All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-06 13:54 ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

Welcome everyone again,

Once again I decided to post an updated version of the Contiguous Memory
Allocator patches.

This version provides mainly a bugfix for a very rare issue that might
have changed migration type of the CMA page blocks resulting in dropping
CMA features from the affected page block and causing memory allocation
to fail. Also the issue reported by Dave Hansen has been fixed.

This version also introduces basic support for x86 architecture, what
allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
hope this will result in wider testing, comments and easier merging to
mainline.

I've also dropped an examplary patch for s5p-fimc platform device
private memory declaration and added the one from real life. CMA device
private memory regions are defined for s5p-mfc device to let it allocate
buffers from two memory banks.

ARM integration code has not been changed since last version, it
provides implementation of all the ideas that has been discussed during
Linaro Sprint meeting. Here are the details:

  This version provides a solution for complete integration of CMA to
  DMA mapping subsystem on ARM architecture. The issue caused by double
  dma pages mapping and possible aliasing in coherent memory mapping has
  been finally resolved, both for GFP_ATOMIC case (allocations comes from
  coherent memory pool) and non-GFP_ATOMIC case (allocations comes from
  CMA managed areas).

  For coherent, nommu, ARMv4 and ARMv5 systems the current DMA-mapping
  implementation has been kept.

  For ARMv6+ systems, CMA has been enabled and a special pool of coherent
  memory for atomic allocations has been created. The size of this pool
  defaults to DEFAULT_CONSISTEN_DMA_SIZE/8, but can be changed with
  coherent_pool kernel parameter (if really required).

  All atomic allocations are served from this pool. I've did a little
  simplification here, because there is no separate pool for writecombine
  memory - such requests are also served from coherent pool. I don't
  think that such simplification is a problem here - I found no driver
  that use dma_alloc_writecombine with GFP_ATOMIC flags.

  All non-atomic allocation are served from CMA area. Kernel mapping is
  updated to reflect required memory attributes changes. This is possible
  because during early boot, all CMA area are remapped with 4KiB pages in
  kernel low-memory.

  This version have been tested on Samsung S5PC110 based Goni machine and
  Exynos4 UniversalC210 board with various V4L2 multimedia drivers.

  Coherent atomic allocations has been tested by manually enabling the dma
  bounce for the s3c-sdhci device.

All patches are prepared for Linux Kernel next-20111005, which is based
on v3.1-rc8.

I hope that patch 1-7 can be first merged to linux-mm kernel tree to
enable testing them in linux-next. Then, the ARM related patches 8-9 can
be scheduled for merging.

A few words for these who see CMA for the first time:

   The Contiguous Memory Allocator (CMA) makes it possible for device
   drivers to allocate big contiguous chunks of memory after the system
   has booted. 

   The main difference from the similar frameworks is the fact that CMA
   allows to transparently reuse memory region reserved for the big
   chunk allocation as a system memory, so no memory is wasted when no
   big chunk is allocated. Once the alloc request is issued, the
   framework will migrate system pages to create a required big chunk of
   physically contiguous memory.

   For more information you can refer to nice LWN articles: 
   http://lwn.net/Articles/447405/ and http://lwn.net/Articles/450286/
   as well as links to previous versions of the CMA framework.

   The CMA framework has been initially developed by Michal Nazarewicz
   at Samsung Poland R&D Center. Since version 9, I've taken over the
   development, because Michal has left the company.

TODO (optional):
- implement support for contiguous memory areas placed in HIGHMEM zone

Best regards
Marek Szyprowski
Samsung Poland R&D Center


Links to previous versions of the patchset:
v15: <http://www.spinics.net/lists/linux-mm/msg23365.html>
v14: <http://www.spinics.net/lists/linux-media/msg36536.html>
v13: (internal, intentionally not released)
v12: <http://www.spinics.net/lists/linux-media/msg35674.html>
v11: <http://www.spinics.net/lists/linux-mm/msg21868.html>
v10: <http://www.spinics.net/lists/linux-mm/msg20761.html>
 v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
 v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
 v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v5: (intentionally left out as CMA v5 was identical to CMA v4)
 v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
 v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
 v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
 v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v16:
    1. merged a fixup from Michal Nazarewicz to address comments from Dave
       Hansen about checking if pfns belong to the same memory zone

    2. merged a fix from Michal Nazarewicz for incorrect handling of pages
       which belong to page block that is in MIGRATE_ISOLATE state, in very
       rare cases the migrate type of page block might have been changed
       from MIGRATE_CMA to MIGRATE_MOVABLE because of this bug

    3. moved some common code to include/asm-generic

    4. added support for x86 DMA-mapping framework for pci-dma hardware,
       CMA can be now even more widely tested on KVM/QEMU and a lot of common
       x86 boxes

    5. rebased onto next-20111005 kernel tree, which includes changes in ARM
       DMA-mapping subsystem (CONSISTENT_DMA_SIZE removal)

    6. removed patch for CMA s5p-fimc device private regions (served only as
       example) and provided the one that matches real life case - s5p-mfc
       device

v15:
    1. fixed calculation of the total memory after activating CMA area (was
       broken from v12)

    2. more code cleanup in drivers/base/dma-contiguous.c

    3. added address limit for default CMA area

    4. rewrote ARM DMA integration:
	- removed "ARM: DMA: steal memory for DMA coherent mappings" patch
	- kept current DMA mapping implementation for coherent, nommu and
	  ARMv4/ARMv5 systems
	- enabled CMA for all ARMv6+ systems
	- added separate, small pool for coherent atomic allocations, defaults
	  to CONSISTENT_DMA_SIZE/8, but can be changed with kernel parameter
	  coherent_pool=[size]

v14:
    1. Merged with "ARM: DMA: steal memory for DMA coherent mappings" 
       patch, added support for GFP_ATOMIC allocations.

    2. Added checks for NULL device pointer

v13: (internal, intentionally not released)

v12:
    1. Fixed 2 nasty bugs in dma-contiguous allocator:
       - alignment argument was not passed correctly
       - range for dma_release_from_contiguous was not checked correctly

    2. Added support for architecture specfic dma_contiguous_early_fixup()
       function

    3. CMA and DMA-mapping integration for ARM architechture has been
       rewritten to take care of the memory aliasing issue that might
       happen for newer ARM CPUs (mapping of the same pages with different
       cache attributes is forbidden). TODO: add support for GFP_ATOMIC
       allocations basing on the "ARM: DMA: steal memory for DMA coherent
       mappings" patch and implement support for contiguous memory areas
       that are placed in HIGHMEM zone

v11:
    1. Removed genalloc usage and replaced it with direct calls to
       bitmap_* functions, dropped patches that are not needed
       anymore (genalloc extensions)

    2. Moved all contiguous area management code from mm/cma.c
       to drivers/base/dma-contiguous.c

    3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous

    4. Introduced global, system wide (default) contiguous area
       configured with kernel config and kernel cmdline parameters

    5. Simplified initialization to just one function:
       dma_declare_contiguous()

    6. Added example of device private memory contiguous area

v10:
    1. Rebased onto 3.0-rc2 and resolved all conflicts

    2. Simplified CMA to be just a pure memory allocator, for use
       with platfrom/bus specific subsystems, like dma-mapping.
       Removed all device specific functions are calls.

    3. Integrated with ARM DMA-mapping subsystem.

    4. Code cleanup here and there.

    5. Removed private context support.

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

    2. Fixed a bunch of nasty bugs that happened when the allocation
       failed (mainly kernel oops due to NULL ptr dereference).

    3. Introduced testing code: cma-regions compatibility layer and
       videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
       CMA and put in page_allocator.c.  This function tries to
       migrate all LRU pages in specified range and then allocate the
       range using alloc_contig_freed_pages().

    2. Support for MIGRATE_CMA has been separated from the CMA code.
       I have not tested if CMA works with ZONE_MOVABLE but I see no
       reasons why it shouldn't.

    3. I have added a @private argument when creating CMA contexts so
       that one can reserve memory and not share it with the rest of
       the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
       mapping has been removed from the patchset.  This is not to say
       that this code is not needed, it's just not worth posting
       everything in one patchset.

       Currently, CMA is "just" an allocator.  It uses it's own
       migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
       which behave just like ZONE_MOVABLE but dispite the latter can
       be put in arbitrary places.

    2. The migration code that was introduced in the previous version
       actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
       The implementation is not yet complete though.

       Migration support means that when CMA is not using memory
       reserved for it, page allocator can allocate pages from it.
       When CMA wants to use the memory, the pages have to be moved
       and/or evicted as to make room for CMA.

       To make it possible it must be guaranteed that only movable and
       reclaimable pages are allocated in CMA controlled regions.
       This is done by introducing a MIGRATE_CMA migrate type that
       guarantees exactly that.

       Some of the migration code is "borrowed" from Kamezawa
       Hiroyuki's alloc_contig_pages() implementation.  The main
       difference is that thanks to MIGRATE_CMA migrate type CMA
       assumes that memory controlled by CMA are is always movable or
       reclaimable so that it makes allocation decisions regardless of
       the whether some pages are actually allocated and migrates them
       if needed.

       The most interesting patches from the patchset that implement
       the functionality are:

         09/13: mm: alloc_contig_free_pages() added
         10/13: mm: MIGRATE_CMA migration type added
         11/13: mm: MIGRATE_CMA isolation functions added
         12/13: mm: cma: Migration support added [wip]

       Currently, kernel panics in some situations which I am trying
       to investigate.

    2. cma_pin() and cma_unpin() functions has been added (after
       a conversation with Johan Mossberg).  The idea is that whenever
       hardware does not use the memory (no transaction is on) the
       chunk can be moved around.  This would allow defragmentation to
       be implemented if desired.  No defragmentation algorithm is
       provided at this time.

    3. Sysfs support has been replaced with debugfs.  I always felt
       unsure about the sysfs interface and when Greg KH pointed it
       out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
       that platform will provide a "*=<regions>" rule in the map
       attribute.

    2. The terminology has been changed slightly renaming "kind" to
       "type" of memory.  In the previous revisions, the documentation
       indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
       a separate patch, the fourth one).  As a consequence, the
       cma_set_defaults() function has been changed -- it no longer
       accepts a string with list of regions but an array of regions.

    2. The "asterisk" attribute has been removed.  Now, each region
       has an "asterisk" flag which lets one specify whether this
       region should by considered "asterisk" region.

    3. SysFS support has been moved to a separate patch (the third one
       in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed.  In exchange,
       a SysFS entry has been created under kernel/mm/contiguous.

       The intended way of specifying the attributes is
       a cma_set_defaults() function called by platform initialisation
       code.  "regions" attribute (the string specified by "cma"
       command line parameter) can be overwritten with command line
       parameter; the other attributes can be changed during run-time
       using the SysFS entries.

    2. The behaviour of the "map" attribute has been modified
       slightly.  Currently, if no rule matches given device it is
       assigned regions specified by the "asterisk" attribute.  It is
       by default built from the region names given in "regions"
       attribute.

    3. Devices can register private regions as well as regions that
       can be shared but are not reserved using standard CMA
       mechanisms.  A private region has no name and can be accessed
       only by devices that have the pointer to it.

    4. The way allocators are registered has changed.  Currently,
       a cma_allocator_register() function is used for that purpose.
       Moreover, allocators are attached to regions the first time
       memory is registered from the region or when allocator is
       registered which means that allocators can be dynamic modules
       that are loaded after the kernel booted (of course, it won't be
       possible to allocate a chunk of memory from a region if
       allocator is not loaded).

    5. Index of new functions:

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions, size_t size,
    +               dma_addr_t alignment)

    +static inline int
    +cma_info_about(struct cma_info *info, const const char *regions)

    +int __must_check cma_region_register(struct cma_region *reg);

    +dma_addr_t __must_check
    +cma_alloc_from_region(struct cma_region *reg,
    +                      size_t size, dma_addr_t alignment);

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions,
    +               size_t size, dma_addr_t alignment);

    +int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

    Code "stolen" from Kamezawa.  The first patch just moves code
    around and the second provide function for "allocates" already
    freed memory.

  mm: alloc_contig_range() added

    This is what Kamezawa asked: a function that tries to migrate all
    pages from given range and then use alloc_contig_freed_pages()
    (defined by the previous commit) to allocate those pages. 

  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

    Introduction of the new migratetype and support for it in CMA.
    MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
    memory range can be marked as one.

  mm: cma: Contiguous Memory Allocator added

    The code CMA code. Manages CMA contexts and performs memory
    allocations.

  X86: integrate CMA with DMA-mapping subsystem
  ARM: integrate CMA with dma-mapping subsystem

    Main clients of CMA framework. CMA serves as a alloc_pages()
    replacement.

  ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device

    Use CMA device private memory regions instead of custom solution
    based on memblock_reserve() + dma_declare_coherent().


Patch summary:

KAMEZAWA Hiroyuki (2):
  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

Marek Szyprowski (4):
  drivers: add Contiguous Memory Allocator
  ARM: integrate CMA with DMA-mapping subsystem
  ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
  X86: integrate CMA with DMA-mapping subsystem

Michal Nazarewicz (3):
  mm: alloc_contig_range() added
  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

 arch/Kconfig                          |    3 +
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/dma-contiguous.h |   16 ++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |    8 +
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++-
 arch/arm/plat-s5p/dev-mfc.c           |   51 +----
 arch/x86/Kconfig                      |    1 +
 arch/x86/include/asm/dma-contiguous.h |   13 +
 arch/x86/include/asm/dma-mapping.h    |    4 +
 arch/x86/kernel/pci-dma.c             |   18 ++-
 arch/x86/kernel/pci-nommu.c           |    8 +-
 arch/x86/kernel/setup.c               |    2 +
 drivers/base/Kconfig                  |   79 +++++++
 drivers/base/Makefile                 |    1 +
 drivers/base/dma-contiguous.c         |  386 +++++++++++++++++++++++++++++++++
 include/asm-generic/dma-contiguous.h  |   27 +++
 include/linux/device.h                |    4 +
 include/linux/dma-contiguous.h        |  106 +++++++++
 include/linux/mmzone.h                |   57 +++++-
 include/linux/page-isolation.h        |   53 ++++-
 mm/Kconfig                            |    8 +-
 mm/compaction.c                       |   10 +
 mm/memory_hotplug.c                   |  111 ----------
 mm/page_alloc.c                       |  317 +++++++++++++++++++++++++--
 mm/page_isolation.c                   |  131 +++++++++++-
 28 files changed, 1522 insertions(+), 289 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h
 create mode 100644 arch/x86/include/asm/dma-contiguous.h
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/asm-generic/dma-contiguous.h
 create mode 100644 include/linux/dma-contiguous.h

-- 
1.7.1.569.g6f426


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-06 13:54 ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

Welcome everyone again,

Once again I decided to post an updated version of the Contiguous Memory
Allocator patches.

This version provides mainly a bugfix for a very rare issue that might
have changed migration type of the CMA page blocks resulting in dropping
CMA features from the affected page block and causing memory allocation
to fail. Also the issue reported by Dave Hansen has been fixed.

This version also introduces basic support for x86 architecture, what
allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
hope this will result in wider testing, comments and easier merging to
mainline.

I've also dropped an examplary patch for s5p-fimc platform device
private memory declaration and added the one from real life. CMA device
private memory regions are defined for s5p-mfc device to let it allocate
buffers from two memory banks.

ARM integration code has not been changed since last version, it
provides implementation of all the ideas that has been discussed during
Linaro Sprint meeting. Here are the details:

  This version provides a solution for complete integration of CMA to
  DMA mapping subsystem on ARM architecture. The issue caused by double
  dma pages mapping and possible aliasing in coherent memory mapping has
  been finally resolved, both for GFP_ATOMIC case (allocations comes from
  coherent memory pool) and non-GFP_ATOMIC case (allocations comes from
  CMA managed areas).

  For coherent, nommu, ARMv4 and ARMv5 systems the current DMA-mapping
  implementation has been kept.

  For ARMv6+ systems, CMA has been enabled and a special pool of coherent
  memory for atomic allocations has been created. The size of this pool
  defaults to DEFAULT_CONSISTEN_DMA_SIZE/8, but can be changed with
  coherent_pool kernel parameter (if really required).

  All atomic allocations are served from this pool. I've did a little
  simplification here, because there is no separate pool for writecombine
  memory - such requests are also served from coherent pool. I don't
  think that such simplification is a problem here - I found no driver
  that use dma_alloc_writecombine with GFP_ATOMIC flags.

  All non-atomic allocation are served from CMA area. Kernel mapping is
  updated to reflect required memory attributes changes. This is possible
  because during early boot, all CMA area are remapped with 4KiB pages in
  kernel low-memory.

  This version have been tested on Samsung S5PC110 based Goni machine and
  Exynos4 UniversalC210 board with various V4L2 multimedia drivers.

  Coherent atomic allocations has been tested by manually enabling the dma
  bounce for the s3c-sdhci device.

All patches are prepared for Linux Kernel next-20111005, which is based
on v3.1-rc8.

I hope that patch 1-7 can be first merged to linux-mm kernel tree to
enable testing them in linux-next. Then, the ARM related patches 8-9 can
be scheduled for merging.

A few words for these who see CMA for the first time:

   The Contiguous Memory Allocator (CMA) makes it possible for device
   drivers to allocate big contiguous chunks of memory after the system
   has booted. 

   The main difference from the similar frameworks is the fact that CMA
   allows to transparently reuse memory region reserved for the big
   chunk allocation as a system memory, so no memory is wasted when no
   big chunk is allocated. Once the alloc request is issued, the
   framework will migrate system pages to create a required big chunk of
   physically contiguous memory.

   For more information you can refer to nice LWN articles: 
   http://lwn.net/Articles/447405/ and http://lwn.net/Articles/450286/
   as well as links to previous versions of the CMA framework.

   The CMA framework has been initially developed by Michal Nazarewicz
   at Samsung Poland R&D Center. Since version 9, I've taken over the
   development, because Michal has left the company.

TODO (optional):
- implement support for contiguous memory areas placed in HIGHMEM zone

Best regards
Marek Szyprowski
Samsung Poland R&D Center


Links to previous versions of the patchset:
v15: <http://www.spinics.net/lists/linux-mm/msg23365.html>
v14: <http://www.spinics.net/lists/linux-media/msg36536.html>
v13: (internal, intentionally not released)
v12: <http://www.spinics.net/lists/linux-media/msg35674.html>
v11: <http://www.spinics.net/lists/linux-mm/msg21868.html>
v10: <http://www.spinics.net/lists/linux-mm/msg20761.html>
 v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
 v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
 v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v5: (intentionally left out as CMA v5 was identical to CMA v4)
 v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
 v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
 v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
 v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v16:
    1. merged a fixup from Michal Nazarewicz to address comments from Dave
       Hansen about checking if pfns belong to the same memory zone

    2. merged a fix from Michal Nazarewicz for incorrect handling of pages
       which belong to page block that is in MIGRATE_ISOLATE state, in very
       rare cases the migrate type of page block might have been changed
       from MIGRATE_CMA to MIGRATE_MOVABLE because of this bug

    3. moved some common code to include/asm-generic

    4. added support for x86 DMA-mapping framework for pci-dma hardware,
       CMA can be now even more widely tested on KVM/QEMU and a lot of common
       x86 boxes

    5. rebased onto next-20111005 kernel tree, which includes changes in ARM
       DMA-mapping subsystem (CONSISTENT_DMA_SIZE removal)

    6. removed patch for CMA s5p-fimc device private regions (served only as
       example) and provided the one that matches real life case - s5p-mfc
       device

v15:
    1. fixed calculation of the total memory after activating CMA area (was
       broken from v12)

    2. more code cleanup in drivers/base/dma-contiguous.c

    3. added address limit for default CMA area

    4. rewrote ARM DMA integration:
	- removed "ARM: DMA: steal memory for DMA coherent mappings" patch
	- kept current DMA mapping implementation for coherent, nommu and
	  ARMv4/ARMv5 systems
	- enabled CMA for all ARMv6+ systems
	- added separate, small pool for coherent atomic allocations, defaults
	  to CONSISTENT_DMA_SIZE/8, but can be changed with kernel parameter
	  coherent_pool=[size]

v14:
    1. Merged with "ARM: DMA: steal memory for DMA coherent mappings" 
       patch, added support for GFP_ATOMIC allocations.

    2. Added checks for NULL device pointer

v13: (internal, intentionally not released)

v12:
    1. Fixed 2 nasty bugs in dma-contiguous allocator:
       - alignment argument was not passed correctly
       - range for dma_release_from_contiguous was not checked correctly

    2. Added support for architecture specfic dma_contiguous_early_fixup()
       function

    3. CMA and DMA-mapping integration for ARM architechture has been
       rewritten to take care of the memory aliasing issue that might
       happen for newer ARM CPUs (mapping of the same pages with different
       cache attributes is forbidden). TODO: add support for GFP_ATOMIC
       allocations basing on the "ARM: DMA: steal memory for DMA coherent
       mappings" patch and implement support for contiguous memory areas
       that are placed in HIGHMEM zone

v11:
    1. Removed genalloc usage and replaced it with direct calls to
       bitmap_* functions, dropped patches that are not needed
       anymore (genalloc extensions)

    2. Moved all contiguous area management code from mm/cma.c
       to drivers/base/dma-contiguous.c

    3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous

    4. Introduced global, system wide (default) contiguous area
       configured with kernel config and kernel cmdline parameters

    5. Simplified initialization to just one function:
       dma_declare_contiguous()

    6. Added example of device private memory contiguous area

v10:
    1. Rebased onto 3.0-rc2 and resolved all conflicts

    2. Simplified CMA to be just a pure memory allocator, for use
       with platfrom/bus specific subsystems, like dma-mapping.
       Removed all device specific functions are calls.

    3. Integrated with ARM DMA-mapping subsystem.

    4. Code cleanup here and there.

    5. Removed private context support.

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

    2. Fixed a bunch of nasty bugs that happened when the allocation
       failed (mainly kernel oops due to NULL ptr dereference).

    3. Introduced testing code: cma-regions compatibility layer and
       videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
       CMA and put in page_allocator.c.  This function tries to
       migrate all LRU pages in specified range and then allocate the
       range using alloc_contig_freed_pages().

    2. Support for MIGRATE_CMA has been separated from the CMA code.
       I have not tested if CMA works with ZONE_MOVABLE but I see no
       reasons why it shouldn't.

    3. I have added a @private argument when creating CMA contexts so
       that one can reserve memory and not share it with the rest of
       the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
       mapping has been removed from the patchset.  This is not to say
       that this code is not needed, it's just not worth posting
       everything in one patchset.

       Currently, CMA is "just" an allocator.  It uses it's own
       migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
       which behave just like ZONE_MOVABLE but dispite the latter can
       be put in arbitrary places.

    2. The migration code that was introduced in the previous version
       actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
       The implementation is not yet complete though.

       Migration support means that when CMA is not using memory
       reserved for it, page allocator can allocate pages from it.
       When CMA wants to use the memory, the pages have to be moved
       and/or evicted as to make room for CMA.

       To make it possible it must be guaranteed that only movable and
       reclaimable pages are allocated in CMA controlled regions.
       This is done by introducing a MIGRATE_CMA migrate type that
       guarantees exactly that.

       Some of the migration code is "borrowed" from Kamezawa
       Hiroyuki's alloc_contig_pages() implementation.  The main
       difference is that thanks to MIGRATE_CMA migrate type CMA
       assumes that memory controlled by CMA are is always movable or
       reclaimable so that it makes allocation decisions regardless of
       the whether some pages are actually allocated and migrates them
       if needed.

       The most interesting patches from the patchset that implement
       the functionality are:

         09/13: mm: alloc_contig_free_pages() added
         10/13: mm: MIGRATE_CMA migration type added
         11/13: mm: MIGRATE_CMA isolation functions added
         12/13: mm: cma: Migration support added [wip]

       Currently, kernel panics in some situations which I am trying
       to investigate.

    2. cma_pin() and cma_unpin() functions has been added (after
       a conversation with Johan Mossberg).  The idea is that whenever
       hardware does not use the memory (no transaction is on) the
       chunk can be moved around.  This would allow defragmentation to
       be implemented if desired.  No defragmentation algorithm is
       provided at this time.

    3. Sysfs support has been replaced with debugfs.  I always felt
       unsure about the sysfs interface and when Greg KH pointed it
       out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
       that platform will provide a "*=<regions>" rule in the map
       attribute.

    2. The terminology has been changed slightly renaming "kind" to
       "type" of memory.  In the previous revisions, the documentation
       indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
       a separate patch, the fourth one).  As a consequence, the
       cma_set_defaults() function has been changed -- it no longer
       accepts a string with list of regions but an array of regions.

    2. The "asterisk" attribute has been removed.  Now, each region
       has an "asterisk" flag which lets one specify whether this
       region should by considered "asterisk" region.

    3. SysFS support has been moved to a separate patch (the third one
       in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed.  In exchange,
       a SysFS entry has been created under kernel/mm/contiguous.

       The intended way of specifying the attributes is
       a cma_set_defaults() function called by platform initialisation
       code.  "regions" attribute (the string specified by "cma"
       command line parameter) can be overwritten with command line
       parameter; the other attributes can be changed during run-time
       using the SysFS entries.

    2. The behaviour of the "map" attribute has been modified
       slightly.  Currently, if no rule matches given device it is
       assigned regions specified by the "asterisk" attribute.  It is
       by default built from the region names given in "regions"
       attribute.

    3. Devices can register private regions as well as regions that
       can be shared but are not reserved using standard CMA
       mechanisms.  A private region has no name and can be accessed
       only by devices that have the pointer to it.

    4. The way allocators are registered has changed.  Currently,
       a cma_allocator_register() function is used for that purpose.
       Moreover, allocators are attached to regions the first time
       memory is registered from the region or when allocator is
       registered which means that allocators can be dynamic modules
       that are loaded after the kernel booted (of course, it won't be
       possible to allocate a chunk of memory from a region if
       allocator is not loaded).

    5. Index of new functions:

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions, size_t size,
    +               dma_addr_t alignment)

    +static inline int
    +cma_info_about(struct cma_info *info, const const char *regions)

    +int __must_check cma_region_register(struct cma_region *reg);

    +dma_addr_t __must_check
    +cma_alloc_from_region(struct cma_region *reg,
    +                      size_t size, dma_addr_t alignment);

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions,
    +               size_t size, dma_addr_t alignment);

    +int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

    Code "stolen" from Kamezawa.  The first patch just moves code
    around and the second provide function for "allocates" already
    freed memory.

  mm: alloc_contig_range() added

    This is what Kamezawa asked: a function that tries to migrate all
    pages from given range and then use alloc_contig_freed_pages()
    (defined by the previous commit) to allocate those pages. 

  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

    Introduction of the new migratetype and support for it in CMA.
    MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
    memory range can be marked as one.

  mm: cma: Contiguous Memory Allocator added

    The code CMA code. Manages CMA contexts and performs memory
    allocations.

  X86: integrate CMA with DMA-mapping subsystem
  ARM: integrate CMA with dma-mapping subsystem

    Main clients of CMA framework. CMA serves as a alloc_pages()
    replacement.

  ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device

    Use CMA device private memory regions instead of custom solution
    based on memblock_reserve() + dma_declare_coherent().


Patch summary:

KAMEZAWA Hiroyuki (2):
  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

Marek Szyprowski (4):
  drivers: add Contiguous Memory Allocator
  ARM: integrate CMA with DMA-mapping subsystem
  ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
  X86: integrate CMA with DMA-mapping subsystem

Michal Nazarewicz (3):
  mm: alloc_contig_range() added
  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

 arch/Kconfig                          |    3 +
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/dma-contiguous.h |   16 ++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |    8 +
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++-
 arch/arm/plat-s5p/dev-mfc.c           |   51 +----
 arch/x86/Kconfig                      |    1 +
 arch/x86/include/asm/dma-contiguous.h |   13 +
 arch/x86/include/asm/dma-mapping.h    |    4 +
 arch/x86/kernel/pci-dma.c             |   18 ++-
 arch/x86/kernel/pci-nommu.c           |    8 +-
 arch/x86/kernel/setup.c               |    2 +
 drivers/base/Kconfig                  |   79 +++++++
 drivers/base/Makefile                 |    1 +
 drivers/base/dma-contiguous.c         |  386 +++++++++++++++++++++++++++++++++
 include/asm-generic/dma-contiguous.h  |   27 +++
 include/linux/device.h                |    4 +
 include/linux/dma-contiguous.h        |  106 +++++++++
 include/linux/mmzone.h                |   57 +++++-
 include/linux/page-isolation.h        |   53 ++++-
 mm/Kconfig                            |    8 +-
 mm/compaction.c                       |   10 +
 mm/memory_hotplug.c                   |  111 ----------
 mm/page_alloc.c                       |  317 +++++++++++++++++++++++++--
 mm/page_isolation.c                   |  131 +++++++++++-
 28 files changed, 1522 insertions(+), 289 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h
 create mode 100644 arch/x86/include/asm/dma-contiguous.h
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/asm-generic/dma-contiguous.h
 create mode 100644 include/linux/dma-contiguous.h

-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-06 13:54 ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Welcome everyone again,

Once again I decided to post an updated version of the Contiguous Memory
Allocator patches.

This version provides mainly a bugfix for a very rare issue that might
have changed migration type of the CMA page blocks resulting in dropping
CMA features from the affected page block and causing memory allocation
to fail. Also the issue reported by Dave Hansen has been fixed.

This version also introduces basic support for x86 architecture, what
allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
hope this will result in wider testing, comments and easier merging to
mainline.

I've also dropped an examplary patch for s5p-fimc platform device
private memory declaration and added the one from real life. CMA device
private memory regions are defined for s5p-mfc device to let it allocate
buffers from two memory banks.

ARM integration code has not been changed since last version, it
provides implementation of all the ideas that has been discussed during
Linaro Sprint meeting. Here are the details:

  This version provides a solution for complete integration of CMA to
  DMA mapping subsystem on ARM architecture. The issue caused by double
  dma pages mapping and possible aliasing in coherent memory mapping has
  been finally resolved, both for GFP_ATOMIC case (allocations comes from
  coherent memory pool) and non-GFP_ATOMIC case (allocations comes from
  CMA managed areas).

  For coherent, nommu, ARMv4 and ARMv5 systems the current DMA-mapping
  implementation has been kept.

  For ARMv6+ systems, CMA has been enabled and a special pool of coherent
  memory for atomic allocations has been created. The size of this pool
  defaults to DEFAULT_CONSISTEN_DMA_SIZE/8, but can be changed with
  coherent_pool kernel parameter (if really required).

  All atomic allocations are served from this pool. I've did a little
  simplification here, because there is no separate pool for writecombine
  memory - such requests are also served from coherent pool. I don't
  think that such simplification is a problem here - I found no driver
  that use dma_alloc_writecombine with GFP_ATOMIC flags.

  All non-atomic allocation are served from CMA area. Kernel mapping is
  updated to reflect required memory attributes changes. This is possible
  because during early boot, all CMA area are remapped with 4KiB pages in
  kernel low-memory.

  This version have been tested on Samsung S5PC110 based Goni machine and
  Exynos4 UniversalC210 board with various V4L2 multimedia drivers.

  Coherent atomic allocations has been tested by manually enabling the dma
  bounce for the s3c-sdhci device.

All patches are prepared for Linux Kernel next-20111005, which is based
on v3.1-rc8.

I hope that patch 1-7 can be first merged to linux-mm kernel tree to
enable testing them in linux-next. Then, the ARM related patches 8-9 can
be scheduled for merging.

A few words for these who see CMA for the first time:

   The Contiguous Memory Allocator (CMA) makes it possible for device
   drivers to allocate big contiguous chunks of memory after the system
   has booted. 

   The main difference from the similar frameworks is the fact that CMA
   allows to transparently reuse memory region reserved for the big
   chunk allocation as a system memory, so no memory is wasted when no
   big chunk is allocated. Once the alloc request is issued, the
   framework will migrate system pages to create a required big chunk of
   physically contiguous memory.

   For more information you can refer to nice LWN articles: 
   http://lwn.net/Articles/447405/ and http://lwn.net/Articles/450286/
   as well as links to previous versions of the CMA framework.

   The CMA framework has been initially developed by Michal Nazarewicz
   at Samsung Poland R&D Center. Since version 9, I've taken over the
   development, because Michal has left the company.

TODO (optional):
- implement support for contiguous memory areas placed in HIGHMEM zone

Best regards
Marek Szyprowski
Samsung Poland R&D Center


Links to previous versions of the patchset:
v15: <http://www.spinics.net/lists/linux-mm/msg23365.html>
v14: <http://www.spinics.net/lists/linux-media/msg36536.html>
v13: (internal, intentionally not released)
v12: <http://www.spinics.net/lists/linux-media/msg35674.html>
v11: <http://www.spinics.net/lists/linux-mm/msg21868.html>
v10: <http://www.spinics.net/lists/linux-mm/msg20761.html>
 v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
 v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
 v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v5: (intentionally left out as CMA v5 was identical to CMA v4)
 v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
 v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
 v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
 v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v16:
    1. merged a fixup from Michal Nazarewicz to address comments from Dave
       Hansen about checking if pfns belong to the same memory zone

    2. merged a fix from Michal Nazarewicz for incorrect handling of pages
       which belong to page block that is in MIGRATE_ISOLATE state, in very
       rare cases the migrate type of page block might have been changed
       from MIGRATE_CMA to MIGRATE_MOVABLE because of this bug

    3. moved some common code to include/asm-generic

    4. added support for x86 DMA-mapping framework for pci-dma hardware,
       CMA can be now even more widely tested on KVM/QEMU and a lot of common
       x86 boxes

    5. rebased onto next-20111005 kernel tree, which includes changes in ARM
       DMA-mapping subsystem (CONSISTENT_DMA_SIZE removal)

    6. removed patch for CMA s5p-fimc device private regions (served only as
       example) and provided the one that matches real life case - s5p-mfc
       device

v15:
    1. fixed calculation of the total memory after activating CMA area (was
       broken from v12)

    2. more code cleanup in drivers/base/dma-contiguous.c

    3. added address limit for default CMA area

    4. rewrote ARM DMA integration:
	- removed "ARM: DMA: steal memory for DMA coherent mappings" patch
	- kept current DMA mapping implementation for coherent, nommu and
	  ARMv4/ARMv5 systems
	- enabled CMA for all ARMv6+ systems
	- added separate, small pool for coherent atomic allocations, defaults
	  to CONSISTENT_DMA_SIZE/8, but can be changed with kernel parameter
	  coherent_pool=[size]

v14:
    1. Merged with "ARM: DMA: steal memory for DMA coherent mappings" 
       patch, added support for GFP_ATOMIC allocations.

    2. Added checks for NULL device pointer

v13: (internal, intentionally not released)

v12:
    1. Fixed 2 nasty bugs in dma-contiguous allocator:
       - alignment argument was not passed correctly
       - range for dma_release_from_contiguous was not checked correctly

    2. Added support for architecture specfic dma_contiguous_early_fixup()
       function

    3. CMA and DMA-mapping integration for ARM architechture has been
       rewritten to take care of the memory aliasing issue that might
       happen for newer ARM CPUs (mapping of the same pages with different
       cache attributes is forbidden). TODO: add support for GFP_ATOMIC
       allocations basing on the "ARM: DMA: steal memory for DMA coherent
       mappings" patch and implement support for contiguous memory areas
       that are placed in HIGHMEM zone

v11:
    1. Removed genalloc usage and replaced it with direct calls to
       bitmap_* functions, dropped patches that are not needed
       anymore (genalloc extensions)

    2. Moved all contiguous area management code from mm/cma.c
       to drivers/base/dma-contiguous.c

    3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous

    4. Introduced global, system wide (default) contiguous area
       configured with kernel config and kernel cmdline parameters

    5. Simplified initialization to just one function:
       dma_declare_contiguous()

    6. Added example of device private memory contiguous area

v10:
    1. Rebased onto 3.0-rc2 and resolved all conflicts

    2. Simplified CMA to be just a pure memory allocator, for use
       with platfrom/bus specific subsystems, like dma-mapping.
       Removed all device specific functions are calls.

    3. Integrated with ARM DMA-mapping subsystem.

    4. Code cleanup here and there.

    5. Removed private context support.

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

    2. Fixed a bunch of nasty bugs that happened when the allocation
       failed (mainly kernel oops due to NULL ptr dereference).

    3. Introduced testing code: cma-regions compatibility layer and
       videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
       CMA and put in page_allocator.c.  This function tries to
       migrate all LRU pages in specified range and then allocate the
       range using alloc_contig_freed_pages().

    2. Support for MIGRATE_CMA has been separated from the CMA code.
       I have not tested if CMA works with ZONE_MOVABLE but I see no
       reasons why it shouldn't.

    3. I have added a @private argument when creating CMA contexts so
       that one can reserve memory and not share it with the rest of
       the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
       mapping has been removed from the patchset.  This is not to say
       that this code is not needed, it's just not worth posting
       everything in one patchset.

       Currently, CMA is "just" an allocator.  It uses it's own
       migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
       which behave just like ZONE_MOVABLE but dispite the latter can
       be put in arbitrary places.

    2. The migration code that was introduced in the previous version
       actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
       The implementation is not yet complete though.

       Migration support means that when CMA is not using memory
       reserved for it, page allocator can allocate pages from it.
       When CMA wants to use the memory, the pages have to be moved
       and/or evicted as to make room for CMA.

       To make it possible it must be guaranteed that only movable and
       reclaimable pages are allocated in CMA controlled regions.
       This is done by introducing a MIGRATE_CMA migrate type that
       guarantees exactly that.

       Some of the migration code is "borrowed" from Kamezawa
       Hiroyuki's alloc_contig_pages() implementation.  The main
       difference is that thanks to MIGRATE_CMA migrate type CMA
       assumes that memory controlled by CMA are is always movable or
       reclaimable so that it makes allocation decisions regardless of
       the whether some pages are actually allocated and migrates them
       if needed.

       The most interesting patches from the patchset that implement
       the functionality are:

         09/13: mm: alloc_contig_free_pages() added
         10/13: mm: MIGRATE_CMA migration type added
         11/13: mm: MIGRATE_CMA isolation functions added
         12/13: mm: cma: Migration support added [wip]

       Currently, kernel panics in some situations which I am trying
       to investigate.

    2. cma_pin() and cma_unpin() functions has been added (after
       a conversation with Johan Mossberg).  The idea is that whenever
       hardware does not use the memory (no transaction is on) the
       chunk can be moved around.  This would allow defragmentation to
       be implemented if desired.  No defragmentation algorithm is
       provided at this time.

    3. Sysfs support has been replaced with debugfs.  I always felt
       unsure about the sysfs interface and when Greg KH pointed it
       out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
       that platform will provide a "*=<regions>" rule in the map
       attribute.

    2. The terminology has been changed slightly renaming "kind" to
       "type" of memory.  In the previous revisions, the documentation
       indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
       a separate patch, the fourth one).  As a consequence, the
       cma_set_defaults() function has been changed -- it no longer
       accepts a string with list of regions but an array of regions.

    2. The "asterisk" attribute has been removed.  Now, each region
       has an "asterisk" flag which lets one specify whether this
       region should by considered "asterisk" region.

    3. SysFS support has been moved to a separate patch (the third one
       in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed.  In exchange,
       a SysFS entry has been created under kernel/mm/contiguous.

       The intended way of specifying the attributes is
       a cma_set_defaults() function called by platform initialisation
       code.  "regions" attribute (the string specified by "cma"
       command line parameter) can be overwritten with command line
       parameter; the other attributes can be changed during run-time
       using the SysFS entries.

    2. The behaviour of the "map" attribute has been modified
       slightly.  Currently, if no rule matches given device it is
       assigned regions specified by the "asterisk" attribute.  It is
       by default built from the region names given in "regions"
       attribute.

    3. Devices can register private regions as well as regions that
       can be shared but are not reserved using standard CMA
       mechanisms.  A private region has no name and can be accessed
       only by devices that have the pointer to it.

    4. The way allocators are registered has changed.  Currently,
       a cma_allocator_register() function is used for that purpose.
       Moreover, allocators are attached to regions the first time
       memory is registered from the region or when allocator is
       registered which means that allocators can be dynamic modules
       that are loaded after the kernel booted (of course, it won't be
       possible to allocate a chunk of memory from a region if
       allocator is not loaded).

    5. Index of new functions:

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions, size_t size,
    +               dma_addr_t alignment)

    +static inline int
    +cma_info_about(struct cma_info *info, const const char *regions)

    +int __must_check cma_region_register(struct cma_region *reg);

    +dma_addr_t __must_check
    +cma_alloc_from_region(struct cma_region *reg,
    +                      size_t size, dma_addr_t alignment);

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions,
    +               size_t size, dma_addr_t alignment);

    +int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

    Code "stolen" from Kamezawa.  The first patch just moves code
    around and the second provide function for "allocates" already
    freed memory.

  mm: alloc_contig_range() added

    This is what Kamezawa asked: a function that tries to migrate all
    pages from given range and then use alloc_contig_freed_pages()
    (defined by the previous commit) to allocate those pages. 

  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

    Introduction of the new migratetype and support for it in CMA.
    MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
    memory range can be marked as one.

  mm: cma: Contiguous Memory Allocator added

    The code CMA code. Manages CMA contexts and performs memory
    allocations.

  X86: integrate CMA with DMA-mapping subsystem
  ARM: integrate CMA with dma-mapping subsystem

    Main clients of CMA framework. CMA serves as a alloc_pages()
    replacement.

  ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device

    Use CMA device private memory regions instead of custom solution
    based on memblock_reserve() + dma_declare_coherent().


Patch summary:

KAMEZAWA Hiroyuki (2):
  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

Marek Szyprowski (4):
  drivers: add Contiguous Memory Allocator
  ARM: integrate CMA with DMA-mapping subsystem
  ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
  X86: integrate CMA with DMA-mapping subsystem

Michal Nazarewicz (3):
  mm: alloc_contig_range() added
  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

 arch/Kconfig                          |    3 +
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/dma-contiguous.h |   16 ++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |    8 +
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++-
 arch/arm/plat-s5p/dev-mfc.c           |   51 +----
 arch/x86/Kconfig                      |    1 +
 arch/x86/include/asm/dma-contiguous.h |   13 +
 arch/x86/include/asm/dma-mapping.h    |    4 +
 arch/x86/kernel/pci-dma.c             |   18 ++-
 arch/x86/kernel/pci-nommu.c           |    8 +-
 arch/x86/kernel/setup.c               |    2 +
 drivers/base/Kconfig                  |   79 +++++++
 drivers/base/Makefile                 |    1 +
 drivers/base/dma-contiguous.c         |  386 +++++++++++++++++++++++++++++++++
 include/asm-generic/dma-contiguous.h  |   27 +++
 include/linux/device.h                |    4 +
 include/linux/dma-contiguous.h        |  106 +++++++++
 include/linux/mmzone.h                |   57 +++++-
 include/linux/page-isolation.h        |   53 ++++-
 mm/Kconfig                            |    8 +-
 mm/compaction.c                       |   10 +
 mm/memory_hotplug.c                   |  111 ----------
 mm/page_alloc.c                       |  317 +++++++++++++++++++++++++--
 mm/page_isolation.c                   |  131 +++++++++++-
 28 files changed, 1522 insertions(+), 289 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h
 create mode 100644 arch/x86/include/asm/dma-contiguous.h
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/asm-generic/dma-contiguous.h
 create mode 100644 include/linux/dma-contiguous.h

-- 
1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-06 13:54   ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
[m.nazarewicz: reworded commit message]
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: rebased and updated to Linux v3.0-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    7 +++
 mm/memory_hotplug.c            |  111 --------------------------------------
 mm/page_isolation.c            |  114 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 111 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6e7d8b2..3419dd6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -707,117 +707,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!get_page_unless_zero(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			put_page(page);
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			put_page(page);
-			/* Because we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..270a026 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,114 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!get_page_unless_zero(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			put_page(page);
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			put_page(page);
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								true, true);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
[m.nazarewicz: reworded commit message]
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: rebased and updated to Linux v3.0-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    7 +++
 mm/memory_hotplug.c            |  111 --------------------------------------
 mm/page_isolation.c            |  114 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 111 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6e7d8b2..3419dd6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -707,117 +707,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!get_page_unless_zero(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			put_page(page);
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			put_page(page);
-			/* Because we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..270a026 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,114 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!get_page_unless_zero(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			put_page(page);
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			put_page(page);
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								true, true);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
[m.nazarewicz: reworded commit message]
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: rebased and updated to Linux v3.0-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    7 +++
 mm/memory_hotplug.c            |  111 --------------------------------------
 mm/page_isolation.c            |  114 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 111 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6e7d8b2..3419dd6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -707,117 +707,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!get_page_unless_zero(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			put_page(page);
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			put_page(page);
-			/* Because we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..270a026 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,114 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!get_page_unless_zero(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			put_page(page);
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			put_page(page);
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								true, true);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-06 13:54   ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range.  Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
[m.nazarewicz: added checks if all allocated pages comes from the
same memory zone]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: fixed wrong condition in VM_BUG_ON assert]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   16 +++++++++
 include/linux/page-isolation.h |    5 +++
 mm/page_alloc.c                |   67 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a2760bb..862a834 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1168,6 +1168,22 @@ static inline int memmap_valid_within(unsigned long pfn,
 }
 #endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
 
+#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
+/*
+ * Both PFNs must be from the same zone!  If this function returns
+ * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
+ */
+static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
+{
+	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
+}
+
+#else
+
+#define zone_pfn_same_memmap(pfn1, pfn2) (true)
+
+#endif
+
 #endif /* !__GENERATING_BOUNDS.H */
 #endif /* !__ASSEMBLY__ */
 #endif /* _LINUX_MMZONE_H */
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..b9fc428 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,6 +33,11 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/* The below functions must be run on a range from a single zone. */
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+					      unsigned long end, gfp_t flag);
+extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
+
 /*
  * For migration.
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf4399a..fbfb920 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5706,6 +5706,73 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+				       gfp_t flag)
+{
+	unsigned long pfn = start, count;
+	struct page *page;
+	struct zone *zone;
+	int order;
+
+	VM_BUG_ON(!pfn_valid(start));
+	page = pfn_to_page(start);
+	zone = page_zone(page);
+
+	spin_lock_irq(&zone->lock);
+
+	for (;;) {
+		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
+			  page_zone(page) != zone);
+
+		list_del(&page->lru);
+		order = page_order(page);
+		count = 1UL << order;
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+
+		pfn += count;
+		if (pfn >= end)
+			break;
+		VM_BUG_ON(!pfn_valid(pfn));
+
+		if (zone_pfn_same_memmap(pfn - count, pfn))
+			page += count;
+		else
+			page = pfn_to_page(pfn);
+	}
+
+	spin_unlock_irq(&zone->lock);
+
+	/* After this, pages in the range can be freed one be one */
+	count = pfn - start;
+	pfn = start;
+	for (page = pfn_to_page(pfn); count; --count) {
+		prep_new_page(page, 0, flag);
+		++pfn;
+		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
+			++page;
+		else
+			page = pfn_to_page(pfn);
+	}
+
+	return pfn;
+}
+
+void free_contig_pages(unsigned long pfn, unsigned nr_pages)
+{
+	struct page *page = pfn_to_page(pfn);
+
+	while (nr_pages--) {
+		__free_page(page);
+		++pfn;
+		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
+			++page;
+		else
+			page = pfn_to_page(pfn);
+	}
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range.  Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
[m.nazarewicz: added checks if all allocated pages comes from the
same memory zone]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: fixed wrong condition in VM_BUG_ON assert]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   16 +++++++++
 include/linux/page-isolation.h |    5 +++
 mm/page_alloc.c                |   67 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a2760bb..862a834 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1168,6 +1168,22 @@ static inline int memmap_valid_within(unsigned long pfn,
 }
 #endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
 
+#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
+/*
+ * Both PFNs must be from the same zone!  If this function returns
+ * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
+ */
+static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
+{
+	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
+}
+
+#else
+
+#define zone_pfn_same_memmap(pfn1, pfn2) (true)
+
+#endif
+
 #endif /* !__GENERATING_BOUNDS.H */
 #endif /* !__ASSEMBLY__ */
 #endif /* _LINUX_MMZONE_H */
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..b9fc428 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,6 +33,11 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/* The below functions must be run on a range from a single zone. */
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+					      unsigned long end, gfp_t flag);
+extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
+
 /*
  * For migration.
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf4399a..fbfb920 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5706,6 +5706,73 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+				       gfp_t flag)
+{
+	unsigned long pfn = start, count;
+	struct page *page;
+	struct zone *zone;
+	int order;
+
+	VM_BUG_ON(!pfn_valid(start));
+	page = pfn_to_page(start);
+	zone = page_zone(page);
+
+	spin_lock_irq(&zone->lock);
+
+	for (;;) {
+		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
+			  page_zone(page) != zone);
+
+		list_del(&page->lru);
+		order = page_order(page);
+		count = 1UL << order;
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+
+		pfn += count;
+		if (pfn >= end)
+			break;
+		VM_BUG_ON(!pfn_valid(pfn));
+
+		if (zone_pfn_same_memmap(pfn - count, pfn))
+			page += count;
+		else
+			page = pfn_to_page(pfn);
+	}
+
+	spin_unlock_irq(&zone->lock);
+
+	/* After this, pages in the range can be freed one be one */
+	count = pfn - start;
+	pfn = start;
+	for (page = pfn_to_page(pfn); count; --count) {
+		prep_new_page(page, 0, flag);
+		++pfn;
+		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
+			++page;
+		else
+			page = pfn_to_page(pfn);
+	}
+
+	return pfn;
+}
+
+void free_contig_pages(unsigned long pfn, unsigned nr_pages)
+{
+	struct page *page = pfn_to_page(pfn);
+
+	while (nr_pages--) {
+		__free_page(page);
+		++pfn;
+		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
+			++page;
+		else
+			page = pfn_to_page(pfn);
+	}
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range.  Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
[m.nazarewicz: added checks if all allocated pages comes from the
same memory zone]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: fixed wrong condition in VM_BUG_ON assert]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   16 +++++++++
 include/linux/page-isolation.h |    5 +++
 mm/page_alloc.c                |   67 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a2760bb..862a834 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1168,6 +1168,22 @@ static inline int memmap_valid_within(unsigned long pfn,
 }
 #endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
 
+#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
+/*
+ * Both PFNs must be from the same zone!  If this function returns
+ * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
+ */
+static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
+{
+	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
+}
+
+#else
+
+#define zone_pfn_same_memmap(pfn1, pfn2) (true)
+
+#endif
+
 #endif /* !__GENERATING_BOUNDS.H */
 #endif /* !__ASSEMBLY__ */
 #endif /* _LINUX_MMZONE_H */
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..b9fc428 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,6 +33,11 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/* The below functions must be run on a range from a single zone. */
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+					      unsigned long end, gfp_t flag);
+extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
+
 /*
  * For migration.
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf4399a..fbfb920 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5706,6 +5706,73 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+				       gfp_t flag)
+{
+	unsigned long pfn = start, count;
+	struct page *page;
+	struct zone *zone;
+	int order;
+
+	VM_BUG_ON(!pfn_valid(start));
+	page = pfn_to_page(start);
+	zone = page_zone(page);
+
+	spin_lock_irq(&zone->lock);
+
+	for (;;) {
+		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
+			  page_zone(page) != zone);
+
+		list_del(&page->lru);
+		order = page_order(page);
+		count = 1UL << order;
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+
+		pfn += count;
+		if (pfn >= end)
+			break;
+		VM_BUG_ON(!pfn_valid(pfn));
+
+		if (zone_pfn_same_memmap(pfn - count, pfn))
+			page += count;
+		else
+			page = pfn_to_page(pfn);
+	}
+
+	spin_unlock_irq(&zone->lock);
+
+	/* After this, pages in the range can be freed one be one */
+	count = pfn - start;
+	pfn = start;
+	for (page = pfn_to_page(pfn); count; --count) {
+		prep_new_page(page, 0, flag);
+		++pfn;
+		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
+			++page;
+		else
+			page = pfn_to_page(pfn);
+	}
+
+	return pfn;
+}
+
+void free_contig_pages(unsigned long pfn, unsigned nr_pages)
+{
+	struct page *page = pfn_to_page(pfn);
+
+	while (nr_pages--) {
+		__free_page(page);
+		++pfn;
+		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
+			++page;
+		else
+			page = pfn_to_page(pfn);
+	}
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 3/9] mm: alloc_contig_range() added
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-06 13:54   ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit adds the alloc_contig_range() function which tries
to allocate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: renamed some variables for easier code reading]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    2 +
 mm/page_alloc.c                |  148 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 150 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index b9fc428..774ecec 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -36,6 +36,8 @@ extern void unset_migratetype_isolate(struct page *page);
 /* The below functions must be run on a range from a single zone. */
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+			      gfp_t flags);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fbfb920..8010854 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5773,6 +5773,154 @@ void free_contig_pages(unsigned long pfn, unsigned nr_pages)
 	}
 }
 
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY	5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+	int migration_failed = 0, ret;
+	unsigned long pfn = start;
+
+	/*
+	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
+	 * __alloc_contig_pages().
+	 */
+
+	/* drop all pages in pagevec and pcp list */
+	lru_add_drain_all();
+	drain_all_pages();
+
+	for (;;) {
+		pfn = scan_lru_pages(pfn, end);
+		if (!pfn || pfn >= end)
+			break;
+
+		ret = do_migrate_range(pfn, end);
+		if (!ret) {
+			migration_failed = 0;
+		} else if (ret != -EBUSY
+			|| ++migration_failed >= MIGRATION_RETRY) {
+			return ret;
+		} else {
+			/* There are unstable pages.on pagevec. */
+			lru_add_drain_all();
+			/*
+			 * there may be pages on pcplist before
+			 * we mark the range as ISOLATED.
+			 */
+			drain_all_pages();
+		}
+		cond_resched();
+	}
+
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+
+	/* Make sure all pages are isolated */
+	if (WARN_ON(test_pages_isolated(start, end)))
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ * @flags:	flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+		       gfp_t flags)
+{
+	unsigned long outer_start, outer_end;
+	int ret;
+
+	/*
+	 * What we do here is we mark all pageblocks in range as
+	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
+	 * align the range to MAX_ORDER pages so that page allocator
+	 * won't try to merge buddies from different pageblocks and
+	 * change MIGRATE_ISOLATE to some other migration type.
+	 *
+	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+	 * migrate the pages from an unaligned range (ie. pages that
+	 * we are interested in).  This will put all the pages in
+	 * range back to page allocator as MIGRATE_ISOLATE.
+	 *
+	 * When this is done, we take the pages in range from page
+	 * allocator removing them from the buddy system.  This way
+	 * page allocator will never consider using them.
+	 *
+	 * This lets us mark the pageblocks back as
+	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+	 * MAX_ORDER aligned range but not in the unaligned, original
+	 * range are put back to page allocator so that buddy can use
+	 * them.
+	 */
+
+	ret = start_isolate_page_range(pfn_to_maxpage(start),
+				       pfn_to_maxpage_up(end));
+	if (ret)
+		goto done;
+
+	ret = __alloc_contig_migrate_range(start, end);
+	if (ret)
+		goto done;
+
+	/*
+	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
+	 * more, all pages in [start, end) are free in page allocator.
+	 * What we are going to do is to allocate all pages from
+	 * [start, end) (that is remove them from page allocater).
+	 *
+	 * The only problem is that pages at the beginning and at the
+	 * end of interesting range may be not aligned with pages that
+	 * page allocator holds, ie. they can be part of higher order
+	 * pages.  Because of this, we reserve the bigger range and
+	 * once this is done free the pages we are not interested in.
+	 */
+
+	ret = 0;
+	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+		if (WARN_ON(++ret >= MAX_ORDER))
+			return -EINVAL;
+
+	outer_start = start & (~0UL << ret);
+	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	/* Free head and tail (if any) */
+	if (start != outer_start)
+		free_contig_pages(outer_start, start - outer_start);
+	if (end != outer_end)
+		free_contig_pages(end, outer_end - end);
+
+	ret = 0;
+done:
+	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	return ret;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 3/9] mm: alloc_contig_range() added
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit adds the alloc_contig_range() function which tries
to allocate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: renamed some variables for easier code reading]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    2 +
 mm/page_alloc.c                |  148 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 150 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index b9fc428..774ecec 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -36,6 +36,8 @@ extern void unset_migratetype_isolate(struct page *page);
 /* The below functions must be run on a range from a single zone. */
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+			      gfp_t flags);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fbfb920..8010854 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5773,6 +5773,154 @@ void free_contig_pages(unsigned long pfn, unsigned nr_pages)
 	}
 }
 
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY	5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+	int migration_failed = 0, ret;
+	unsigned long pfn = start;
+
+	/*
+	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
+	 * __alloc_contig_pages().
+	 */
+
+	/* drop all pages in pagevec and pcp list */
+	lru_add_drain_all();
+	drain_all_pages();
+
+	for (;;) {
+		pfn = scan_lru_pages(pfn, end);
+		if (!pfn || pfn >= end)
+			break;
+
+		ret = do_migrate_range(pfn, end);
+		if (!ret) {
+			migration_failed = 0;
+		} else if (ret != -EBUSY
+			|| ++migration_failed >= MIGRATION_RETRY) {
+			return ret;
+		} else {
+			/* There are unstable pages.on pagevec. */
+			lru_add_drain_all();
+			/*
+			 * there may be pages on pcplist before
+			 * we mark the range as ISOLATED.
+			 */
+			drain_all_pages();
+		}
+		cond_resched();
+	}
+
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+
+	/* Make sure all pages are isolated */
+	if (WARN_ON(test_pages_isolated(start, end)))
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ * @flags:	flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+		       gfp_t flags)
+{
+	unsigned long outer_start, outer_end;
+	int ret;
+
+	/*
+	 * What we do here is we mark all pageblocks in range as
+	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
+	 * align the range to MAX_ORDER pages so that page allocator
+	 * won't try to merge buddies from different pageblocks and
+	 * change MIGRATE_ISOLATE to some other migration type.
+	 *
+	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+	 * migrate the pages from an unaligned range (ie. pages that
+	 * we are interested in).  This will put all the pages in
+	 * range back to page allocator as MIGRATE_ISOLATE.
+	 *
+	 * When this is done, we take the pages in range from page
+	 * allocator removing them from the buddy system.  This way
+	 * page allocator will never consider using them.
+	 *
+	 * This lets us mark the pageblocks back as
+	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+	 * MAX_ORDER aligned range but not in the unaligned, original
+	 * range are put back to page allocator so that buddy can use
+	 * them.
+	 */
+
+	ret = start_isolate_page_range(pfn_to_maxpage(start),
+				       pfn_to_maxpage_up(end));
+	if (ret)
+		goto done;
+
+	ret = __alloc_contig_migrate_range(start, end);
+	if (ret)
+		goto done;
+
+	/*
+	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
+	 * more, all pages in [start, end) are free in page allocator.
+	 * What we are going to do is to allocate all pages from
+	 * [start, end) (that is remove them from page allocater).
+	 *
+	 * The only problem is that pages at the beginning and at the
+	 * end of interesting range may be not aligned with pages that
+	 * page allocator holds, ie. they can be part of higher order
+	 * pages.  Because of this, we reserve the bigger range and
+	 * once this is done free the pages we are not interested in.
+	 */
+
+	ret = 0;
+	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+		if (WARN_ON(++ret >= MAX_ORDER))
+			return -EINVAL;
+
+	outer_start = start & (~0UL << ret);
+	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	/* Free head and tail (if any) */
+	if (start != outer_start)
+		free_contig_pages(outer_start, start - outer_start);
+	if (end != outer_end)
+		free_contig_pages(end, outer_end - end);
+
+	ret = 0;
+done:
+	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	return ret;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 3/9] mm: alloc_contig_range() added
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit adds the alloc_contig_range() function which tries
to allocate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: renamed some variables for easier code reading]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    2 +
 mm/page_alloc.c                |  148 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 150 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index b9fc428..774ecec 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -36,6 +36,8 @@ extern void unset_migratetype_isolate(struct page *page);
 /* The below functions must be run on a range from a single zone. */
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+			      gfp_t flags);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fbfb920..8010854 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5773,6 +5773,154 @@ void free_contig_pages(unsigned long pfn, unsigned nr_pages)
 	}
 }
 
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY	5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+	int migration_failed = 0, ret;
+	unsigned long pfn = start;
+
+	/*
+	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
+	 * __alloc_contig_pages().
+	 */
+
+	/* drop all pages in pagevec and pcp list */
+	lru_add_drain_all();
+	drain_all_pages();
+
+	for (;;) {
+		pfn = scan_lru_pages(pfn, end);
+		if (!pfn || pfn >= end)
+			break;
+
+		ret = do_migrate_range(pfn, end);
+		if (!ret) {
+			migration_failed = 0;
+		} else if (ret != -EBUSY
+			|| ++migration_failed >= MIGRATION_RETRY) {
+			return ret;
+		} else {
+			/* There are unstable pages.on pagevec. */
+			lru_add_drain_all();
+			/*
+			 * there may be pages on pcplist before
+			 * we mark the range as ISOLATED.
+			 */
+			drain_all_pages();
+		}
+		cond_resched();
+	}
+
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+
+	/* Make sure all pages are isolated */
+	if (WARN_ON(test_pages_isolated(start, end)))
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ * @flags:	flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+		       gfp_t flags)
+{
+	unsigned long outer_start, outer_end;
+	int ret;
+
+	/*
+	 * What we do here is we mark all pageblocks in range as
+	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
+	 * align the range to MAX_ORDER pages so that page allocator
+	 * won't try to merge buddies from different pageblocks and
+	 * change MIGRATE_ISOLATE to some other migration type.
+	 *
+	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+	 * migrate the pages from an unaligned range (ie. pages that
+	 * we are interested in).  This will put all the pages in
+	 * range back to page allocator as MIGRATE_ISOLATE.
+	 *
+	 * When this is done, we take the pages in range from page
+	 * allocator removing them from the buddy system.  This way
+	 * page allocator will never consider using them.
+	 *
+	 * This lets us mark the pageblocks back as
+	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+	 * MAX_ORDER aligned range but not in the unaligned, original
+	 * range are put back to page allocator so that buddy can use
+	 * them.
+	 */
+
+	ret = start_isolate_page_range(pfn_to_maxpage(start),
+				       pfn_to_maxpage_up(end));
+	if (ret)
+		goto done;
+
+	ret = __alloc_contig_migrate_range(start, end);
+	if (ret)
+		goto done;
+
+	/*
+	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
+	 * more, all pages in [start, end) are free in page allocator.
+	 * What we are going to do is to allocate all pages from
+	 * [start, end) (that is remove them from page allocater).
+	 *
+	 * The only problem is that pages at the beginning and at the
+	 * end of interesting range may be not aligned with pages that
+	 * page allocator holds, ie. they can be part of higher order
+	 * pages.  Because of this, we reserve the bigger range and
+	 * once this is done free the pages we are not interested in.
+	 */
+
+	ret = 0;
+	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+		if (WARN_ON(++ret >= MAX_ORDER))
+			return -EINVAL;
+
+	outer_start = start & (~0UL << ret);
+	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	/* Free head and tail (if any) */
+	if (start != outer_start)
+		free_contig_pages(outer_start, start - outer_start);
+	if (end != outer_end)
+		free_contig_pages(end, outer_end - end);
+
+	ret = 0;
+done:
+	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	return ret;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 4/9] mm: MIGRATE_CMA migration type added
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-06 13:54   ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
[m.nazarewicz: fixed incorrect handling of pages from ISOLATE page blocks]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   41 +++++++++++++++++----
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 ++++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   79 ++++++++++++++++++++++++++++++----------
 5 files changed, 112 insertions(+), 27 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 862a834..cc34965 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 774ecec..9b6aa8a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -48,4 +48,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 10d7986..d067b84 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -192,7 +192,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -201,6 +201,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 97254e4..9cf6b2b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8010854..6758b9a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -733,6 +733,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	totalram_pages += pageblock_nr_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -841,11 +864,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -960,19 +983,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -982,11 +1015,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1058,7 +1094,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1302,7 +1341,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5592,8 +5633,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
[m.nazarewicz: fixed incorrect handling of pages from ISOLATE page blocks]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   41 +++++++++++++++++----
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 ++++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   79 ++++++++++++++++++++++++++++++----------
 5 files changed, 112 insertions(+), 27 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 862a834..cc34965 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 774ecec..9b6aa8a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -48,4 +48,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 10d7986..d067b84 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -192,7 +192,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -201,6 +201,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 97254e4..9cf6b2b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8010854..6758b9a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -733,6 +733,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	totalram_pages += pageblock_nr_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -841,11 +864,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -960,19 +983,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -982,11 +1015,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1058,7 +1094,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1302,7 +1341,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5592,8 +5633,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
[m.nazarewicz: fixed incorrect handling of pages from ISOLATE page blocks]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   41 +++++++++++++++++----
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 ++++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   79 ++++++++++++++++++++++++++++++----------
 5 files changed, 112 insertions(+), 27 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 862a834..cc34965 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 774ecec..9b6aa8a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -48,4 +48,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 10d7986..d067b84 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -192,7 +192,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -201,6 +201,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 97254e4..9cf6b2b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8010854..6758b9a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -733,6 +733,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	totalram_pages += pageblock_nr_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -841,11 +864,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -960,19 +983,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -982,11 +1015,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1058,7 +1094,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1302,7 +1341,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5592,8 +5633,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 5/9] mm: MIGRATE_CMA isolation functions added
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-06 13:54   ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
[m.nazarewicz: fixed incorrect handling of pages from ISOLATE page blocks]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |   40 +++++++++++++++++++++++++++-------------
 mm/page_alloc.c                |   31 ++++++++++++++++++++++++-------
 mm/page_isolation.c            |   17 +++++++++--------
 3 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 9b6aa8a..003c52f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,41 +3,55 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
  * free all pages in the range. test_page_isolated() can be used for
  * test it.
  */
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+	__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
 
 /* The below functions must be run on a range from a single zone. */
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags);
+			      gfp_t flags, unsigned migratetype);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6758b9a..6dd6fb5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -641,6 +641,18 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			page = list_entry(list->prev, struct page, lru);
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
+
+			/*
+			 * When page is isolated in set_migratetype_isolate()
+			 * function it's page_private is not changed since the
+			 * function has no way of knowing if it can touch it.
+			 * This means that when a page is on PCP list, it's
+			 * page_private no longer matches the desired migrate
+			 * type.
+			 */
+			if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE)
+				set_page_private(page, MIGRATE_ISOLATE);
+
 			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
 			__free_one_page(page, zone, 0, page_private(page));
 			trace_mm_page_pcpu_drain(page, 0, page_private(page));
@@ -5733,7 +5745,7 @@ out:
 	return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags;
@@ -5741,8 +5753,8 @@ void unset_migratetype_isolate(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	set_pageblock_migratetype(page, migratetype);
+	move_freepages_block(zone, page, migratetype);
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -5880,6 +5892,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
  * @flags:	flags passed to alloc_contig_freed_pages().
+ * @migratetype:	migratetype of the underlaying pageblocks (either
+ *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ *			in range must have the same migratetype and it must
+ *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, hovewer it's callers responsibility to guarantee that we
@@ -5891,7 +5907,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags)
+		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
 	int ret;
@@ -5919,8 +5935,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * them.
 	 */
 
-	ret = start_isolate_page_range(pfn_to_maxpage(start),
-				       pfn_to_maxpage_up(end));
+	ret = __start_isolate_page_range(pfn_to_maxpage(start),
+					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
 		goto done;
 
@@ -5958,7 +5974,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
-	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+				  migratetype);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 270a026..bcbed1d 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 }
 
 /*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * start_pfn/end_pfn must be aligned to pageblock_order.
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn));
+		__unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
 	return -EBUSY;
 }
@@ -67,8 +68,8 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page);
+		__unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
@@ -89,7 +90,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
  * all pages in [start_pfn...end_pfn) must be in the same zone.
  * zone->lock must be held before call this.
  *
- * Returns 1 if all pages in the range is isolated.
+ * Returns 1 if all pages in the range are isolated.
  */
 static int
 __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 5/9] mm: MIGRATE_CMA isolation functions added
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
[m.nazarewicz: fixed incorrect handling of pages from ISOLATE page blocks]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |   40 +++++++++++++++++++++++++++-------------
 mm/page_alloc.c                |   31 ++++++++++++++++++++++++-------
 mm/page_isolation.c            |   17 +++++++++--------
 3 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 9b6aa8a..003c52f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,41 +3,55 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
  * free all pages in the range. test_page_isolated() can be used for
  * test it.
  */
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+	__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
 
 /* The below functions must be run on a range from a single zone. */
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags);
+			      gfp_t flags, unsigned migratetype);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6758b9a..6dd6fb5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -641,6 +641,18 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			page = list_entry(list->prev, struct page, lru);
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
+
+			/*
+			 * When page is isolated in set_migratetype_isolate()
+			 * function it's page_private is not changed since the
+			 * function has no way of knowing if it can touch it.
+			 * This means that when a page is on PCP list, it's
+			 * page_private no longer matches the desired migrate
+			 * type.
+			 */
+			if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE)
+				set_page_private(page, MIGRATE_ISOLATE);
+
 			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
 			__free_one_page(page, zone, 0, page_private(page));
 			trace_mm_page_pcpu_drain(page, 0, page_private(page));
@@ -5733,7 +5745,7 @@ out:
 	return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags;
@@ -5741,8 +5753,8 @@ void unset_migratetype_isolate(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	set_pageblock_migratetype(page, migratetype);
+	move_freepages_block(zone, page, migratetype);
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -5880,6 +5892,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
  * @flags:	flags passed to alloc_contig_freed_pages().
+ * @migratetype:	migratetype of the underlaying pageblocks (either
+ *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ *			in range must have the same migratetype and it must
+ *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, hovewer it's callers responsibility to guarantee that we
@@ -5891,7 +5907,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags)
+		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
 	int ret;
@@ -5919,8 +5935,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * them.
 	 */
 
-	ret = start_isolate_page_range(pfn_to_maxpage(start),
-				       pfn_to_maxpage_up(end));
+	ret = __start_isolate_page_range(pfn_to_maxpage(start),
+					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
 		goto done;
 
@@ -5958,7 +5974,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
-	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+				  migratetype);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 270a026..bcbed1d 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 }
 
 /*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * start_pfn/end_pfn must be aligned to pageblock_order.
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn));
+		__unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
 	return -EBUSY;
 }
@@ -67,8 +68,8 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page);
+		__unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
@@ -89,7 +90,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
  * all pages in [start_pfn...end_pfn) must be in the same zone.
  * zone->lock must be held before call this.
  *
- * Returns 1 if all pages in the range is isolated.
+ * Returns 1 if all pages in the range are isolated.
  */
 static int
 __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 5/9] mm: MIGRATE_CMA isolation functions added
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
[m.nazarewicz: fixed incorrect handling of pages from ISOLATE page blocks]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |   40 +++++++++++++++++++++++++++-------------
 mm/page_alloc.c                |   31 ++++++++++++++++++++++++-------
 mm/page_isolation.c            |   17 +++++++++--------
 3 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 9b6aa8a..003c52f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,41 +3,55 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
  * free all pages in the range. test_page_isolated() can be used for
  * test it.
  */
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+	__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
 
 /* The below functions must be run on a range from a single zone. */
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags);
+			      gfp_t flags, unsigned migratetype);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6758b9a..6dd6fb5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -641,6 +641,18 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			page = list_entry(list->prev, struct page, lru);
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
+
+			/*
+			 * When page is isolated in set_migratetype_isolate()
+			 * function it's page_private is not changed since the
+			 * function has no way of knowing if it can touch it.
+			 * This means that when a page is on PCP list, it's
+			 * page_private no longer matches the desired migrate
+			 * type.
+			 */
+			if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE)
+				set_page_private(page, MIGRATE_ISOLATE);
+
 			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
 			__free_one_page(page, zone, 0, page_private(page));
 			trace_mm_page_pcpu_drain(page, 0, page_private(page));
@@ -5733,7 +5745,7 @@ out:
 	return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags;
@@ -5741,8 +5753,8 @@ void unset_migratetype_isolate(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	set_pageblock_migratetype(page, migratetype);
+	move_freepages_block(zone, page, migratetype);
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -5880,6 +5892,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
  * @flags:	flags passed to alloc_contig_freed_pages().
+ * @migratetype:	migratetype of the underlaying pageblocks (either
+ *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ *			in range must have the same migratetype and it must
+ *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, hovewer it's callers responsibility to guarantee that we
@@ -5891,7 +5907,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags)
+		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
 	int ret;
@@ -5919,8 +5935,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * them.
 	 */
 
-	ret = start_isolate_page_range(pfn_to_maxpage(start),
-				       pfn_to_maxpage_up(end));
+	ret = __start_isolate_page_range(pfn_to_maxpage(start),
+					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
 		goto done;
 
@@ -5958,7 +5974,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
-	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+				  migratetype);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 270a026..bcbed1d 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 }
 
 /*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * start_pfn/end_pfn must be aligned to pageblock_order.
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn));
+		__unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
 	return -EBUSY;
 }
@@ -67,8 +68,8 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page);
+		__unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
@@ -89,7 +90,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
  * all pages in [start_pfn...end_pfn) must be in the same zone.
  * zone->lock must be held before call this.
  *
- * Returns 1 if all pages in the range is isolated.
+ * Returns 1 if all pages in the range are isolated.
  */
 static int
 __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-06 13:54   ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/Kconfig                         |    3 +
 drivers/base/Kconfig                 |   79 +++++++
 drivers/base/Makefile                |    1 +
 drivers/base/dma-contiguous.c        |  386 ++++++++++++++++++++++++++++++++++
 include/asm-generic/dma-contiguous.h |   27 +++
 include/linux/device.h               |    4 +
 include/linux/dma-contiguous.h       |  106 ++++++++++
 7 files changed, 606 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/asm-generic/dma-contiguous.h
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 4b0669c..a3b39a2 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
 config HAVE_DMA_ATTRS
 	bool
 
+config HAVE_DMA_CONTIGUOUS
+	bool
+
 config USE_GENERIC_SMP_HELPERS
 	bool
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 21cf46f..a5e6d75 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,4 +174,83 @@ config SYS_HYPERVISOR
 
 source "drivers/base/regmap/Kconfig"
 
+config CMA
+	bool "Contiguous Memory Allocator (EXPERIMENTAL)"
+	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	depends on !CMA_SIZE_SEL_PERCENTAGE
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	depends on !CMA_SIZE_SEL_ABSOLUTE
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 99a375a..794546f 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..e54bb76
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,386 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/dma-contiguous.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-contiguous.h>
+
+#ifndef SZ_1M
+#define SZ_1M (1 << 20)
+#endif
+
+#ifdef phys_to_pfn
+/* nothing to do */
+#elif defined __phys_to_pfn
+#  define phys_to_pfn __phys_to_pfn
+#elif defined __va
+#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
+#else
+#  error phys_to_pfn implementation needed
+#endif
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+#ifndef CONFIG_CMA_SIZE_ABSOLUTE
+#define CONFIG_CMA_SIZE_ABSOLUTE 0
+#endif
+
+#ifndef CONFIG_CMA_SIZE_PERCENTAGE
+#define CONFIG_CMA_SIZE_PERCENTAGE 0
+#endif
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+static unsigned long __init __cma_early_get_total_pages(void)
+{
+	struct memblock_region *reg;
+	unsigned long total_pages = 0;
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+	return total_pages;
+}
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from early allocator. It should be
+ * called by arch specific code once the early allocator (memblock or bootmem)
+ * has been activated and all other subsystems have already allocated/reserved
+ * memory.
+ */
+void __init dma_contiguous_reserve(phys_addr_t limit)
+{
+	unsigned long selected_size = 0;
+	unsigned long total_pages;
+
+	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
+
+	total_pages = __cma_early_get_total_pages();
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n",
+		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
+		size_abs / SZ_1M, size_percent / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
+	selected_size = size_percent;
+#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
+	selected_size = min(size_abs, size_percent);
+#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0, limit);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+static void __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count >> pageblock_order;
+	struct zone *zone;
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		unsigned j;
+		base_pfn = pfn;
+		for (j = pageblock_nr_pages; j; --j, pfn++) {
+			VM_BUG_ON(!pfn_valid(pfn));
+			VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		}
+		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
+	} while (--i);
+}
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returned %p\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	phys_addr_t start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[MAX_CMA_AREAS] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(phys_to_pfn(r->start),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ * @limit: End address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code when early allocator (memblock or bootmem)
+ * is still activate.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t base, phys_addr_t limit)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
+		 (unsigned long)size, (unsigned long)base,
+		 (unsigned long)limit);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
+	base = ALIGN(base, alignment);
+	size = ALIGN(size, alignment);
+	limit = ALIGN(limit, alignment);
+
+	/* Reserve memory */
+	if (base) {
+		if (memblock_is_region_reserved(base, size) ||
+		    memblock_reserve(base, size) < 0) {
+			base = -EBUSY;
+			goto err;
+		}
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
+		if (!addr) {
+			base = -ENOMEM;
+			goto err;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			base = -EOVERFLOW;
+			goto err;
+		} else {
+			base = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = base;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
+	       (unsigned long)base);
+
+	/*
+	 * Architecture specific contiguous memory fixup.
+	 */
+	dma_contiguous_early_fixup(base, size);
+	return 0;
+err:
+	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
+	return base;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
+		 count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    (1 << align) - 1);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion releases memory allocated by dma_alloc_from_contiguous().
+ * It return 0 when provided pages doen't belongs to contiguous area and
+ * 1 on success.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s(page %p)\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pfn, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/asm-generic/dma-contiguous.h b/include/asm-generic/dma-contiguous.h
new file mode 100644
index 0000000..8c76649
--- /dev/null
+++ b/include/asm-generic/dma-contiguous.h
@@ -0,0 +1,27 @@
+#ifndef ASM_DMA_CONTIGUOUS_H
+#define ASM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev && dev->cma_area)
+		return dev->cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	if (dev)
+		dev->cma_area = cma;
+	dma_contiguous_default_area = cma;
+}
+
+#endif
+#endif
+#endif
diff --git a/include/linux/device.h b/include/linux/device.h
index 8bab5c4..cc1e7f0 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -592,6 +592,10 @@ struct device {
 
 	struct dma_coherent_mem	*dma_mem; /* internal for coherent mem
 					     override */
+#ifdef CONFIG_CMA
+	struct cma *cma_area;		/* contiguous memory area for dma
+					   allocations */
+#endif
 	/* arch specific additions */
 	struct dev_archdata	archdata;
 
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..7ca81c9
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,106 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+#define MAX_CMA_AREAS	(8)
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(phys_addr_t addr_limit);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base, phys_addr_t limit);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define MAX_CMA_AREAS	(0)
+
+static inline void dma_contiguous_reserve(phys_addr_t limit) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base, phys_addr_t limit)
+{
+	return -ENOSYS;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/Kconfig                         |    3 +
 drivers/base/Kconfig                 |   79 +++++++
 drivers/base/Makefile                |    1 +
 drivers/base/dma-contiguous.c        |  386 ++++++++++++++++++++++++++++++++++
 include/asm-generic/dma-contiguous.h |   27 +++
 include/linux/device.h               |    4 +
 include/linux/dma-contiguous.h       |  106 ++++++++++
 7 files changed, 606 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/asm-generic/dma-contiguous.h
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 4b0669c..a3b39a2 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
 config HAVE_DMA_ATTRS
 	bool
 
+config HAVE_DMA_CONTIGUOUS
+	bool
+
 config USE_GENERIC_SMP_HELPERS
 	bool
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 21cf46f..a5e6d75 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,4 +174,83 @@ config SYS_HYPERVISOR
 
 source "drivers/base/regmap/Kconfig"
 
+config CMA
+	bool "Contiguous Memory Allocator (EXPERIMENTAL)"
+	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	depends on !CMA_SIZE_SEL_PERCENTAGE
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	depends on !CMA_SIZE_SEL_ABSOLUTE
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 99a375a..794546f 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..e54bb76
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,386 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/dma-contiguous.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-contiguous.h>
+
+#ifndef SZ_1M
+#define SZ_1M (1 << 20)
+#endif
+
+#ifdef phys_to_pfn
+/* nothing to do */
+#elif defined __phys_to_pfn
+#  define phys_to_pfn __phys_to_pfn
+#elif defined __va
+#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
+#else
+#  error phys_to_pfn implementation needed
+#endif
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+#ifndef CONFIG_CMA_SIZE_ABSOLUTE
+#define CONFIG_CMA_SIZE_ABSOLUTE 0
+#endif
+
+#ifndef CONFIG_CMA_SIZE_PERCENTAGE
+#define CONFIG_CMA_SIZE_PERCENTAGE 0
+#endif
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+static unsigned long __init __cma_early_get_total_pages(void)
+{
+	struct memblock_region *reg;
+	unsigned long total_pages = 0;
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+	return total_pages;
+}
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from early allocator. It should be
+ * called by arch specific code once the early allocator (memblock or bootmem)
+ * has been activated and all other subsystems have already allocated/reserved
+ * memory.
+ */
+void __init dma_contiguous_reserve(phys_addr_t limit)
+{
+	unsigned long selected_size = 0;
+	unsigned long total_pages;
+
+	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
+
+	total_pages = __cma_early_get_total_pages();
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n",
+		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
+		size_abs / SZ_1M, size_percent / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
+	selected_size = size_percent;
+#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
+	selected_size = min(size_abs, size_percent);
+#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0, limit);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+static void __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count >> pageblock_order;
+	struct zone *zone;
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		unsigned j;
+		base_pfn = pfn;
+		for (j = pageblock_nr_pages; j; --j, pfn++) {
+			VM_BUG_ON(!pfn_valid(pfn));
+			VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		}
+		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
+	} while (--i);
+}
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returned %p\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	phys_addr_t start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[MAX_CMA_AREAS] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(phys_to_pfn(r->start),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ * @limit: End address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code when early allocator (memblock or bootmem)
+ * is still activate.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t base, phys_addr_t limit)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
+		 (unsigned long)size, (unsigned long)base,
+		 (unsigned long)limit);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
+	base = ALIGN(base, alignment);
+	size = ALIGN(size, alignment);
+	limit = ALIGN(limit, alignment);
+
+	/* Reserve memory */
+	if (base) {
+		if (memblock_is_region_reserved(base, size) ||
+		    memblock_reserve(base, size) < 0) {
+			base = -EBUSY;
+			goto err;
+		}
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
+		if (!addr) {
+			base = -ENOMEM;
+			goto err;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			base = -EOVERFLOW;
+			goto err;
+		} else {
+			base = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = base;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
+	       (unsigned long)base);
+
+	/*
+	 * Architecture specific contiguous memory fixup.
+	 */
+	dma_contiguous_early_fixup(base, size);
+	return 0;
+err:
+	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
+	return base;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
+		 count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    (1 << align) - 1);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion releases memory allocated by dma_alloc_from_contiguous().
+ * It return 0 when provided pages doen't belongs to contiguous area and
+ * 1 on success.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s(page %p)\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pfn, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/asm-generic/dma-contiguous.h b/include/asm-generic/dma-contiguous.h
new file mode 100644
index 0000000..8c76649
--- /dev/null
+++ b/include/asm-generic/dma-contiguous.h
@@ -0,0 +1,27 @@
+#ifndef ASM_DMA_CONTIGUOUS_H
+#define ASM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev && dev->cma_area)
+		return dev->cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	if (dev)
+		dev->cma_area = cma;
+	dma_contiguous_default_area = cma;
+}
+
+#endif
+#endif
+#endif
diff --git a/include/linux/device.h b/include/linux/device.h
index 8bab5c4..cc1e7f0 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -592,6 +592,10 @@ struct device {
 
 	struct dma_coherent_mem	*dma_mem; /* internal for coherent mem
 					     override */
+#ifdef CONFIG_CMA
+	struct cma *cma_area;		/* contiguous memory area for dma
+					   allocations */
+#endif
 	/* arch specific additions */
 	struct dev_archdata	archdata;
 
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..7ca81c9
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,106 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+#define MAX_CMA_AREAS	(8)
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(phys_addr_t addr_limit);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base, phys_addr_t limit);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define MAX_CMA_AREAS	(0)
+
+static inline void dma_contiguous_reserve(phys_addr_t limit) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base, phys_addr_t limit)
+{
+	return -ENOSYS;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/Kconfig                         |    3 +
 drivers/base/Kconfig                 |   79 +++++++
 drivers/base/Makefile                |    1 +
 drivers/base/dma-contiguous.c        |  386 ++++++++++++++++++++++++++++++++++
 include/asm-generic/dma-contiguous.h |   27 +++
 include/linux/device.h               |    4 +
 include/linux/dma-contiguous.h       |  106 ++++++++++
 7 files changed, 606 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/asm-generic/dma-contiguous.h
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 4b0669c..a3b39a2 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
 config HAVE_DMA_ATTRS
 	bool
 
+config HAVE_DMA_CONTIGUOUS
+	bool
+
 config USE_GENERIC_SMP_HELPERS
 	bool
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 21cf46f..a5e6d75 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,4 +174,83 @@ config SYS_HYPERVISOR
 
 source "drivers/base/regmap/Kconfig"
 
+config CMA
+	bool "Contiguous Memory Allocator (EXPERIMENTAL)"
+	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	depends on !CMA_SIZE_SEL_PERCENTAGE
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	depends on !CMA_SIZE_SEL_ABSOLUTE
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 99a375a..794546f 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..e54bb76
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,386 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/dma-contiguous.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-contiguous.h>
+
+#ifndef SZ_1M
+#define SZ_1M (1 << 20)
+#endif
+
+#ifdef phys_to_pfn
+/* nothing to do */
+#elif defined __phys_to_pfn
+#  define phys_to_pfn __phys_to_pfn
+#elif defined __va
+#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
+#else
+#  error phys_to_pfn implementation needed
+#endif
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+#ifndef CONFIG_CMA_SIZE_ABSOLUTE
+#define CONFIG_CMA_SIZE_ABSOLUTE 0
+#endif
+
+#ifndef CONFIG_CMA_SIZE_PERCENTAGE
+#define CONFIG_CMA_SIZE_PERCENTAGE 0
+#endif
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+static unsigned long __init __cma_early_get_total_pages(void)
+{
+	struct memblock_region *reg;
+	unsigned long total_pages = 0;
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+	return total_pages;
+}
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from early allocator. It should be
+ * called by arch specific code once the early allocator (memblock or bootmem)
+ * has been activated and all other subsystems have already allocated/reserved
+ * memory.
+ */
+void __init dma_contiguous_reserve(phys_addr_t limit)
+{
+	unsigned long selected_size = 0;
+	unsigned long total_pages;
+
+	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
+
+	total_pages = __cma_early_get_total_pages();
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n",
+		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
+		size_abs / SZ_1M, size_percent / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
+	selected_size = size_percent;
+#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
+	selected_size = min(size_abs, size_percent);
+#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0, limit);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+static void __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count >> pageblock_order;
+	struct zone *zone;
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		unsigned j;
+		base_pfn = pfn;
+		for (j = pageblock_nr_pages; j; --j, pfn++) {
+			VM_BUG_ON(!pfn_valid(pfn));
+			VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		}
+		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
+	} while (--i);
+}
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returned %p\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	phys_addr_t start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[MAX_CMA_AREAS] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(phys_to_pfn(r->start),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ * @limit: End address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code when early allocator (memblock or bootmem)
+ * is still activate.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t base, phys_addr_t limit)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
+		 (unsigned long)size, (unsigned long)base,
+		 (unsigned long)limit);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
+	base = ALIGN(base, alignment);
+	size = ALIGN(size, alignment);
+	limit = ALIGN(limit, alignment);
+
+	/* Reserve memory */
+	if (base) {
+		if (memblock_is_region_reserved(base, size) ||
+		    memblock_reserve(base, size) < 0) {
+			base = -EBUSY;
+			goto err;
+		}
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
+		if (!addr) {
+			base = -ENOMEM;
+			goto err;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			base = -EOVERFLOW;
+			goto err;
+		} else {
+			base = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = base;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
+	       (unsigned long)base);
+
+	/*
+	 * Architecture specific contiguous memory fixup.
+	 */
+	dma_contiguous_early_fixup(base, size);
+	return 0;
+err:
+	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
+	return base;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
+		 count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    (1 << align) - 1);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion releases memory allocated by dma_alloc_from_contiguous().
+ * It return 0 when provided pages doen't belongs to contiguous area and
+ * 1 on success.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s(page %p)\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pfn, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/asm-generic/dma-contiguous.h b/include/asm-generic/dma-contiguous.h
new file mode 100644
index 0000000..8c76649
--- /dev/null
+++ b/include/asm-generic/dma-contiguous.h
@@ -0,0 +1,27 @@
+#ifndef ASM_DMA_CONTIGUOUS_H
+#define ASM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev && dev->cma_area)
+		return dev->cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	if (dev)
+		dev->cma_area = cma;
+	dma_contiguous_default_area = cma;
+}
+
+#endif
+#endif
+#endif
diff --git a/include/linux/device.h b/include/linux/device.h
index 8bab5c4..cc1e7f0 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -592,6 +592,10 @@ struct device {
 
 	struct dma_coherent_mem	*dma_mem; /* internal for coherent mem
 					     override */
+#ifdef CONFIG_CMA
+	struct cma *cma_area;		/* contiguous memory area for dma
+					   allocations */
+#endif
 	/* arch specific additions */
 	struct dev_archdata	archdata;
 
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..7ca81c9
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,106 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+#define MAX_CMA_AREAS	(8)
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(phys_addr_t addr_limit);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base, phys_addr_t limit);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define MAX_CMA_AREAS	(0)
+
+static inline void dma_contiguous_reserve(phys_addr_t limit) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base, phys_addr_t limit)
+{
+	return -ENOSYS;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 7/7] ARM: integrate CMA with DMA-mapping subsystem
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-06 13:54   ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special pool which is created
early during boot. This way remapping page attributes is not needed on
allocation time.

CMA has been enabled unconditionally for ARMv6+ systems.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/device.h         |    3 +
 arch/arm/include/asm/dma-contiguous.h |   33 +++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |    8 +
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++-
 8 files changed, 366 insertions(+), 75 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 6f231d5..e1705c9 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,8 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS if (CPU_V6 || CPU_V6K || CPU_V7)
+	select CMA if (CPU_V6 || CPU_V6K || CPU_V7)
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index b5c9f5b..b1ee416 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -10,6 +10,9 @@ struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
+#ifdef CONFIG_CMA
+	struct cma *cma_area;
+#endif
 };
 
 struct omap_device;
diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..6be12ba
--- /dev/null
+++ b/arch/arm/include/asm/dma-contiguous.h
@@ -0,0 +1,33 @@
+#ifndef ASMARM_DMA_CONTIGUOUS_H
+#define ASMARM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+#define MAX_CMA_AREAS	(8)
+
+void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
+
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev && dev->archdata.cma_area)
+		return dev->archdata.cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	dev->archdata.cma_area = cma;
+}
+
+#else
+
+#define MAX_CMA_AREAS	(0)
+
+#endif
+#endif
+#endif
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index d2fedb5..05343de 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -29,6 +29,7 @@ struct map_desc {
 #define MT_MEMORY_NONCACHED	11
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
+#define MT_MEMORY_DMA_READY	14
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 66e3053..3d6a33d 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,7 +17,9 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
+#include <linux/memblock.h>
 #include <linux/slab.h>
 
 #include <asm/memory.h>
@@ -26,6 +28,8 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 #include <asm/mach/arch.h>
+#include <asm/mach/map.h>
+#include <asm/dma-contiguous.h>
 
 #include "mm.h"
 
@@ -56,6 +60,19 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+static void __dma_clear_buffer(struct page *page, size_t size)
+{
+	void *ptr;
+	/*
+	 * Ensure that the allocated pages are zeroed, and that any data
+	 * lurking in the kernel direct-mapped region is invalidated.
+	 */
+	ptr = page_address(page);
+	memset(ptr, 0, size);
+	dmac_flush_range(ptr, ptr + size);
+	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
@@ -64,23 +81,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 {
 	unsigned long order = get_order(size);
 	struct page *page, *p, *e;
-	void *ptr;
-	u64 mask = get_coherent_dma_mask(dev);
-
-#ifdef CONFIG_DMA_API_DEBUG
-	u64 limit = (mask + 1) & ~mask;
-	if (limit && size >= limit) {
-		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
-			size, mask);
-		return NULL;
-	}
-#endif
-
-	if (!mask)
-		return NULL;
-
-	if (mask < 0xffffffffULL)
-		gfp |= GFP_DMA;
 
 	page = alloc_pages(gfp, order);
 	if (!page)
@@ -93,14 +93,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
 		__free_page(p);
 
-	/*
-	 * Ensure that the allocated pages are zeroed, and that any data
-	 * lurking in the kernel direct-mapped region is invalidated.
-	 */
-	ptr = page_address(page);
-	memset(ptr, 0, size);
-	dmac_flush_range(ptr, ptr + size);
-	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+	__dma_clear_buffer(page, size);
 
 	return page;
 }
@@ -170,6 +163,9 @@ static int __init consistent_init(void)
 	unsigned long base = consistent_base;
 	unsigned long num_ptes = (CONSISTENT_END - base) >> PGDIR_SHIFT;
 
+	if (cpu_architecture() >= CPU_ARCH_ARMv6)
+		return 0;
+
 	consistent_pte = kmalloc(num_ptes * sizeof(pte_t), GFP_KERNEL);
 	if (!consistent_pte) {
 		pr_err("%s: no memory\n", __func__);
@@ -210,9 +206,102 @@ static int __init consistent_init(void)
 
 	return ret;
 }
-
 core_initcall(consistent_init);
 
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page);
+
+static struct arm_vmregion_head coherent_head = {
+	.vm_lock	= __SPIN_LOCK_UNLOCKED(&coherent_head.vm_lock),
+	.vm_list	= LIST_HEAD_INIT(coherent_head.vm_list),
+};
+
+size_t coherent_pool_size = DEFAULT_CONSISTENT_DMA_SIZE / 8;
+
+static int __init early_coherent_pool(char *p)
+{
+	coherent_pool_size = memparse(p, &p);
+	return 0;
+}
+early_param("coherent_pool", early_coherent_pool);
+
+/*
+ * Initialise the coherent pool for atomic allocations.
+ */
+static int __init coherent_init(void)
+{
+	pgprot_t prot = pgprot_dmacoherent(pgprot_kernel);
+	size_t size = coherent_pool_size;
+	struct page *page;
+	void *ptr;
+
+	if (cpu_architecture() < CPU_ARCH_ARMv6)
+		return 0;
+
+	ptr = __alloc_from_contiguous(NULL, size, prot, &page);
+	if (ptr) {
+		coherent_head.vm_start = (unsigned long) ptr;
+		coherent_head.vm_end = (unsigned long) ptr + size;
+		printk(KERN_INFO "DMA: preallocated %u KiB pool for atomic coherent allocations\n",
+		       (unsigned)size / 1024);
+		return 0;
+	}
+	printk(KERN_ERR "DMA: failed to allocate %u KiB pool for atomic coherent allocation\n",
+	       (unsigned)size / 1024);
+	return -ENOMEM;
+}
+/*
+ * CMA is activated by core_initcall, so we must be called after it
+ */
+postcore_initcall(coherent_init);
+
+struct dma_contiguous_early_reserve {
+	phys_addr_t base;
+	unsigned long size;
+};
+
+static struct dma_contiguous_early_reserve
+dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
+		phys_addr_t start = dma_mmu_remap[i].base;
+		phys_addr_t end = start + dma_mmu_remap[i].size;
+		struct map_desc map;
+		unsigned long addr;
+
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
+		if (start >= end)
+			return;
+
+		map.pfn = __phys_to_pfn(start);
+		map.virtual = __phys_to_virt(start);
+		map.length = end - start;
+		map.type = MT_MEMORY_DMA_READY;
+
+		/*
+		 * Clear previous low-memory mapping
+		 */
+		for (addr = __phys_to_virt(start); addr < __phys_to_virt(end);
+		     addr += PGDIR_SIZE)
+			pmd_clear(pmd_off_k(addr));
+
+		iotable_init(&map, 1);
+	}
+}
+
 static void *
 __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
 {
@@ -318,31 +407,178 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
 	arm_vmregion_free(&consistent_head, c);
 }
 
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	struct page *page = virt_to_page(addr);
+	pgprot_t prot = *(pgprot_t *)data;
+
+	set_pte_ext(pte, mk_pte(page, prot), 0);
+	return 0;
+}
+
+static void __dma_remap(struct page *page, size_t size, pgprot_t prot)
+{
+	unsigned long start = (unsigned long) page_address(page);
+	unsigned end = start + size;
+
+	apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
+	dsb();
+	flush_tlb_kernel_range(start, end);
+}
+
+static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
+				 pgprot_t prot, struct page **ret_page)
+{
+	struct page *page;
+	void *ptr;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	ptr = __dma_alloc_remap(page, size, gfp, prot);
+	if (!ptr) {
+		__dma_free_buffer(page, size);
+		return NULL;
+	}
+
+	*ret_page = page;
+	return ptr;
+}
+
+static void *__alloc_from_pool(struct device *dev, size_t size,
+			       struct page **ret_page)
+{
+	struct arm_vmregion *c;
+	size_t align;
+
+	if (!coherent_head.vm_start) {
+		printk(KERN_ERR "%s: coherent pool not initialised!\n",
+		       __func__);
+		dump_stack();
+		return NULL;
+	}
+
+	align = 1 << fls(size - 1);
+	c = arm_vmregion_alloc(&coherent_head, align, size, 0);
+	if (c) {
+		void *ptr = (void *)c->vm_start;
+		struct page *page = virt_to_page(ptr);
+		*ret_page = page;
+		return ptr;
+	}
+	return NULL;
+}
+
+static int __free_from_pool(void *cpu_addr, size_t size)
+{
+	unsigned long start = (unsigned long)cpu_addr;
+	unsigned long end = start + size;
+	struct arm_vmregion *c;
+
+	if (start < coherent_head.vm_start || end > coherent_head.vm_end)
+		return 0;
+
+	c = arm_vmregion_find_remove(&coherent_head, (unsigned long)start);
+
+	if ((c->vm_end - c->vm_start) != size) {
+		printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n",
+		       __func__, c->vm_end - c->vm_start, size);
+		dump_stack();
+		size = c->vm_end - c->vm_start;
+	}
+
+	arm_vmregion_free(&coherent_head, c);
+	return 1;
+}
+
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page)
+{
+	unsigned long order = get_order(size);
+	size_t count = size >> PAGE_SHIFT;
+	struct page *page;
+
+	page = dma_alloc_from_contiguous(dev, count, order);
+	if (!page)
+		return NULL;
+
+	__dma_clear_buffer(page, size);
+	__dma_remap(page, size, prot);
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+static void __free_from_contiguous(struct device *dev, struct page *page,
+				   size_t size)
+{
+	__dma_remap(page, size, pgprot_kernel);
+	dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
+}
+
+#define nommu() 0
+
 #else	/* !CONFIG_MMU */
 
-#define __dma_alloc_remap(page, size, gfp, prot)	page_address(page)
-#define __dma_free_remap(addr, size)			do { } while (0)
+#define nommu() 1
+
+#define __alloc_remap_buffer(dev, size, gfp, prot, ret)	NULL
+#define __alloc_from_pool(dev, size, ret_page)		NULL
+#define __alloc_from_contiguous(dev, size, prot, ret)	NULL
+#define __free_from_pool(cpu_addr, size)		0
+#define __free_from_contiguous(dev, page, size)		do { } while (0)
+#define __dma_free_remap(cpu_addr, size)		do { } while (0)
 
 #endif	/* CONFIG_MMU */
 
-static void *
-__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
-	    pgprot_t prot)
+static void *__alloc_simple_buffer(struct device *dev, size_t size, gfp_t gfp,
+				   struct page **ret_page)
 {
 	struct page *page;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+
+
+static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp, pgprot_t prot)
+{
+	u64 mask = get_coherent_dma_mask(dev);
+	struct page *page = NULL;
 	void *addr;
 
-	*handle = ~0;
-	size = PAGE_ALIGN(size);
+#ifdef CONFIG_DMA_API_DEBUG
+	u64 limit = (mask + 1) & ~mask;
+	if (limit && size >= limit) {
+		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
+			size, mask);
+		return NULL;
+	}
+#endif
 
-	page = __dma_alloc_buffer(dev, size, gfp);
-	if (!page)
+	if (!mask)
 		return NULL;
 
-	if (!arch_is_coherent())
-		addr = __dma_alloc_remap(page, size, gfp, prot);
+	if (mask < 0xffffffffULL)
+		gfp |= GFP_DMA;
+
+	*handle = ~0;
+	size = PAGE_ALIGN(size);
+
+	if (arch_is_coherent() || nommu())
+		addr = __alloc_simple_buffer(dev, size, gfp, &page);
+	else if (cpu_architecture() < CPU_ARCH_ARMv6)
+		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page);
+	else if (gfp & GFP_ATOMIC)
+		addr = __alloc_from_pool(dev, size, &page);
 	else
-		addr = page_address(page);
+		addr = __alloc_from_contiguous(dev, size, prot, &page);
 
 	if (addr)
 		*handle = pfn_to_dma(dev, page_to_pfn(page));
@@ -354,8 +590,8 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp)
 {
 	void *memory;
 
@@ -384,25 +620,11 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
-	unsigned long user_size, kern_size;
-	struct arm_vmregion *c;
-
-	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-
-	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
-	if (c) {
-		unsigned long off = vma->vm_pgoff;
-
-		kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;
-
-		if (off < kern_size &&
-		    user_size <= (kern_size - off)) {
-			ret = remap_pfn_range(vma, vma->vm_start,
-					      page_to_pfn(c->vm_pages) + off,
-					      user_size << PAGE_SHIFT,
-					      vma->vm_page_prot);
-		}
-	}
+	unsigned long pfn = dma_to_pfn(dev, dma_addr);
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn + vma->vm_pgoff,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
 #endif	/* CONFIG_MMU */
 
 	return ret;
@@ -424,23 +646,33 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL(dma_mmap_writecombine);
 
+
 /*
- * free a page as defined by the above mapping.
- * Must not be called with IRQs disabled.
+ * Free a buffer as defined by the above mapping.
  */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
+	struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
 
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
 	size = PAGE_ALIGN(size);
 
-	if (!arch_is_coherent())
+	if (arch_is_coherent() || nommu()) {
+		__dma_free_buffer(page, size);
+	} else if (cpu_architecture() < CPU_ARCH_ARMv6) {
 		__dma_free_remap(cpu_addr, size);
-
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+		__dma_free_buffer(page, size);
+	} else {
+		if (__free_from_pool(cpu_addr, size))
+			return;
+		/*
+		 * Non-atomic allocations cannot be freed with IRQs disabled
+		 */
+		WARN_ON(irqs_disabled());
+		__free_from_contiguous(dev, page, size);
+	}
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 34409a0..44619f0 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -20,6 +20,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -370,6 +371,13 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	/* reserve memory for DMA contigouos allocations */
+#ifdef CONFIG_ZONE_DMA
+	dma_contiguous_reserve(PHYS_OFFSET + mdesc->dma_zone_size - 1);
+#else
+	dma_contiguous_reserve(0);
+#endif
+
 	memblock_analyze();
 	memblock_dump_all();
 }
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index ad7cce3..fa95d9b 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -29,5 +29,8 @@ extern u32 arm_dma_limit;
 #define arm_dma_limit ((u32)~0)
 #endif
 
+extern phys_addr_t arm_lowmem_limit;
+
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
+void dma_contiguous_remap(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 226f180..cdc9c38 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -273,6 +273,11 @@ static struct mem_type mem_types[] = {
 		.prot_l1   = PMD_TYPE_TABLE,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_DMA_READY] = {
+		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,
+		.prot_l1   = PMD_TYPE_TABLE,
+		.domain    = DOMAIN_KERNEL,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
@@ -414,6 +419,7 @@ static void __init build_mem_type_table(void)
 	if (arch_is_coherent() && cpu_is_xsc3()) {
 		mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+		mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 	}
@@ -443,6 +449,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 		}
@@ -482,6 +489,7 @@ static void __init build_mem_type_table(void)
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
 	mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+	mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
 	mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
 	mem_types[MT_ROM].prot_sect |= cp->pmd;
 
@@ -561,7 +569,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr,
 	 * L1 entries, whereas PGDs refer to a group of L1 entries making
 	 * up one logical pointer to an L2 table.
 	 */
-	if (((addr | end | phys) & ~SECTION_MASK) == 0) {
+	if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) {
 		pmd_t *p = pmd;
 
 		if (addr & SECTION_SIZE)
@@ -757,7 +765,7 @@ static int __init early_vmalloc(char *arg)
 }
 early_param("vmalloc", early_vmalloc);
 
-static phys_addr_t lowmem_limit __initdata = 0;
+phys_addr_t arm_lowmem_limit __initdata = 0;
 
 void __init sanity_check_meminfo(void)
 {
@@ -826,8 +834,8 @@ void __init sanity_check_meminfo(void)
 			bank->size = newsize;
 		}
 #endif
-		if (!bank->highmem && bank->start + bank->size > lowmem_limit)
-			lowmem_limit = bank->start + bank->size;
+		if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit)
+			arm_lowmem_limit = bank->start + bank->size;
 
 		j++;
 	}
@@ -852,7 +860,7 @@ void __init sanity_check_meminfo(void)
 	}
 #endif
 	meminfo.nr_banks = j;
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 }
 
 static inline void prepare_page_table(void)
@@ -877,8 +885,8 @@ static inline void prepare_page_table(void)
 	 * Find the end of the first block of lowmem.
 	 */
 	end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
-	if (end >= lowmem_limit)
-		end = lowmem_limit;
+	if (end >= arm_lowmem_limit)
+		end = arm_lowmem_limit;
 
 	/*
 	 * Clear out all the kernel space mappings, except for the first
@@ -1012,8 +1020,8 @@ static void __init map_lowmem(void)
 		phys_addr_t end = start + reg->size;
 		struct map_desc map;
 
-		if (end > lowmem_limit)
-			end = lowmem_limit;
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
 		if (start >= end)
 			break;
 
@@ -1034,11 +1042,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 
 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
 
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 7/7] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special pool which is created
early during boot. This way remapping page attributes is not needed on
allocation time.

CMA has been enabled unconditionally for ARMv6+ systems.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/device.h         |    3 +
 arch/arm/include/asm/dma-contiguous.h |   33 +++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |    8 +
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++-
 8 files changed, 366 insertions(+), 75 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 6f231d5..e1705c9 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,8 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS if (CPU_V6 || CPU_V6K || CPU_V7)
+	select CMA if (CPU_V6 || CPU_V6K || CPU_V7)
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index b5c9f5b..b1ee416 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -10,6 +10,9 @@ struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
+#ifdef CONFIG_CMA
+	struct cma *cma_area;
+#endif
 };
 
 struct omap_device;
diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..6be12ba
--- /dev/null
+++ b/arch/arm/include/asm/dma-contiguous.h
@@ -0,0 +1,33 @@
+#ifndef ASMARM_DMA_CONTIGUOUS_H
+#define ASMARM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+#define MAX_CMA_AREAS	(8)
+
+void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
+
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev && dev->archdata.cma_area)
+		return dev->archdata.cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	dev->archdata.cma_area = cma;
+}
+
+#else
+
+#define MAX_CMA_AREAS	(0)
+
+#endif
+#endif
+#endif
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index d2fedb5..05343de 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -29,6 +29,7 @@ struct map_desc {
 #define MT_MEMORY_NONCACHED	11
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
+#define MT_MEMORY_DMA_READY	14
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 66e3053..3d6a33d 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,7 +17,9 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
+#include <linux/memblock.h>
 #include <linux/slab.h>
 
 #include <asm/memory.h>
@@ -26,6 +28,8 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 #include <asm/mach/arch.h>
+#include <asm/mach/map.h>
+#include <asm/dma-contiguous.h>
 
 #include "mm.h"
 
@@ -56,6 +60,19 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+static void __dma_clear_buffer(struct page *page, size_t size)
+{
+	void *ptr;
+	/*
+	 * Ensure that the allocated pages are zeroed, and that any data
+	 * lurking in the kernel direct-mapped region is invalidated.
+	 */
+	ptr = page_address(page);
+	memset(ptr, 0, size);
+	dmac_flush_range(ptr, ptr + size);
+	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
@@ -64,23 +81,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 {
 	unsigned long order = get_order(size);
 	struct page *page, *p, *e;
-	void *ptr;
-	u64 mask = get_coherent_dma_mask(dev);
-
-#ifdef CONFIG_DMA_API_DEBUG
-	u64 limit = (mask + 1) & ~mask;
-	if (limit && size >= limit) {
-		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
-			size, mask);
-		return NULL;
-	}
-#endif
-
-	if (!mask)
-		return NULL;
-
-	if (mask < 0xffffffffULL)
-		gfp |= GFP_DMA;
 
 	page = alloc_pages(gfp, order);
 	if (!page)
@@ -93,14 +93,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
 		__free_page(p);
 
-	/*
-	 * Ensure that the allocated pages are zeroed, and that any data
-	 * lurking in the kernel direct-mapped region is invalidated.
-	 */
-	ptr = page_address(page);
-	memset(ptr, 0, size);
-	dmac_flush_range(ptr, ptr + size);
-	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+	__dma_clear_buffer(page, size);
 
 	return page;
 }
@@ -170,6 +163,9 @@ static int __init consistent_init(void)
 	unsigned long base = consistent_base;
 	unsigned long num_ptes = (CONSISTENT_END - base) >> PGDIR_SHIFT;
 
+	if (cpu_architecture() >= CPU_ARCH_ARMv6)
+		return 0;
+
 	consistent_pte = kmalloc(num_ptes * sizeof(pte_t), GFP_KERNEL);
 	if (!consistent_pte) {
 		pr_err("%s: no memory\n", __func__);
@@ -210,9 +206,102 @@ static int __init consistent_init(void)
 
 	return ret;
 }
-
 core_initcall(consistent_init);
 
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page);
+
+static struct arm_vmregion_head coherent_head = {
+	.vm_lock	= __SPIN_LOCK_UNLOCKED(&coherent_head.vm_lock),
+	.vm_list	= LIST_HEAD_INIT(coherent_head.vm_list),
+};
+
+size_t coherent_pool_size = DEFAULT_CONSISTENT_DMA_SIZE / 8;
+
+static int __init early_coherent_pool(char *p)
+{
+	coherent_pool_size = memparse(p, &p);
+	return 0;
+}
+early_param("coherent_pool", early_coherent_pool);
+
+/*
+ * Initialise the coherent pool for atomic allocations.
+ */
+static int __init coherent_init(void)
+{
+	pgprot_t prot = pgprot_dmacoherent(pgprot_kernel);
+	size_t size = coherent_pool_size;
+	struct page *page;
+	void *ptr;
+
+	if (cpu_architecture() < CPU_ARCH_ARMv6)
+		return 0;
+
+	ptr = __alloc_from_contiguous(NULL, size, prot, &page);
+	if (ptr) {
+		coherent_head.vm_start = (unsigned long) ptr;
+		coherent_head.vm_end = (unsigned long) ptr + size;
+		printk(KERN_INFO "DMA: preallocated %u KiB pool for atomic coherent allocations\n",
+		       (unsigned)size / 1024);
+		return 0;
+	}
+	printk(KERN_ERR "DMA: failed to allocate %u KiB pool for atomic coherent allocation\n",
+	       (unsigned)size / 1024);
+	return -ENOMEM;
+}
+/*
+ * CMA is activated by core_initcall, so we must be called after it
+ */
+postcore_initcall(coherent_init);
+
+struct dma_contiguous_early_reserve {
+	phys_addr_t base;
+	unsigned long size;
+};
+
+static struct dma_contiguous_early_reserve
+dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
+		phys_addr_t start = dma_mmu_remap[i].base;
+		phys_addr_t end = start + dma_mmu_remap[i].size;
+		struct map_desc map;
+		unsigned long addr;
+
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
+		if (start >= end)
+			return;
+
+		map.pfn = __phys_to_pfn(start);
+		map.virtual = __phys_to_virt(start);
+		map.length = end - start;
+		map.type = MT_MEMORY_DMA_READY;
+
+		/*
+		 * Clear previous low-memory mapping
+		 */
+		for (addr = __phys_to_virt(start); addr < __phys_to_virt(end);
+		     addr += PGDIR_SIZE)
+			pmd_clear(pmd_off_k(addr));
+
+		iotable_init(&map, 1);
+	}
+}
+
 static void *
 __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
 {
@@ -318,31 +407,178 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
 	arm_vmregion_free(&consistent_head, c);
 }
 
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	struct page *page = virt_to_page(addr);
+	pgprot_t prot = *(pgprot_t *)data;
+
+	set_pte_ext(pte, mk_pte(page, prot), 0);
+	return 0;
+}
+
+static void __dma_remap(struct page *page, size_t size, pgprot_t prot)
+{
+	unsigned long start = (unsigned long) page_address(page);
+	unsigned end = start + size;
+
+	apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
+	dsb();
+	flush_tlb_kernel_range(start, end);
+}
+
+static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
+				 pgprot_t prot, struct page **ret_page)
+{
+	struct page *page;
+	void *ptr;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	ptr = __dma_alloc_remap(page, size, gfp, prot);
+	if (!ptr) {
+		__dma_free_buffer(page, size);
+		return NULL;
+	}
+
+	*ret_page = page;
+	return ptr;
+}
+
+static void *__alloc_from_pool(struct device *dev, size_t size,
+			       struct page **ret_page)
+{
+	struct arm_vmregion *c;
+	size_t align;
+
+	if (!coherent_head.vm_start) {
+		printk(KERN_ERR "%s: coherent pool not initialised!\n",
+		       __func__);
+		dump_stack();
+		return NULL;
+	}
+
+	align = 1 << fls(size - 1);
+	c = arm_vmregion_alloc(&coherent_head, align, size, 0);
+	if (c) {
+		void *ptr = (void *)c->vm_start;
+		struct page *page = virt_to_page(ptr);
+		*ret_page = page;
+		return ptr;
+	}
+	return NULL;
+}
+
+static int __free_from_pool(void *cpu_addr, size_t size)
+{
+	unsigned long start = (unsigned long)cpu_addr;
+	unsigned long end = start + size;
+	struct arm_vmregion *c;
+
+	if (start < coherent_head.vm_start || end > coherent_head.vm_end)
+		return 0;
+
+	c = arm_vmregion_find_remove(&coherent_head, (unsigned long)start);
+
+	if ((c->vm_end - c->vm_start) != size) {
+		printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n",
+		       __func__, c->vm_end - c->vm_start, size);
+		dump_stack();
+		size = c->vm_end - c->vm_start;
+	}
+
+	arm_vmregion_free(&coherent_head, c);
+	return 1;
+}
+
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page)
+{
+	unsigned long order = get_order(size);
+	size_t count = size >> PAGE_SHIFT;
+	struct page *page;
+
+	page = dma_alloc_from_contiguous(dev, count, order);
+	if (!page)
+		return NULL;
+
+	__dma_clear_buffer(page, size);
+	__dma_remap(page, size, prot);
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+static void __free_from_contiguous(struct device *dev, struct page *page,
+				   size_t size)
+{
+	__dma_remap(page, size, pgprot_kernel);
+	dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
+}
+
+#define nommu() 0
+
 #else	/* !CONFIG_MMU */
 
-#define __dma_alloc_remap(page, size, gfp, prot)	page_address(page)
-#define __dma_free_remap(addr, size)			do { } while (0)
+#define nommu() 1
+
+#define __alloc_remap_buffer(dev, size, gfp, prot, ret)	NULL
+#define __alloc_from_pool(dev, size, ret_page)		NULL
+#define __alloc_from_contiguous(dev, size, prot, ret)	NULL
+#define __free_from_pool(cpu_addr, size)		0
+#define __free_from_contiguous(dev, page, size)		do { } while (0)
+#define __dma_free_remap(cpu_addr, size)		do { } while (0)
 
 #endif	/* CONFIG_MMU */
 
-static void *
-__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
-	    pgprot_t prot)
+static void *__alloc_simple_buffer(struct device *dev, size_t size, gfp_t gfp,
+				   struct page **ret_page)
 {
 	struct page *page;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+
+
+static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp, pgprot_t prot)
+{
+	u64 mask = get_coherent_dma_mask(dev);
+	struct page *page = NULL;
 	void *addr;
 
-	*handle = ~0;
-	size = PAGE_ALIGN(size);
+#ifdef CONFIG_DMA_API_DEBUG
+	u64 limit = (mask + 1) & ~mask;
+	if (limit && size >= limit) {
+		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
+			size, mask);
+		return NULL;
+	}
+#endif
 
-	page = __dma_alloc_buffer(dev, size, gfp);
-	if (!page)
+	if (!mask)
 		return NULL;
 
-	if (!arch_is_coherent())
-		addr = __dma_alloc_remap(page, size, gfp, prot);
+	if (mask < 0xffffffffULL)
+		gfp |= GFP_DMA;
+
+	*handle = ~0;
+	size = PAGE_ALIGN(size);
+
+	if (arch_is_coherent() || nommu())
+		addr = __alloc_simple_buffer(dev, size, gfp, &page);
+	else if (cpu_architecture() < CPU_ARCH_ARMv6)
+		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page);
+	else if (gfp & GFP_ATOMIC)
+		addr = __alloc_from_pool(dev, size, &page);
 	else
-		addr = page_address(page);
+		addr = __alloc_from_contiguous(dev, size, prot, &page);
 
 	if (addr)
 		*handle = pfn_to_dma(dev, page_to_pfn(page));
@@ -354,8 +590,8 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp)
 {
 	void *memory;
 
@@ -384,25 +620,11 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
-	unsigned long user_size, kern_size;
-	struct arm_vmregion *c;
-
-	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-
-	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
-	if (c) {
-		unsigned long off = vma->vm_pgoff;
-
-		kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;
-
-		if (off < kern_size &&
-		    user_size <= (kern_size - off)) {
-			ret = remap_pfn_range(vma, vma->vm_start,
-					      page_to_pfn(c->vm_pages) + off,
-					      user_size << PAGE_SHIFT,
-					      vma->vm_page_prot);
-		}
-	}
+	unsigned long pfn = dma_to_pfn(dev, dma_addr);
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn + vma->vm_pgoff,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
 #endif	/* CONFIG_MMU */
 
 	return ret;
@@ -424,23 +646,33 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL(dma_mmap_writecombine);
 
+
 /*
- * free a page as defined by the above mapping.
- * Must not be called with IRQs disabled.
+ * Free a buffer as defined by the above mapping.
  */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
+	struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
 
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
 	size = PAGE_ALIGN(size);
 
-	if (!arch_is_coherent())
+	if (arch_is_coherent() || nommu()) {
+		__dma_free_buffer(page, size);
+	} else if (cpu_architecture() < CPU_ARCH_ARMv6) {
 		__dma_free_remap(cpu_addr, size);
-
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+		__dma_free_buffer(page, size);
+	} else {
+		if (__free_from_pool(cpu_addr, size))
+			return;
+		/*
+		 * Non-atomic allocations cannot be freed with IRQs disabled
+		 */
+		WARN_ON(irqs_disabled());
+		__free_from_contiguous(dev, page, size);
+	}
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 34409a0..44619f0 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -20,6 +20,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -370,6 +371,13 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	/* reserve memory for DMA contigouos allocations */
+#ifdef CONFIG_ZONE_DMA
+	dma_contiguous_reserve(PHYS_OFFSET + mdesc->dma_zone_size - 1);
+#else
+	dma_contiguous_reserve(0);
+#endif
+
 	memblock_analyze();
 	memblock_dump_all();
 }
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index ad7cce3..fa95d9b 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -29,5 +29,8 @@ extern u32 arm_dma_limit;
 #define arm_dma_limit ((u32)~0)
 #endif
 
+extern phys_addr_t arm_lowmem_limit;
+
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
+void dma_contiguous_remap(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 226f180..cdc9c38 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -273,6 +273,11 @@ static struct mem_type mem_types[] = {
 		.prot_l1   = PMD_TYPE_TABLE,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_DMA_READY] = {
+		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,
+		.prot_l1   = PMD_TYPE_TABLE,
+		.domain    = DOMAIN_KERNEL,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
@@ -414,6 +419,7 @@ static void __init build_mem_type_table(void)
 	if (arch_is_coherent() && cpu_is_xsc3()) {
 		mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+		mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 	}
@@ -443,6 +449,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 		}
@@ -482,6 +489,7 @@ static void __init build_mem_type_table(void)
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
 	mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+	mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
 	mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
 	mem_types[MT_ROM].prot_sect |= cp->pmd;
 
@@ -561,7 +569,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr,
 	 * L1 entries, whereas PGDs refer to a group of L1 entries making
 	 * up one logical pointer to an L2 table.
 	 */
-	if (((addr | end | phys) & ~SECTION_MASK) == 0) {
+	if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) {
 		pmd_t *p = pmd;
 
 		if (addr & SECTION_SIZE)
@@ -757,7 +765,7 @@ static int __init early_vmalloc(char *arg)
 }
 early_param("vmalloc", early_vmalloc);
 
-static phys_addr_t lowmem_limit __initdata = 0;
+phys_addr_t arm_lowmem_limit __initdata = 0;
 
 void __init sanity_check_meminfo(void)
 {
@@ -826,8 +834,8 @@ void __init sanity_check_meminfo(void)
 			bank->size = newsize;
 		}
 #endif
-		if (!bank->highmem && bank->start + bank->size > lowmem_limit)
-			lowmem_limit = bank->start + bank->size;
+		if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit)
+			arm_lowmem_limit = bank->start + bank->size;
 
 		j++;
 	}
@@ -852,7 +860,7 @@ void __init sanity_check_meminfo(void)
 	}
 #endif
 	meminfo.nr_banks = j;
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 }
 
 static inline void prepare_page_table(void)
@@ -877,8 +885,8 @@ static inline void prepare_page_table(void)
 	 * Find the end of the first block of lowmem.
 	 */
 	end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
-	if (end >= lowmem_limit)
-		end = lowmem_limit;
+	if (end >= arm_lowmem_limit)
+		end = arm_lowmem_limit;
 
 	/*
 	 * Clear out all the kernel space mappings, except for the first
@@ -1012,8 +1020,8 @@ static void __init map_lowmem(void)
 		phys_addr_t end = start + reg->size;
 		struct map_desc map;
 
-		if (end > lowmem_limit)
-			end = lowmem_limit;
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
 		if (start >= end)
 			break;
 
@@ -1034,11 +1042,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 
 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
 
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 7/7] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special pool which is created
early during boot. This way remapping page attributes is not needed on
allocation time.

CMA has been enabled unconditionally for ARMv6+ systems.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/device.h         |    3 +
 arch/arm/include/asm/dma-contiguous.h |   33 +++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |    8 +
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++-
 8 files changed, 366 insertions(+), 75 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 6f231d5..e1705c9 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,8 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS if (CPU_V6 || CPU_V6K || CPU_V7)
+	select CMA if (CPU_V6 || CPU_V6K || CPU_V7)
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index b5c9f5b..b1ee416 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -10,6 +10,9 @@ struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
+#ifdef CONFIG_CMA
+	struct cma *cma_area;
+#endif
 };
 
 struct omap_device;
diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..6be12ba
--- /dev/null
+++ b/arch/arm/include/asm/dma-contiguous.h
@@ -0,0 +1,33 @@
+#ifndef ASMARM_DMA_CONTIGUOUS_H
+#define ASMARM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+#define MAX_CMA_AREAS	(8)
+
+void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
+
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev && dev->archdata.cma_area)
+		return dev->archdata.cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	dev->archdata.cma_area = cma;
+}
+
+#else
+
+#define MAX_CMA_AREAS	(0)
+
+#endif
+#endif
+#endif
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index d2fedb5..05343de 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -29,6 +29,7 @@ struct map_desc {
 #define MT_MEMORY_NONCACHED	11
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
+#define MT_MEMORY_DMA_READY	14
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 66e3053..3d6a33d 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,7 +17,9 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
+#include <linux/memblock.h>
 #include <linux/slab.h>
 
 #include <asm/memory.h>
@@ -26,6 +28,8 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 #include <asm/mach/arch.h>
+#include <asm/mach/map.h>
+#include <asm/dma-contiguous.h>
 
 #include "mm.h"
 
@@ -56,6 +60,19 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+static void __dma_clear_buffer(struct page *page, size_t size)
+{
+	void *ptr;
+	/*
+	 * Ensure that the allocated pages are zeroed, and that any data
+	 * lurking in the kernel direct-mapped region is invalidated.
+	 */
+	ptr = page_address(page);
+	memset(ptr, 0, size);
+	dmac_flush_range(ptr, ptr + size);
+	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
@@ -64,23 +81,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 {
 	unsigned long order = get_order(size);
 	struct page *page, *p, *e;
-	void *ptr;
-	u64 mask = get_coherent_dma_mask(dev);
-
-#ifdef CONFIG_DMA_API_DEBUG
-	u64 limit = (mask + 1) & ~mask;
-	if (limit && size >= limit) {
-		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
-			size, mask);
-		return NULL;
-	}
-#endif
-
-	if (!mask)
-		return NULL;
-
-	if (mask < 0xffffffffULL)
-		gfp |= GFP_DMA;
 
 	page = alloc_pages(gfp, order);
 	if (!page)
@@ -93,14 +93,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
 		__free_page(p);
 
-	/*
-	 * Ensure that the allocated pages are zeroed, and that any data
-	 * lurking in the kernel direct-mapped region is invalidated.
-	 */
-	ptr = page_address(page);
-	memset(ptr, 0, size);
-	dmac_flush_range(ptr, ptr + size);
-	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+	__dma_clear_buffer(page, size);
 
 	return page;
 }
@@ -170,6 +163,9 @@ static int __init consistent_init(void)
 	unsigned long base = consistent_base;
 	unsigned long num_ptes = (CONSISTENT_END - base) >> PGDIR_SHIFT;
 
+	if (cpu_architecture() >= CPU_ARCH_ARMv6)
+		return 0;
+
 	consistent_pte = kmalloc(num_ptes * sizeof(pte_t), GFP_KERNEL);
 	if (!consistent_pte) {
 		pr_err("%s: no memory\n", __func__);
@@ -210,9 +206,102 @@ static int __init consistent_init(void)
 
 	return ret;
 }
-
 core_initcall(consistent_init);
 
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page);
+
+static struct arm_vmregion_head coherent_head = {
+	.vm_lock	= __SPIN_LOCK_UNLOCKED(&coherent_head.vm_lock),
+	.vm_list	= LIST_HEAD_INIT(coherent_head.vm_list),
+};
+
+size_t coherent_pool_size = DEFAULT_CONSISTENT_DMA_SIZE / 8;
+
+static int __init early_coherent_pool(char *p)
+{
+	coherent_pool_size = memparse(p, &p);
+	return 0;
+}
+early_param("coherent_pool", early_coherent_pool);
+
+/*
+ * Initialise the coherent pool for atomic allocations.
+ */
+static int __init coherent_init(void)
+{
+	pgprot_t prot = pgprot_dmacoherent(pgprot_kernel);
+	size_t size = coherent_pool_size;
+	struct page *page;
+	void *ptr;
+
+	if (cpu_architecture() < CPU_ARCH_ARMv6)
+		return 0;
+
+	ptr = __alloc_from_contiguous(NULL, size, prot, &page);
+	if (ptr) {
+		coherent_head.vm_start = (unsigned long) ptr;
+		coherent_head.vm_end = (unsigned long) ptr + size;
+		printk(KERN_INFO "DMA: preallocated %u KiB pool for atomic coherent allocations\n",
+		       (unsigned)size / 1024);
+		return 0;
+	}
+	printk(KERN_ERR "DMA: failed to allocate %u KiB pool for atomic coherent allocation\n",
+	       (unsigned)size / 1024);
+	return -ENOMEM;
+}
+/*
+ * CMA is activated by core_initcall, so we must be called after it
+ */
+postcore_initcall(coherent_init);
+
+struct dma_contiguous_early_reserve {
+	phys_addr_t base;
+	unsigned long size;
+};
+
+static struct dma_contiguous_early_reserve
+dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
+		phys_addr_t start = dma_mmu_remap[i].base;
+		phys_addr_t end = start + dma_mmu_remap[i].size;
+		struct map_desc map;
+		unsigned long addr;
+
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
+		if (start >= end)
+			return;
+
+		map.pfn = __phys_to_pfn(start);
+		map.virtual = __phys_to_virt(start);
+		map.length = end - start;
+		map.type = MT_MEMORY_DMA_READY;
+
+		/*
+		 * Clear previous low-memory mapping
+		 */
+		for (addr = __phys_to_virt(start); addr < __phys_to_virt(end);
+		     addr += PGDIR_SIZE)
+			pmd_clear(pmd_off_k(addr));
+
+		iotable_init(&map, 1);
+	}
+}
+
 static void *
 __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
 {
@@ -318,31 +407,178 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
 	arm_vmregion_free(&consistent_head, c);
 }
 
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	struct page *page = virt_to_page(addr);
+	pgprot_t prot = *(pgprot_t *)data;
+
+	set_pte_ext(pte, mk_pte(page, prot), 0);
+	return 0;
+}
+
+static void __dma_remap(struct page *page, size_t size, pgprot_t prot)
+{
+	unsigned long start = (unsigned long) page_address(page);
+	unsigned end = start + size;
+
+	apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
+	dsb();
+	flush_tlb_kernel_range(start, end);
+}
+
+static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
+				 pgprot_t prot, struct page **ret_page)
+{
+	struct page *page;
+	void *ptr;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	ptr = __dma_alloc_remap(page, size, gfp, prot);
+	if (!ptr) {
+		__dma_free_buffer(page, size);
+		return NULL;
+	}
+
+	*ret_page = page;
+	return ptr;
+}
+
+static void *__alloc_from_pool(struct device *dev, size_t size,
+			       struct page **ret_page)
+{
+	struct arm_vmregion *c;
+	size_t align;
+
+	if (!coherent_head.vm_start) {
+		printk(KERN_ERR "%s: coherent pool not initialised!\n",
+		       __func__);
+		dump_stack();
+		return NULL;
+	}
+
+	align = 1 << fls(size - 1);
+	c = arm_vmregion_alloc(&coherent_head, align, size, 0);
+	if (c) {
+		void *ptr = (void *)c->vm_start;
+		struct page *page = virt_to_page(ptr);
+		*ret_page = page;
+		return ptr;
+	}
+	return NULL;
+}
+
+static int __free_from_pool(void *cpu_addr, size_t size)
+{
+	unsigned long start = (unsigned long)cpu_addr;
+	unsigned long end = start + size;
+	struct arm_vmregion *c;
+
+	if (start < coherent_head.vm_start || end > coherent_head.vm_end)
+		return 0;
+
+	c = arm_vmregion_find_remove(&coherent_head, (unsigned long)start);
+
+	if ((c->vm_end - c->vm_start) != size) {
+		printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n",
+		       __func__, c->vm_end - c->vm_start, size);
+		dump_stack();
+		size = c->vm_end - c->vm_start;
+	}
+
+	arm_vmregion_free(&coherent_head, c);
+	return 1;
+}
+
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page)
+{
+	unsigned long order = get_order(size);
+	size_t count = size >> PAGE_SHIFT;
+	struct page *page;
+
+	page = dma_alloc_from_contiguous(dev, count, order);
+	if (!page)
+		return NULL;
+
+	__dma_clear_buffer(page, size);
+	__dma_remap(page, size, prot);
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+static void __free_from_contiguous(struct device *dev, struct page *page,
+				   size_t size)
+{
+	__dma_remap(page, size, pgprot_kernel);
+	dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
+}
+
+#define nommu() 0
+
 #else	/* !CONFIG_MMU */
 
-#define __dma_alloc_remap(page, size, gfp, prot)	page_address(page)
-#define __dma_free_remap(addr, size)			do { } while (0)
+#define nommu() 1
+
+#define __alloc_remap_buffer(dev, size, gfp, prot, ret)	NULL
+#define __alloc_from_pool(dev, size, ret_page)		NULL
+#define __alloc_from_contiguous(dev, size, prot, ret)	NULL
+#define __free_from_pool(cpu_addr, size)		0
+#define __free_from_contiguous(dev, page, size)		do { } while (0)
+#define __dma_free_remap(cpu_addr, size)		do { } while (0)
 
 #endif	/* CONFIG_MMU */
 
-static void *
-__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
-	    pgprot_t prot)
+static void *__alloc_simple_buffer(struct device *dev, size_t size, gfp_t gfp,
+				   struct page **ret_page)
 {
 	struct page *page;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+
+
+static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp, pgprot_t prot)
+{
+	u64 mask = get_coherent_dma_mask(dev);
+	struct page *page = NULL;
 	void *addr;
 
-	*handle = ~0;
-	size = PAGE_ALIGN(size);
+#ifdef CONFIG_DMA_API_DEBUG
+	u64 limit = (mask + 1) & ~mask;
+	if (limit && size >= limit) {
+		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
+			size, mask);
+		return NULL;
+	}
+#endif
 
-	page = __dma_alloc_buffer(dev, size, gfp);
-	if (!page)
+	if (!mask)
 		return NULL;
 
-	if (!arch_is_coherent())
-		addr = __dma_alloc_remap(page, size, gfp, prot);
+	if (mask < 0xffffffffULL)
+		gfp |= GFP_DMA;
+
+	*handle = ~0;
+	size = PAGE_ALIGN(size);
+
+	if (arch_is_coherent() || nommu())
+		addr = __alloc_simple_buffer(dev, size, gfp, &page);
+	else if (cpu_architecture() < CPU_ARCH_ARMv6)
+		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page);
+	else if (gfp & GFP_ATOMIC)
+		addr = __alloc_from_pool(dev, size, &page);
 	else
-		addr = page_address(page);
+		addr = __alloc_from_contiguous(dev, size, prot, &page);
 
 	if (addr)
 		*handle = pfn_to_dma(dev, page_to_pfn(page));
@@ -354,8 +590,8 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp)
 {
 	void *memory;
 
@@ -384,25 +620,11 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
-	unsigned long user_size, kern_size;
-	struct arm_vmregion *c;
-
-	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-
-	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
-	if (c) {
-		unsigned long off = vma->vm_pgoff;
-
-		kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;
-
-		if (off < kern_size &&
-		    user_size <= (kern_size - off)) {
-			ret = remap_pfn_range(vma, vma->vm_start,
-					      page_to_pfn(c->vm_pages) + off,
-					      user_size << PAGE_SHIFT,
-					      vma->vm_page_prot);
-		}
-	}
+	unsigned long pfn = dma_to_pfn(dev, dma_addr);
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn + vma->vm_pgoff,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
 #endif	/* CONFIG_MMU */
 
 	return ret;
@@ -424,23 +646,33 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL(dma_mmap_writecombine);
 
+
 /*
- * free a page as defined by the above mapping.
- * Must not be called with IRQs disabled.
+ * Free a buffer as defined by the above mapping.
  */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
+	struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
 
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
 	size = PAGE_ALIGN(size);
 
-	if (!arch_is_coherent())
+	if (arch_is_coherent() || nommu()) {
+		__dma_free_buffer(page, size);
+	} else if (cpu_architecture() < CPU_ARCH_ARMv6) {
 		__dma_free_remap(cpu_addr, size);
-
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+		__dma_free_buffer(page, size);
+	} else {
+		if (__free_from_pool(cpu_addr, size))
+			return;
+		/*
+		 * Non-atomic allocations cannot be freed with IRQs disabled
+		 */
+		WARN_ON(irqs_disabled());
+		__free_from_contiguous(dev, page, size);
+	}
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 34409a0..44619f0 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -20,6 +20,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -370,6 +371,13 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	/* reserve memory for DMA contigouos allocations */
+#ifdef CONFIG_ZONE_DMA
+	dma_contiguous_reserve(PHYS_OFFSET + mdesc->dma_zone_size - 1);
+#else
+	dma_contiguous_reserve(0);
+#endif
+
 	memblock_analyze();
 	memblock_dump_all();
 }
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index ad7cce3..fa95d9b 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -29,5 +29,8 @@ extern u32 arm_dma_limit;
 #define arm_dma_limit ((u32)~0)
 #endif
 
+extern phys_addr_t arm_lowmem_limit;
+
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
+void dma_contiguous_remap(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 226f180..cdc9c38 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -273,6 +273,11 @@ static struct mem_type mem_types[] = {
 		.prot_l1   = PMD_TYPE_TABLE,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_DMA_READY] = {
+		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,
+		.prot_l1   = PMD_TYPE_TABLE,
+		.domain    = DOMAIN_KERNEL,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
@@ -414,6 +419,7 @@ static void __init build_mem_type_table(void)
 	if (arch_is_coherent() && cpu_is_xsc3()) {
 		mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+		mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 	}
@@ -443,6 +449,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 		}
@@ -482,6 +489,7 @@ static void __init build_mem_type_table(void)
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
 	mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+	mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
 	mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
 	mem_types[MT_ROM].prot_sect |= cp->pmd;
 
@@ -561,7 +569,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr,
 	 * L1 entries, whereas PGDs refer to a group of L1 entries making
 	 * up one logical pointer to an L2 table.
 	 */
-	if (((addr | end | phys) & ~SECTION_MASK) == 0) {
+	if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) {
 		pmd_t *p = pmd;
 
 		if (addr & SECTION_SIZE)
@@ -757,7 +765,7 @@ static int __init early_vmalloc(char *arg)
 }
 early_param("vmalloc", early_vmalloc);
 
-static phys_addr_t lowmem_limit __initdata = 0;
+phys_addr_t arm_lowmem_limit __initdata = 0;
 
 void __init sanity_check_meminfo(void)
 {
@@ -826,8 +834,8 @@ void __init sanity_check_meminfo(void)
 			bank->size = newsize;
 		}
 #endif
-		if (!bank->highmem && bank->start + bank->size > lowmem_limit)
-			lowmem_limit = bank->start + bank->size;
+		if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit)
+			arm_lowmem_limit = bank->start + bank->size;
 
 		j++;
 	}
@@ -852,7 +860,7 @@ void __init sanity_check_meminfo(void)
 	}
 #endif
 	meminfo.nr_banks = j;
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 }
 
 static inline void prepare_page_table(void)
@@ -877,8 +885,8 @@ static inline void prepare_page_table(void)
 	 * Find the end of the first block of lowmem.
 	 */
 	end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
-	if (end >= lowmem_limit)
-		end = lowmem_limit;
+	if (end >= arm_lowmem_limit)
+		end = arm_lowmem_limit;
 
 	/*
 	 * Clear out all the kernel space mappings, except for the first
@@ -1012,8 +1020,8 @@ static void __init map_lowmem(void)
 		phys_addr_t end = start + reg->size;
 		struct map_desc map;
 
-		if (end > lowmem_limit)
-			end = lowmem_limit;
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
 		if (start >= end)
 			break;
 
@@ -1034,11 +1042,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 
 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 7/9] X86: integrate CMA with DMA-mapping subsystem
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-06 13:54   ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

This patch adds support for CMA to dma-mapping subsystem for x86
architecture that uses common pci-dma/pci-nommu implementation. This
allows to test CMA on KVM/QEMU and a lot of common x86 boxes.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/x86/Kconfig                      |    1 +
 arch/x86/include/asm/dma-contiguous.h |   13 +++++++++++++
 arch/x86/include/asm/dma-mapping.h    |    4 ++++
 arch/x86/kernel/pci-dma.c             |   18 ++++++++++++++++--
 arch/x86/kernel/pci-nommu.c           |    8 +-------
 arch/x86/kernel/setup.c               |    2 ++
 6 files changed, 37 insertions(+), 9 deletions(-)
 create mode 100644 arch/x86/include/asm/dma-contiguous.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 82830ef..b120c80 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -29,6 +29,7 @@ config X86
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select ARCH_WANT_FRAME_POINTERS
 	select HAVE_DMA_ATTRS
+	select HAVE_DMA_CONTIGUOUS if !SWIOTLB
 	select HAVE_KRETPROBES
 	select HAVE_OPTPROBES
 	select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/x86/include/asm/dma-contiguous.h b/arch/x86/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..8fb117d
--- /dev/null
+++ b/arch/x86/include/asm/dma-contiguous.h
@@ -0,0 +1,13 @@
+#ifndef ASMX86_DMA_CONTIGUOUS_H
+#define ASMX86_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+#include <asm-generic/dma-contiguous.h>
+
+static inline void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size) { }
+
+#endif
+#endif
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index ed3065f..90ac6f0 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -13,6 +13,7 @@
 #include <asm/io.h>
 #include <asm/swiotlb.h>
 #include <asm-generic/dma-coherent.h>
+#include <linux/dma-contiguous.h>
 
 #ifdef CONFIG_ISA
 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -61,6 +62,9 @@ extern int dma_set_mask(struct device *dev, u64 mask);
 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 					dma_addr_t *dma_addr, gfp_t flag);
 
+extern void dma_generic_free_coherent(struct device *dev, size_t size,
+				      void *vaddr, dma_addr_t dma_addr);
+
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
 {
 	if (!dev->dma_mask)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 80dc793..f4abafc 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -90,14 +90,18 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 				 dma_addr_t *dma_addr, gfp_t flag)
 {
 	unsigned long dma_mask;
-	struct page *page;
+	struct page *page = NULL;
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
 	dma_addr_t addr;
 
 	dma_mask = dma_alloc_coherent_mask(dev, flag);
 
 	flag |= __GFP_ZERO;
 again:
-	page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
+	if (!(flag & GFP_ATOMIC))
+		page = dma_alloc_from_contiguous(dev, count, get_order(size));
+	if (!page)
+		page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
 	if (!page)
 		return NULL;
 
@@ -117,6 +121,16 @@ again:
 	return page_address(page);
 }
 
+void dma_generic_free_coherent(struct device *dev, size_t size, void *vaddr,
+			       dma_addr_t dma_addr)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct page *page = virt_to_page(vaddr);
+
+	if (!dma_release_from_contiguous(dev, page, count))
+		free_pages((unsigned long)vaddr, get_order(size));
+}
+
 /*
  * See <Documentation/x86/x86_64/boot-options.txt> for the iommu kernel
  * parameter documentation.
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index 3af4af8..656566f 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -74,12 +74,6 @@ static int nommu_map_sg(struct device *hwdev, struct scatterlist *sg,
 	return nents;
 }
 
-static void nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
-				dma_addr_t dma_addr)
-{
-	free_pages((unsigned long)vaddr, get_order(size));
-}
-
 static void nommu_sync_single_for_device(struct device *dev,
 			dma_addr_t addr, size_t size,
 			enum dma_data_direction dir)
@@ -97,7 +91,7 @@ static void nommu_sync_sg_for_device(struct device *dev,
 
 struct dma_map_ops nommu_dma_ops = {
 	.alloc_coherent		= dma_generic_alloc_coherent,
-	.free_coherent		= nommu_free_coherent,
+	.free_coherent		= dma_generic_free_coherent,
 	.map_sg			= nommu_map_sg,
 	.map_page		= nommu_map_page,
 	.sync_single_for_device = nommu_sync_single_for_device,
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index afaf384..6c9efb5 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -50,6 +50,7 @@
 #include <asm/pci-direct.h>
 #include <linux/init_ohci1394_dma.h>
 #include <linux/kvm_para.h>
+#include <linux/dma-contiguous.h>
 
 #include <linux/errno.h>
 #include <linux/kernel.h>
@@ -944,6 +945,7 @@ void __init setup_arch(char **cmdline_p)
 	}
 #endif
 	memblock.current_limit = get_max_mapped();
+	dma_contiguous_reserve(0);
 
 	/*
 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 7/9] X86: integrate CMA with DMA-mapping subsystem
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

This patch adds support for CMA to dma-mapping subsystem for x86
architecture that uses common pci-dma/pci-nommu implementation. This
allows to test CMA on KVM/QEMU and a lot of common x86 boxes.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/x86/Kconfig                      |    1 +
 arch/x86/include/asm/dma-contiguous.h |   13 +++++++++++++
 arch/x86/include/asm/dma-mapping.h    |    4 ++++
 arch/x86/kernel/pci-dma.c             |   18 ++++++++++++++++--
 arch/x86/kernel/pci-nommu.c           |    8 +-------
 arch/x86/kernel/setup.c               |    2 ++
 6 files changed, 37 insertions(+), 9 deletions(-)
 create mode 100644 arch/x86/include/asm/dma-contiguous.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 82830ef..b120c80 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -29,6 +29,7 @@ config X86
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select ARCH_WANT_FRAME_POINTERS
 	select HAVE_DMA_ATTRS
+	select HAVE_DMA_CONTIGUOUS if !SWIOTLB
 	select HAVE_KRETPROBES
 	select HAVE_OPTPROBES
 	select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/x86/include/asm/dma-contiguous.h b/arch/x86/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..8fb117d
--- /dev/null
+++ b/arch/x86/include/asm/dma-contiguous.h
@@ -0,0 +1,13 @@
+#ifndef ASMX86_DMA_CONTIGUOUS_H
+#define ASMX86_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+#include <asm-generic/dma-contiguous.h>
+
+static inline void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size) { }
+
+#endif
+#endif
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index ed3065f..90ac6f0 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -13,6 +13,7 @@
 #include <asm/io.h>
 #include <asm/swiotlb.h>
 #include <asm-generic/dma-coherent.h>
+#include <linux/dma-contiguous.h>
 
 #ifdef CONFIG_ISA
 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -61,6 +62,9 @@ extern int dma_set_mask(struct device *dev, u64 mask);
 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 					dma_addr_t *dma_addr, gfp_t flag);
 
+extern void dma_generic_free_coherent(struct device *dev, size_t size,
+				      void *vaddr, dma_addr_t dma_addr);
+
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
 {
 	if (!dev->dma_mask)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 80dc793..f4abafc 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -90,14 +90,18 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 				 dma_addr_t *dma_addr, gfp_t flag)
 {
 	unsigned long dma_mask;
-	struct page *page;
+	struct page *page = NULL;
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
 	dma_addr_t addr;
 
 	dma_mask = dma_alloc_coherent_mask(dev, flag);
 
 	flag |= __GFP_ZERO;
 again:
-	page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
+	if (!(flag & GFP_ATOMIC))
+		page = dma_alloc_from_contiguous(dev, count, get_order(size));
+	if (!page)
+		page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
 	if (!page)
 		return NULL;
 
@@ -117,6 +121,16 @@ again:
 	return page_address(page);
 }
 
+void dma_generic_free_coherent(struct device *dev, size_t size, void *vaddr,
+			       dma_addr_t dma_addr)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct page *page = virt_to_page(vaddr);
+
+	if (!dma_release_from_contiguous(dev, page, count))
+		free_pages((unsigned long)vaddr, get_order(size));
+}
+
 /*
  * See <Documentation/x86/x86_64/boot-options.txt> for the iommu kernel
  * parameter documentation.
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index 3af4af8..656566f 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -74,12 +74,6 @@ static int nommu_map_sg(struct device *hwdev, struct scatterlist *sg,
 	return nents;
 }
 
-static void nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
-				dma_addr_t dma_addr)
-{
-	free_pages((unsigned long)vaddr, get_order(size));
-}
-
 static void nommu_sync_single_for_device(struct device *dev,
 			dma_addr_t addr, size_t size,
 			enum dma_data_direction dir)
@@ -97,7 +91,7 @@ static void nommu_sync_sg_for_device(struct device *dev,
 
 struct dma_map_ops nommu_dma_ops = {
 	.alloc_coherent		= dma_generic_alloc_coherent,
-	.free_coherent		= nommu_free_coherent,
+	.free_coherent		= dma_generic_free_coherent,
 	.map_sg			= nommu_map_sg,
 	.map_page		= nommu_map_page,
 	.sync_single_for_device = nommu_sync_single_for_device,
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index afaf384..6c9efb5 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -50,6 +50,7 @@
 #include <asm/pci-direct.h>
 #include <linux/init_ohci1394_dma.h>
 #include <linux/kvm_para.h>
+#include <linux/dma-contiguous.h>
 
 #include <linux/errno.h>
 #include <linux/kernel.h>
@@ -944,6 +945,7 @@ void __init setup_arch(char **cmdline_p)
 	}
 #endif
 	memblock.current_limit = get_max_mapped();
+	dma_contiguous_reserve(0);
 
 	/*
 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 7/9] X86: integrate CMA with DMA-mapping subsystem
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for CMA to dma-mapping subsystem for x86
architecture that uses common pci-dma/pci-nommu implementation. This
allows to test CMA on KVM/QEMU and a lot of common x86 boxes.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/x86/Kconfig                      |    1 +
 arch/x86/include/asm/dma-contiguous.h |   13 +++++++++++++
 arch/x86/include/asm/dma-mapping.h    |    4 ++++
 arch/x86/kernel/pci-dma.c             |   18 ++++++++++++++++--
 arch/x86/kernel/pci-nommu.c           |    8 +-------
 arch/x86/kernel/setup.c               |    2 ++
 6 files changed, 37 insertions(+), 9 deletions(-)
 create mode 100644 arch/x86/include/asm/dma-contiguous.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 82830ef..b120c80 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -29,6 +29,7 @@ config X86
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select ARCH_WANT_FRAME_POINTERS
 	select HAVE_DMA_ATTRS
+	select HAVE_DMA_CONTIGUOUS if !SWIOTLB
 	select HAVE_KRETPROBES
 	select HAVE_OPTPROBES
 	select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/x86/include/asm/dma-contiguous.h b/arch/x86/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..8fb117d
--- /dev/null
+++ b/arch/x86/include/asm/dma-contiguous.h
@@ -0,0 +1,13 @@
+#ifndef ASMX86_DMA_CONTIGUOUS_H
+#define ASMX86_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+#include <asm-generic/dma-contiguous.h>
+
+static inline void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size) { }
+
+#endif
+#endif
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index ed3065f..90ac6f0 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -13,6 +13,7 @@
 #include <asm/io.h>
 #include <asm/swiotlb.h>
 #include <asm-generic/dma-coherent.h>
+#include <linux/dma-contiguous.h>
 
 #ifdef CONFIG_ISA
 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -61,6 +62,9 @@ extern int dma_set_mask(struct device *dev, u64 mask);
 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 					dma_addr_t *dma_addr, gfp_t flag);
 
+extern void dma_generic_free_coherent(struct device *dev, size_t size,
+				      void *vaddr, dma_addr_t dma_addr);
+
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
 {
 	if (!dev->dma_mask)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 80dc793..f4abafc 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -90,14 +90,18 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 				 dma_addr_t *dma_addr, gfp_t flag)
 {
 	unsigned long dma_mask;
-	struct page *page;
+	struct page *page = NULL;
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
 	dma_addr_t addr;
 
 	dma_mask = dma_alloc_coherent_mask(dev, flag);
 
 	flag |= __GFP_ZERO;
 again:
-	page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
+	if (!(flag & GFP_ATOMIC))
+		page = dma_alloc_from_contiguous(dev, count, get_order(size));
+	if (!page)
+		page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
 	if (!page)
 		return NULL;
 
@@ -117,6 +121,16 @@ again:
 	return page_address(page);
 }
 
+void dma_generic_free_coherent(struct device *dev, size_t size, void *vaddr,
+			       dma_addr_t dma_addr)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct page *page = virt_to_page(vaddr);
+
+	if (!dma_release_from_contiguous(dev, page, count))
+		free_pages((unsigned long)vaddr, get_order(size));
+}
+
 /*
  * See <Documentation/x86/x86_64/boot-options.txt> for the iommu kernel
  * parameter documentation.
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index 3af4af8..656566f 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -74,12 +74,6 @@ static int nommu_map_sg(struct device *hwdev, struct scatterlist *sg,
 	return nents;
 }
 
-static void nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
-				dma_addr_t dma_addr)
-{
-	free_pages((unsigned long)vaddr, get_order(size));
-}
-
 static void nommu_sync_single_for_device(struct device *dev,
 			dma_addr_t addr, size_t size,
 			enum dma_data_direction dir)
@@ -97,7 +91,7 @@ static void nommu_sync_sg_for_device(struct device *dev,
 
 struct dma_map_ops nommu_dma_ops = {
 	.alloc_coherent		= dma_generic_alloc_coherent,
-	.free_coherent		= nommu_free_coherent,
+	.free_coherent		= dma_generic_free_coherent,
 	.map_sg			= nommu_map_sg,
 	.map_page		= nommu_map_page,
 	.sync_single_for_device = nommu_sync_single_for_device,
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index afaf384..6c9efb5 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -50,6 +50,7 @@
 #include <asm/pci-direct.h>
 #include <linux/init_ohci1394_dma.h>
 #include <linux/kvm_para.h>
+#include <linux/dma-contiguous.h>
 
 #include <linux/errno.h>
 #include <linux/kernel.h>
@@ -944,6 +945,7 @@ void __init setup_arch(char **cmdline_p)
 	}
 #endif
 	memblock.current_limit = get_max_mapped();
+	dma_contiguous_reserve(0);
 
 	/*
 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-06 13:54   ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special pool which is created
early during boot. This way remapping page attributes is not needed on
allocation time.

CMA has been enabled unconditionally for ARMv6+ systems.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/dma-contiguous.h |   16 ++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |    8 +
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++-
 7 files changed, 346 insertions(+), 75 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 92870e3..d0abf54 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,8 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS if (CPU_V6 || CPU_V6K || CPU_V7)
+	select CMA if (CPU_V6 || CPU_V6K || CPU_V7)
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..c7ba05e
--- /dev/null
+++ b/arch/arm/include/asm/dma-contiguous.h
@@ -0,0 +1,16 @@
+#ifndef ASMARM_DMA_CONTIGUOUS_H
+#define ASMARM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+#include <asm-generic/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
+
+#endif
+#endif
+#endif
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index b36f365..a6efcdd 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -30,6 +30,7 @@ struct map_desc {
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
 #define MT_MEMORY_SO		14
+#define MT_MEMORY_DMA_READY	15
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e4e7f6c..879a658 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,7 +17,9 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
+#include <linux/memblock.h>
 #include <linux/slab.h>
 
 #include <asm/memory.h>
@@ -26,6 +28,8 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 #include <asm/mach/arch.h>
+#include <asm/mach/map.h>
+#include <asm/dma-contiguous.h>
 
 #include "mm.h"
 
@@ -56,6 +60,19 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+static void __dma_clear_buffer(struct page *page, size_t size)
+{
+	void *ptr;
+	/*
+	 * Ensure that the allocated pages are zeroed, and that any data
+	 * lurking in the kernel direct-mapped region is invalidated.
+	 */
+	ptr = page_address(page);
+	memset(ptr, 0, size);
+	dmac_flush_range(ptr, ptr + size);
+	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
@@ -64,23 +81,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 {
 	unsigned long order = get_order(size);
 	struct page *page, *p, *e;
-	void *ptr;
-	u64 mask = get_coherent_dma_mask(dev);
-
-#ifdef CONFIG_DMA_API_DEBUG
-	u64 limit = (mask + 1) & ~mask;
-	if (limit && size >= limit) {
-		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
-			size, mask);
-		return NULL;
-	}
-#endif
-
-	if (!mask)
-		return NULL;
-
-	if (mask < 0xffffffffULL)
-		gfp |= GFP_DMA;
 
 	page = alloc_pages(gfp, order);
 	if (!page)
@@ -93,14 +93,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
 		__free_page(p);
 
-	/*
-	 * Ensure that the allocated pages are zeroed, and that any data
-	 * lurking in the kernel direct-mapped region is invalidated.
-	 */
-	ptr = page_address(page);
-	memset(ptr, 0, size);
-	dmac_flush_range(ptr, ptr + size);
-	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+	__dma_clear_buffer(page, size);
 
 	return page;
 }
@@ -170,6 +163,9 @@ static int __init consistent_init(void)
 	unsigned long base = consistent_base;
 	unsigned long num_ptes = (CONSISTENT_END - base) >> PGDIR_SHIFT;
 
+	if (cpu_architecture() >= CPU_ARCH_ARMv6)
+		return 0;
+
 	consistent_pte = kmalloc(num_ptes * sizeof(pte_t), GFP_KERNEL);
 	if (!consistent_pte) {
 		pr_err("%s: no memory\n", __func__);
@@ -210,9 +206,102 @@ static int __init consistent_init(void)
 
 	return ret;
 }
-
 core_initcall(consistent_init);
 
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page);
+
+static struct arm_vmregion_head coherent_head = {
+	.vm_lock	= __SPIN_LOCK_UNLOCKED(&coherent_head.vm_lock),
+	.vm_list	= LIST_HEAD_INIT(coherent_head.vm_list),
+};
+
+size_t coherent_pool_size = DEFAULT_CONSISTENT_DMA_SIZE / 8;
+
+static int __init early_coherent_pool(char *p)
+{
+	coherent_pool_size = memparse(p, &p);
+	return 0;
+}
+early_param("coherent_pool", early_coherent_pool);
+
+/*
+ * Initialise the coherent pool for atomic allocations.
+ */
+static int __init coherent_init(void)
+{
+	pgprot_t prot = pgprot_dmacoherent(pgprot_kernel);
+	size_t size = coherent_pool_size;
+	struct page *page;
+	void *ptr;
+
+	if (cpu_architecture() < CPU_ARCH_ARMv6)
+		return 0;
+
+	ptr = __alloc_from_contiguous(NULL, size, prot, &page);
+	if (ptr) {
+		coherent_head.vm_start = (unsigned long) ptr;
+		coherent_head.vm_end = (unsigned long) ptr + size;
+		printk(KERN_INFO "DMA: preallocated %u KiB pool for atomic coherent allocations\n",
+		       (unsigned)size / 1024);
+		return 0;
+	}
+	printk(KERN_ERR "DMA: failed to allocate %u KiB pool for atomic coherent allocation\n",
+	       (unsigned)size / 1024);
+	return -ENOMEM;
+}
+/*
+ * CMA is activated by core_initcall, so we must be called after it
+ */
+postcore_initcall(coherent_init);
+
+struct dma_contiguous_early_reserve {
+	phys_addr_t base;
+	unsigned long size;
+};
+
+static struct dma_contiguous_early_reserve
+dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
+		phys_addr_t start = dma_mmu_remap[i].base;
+		phys_addr_t end = start + dma_mmu_remap[i].size;
+		struct map_desc map;
+		unsigned long addr;
+
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
+		if (start >= end)
+			return;
+
+		map.pfn = __phys_to_pfn(start);
+		map.virtual = __phys_to_virt(start);
+		map.length = end - start;
+		map.type = MT_MEMORY_DMA_READY;
+
+		/*
+		 * Clear previous low-memory mapping
+		 */
+		for (addr = __phys_to_virt(start); addr < __phys_to_virt(end);
+		     addr += PGDIR_SIZE)
+			pmd_clear(pmd_off_k(addr));
+
+		iotable_init(&map, 1);
+	}
+}
+
 static void *
 __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
 {
@@ -318,31 +407,178 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
 	arm_vmregion_free(&consistent_head, c);
 }
 
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	struct page *page = virt_to_page(addr);
+	pgprot_t prot = *(pgprot_t *)data;
+
+	set_pte_ext(pte, mk_pte(page, prot), 0);
+	return 0;
+}
+
+static void __dma_remap(struct page *page, size_t size, pgprot_t prot)
+{
+	unsigned long start = (unsigned long) page_address(page);
+	unsigned end = start + size;
+
+	apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
+	dsb();
+	flush_tlb_kernel_range(start, end);
+}
+
+static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
+				 pgprot_t prot, struct page **ret_page)
+{
+	struct page *page;
+	void *ptr;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	ptr = __dma_alloc_remap(page, size, gfp, prot);
+	if (!ptr) {
+		__dma_free_buffer(page, size);
+		return NULL;
+	}
+
+	*ret_page = page;
+	return ptr;
+}
+
+static void *__alloc_from_pool(struct device *dev, size_t size,
+			       struct page **ret_page)
+{
+	struct arm_vmregion *c;
+	size_t align;
+
+	if (!coherent_head.vm_start) {
+		printk(KERN_ERR "%s: coherent pool not initialised!\n",
+		       __func__);
+		dump_stack();
+		return NULL;
+	}
+
+	align = 1 << fls(size - 1);
+	c = arm_vmregion_alloc(&coherent_head, align, size, 0);
+	if (c) {
+		void *ptr = (void *)c->vm_start;
+		struct page *page = virt_to_page(ptr);
+		*ret_page = page;
+		return ptr;
+	}
+	return NULL;
+}
+
+static int __free_from_pool(void *cpu_addr, size_t size)
+{
+	unsigned long start = (unsigned long)cpu_addr;
+	unsigned long end = start + size;
+	struct arm_vmregion *c;
+
+	if (start < coherent_head.vm_start || end > coherent_head.vm_end)
+		return 0;
+
+	c = arm_vmregion_find_remove(&coherent_head, (unsigned long)start);
+
+	if ((c->vm_end - c->vm_start) != size) {
+		printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n",
+		       __func__, c->vm_end - c->vm_start, size);
+		dump_stack();
+		size = c->vm_end - c->vm_start;
+	}
+
+	arm_vmregion_free(&coherent_head, c);
+	return 1;
+}
+
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page)
+{
+	unsigned long order = get_order(size);
+	size_t count = size >> PAGE_SHIFT;
+	struct page *page;
+
+	page = dma_alloc_from_contiguous(dev, count, order);
+	if (!page)
+		return NULL;
+
+	__dma_clear_buffer(page, size);
+	__dma_remap(page, size, prot);
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+static void __free_from_contiguous(struct device *dev, struct page *page,
+				   size_t size)
+{
+	__dma_remap(page, size, pgprot_kernel);
+	dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
+}
+
+#define nommu() 0
+
 #else	/* !CONFIG_MMU */
 
-#define __dma_alloc_remap(page, size, gfp, prot)	page_address(page)
-#define __dma_free_remap(addr, size)			do { } while (0)
+#define nommu() 1
+
+#define __alloc_remap_buffer(dev, size, gfp, prot, ret)	NULL
+#define __alloc_from_pool(dev, size, ret_page)		NULL
+#define __alloc_from_contiguous(dev, size, prot, ret)	NULL
+#define __free_from_pool(cpu_addr, size)		0
+#define __free_from_contiguous(dev, page, size)		do { } while (0)
+#define __dma_free_remap(cpu_addr, size)		do { } while (0)
 
 #endif	/* CONFIG_MMU */
 
-static void *
-__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
-	    pgprot_t prot)
+static void *__alloc_simple_buffer(struct device *dev, size_t size, gfp_t gfp,
+				   struct page **ret_page)
 {
 	struct page *page;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+
+
+static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp, pgprot_t prot)
+{
+	u64 mask = get_coherent_dma_mask(dev);
+	struct page *page = NULL;
 	void *addr;
 
-	*handle = ~0;
-	size = PAGE_ALIGN(size);
+#ifdef CONFIG_DMA_API_DEBUG
+	u64 limit = (mask + 1) & ~mask;
+	if (limit && size >= limit) {
+		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
+			size, mask);
+		return NULL;
+	}
+#endif
 
-	page = __dma_alloc_buffer(dev, size, gfp);
-	if (!page)
+	if (!mask)
 		return NULL;
 
-	if (!arch_is_coherent())
-		addr = __dma_alloc_remap(page, size, gfp, prot);
+	if (mask < 0xffffffffULL)
+		gfp |= GFP_DMA;
+
+	*handle = ~0;
+	size = PAGE_ALIGN(size);
+
+	if (arch_is_coherent() || nommu())
+		addr = __alloc_simple_buffer(dev, size, gfp, &page);
+	else if (cpu_architecture() < CPU_ARCH_ARMv6)
+		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page);
+	else if (gfp & GFP_ATOMIC)
+		addr = __alloc_from_pool(dev, size, &page);
 	else
-		addr = page_address(page);
+		addr = __alloc_from_contiguous(dev, size, prot, &page);
 
 	if (addr)
 		*handle = pfn_to_dma(dev, page_to_pfn(page));
@@ -356,8 +592,8 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp)
 {
 	void *memory;
 
@@ -386,25 +622,11 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
-	unsigned long user_size, kern_size;
-	struct arm_vmregion *c;
-
-	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-
-	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
-	if (c) {
-		unsigned long off = vma->vm_pgoff;
-
-		kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;
-
-		if (off < kern_size &&
-		    user_size <= (kern_size - off)) {
-			ret = remap_pfn_range(vma, vma->vm_start,
-					      page_to_pfn(c->vm_pages) + off,
-					      user_size << PAGE_SHIFT,
-					      vma->vm_page_prot);
-		}
-	}
+	unsigned long pfn = dma_to_pfn(dev, dma_addr);
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn + vma->vm_pgoff,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
 #endif	/* CONFIG_MMU */
 
 	return ret;
@@ -426,23 +648,33 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL(dma_mmap_writecombine);
 
+
 /*
- * free a page as defined by the above mapping.
- * Must not be called with IRQs disabled.
+ * Free a buffer as defined by the above mapping.
  */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
+	struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
 
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
 	size = PAGE_ALIGN(size);
 
-	if (!arch_is_coherent())
+	if (arch_is_coherent() || nommu()) {
+		__dma_free_buffer(page, size);
+	} else if (cpu_architecture() < CPU_ARCH_ARMv6) {
 		__dma_free_remap(cpu_addr, size);
-
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+		__dma_free_buffer(page, size);
+	} else {
+		if (__free_from_pool(cpu_addr, size))
+			return;
+		/*
+		 * Non-atomic allocations cannot be freed with IRQs disabled
+		 */
+		WARN_ON(irqs_disabled());
+		__free_from_contiguous(dev, page, size);
+	}
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index fbdd12e..9c27fbd 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -21,6 +21,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -371,6 +372,13 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	/* reserve memory for DMA contigouos allocations */
+#ifdef CONFIG_ZONE_DMA
+	dma_contiguous_reserve(PHYS_OFFSET + mdesc->dma_zone_size - 1);
+#else
+	dma_contiguous_reserve(0);
+#endif
+
 	memblock_analyze();
 	memblock_dump_all();
 }
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index ad7cce3..fa95d9b 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -29,5 +29,8 @@ extern u32 arm_dma_limit;
 #define arm_dma_limit ((u32)~0)
 #endif
 
+extern phys_addr_t arm_lowmem_limit;
+
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
+void dma_contiguous_remap(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 5a51cc5..ad8e9fc 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -285,6 +285,11 @@ static struct mem_type mem_types[] = {
 				PMD_SECT_UNCACHED | PMD_SECT_XN,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_DMA_READY] = {
+		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,
+		.prot_l1   = PMD_TYPE_TABLE,
+		.domain    = DOMAIN_KERNEL,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
@@ -426,6 +431,7 @@ static void __init build_mem_type_table(void)
 	if (arch_is_coherent() && cpu_is_xsc3()) {
 		mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+		mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 	}
@@ -457,6 +463,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 		}
@@ -508,6 +515,7 @@ static void __init build_mem_type_table(void)
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
 	mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+	mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
 	mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
 	mem_types[MT_ROM].prot_sect |= cp->pmd;
 
@@ -587,7 +595,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr,
 	 * L1 entries, whereas PGDs refer to a group of L1 entries making
 	 * up one logical pointer to an L2 table.
 	 */
-	if (((addr | end | phys) & ~SECTION_MASK) == 0) {
+	if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) {
 		pmd_t *p = pmd;
 
 #ifndef CONFIG_ARM_LPAE
@@ -789,7 +797,7 @@ static int __init early_vmalloc(char *arg)
 }
 early_param("vmalloc", early_vmalloc);
 
-static phys_addr_t lowmem_limit __initdata = 0;
+phys_addr_t arm_lowmem_limit __initdata = 0;
 
 void __init sanity_check_meminfo(void)
 {
@@ -859,8 +867,8 @@ void __init sanity_check_meminfo(void)
 			bank->size = newsize;
 		}
 #endif
-		if (!bank->highmem && bank->start + bank->size > lowmem_limit)
-			lowmem_limit = bank->start + bank->size;
+		if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit)
+			arm_lowmem_limit = bank->start + bank->size;
 
 		j++;
 	}
@@ -885,7 +893,7 @@ void __init sanity_check_meminfo(void)
 	}
 #endif
 	meminfo.nr_banks = j;
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 }
 
 static inline void prepare_page_table(void)
@@ -910,8 +918,8 @@ static inline void prepare_page_table(void)
 	 * Find the end of the first block of lowmem.
 	 */
 	end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
-	if (end >= lowmem_limit)
-		end = lowmem_limit;
+	if (end >= arm_lowmem_limit)
+		end = arm_lowmem_limit;
 
 	/*
 	 * Clear out all the kernel space mappings, except for the first
@@ -1051,8 +1059,8 @@ static void __init map_lowmem(void)
 		phys_addr_t end = start + reg->size;
 		struct map_desc map;
 
-		if (end > lowmem_limit)
-			end = lowmem_limit;
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
 		if (start >= end)
 			break;
 
@@ -1073,11 +1081,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 
 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
 
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special pool which is created
early during boot. This way remapping page attributes is not needed on
allocation time.

CMA has been enabled unconditionally for ARMv6+ systems.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/dma-contiguous.h |   16 ++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |    8 +
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++-
 7 files changed, 346 insertions(+), 75 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 92870e3..d0abf54 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,8 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS if (CPU_V6 || CPU_V6K || CPU_V7)
+	select CMA if (CPU_V6 || CPU_V6K || CPU_V7)
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..c7ba05e
--- /dev/null
+++ b/arch/arm/include/asm/dma-contiguous.h
@@ -0,0 +1,16 @@
+#ifndef ASMARM_DMA_CONTIGUOUS_H
+#define ASMARM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+#include <asm-generic/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
+
+#endif
+#endif
+#endif
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index b36f365..a6efcdd 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -30,6 +30,7 @@ struct map_desc {
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
 #define MT_MEMORY_SO		14
+#define MT_MEMORY_DMA_READY	15
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e4e7f6c..879a658 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,7 +17,9 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
+#include <linux/memblock.h>
 #include <linux/slab.h>
 
 #include <asm/memory.h>
@@ -26,6 +28,8 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 #include <asm/mach/arch.h>
+#include <asm/mach/map.h>
+#include <asm/dma-contiguous.h>
 
 #include "mm.h"
 
@@ -56,6 +60,19 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+static void __dma_clear_buffer(struct page *page, size_t size)
+{
+	void *ptr;
+	/*
+	 * Ensure that the allocated pages are zeroed, and that any data
+	 * lurking in the kernel direct-mapped region is invalidated.
+	 */
+	ptr = page_address(page);
+	memset(ptr, 0, size);
+	dmac_flush_range(ptr, ptr + size);
+	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
@@ -64,23 +81,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 {
 	unsigned long order = get_order(size);
 	struct page *page, *p, *e;
-	void *ptr;
-	u64 mask = get_coherent_dma_mask(dev);
-
-#ifdef CONFIG_DMA_API_DEBUG
-	u64 limit = (mask + 1) & ~mask;
-	if (limit && size >= limit) {
-		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
-			size, mask);
-		return NULL;
-	}
-#endif
-
-	if (!mask)
-		return NULL;
-
-	if (mask < 0xffffffffULL)
-		gfp |= GFP_DMA;
 
 	page = alloc_pages(gfp, order);
 	if (!page)
@@ -93,14 +93,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
 		__free_page(p);
 
-	/*
-	 * Ensure that the allocated pages are zeroed, and that any data
-	 * lurking in the kernel direct-mapped region is invalidated.
-	 */
-	ptr = page_address(page);
-	memset(ptr, 0, size);
-	dmac_flush_range(ptr, ptr + size);
-	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+	__dma_clear_buffer(page, size);
 
 	return page;
 }
@@ -170,6 +163,9 @@ static int __init consistent_init(void)
 	unsigned long base = consistent_base;
 	unsigned long num_ptes = (CONSISTENT_END - base) >> PGDIR_SHIFT;
 
+	if (cpu_architecture() >= CPU_ARCH_ARMv6)
+		return 0;
+
 	consistent_pte = kmalloc(num_ptes * sizeof(pte_t), GFP_KERNEL);
 	if (!consistent_pte) {
 		pr_err("%s: no memory\n", __func__);
@@ -210,9 +206,102 @@ static int __init consistent_init(void)
 
 	return ret;
 }
-
 core_initcall(consistent_init);
 
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page);
+
+static struct arm_vmregion_head coherent_head = {
+	.vm_lock	= __SPIN_LOCK_UNLOCKED(&coherent_head.vm_lock),
+	.vm_list	= LIST_HEAD_INIT(coherent_head.vm_list),
+};
+
+size_t coherent_pool_size = DEFAULT_CONSISTENT_DMA_SIZE / 8;
+
+static int __init early_coherent_pool(char *p)
+{
+	coherent_pool_size = memparse(p, &p);
+	return 0;
+}
+early_param("coherent_pool", early_coherent_pool);
+
+/*
+ * Initialise the coherent pool for atomic allocations.
+ */
+static int __init coherent_init(void)
+{
+	pgprot_t prot = pgprot_dmacoherent(pgprot_kernel);
+	size_t size = coherent_pool_size;
+	struct page *page;
+	void *ptr;
+
+	if (cpu_architecture() < CPU_ARCH_ARMv6)
+		return 0;
+
+	ptr = __alloc_from_contiguous(NULL, size, prot, &page);
+	if (ptr) {
+		coherent_head.vm_start = (unsigned long) ptr;
+		coherent_head.vm_end = (unsigned long) ptr + size;
+		printk(KERN_INFO "DMA: preallocated %u KiB pool for atomic coherent allocations\n",
+		       (unsigned)size / 1024);
+		return 0;
+	}
+	printk(KERN_ERR "DMA: failed to allocate %u KiB pool for atomic coherent allocation\n",
+	       (unsigned)size / 1024);
+	return -ENOMEM;
+}
+/*
+ * CMA is activated by core_initcall, so we must be called after it
+ */
+postcore_initcall(coherent_init);
+
+struct dma_contiguous_early_reserve {
+	phys_addr_t base;
+	unsigned long size;
+};
+
+static struct dma_contiguous_early_reserve
+dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
+		phys_addr_t start = dma_mmu_remap[i].base;
+		phys_addr_t end = start + dma_mmu_remap[i].size;
+		struct map_desc map;
+		unsigned long addr;
+
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
+		if (start >= end)
+			return;
+
+		map.pfn = __phys_to_pfn(start);
+		map.virtual = __phys_to_virt(start);
+		map.length = end - start;
+		map.type = MT_MEMORY_DMA_READY;
+
+		/*
+		 * Clear previous low-memory mapping
+		 */
+		for (addr = __phys_to_virt(start); addr < __phys_to_virt(end);
+		     addr += PGDIR_SIZE)
+			pmd_clear(pmd_off_k(addr));
+
+		iotable_init(&map, 1);
+	}
+}
+
 static void *
 __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
 {
@@ -318,31 +407,178 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
 	arm_vmregion_free(&consistent_head, c);
 }
 
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	struct page *page = virt_to_page(addr);
+	pgprot_t prot = *(pgprot_t *)data;
+
+	set_pte_ext(pte, mk_pte(page, prot), 0);
+	return 0;
+}
+
+static void __dma_remap(struct page *page, size_t size, pgprot_t prot)
+{
+	unsigned long start = (unsigned long) page_address(page);
+	unsigned end = start + size;
+
+	apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
+	dsb();
+	flush_tlb_kernel_range(start, end);
+}
+
+static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
+				 pgprot_t prot, struct page **ret_page)
+{
+	struct page *page;
+	void *ptr;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	ptr = __dma_alloc_remap(page, size, gfp, prot);
+	if (!ptr) {
+		__dma_free_buffer(page, size);
+		return NULL;
+	}
+
+	*ret_page = page;
+	return ptr;
+}
+
+static void *__alloc_from_pool(struct device *dev, size_t size,
+			       struct page **ret_page)
+{
+	struct arm_vmregion *c;
+	size_t align;
+
+	if (!coherent_head.vm_start) {
+		printk(KERN_ERR "%s: coherent pool not initialised!\n",
+		       __func__);
+		dump_stack();
+		return NULL;
+	}
+
+	align = 1 << fls(size - 1);
+	c = arm_vmregion_alloc(&coherent_head, align, size, 0);
+	if (c) {
+		void *ptr = (void *)c->vm_start;
+		struct page *page = virt_to_page(ptr);
+		*ret_page = page;
+		return ptr;
+	}
+	return NULL;
+}
+
+static int __free_from_pool(void *cpu_addr, size_t size)
+{
+	unsigned long start = (unsigned long)cpu_addr;
+	unsigned long end = start + size;
+	struct arm_vmregion *c;
+
+	if (start < coherent_head.vm_start || end > coherent_head.vm_end)
+		return 0;
+
+	c = arm_vmregion_find_remove(&coherent_head, (unsigned long)start);
+
+	if ((c->vm_end - c->vm_start) != size) {
+		printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n",
+		       __func__, c->vm_end - c->vm_start, size);
+		dump_stack();
+		size = c->vm_end - c->vm_start;
+	}
+
+	arm_vmregion_free(&coherent_head, c);
+	return 1;
+}
+
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page)
+{
+	unsigned long order = get_order(size);
+	size_t count = size >> PAGE_SHIFT;
+	struct page *page;
+
+	page = dma_alloc_from_contiguous(dev, count, order);
+	if (!page)
+		return NULL;
+
+	__dma_clear_buffer(page, size);
+	__dma_remap(page, size, prot);
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+static void __free_from_contiguous(struct device *dev, struct page *page,
+				   size_t size)
+{
+	__dma_remap(page, size, pgprot_kernel);
+	dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
+}
+
+#define nommu() 0
+
 #else	/* !CONFIG_MMU */
 
-#define __dma_alloc_remap(page, size, gfp, prot)	page_address(page)
-#define __dma_free_remap(addr, size)			do { } while (0)
+#define nommu() 1
+
+#define __alloc_remap_buffer(dev, size, gfp, prot, ret)	NULL
+#define __alloc_from_pool(dev, size, ret_page)		NULL
+#define __alloc_from_contiguous(dev, size, prot, ret)	NULL
+#define __free_from_pool(cpu_addr, size)		0
+#define __free_from_contiguous(dev, page, size)		do { } while (0)
+#define __dma_free_remap(cpu_addr, size)		do { } while (0)
 
 #endif	/* CONFIG_MMU */
 
-static void *
-__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
-	    pgprot_t prot)
+static void *__alloc_simple_buffer(struct device *dev, size_t size, gfp_t gfp,
+				   struct page **ret_page)
 {
 	struct page *page;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+
+
+static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp, pgprot_t prot)
+{
+	u64 mask = get_coherent_dma_mask(dev);
+	struct page *page = NULL;
 	void *addr;
 
-	*handle = ~0;
-	size = PAGE_ALIGN(size);
+#ifdef CONFIG_DMA_API_DEBUG
+	u64 limit = (mask + 1) & ~mask;
+	if (limit && size >= limit) {
+		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
+			size, mask);
+		return NULL;
+	}
+#endif
 
-	page = __dma_alloc_buffer(dev, size, gfp);
-	if (!page)
+	if (!mask)
 		return NULL;
 
-	if (!arch_is_coherent())
-		addr = __dma_alloc_remap(page, size, gfp, prot);
+	if (mask < 0xffffffffULL)
+		gfp |= GFP_DMA;
+
+	*handle = ~0;
+	size = PAGE_ALIGN(size);
+
+	if (arch_is_coherent() || nommu())
+		addr = __alloc_simple_buffer(dev, size, gfp, &page);
+	else if (cpu_architecture() < CPU_ARCH_ARMv6)
+		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page);
+	else if (gfp & GFP_ATOMIC)
+		addr = __alloc_from_pool(dev, size, &page);
 	else
-		addr = page_address(page);
+		addr = __alloc_from_contiguous(dev, size, prot, &page);
 
 	if (addr)
 		*handle = pfn_to_dma(dev, page_to_pfn(page));
@@ -356,8 +592,8 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp)
 {
 	void *memory;
 
@@ -386,25 +622,11 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
-	unsigned long user_size, kern_size;
-	struct arm_vmregion *c;
-
-	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-
-	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
-	if (c) {
-		unsigned long off = vma->vm_pgoff;
-
-		kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;
-
-		if (off < kern_size &&
-		    user_size <= (kern_size - off)) {
-			ret = remap_pfn_range(vma, vma->vm_start,
-					      page_to_pfn(c->vm_pages) + off,
-					      user_size << PAGE_SHIFT,
-					      vma->vm_page_prot);
-		}
-	}
+	unsigned long pfn = dma_to_pfn(dev, dma_addr);
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn + vma->vm_pgoff,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
 #endif	/* CONFIG_MMU */
 
 	return ret;
@@ -426,23 +648,33 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL(dma_mmap_writecombine);
 
+
 /*
- * free a page as defined by the above mapping.
- * Must not be called with IRQs disabled.
+ * Free a buffer as defined by the above mapping.
  */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
+	struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
 
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
 	size = PAGE_ALIGN(size);
 
-	if (!arch_is_coherent())
+	if (arch_is_coherent() || nommu()) {
+		__dma_free_buffer(page, size);
+	} else if (cpu_architecture() < CPU_ARCH_ARMv6) {
 		__dma_free_remap(cpu_addr, size);
-
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+		__dma_free_buffer(page, size);
+	} else {
+		if (__free_from_pool(cpu_addr, size))
+			return;
+		/*
+		 * Non-atomic allocations cannot be freed with IRQs disabled
+		 */
+		WARN_ON(irqs_disabled());
+		__free_from_contiguous(dev, page, size);
+	}
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index fbdd12e..9c27fbd 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -21,6 +21,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -371,6 +372,13 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	/* reserve memory for DMA contigouos allocations */
+#ifdef CONFIG_ZONE_DMA
+	dma_contiguous_reserve(PHYS_OFFSET + mdesc->dma_zone_size - 1);
+#else
+	dma_contiguous_reserve(0);
+#endif
+
 	memblock_analyze();
 	memblock_dump_all();
 }
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index ad7cce3..fa95d9b 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -29,5 +29,8 @@ extern u32 arm_dma_limit;
 #define arm_dma_limit ((u32)~0)
 #endif
 
+extern phys_addr_t arm_lowmem_limit;
+
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
+void dma_contiguous_remap(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 5a51cc5..ad8e9fc 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -285,6 +285,11 @@ static struct mem_type mem_types[] = {
 				PMD_SECT_UNCACHED | PMD_SECT_XN,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_DMA_READY] = {
+		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,
+		.prot_l1   = PMD_TYPE_TABLE,
+		.domain    = DOMAIN_KERNEL,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
@@ -426,6 +431,7 @@ static void __init build_mem_type_table(void)
 	if (arch_is_coherent() && cpu_is_xsc3()) {
 		mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+		mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 	}
@@ -457,6 +463,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 		}
@@ -508,6 +515,7 @@ static void __init build_mem_type_table(void)
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
 	mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+	mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
 	mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
 	mem_types[MT_ROM].prot_sect |= cp->pmd;
 
@@ -587,7 +595,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr,
 	 * L1 entries, whereas PGDs refer to a group of L1 entries making
 	 * up one logical pointer to an L2 table.
 	 */
-	if (((addr | end | phys) & ~SECTION_MASK) == 0) {
+	if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) {
 		pmd_t *p = pmd;
 
 #ifndef CONFIG_ARM_LPAE
@@ -789,7 +797,7 @@ static int __init early_vmalloc(char *arg)
 }
 early_param("vmalloc", early_vmalloc);
 
-static phys_addr_t lowmem_limit __initdata = 0;
+phys_addr_t arm_lowmem_limit __initdata = 0;
 
 void __init sanity_check_meminfo(void)
 {
@@ -859,8 +867,8 @@ void __init sanity_check_meminfo(void)
 			bank->size = newsize;
 		}
 #endif
-		if (!bank->highmem && bank->start + bank->size > lowmem_limit)
-			lowmem_limit = bank->start + bank->size;
+		if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit)
+			arm_lowmem_limit = bank->start + bank->size;
 
 		j++;
 	}
@@ -885,7 +893,7 @@ void __init sanity_check_meminfo(void)
 	}
 #endif
 	meminfo.nr_banks = j;
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 }
 
 static inline void prepare_page_table(void)
@@ -910,8 +918,8 @@ static inline void prepare_page_table(void)
 	 * Find the end of the first block of lowmem.
 	 */
 	end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
-	if (end >= lowmem_limit)
-		end = lowmem_limit;
+	if (end >= arm_lowmem_limit)
+		end = arm_lowmem_limit;
 
 	/*
 	 * Clear out all the kernel space mappings, except for the first
@@ -1051,8 +1059,8 @@ static void __init map_lowmem(void)
 		phys_addr_t end = start + reg->size;
 		struct map_desc map;
 
-		if (end > lowmem_limit)
-			end = lowmem_limit;
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
 		if (start >= end)
 			break;
 
@@ -1073,11 +1081,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 
 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
 
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special pool which is created
early during boot. This way remapping page attributes is not needed on
allocation time.

CMA has been enabled unconditionally for ARMv6+ systems.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/dma-contiguous.h |   16 ++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |    8 +
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++-
 7 files changed, 346 insertions(+), 75 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 92870e3..d0abf54 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,8 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS if (CPU_V6 || CPU_V6K || CPU_V7)
+	select CMA if (CPU_V6 || CPU_V6K || CPU_V7)
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..c7ba05e
--- /dev/null
+++ b/arch/arm/include/asm/dma-contiguous.h
@@ -0,0 +1,16 @@
+#ifndef ASMARM_DMA_CONTIGUOUS_H
+#define ASMARM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+#include <asm-generic/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
+
+#endif
+#endif
+#endif
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index b36f365..a6efcdd 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -30,6 +30,7 @@ struct map_desc {
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
 #define MT_MEMORY_SO		14
+#define MT_MEMORY_DMA_READY	15
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e4e7f6c..879a658 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,7 +17,9 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
+#include <linux/memblock.h>
 #include <linux/slab.h>
 
 #include <asm/memory.h>
@@ -26,6 +28,8 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 #include <asm/mach/arch.h>
+#include <asm/mach/map.h>
+#include <asm/dma-contiguous.h>
 
 #include "mm.h"
 
@@ -56,6 +60,19 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+static void __dma_clear_buffer(struct page *page, size_t size)
+{
+	void *ptr;
+	/*
+	 * Ensure that the allocated pages are zeroed, and that any data
+	 * lurking in the kernel direct-mapped region is invalidated.
+	 */
+	ptr = page_address(page);
+	memset(ptr, 0, size);
+	dmac_flush_range(ptr, ptr + size);
+	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
@@ -64,23 +81,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 {
 	unsigned long order = get_order(size);
 	struct page *page, *p, *e;
-	void *ptr;
-	u64 mask = get_coherent_dma_mask(dev);
-
-#ifdef CONFIG_DMA_API_DEBUG
-	u64 limit = (mask + 1) & ~mask;
-	if (limit && size >= limit) {
-		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
-			size, mask);
-		return NULL;
-	}
-#endif
-
-	if (!mask)
-		return NULL;
-
-	if (mask < 0xffffffffULL)
-		gfp |= GFP_DMA;
 
 	page = alloc_pages(gfp, order);
 	if (!page)
@@ -93,14 +93,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
 		__free_page(p);
 
-	/*
-	 * Ensure that the allocated pages are zeroed, and that any data
-	 * lurking in the kernel direct-mapped region is invalidated.
-	 */
-	ptr = page_address(page);
-	memset(ptr, 0, size);
-	dmac_flush_range(ptr, ptr + size);
-	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+	__dma_clear_buffer(page, size);
 
 	return page;
 }
@@ -170,6 +163,9 @@ static int __init consistent_init(void)
 	unsigned long base = consistent_base;
 	unsigned long num_ptes = (CONSISTENT_END - base) >> PGDIR_SHIFT;
 
+	if (cpu_architecture() >= CPU_ARCH_ARMv6)
+		return 0;
+
 	consistent_pte = kmalloc(num_ptes * sizeof(pte_t), GFP_KERNEL);
 	if (!consistent_pte) {
 		pr_err("%s: no memory\n", __func__);
@@ -210,9 +206,102 @@ static int __init consistent_init(void)
 
 	return ret;
 }
-
 core_initcall(consistent_init);
 
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page);
+
+static struct arm_vmregion_head coherent_head = {
+	.vm_lock	= __SPIN_LOCK_UNLOCKED(&coherent_head.vm_lock),
+	.vm_list	= LIST_HEAD_INIT(coherent_head.vm_list),
+};
+
+size_t coherent_pool_size = DEFAULT_CONSISTENT_DMA_SIZE / 8;
+
+static int __init early_coherent_pool(char *p)
+{
+	coherent_pool_size = memparse(p, &p);
+	return 0;
+}
+early_param("coherent_pool", early_coherent_pool);
+
+/*
+ * Initialise the coherent pool for atomic allocations.
+ */
+static int __init coherent_init(void)
+{
+	pgprot_t prot = pgprot_dmacoherent(pgprot_kernel);
+	size_t size = coherent_pool_size;
+	struct page *page;
+	void *ptr;
+
+	if (cpu_architecture() < CPU_ARCH_ARMv6)
+		return 0;
+
+	ptr = __alloc_from_contiguous(NULL, size, prot, &page);
+	if (ptr) {
+		coherent_head.vm_start = (unsigned long) ptr;
+		coherent_head.vm_end = (unsigned long) ptr + size;
+		printk(KERN_INFO "DMA: preallocated %u KiB pool for atomic coherent allocations\n",
+		       (unsigned)size / 1024);
+		return 0;
+	}
+	printk(KERN_ERR "DMA: failed to allocate %u KiB pool for atomic coherent allocation\n",
+	       (unsigned)size / 1024);
+	return -ENOMEM;
+}
+/*
+ * CMA is activated by core_initcall, so we must be called after it
+ */
+postcore_initcall(coherent_init);
+
+struct dma_contiguous_early_reserve {
+	phys_addr_t base;
+	unsigned long size;
+};
+
+static struct dma_contiguous_early_reserve
+dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
+		phys_addr_t start = dma_mmu_remap[i].base;
+		phys_addr_t end = start + dma_mmu_remap[i].size;
+		struct map_desc map;
+		unsigned long addr;
+
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
+		if (start >= end)
+			return;
+
+		map.pfn = __phys_to_pfn(start);
+		map.virtual = __phys_to_virt(start);
+		map.length = end - start;
+		map.type = MT_MEMORY_DMA_READY;
+
+		/*
+		 * Clear previous low-memory mapping
+		 */
+		for (addr = __phys_to_virt(start); addr < __phys_to_virt(end);
+		     addr += PGDIR_SIZE)
+			pmd_clear(pmd_off_k(addr));
+
+		iotable_init(&map, 1);
+	}
+}
+
 static void *
 __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
 {
@@ -318,31 +407,178 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
 	arm_vmregion_free(&consistent_head, c);
 }
 
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	struct page *page = virt_to_page(addr);
+	pgprot_t prot = *(pgprot_t *)data;
+
+	set_pte_ext(pte, mk_pte(page, prot), 0);
+	return 0;
+}
+
+static void __dma_remap(struct page *page, size_t size, pgprot_t prot)
+{
+	unsigned long start = (unsigned long) page_address(page);
+	unsigned end = start + size;
+
+	apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
+	dsb();
+	flush_tlb_kernel_range(start, end);
+}
+
+static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
+				 pgprot_t prot, struct page **ret_page)
+{
+	struct page *page;
+	void *ptr;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	ptr = __dma_alloc_remap(page, size, gfp, prot);
+	if (!ptr) {
+		__dma_free_buffer(page, size);
+		return NULL;
+	}
+
+	*ret_page = page;
+	return ptr;
+}
+
+static void *__alloc_from_pool(struct device *dev, size_t size,
+			       struct page **ret_page)
+{
+	struct arm_vmregion *c;
+	size_t align;
+
+	if (!coherent_head.vm_start) {
+		printk(KERN_ERR "%s: coherent pool not initialised!\n",
+		       __func__);
+		dump_stack();
+		return NULL;
+	}
+
+	align = 1 << fls(size - 1);
+	c = arm_vmregion_alloc(&coherent_head, align, size, 0);
+	if (c) {
+		void *ptr = (void *)c->vm_start;
+		struct page *page = virt_to_page(ptr);
+		*ret_page = page;
+		return ptr;
+	}
+	return NULL;
+}
+
+static int __free_from_pool(void *cpu_addr, size_t size)
+{
+	unsigned long start = (unsigned long)cpu_addr;
+	unsigned long end = start + size;
+	struct arm_vmregion *c;
+
+	if (start < coherent_head.vm_start || end > coherent_head.vm_end)
+		return 0;
+
+	c = arm_vmregion_find_remove(&coherent_head, (unsigned long)start);
+
+	if ((c->vm_end - c->vm_start) != size) {
+		printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n",
+		       __func__, c->vm_end - c->vm_start, size);
+		dump_stack();
+		size = c->vm_end - c->vm_start;
+	}
+
+	arm_vmregion_free(&coherent_head, c);
+	return 1;
+}
+
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page)
+{
+	unsigned long order = get_order(size);
+	size_t count = size >> PAGE_SHIFT;
+	struct page *page;
+
+	page = dma_alloc_from_contiguous(dev, count, order);
+	if (!page)
+		return NULL;
+
+	__dma_clear_buffer(page, size);
+	__dma_remap(page, size, prot);
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+static void __free_from_contiguous(struct device *dev, struct page *page,
+				   size_t size)
+{
+	__dma_remap(page, size, pgprot_kernel);
+	dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
+}
+
+#define nommu() 0
+
 #else	/* !CONFIG_MMU */
 
-#define __dma_alloc_remap(page, size, gfp, prot)	page_address(page)
-#define __dma_free_remap(addr, size)			do { } while (0)
+#define nommu() 1
+
+#define __alloc_remap_buffer(dev, size, gfp, prot, ret)	NULL
+#define __alloc_from_pool(dev, size, ret_page)		NULL
+#define __alloc_from_contiguous(dev, size, prot, ret)	NULL
+#define __free_from_pool(cpu_addr, size)		0
+#define __free_from_contiguous(dev, page, size)		do { } while (0)
+#define __dma_free_remap(cpu_addr, size)		do { } while (0)
 
 #endif	/* CONFIG_MMU */
 
-static void *
-__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
-	    pgprot_t prot)
+static void *__alloc_simple_buffer(struct device *dev, size_t size, gfp_t gfp,
+				   struct page **ret_page)
 {
 	struct page *page;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+
+
+static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp, pgprot_t prot)
+{
+	u64 mask = get_coherent_dma_mask(dev);
+	struct page *page = NULL;
 	void *addr;
 
-	*handle = ~0;
-	size = PAGE_ALIGN(size);
+#ifdef CONFIG_DMA_API_DEBUG
+	u64 limit = (mask + 1) & ~mask;
+	if (limit && size >= limit) {
+		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
+			size, mask);
+		return NULL;
+	}
+#endif
 
-	page = __dma_alloc_buffer(dev, size, gfp);
-	if (!page)
+	if (!mask)
 		return NULL;
 
-	if (!arch_is_coherent())
-		addr = __dma_alloc_remap(page, size, gfp, prot);
+	if (mask < 0xffffffffULL)
+		gfp |= GFP_DMA;
+
+	*handle = ~0;
+	size = PAGE_ALIGN(size);
+
+	if (arch_is_coherent() || nommu())
+		addr = __alloc_simple_buffer(dev, size, gfp, &page);
+	else if (cpu_architecture() < CPU_ARCH_ARMv6)
+		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page);
+	else if (gfp & GFP_ATOMIC)
+		addr = __alloc_from_pool(dev, size, &page);
 	else
-		addr = page_address(page);
+		addr = __alloc_from_contiguous(dev, size, prot, &page);
 
 	if (addr)
 		*handle = pfn_to_dma(dev, page_to_pfn(page));
@@ -356,8 +592,8 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp)
 {
 	void *memory;
 
@@ -386,25 +622,11 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
-	unsigned long user_size, kern_size;
-	struct arm_vmregion *c;
-
-	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-
-	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
-	if (c) {
-		unsigned long off = vma->vm_pgoff;
-
-		kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;
-
-		if (off < kern_size &&
-		    user_size <= (kern_size - off)) {
-			ret = remap_pfn_range(vma, vma->vm_start,
-					      page_to_pfn(c->vm_pages) + off,
-					      user_size << PAGE_SHIFT,
-					      vma->vm_page_prot);
-		}
-	}
+	unsigned long pfn = dma_to_pfn(dev, dma_addr);
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn + vma->vm_pgoff,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
 #endif	/* CONFIG_MMU */
 
 	return ret;
@@ -426,23 +648,33 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL(dma_mmap_writecombine);
 
+
 /*
- * free a page as defined by the above mapping.
- * Must not be called with IRQs disabled.
+ * Free a buffer as defined by the above mapping.
  */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
+	struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
 
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
 	size = PAGE_ALIGN(size);
 
-	if (!arch_is_coherent())
+	if (arch_is_coherent() || nommu()) {
+		__dma_free_buffer(page, size);
+	} else if (cpu_architecture() < CPU_ARCH_ARMv6) {
 		__dma_free_remap(cpu_addr, size);
-
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+		__dma_free_buffer(page, size);
+	} else {
+		if (__free_from_pool(cpu_addr, size))
+			return;
+		/*
+		 * Non-atomic allocations cannot be freed with IRQs disabled
+		 */
+		WARN_ON(irqs_disabled());
+		__free_from_contiguous(dev, page, size);
+	}
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index fbdd12e..9c27fbd 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -21,6 +21,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -371,6 +372,13 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	/* reserve memory for DMA contigouos allocations */
+#ifdef CONFIG_ZONE_DMA
+	dma_contiguous_reserve(PHYS_OFFSET + mdesc->dma_zone_size - 1);
+#else
+	dma_contiguous_reserve(0);
+#endif
+
 	memblock_analyze();
 	memblock_dump_all();
 }
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index ad7cce3..fa95d9b 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -29,5 +29,8 @@ extern u32 arm_dma_limit;
 #define arm_dma_limit ((u32)~0)
 #endif
 
+extern phys_addr_t arm_lowmem_limit;
+
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
+void dma_contiguous_remap(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 5a51cc5..ad8e9fc 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -285,6 +285,11 @@ static struct mem_type mem_types[] = {
 				PMD_SECT_UNCACHED | PMD_SECT_XN,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_DMA_READY] = {
+		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,
+		.prot_l1   = PMD_TYPE_TABLE,
+		.domain    = DOMAIN_KERNEL,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
@@ -426,6 +431,7 @@ static void __init build_mem_type_table(void)
 	if (arch_is_coherent() && cpu_is_xsc3()) {
 		mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+		mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 	}
@@ -457,6 +463,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 		}
@@ -508,6 +515,7 @@ static void __init build_mem_type_table(void)
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
 	mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+	mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
 	mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
 	mem_types[MT_ROM].prot_sect |= cp->pmd;
 
@@ -587,7 +595,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr,
 	 * L1 entries, whereas PGDs refer to a group of L1 entries making
 	 * up one logical pointer to an L2 table.
 	 */
-	if (((addr | end | phys) & ~SECTION_MASK) == 0) {
+	if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) {
 		pmd_t *p = pmd;
 
 #ifndef CONFIG_ARM_LPAE
@@ -789,7 +797,7 @@ static int __init early_vmalloc(char *arg)
 }
 early_param("vmalloc", early_vmalloc);
 
-static phys_addr_t lowmem_limit __initdata = 0;
+phys_addr_t arm_lowmem_limit __initdata = 0;
 
 void __init sanity_check_meminfo(void)
 {
@@ -859,8 +867,8 @@ void __init sanity_check_meminfo(void)
 			bank->size = newsize;
 		}
 #endif
-		if (!bank->highmem && bank->start + bank->size > lowmem_limit)
-			lowmem_limit = bank->start + bank->size;
+		if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit)
+			arm_lowmem_limit = bank->start + bank->size;
 
 		j++;
 	}
@@ -885,7 +893,7 @@ void __init sanity_check_meminfo(void)
 	}
 #endif
 	meminfo.nr_banks = j;
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 }
 
 static inline void prepare_page_table(void)
@@ -910,8 +918,8 @@ static inline void prepare_page_table(void)
 	 * Find the end of the first block of lowmem.
 	 */
 	end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
-	if (end >= lowmem_limit)
-		end = lowmem_limit;
+	if (end >= arm_lowmem_limit)
+		end = arm_lowmem_limit;
 
 	/*
 	 * Clear out all the kernel space mappings, except for the first
@@ -1051,8 +1059,8 @@ static void __init map_lowmem(void)
 		phys_addr_t end = start + reg->size;
 		struct map_desc map;
 
-		if (end > lowmem_limit)
-			end = lowmem_limit;
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
 		if (start >= end)
 			break;
 
@@ -1073,11 +1081,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 
 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 9/9] ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-06 13:54   ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

Replace custom memory bank initialization using memblock_reserve and
dma_declare_coherent with a single call to CMA's dma_declare_contiguous.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/plat-s5p/dev-mfc.c |   51 ++++++-------------------------------------
 1 files changed, 7 insertions(+), 44 deletions(-)

diff --git a/arch/arm/plat-s5p/dev-mfc.c b/arch/arm/plat-s5p/dev-mfc.c
index 94226a0..0dec422 100644
--- a/arch/arm/plat-s5p/dev-mfc.c
+++ b/arch/arm/plat-s5p/dev-mfc.c
@@ -14,6 +14,7 @@
 #include <linux/interrupt.h>
 #include <linux/platform_device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/memblock.h>
 #include <linux/ioport.h>
 
@@ -72,52 +73,14 @@ struct platform_device s5p_device_mfc_r = {
 	},
 };
 
-struct s5p_mfc_reserved_mem {
-	phys_addr_t	base;
-	unsigned long	size;
-	struct device	*dev;
-};
-
-static struct s5p_mfc_reserved_mem s5p_mfc_mem[2] __initdata;
-
 void __init s5p_mfc_reserve_mem(phys_addr_t rbase, unsigned int rsize,
 				phys_addr_t lbase, unsigned int lsize)
 {
-	int i;
-
-	s5p_mfc_mem[0].dev = &s5p_device_mfc_r.dev;
-	s5p_mfc_mem[0].base = rbase;
-	s5p_mfc_mem[0].size = rsize;
-
-	s5p_mfc_mem[1].dev = &s5p_device_mfc_l.dev;
-	s5p_mfc_mem[1].base = lbase;
-	s5p_mfc_mem[1].size = lsize;
-
-	for (i = 0; i < ARRAY_SIZE(s5p_mfc_mem); i++) {
-		struct s5p_mfc_reserved_mem *area = &s5p_mfc_mem[i];
-		if (memblock_remove(area->base, area->size)) {
-			printk(KERN_ERR "Failed to reserve memory for MFC device (%ld bytes at 0x%08lx)\n",
-			       area->size, (unsigned long) area->base);
-			area->base = 0;
-		}
-	}
-}
-
-static int __init s5p_mfc_memory_init(void)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(s5p_mfc_mem); i++) {
-		struct s5p_mfc_reserved_mem *area = &s5p_mfc_mem[i];
-		if (!area->base)
-			continue;
+	if (dma_declare_contiguous(&s5p_device_mfc_r.dev, rsize, rbase, 0))
+		printk(KERN_ERR "Failed to reserve memory for MFC device (%u bytes at 0x%08lx)\n",
+		       rsize, (unsigned long) rbase);
 
-		if (dma_declare_coherent_memory(area->dev, area->base,
-				area->base, area->size,
-				DMA_MEMORY_MAP | DMA_MEMORY_EXCLUSIVE) == 0)
-			printk(KERN_ERR "Failed to declare coherent memory for MFC device (%ld bytes at 0x%08lx)\n",
-			       area->size, (unsigned long) area->base);
-	}
-	return 0;
+	if (dma_declare_contiguous(&s5p_device_mfc_l.dev, lsize, lbase, 0))
+		printk(KERN_ERR "Failed to reserve memory for MFC device (%u bytes at 0x%08lx)\n",
+		       rsize, (unsigned long) rbase);
 }
-device_initcall(s5p_mfc_memory_init);
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 9/9] ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen

Replace custom memory bank initialization using memblock_reserve and
dma_declare_coherent with a single call to CMA's dma_declare_contiguous.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/plat-s5p/dev-mfc.c |   51 ++++++-------------------------------------
 1 files changed, 7 insertions(+), 44 deletions(-)

diff --git a/arch/arm/plat-s5p/dev-mfc.c b/arch/arm/plat-s5p/dev-mfc.c
index 94226a0..0dec422 100644
--- a/arch/arm/plat-s5p/dev-mfc.c
+++ b/arch/arm/plat-s5p/dev-mfc.c
@@ -14,6 +14,7 @@
 #include <linux/interrupt.h>
 #include <linux/platform_device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/memblock.h>
 #include <linux/ioport.h>
 
@@ -72,52 +73,14 @@ struct platform_device s5p_device_mfc_r = {
 	},
 };
 
-struct s5p_mfc_reserved_mem {
-	phys_addr_t	base;
-	unsigned long	size;
-	struct device	*dev;
-};
-
-static struct s5p_mfc_reserved_mem s5p_mfc_mem[2] __initdata;
-
 void __init s5p_mfc_reserve_mem(phys_addr_t rbase, unsigned int rsize,
 				phys_addr_t lbase, unsigned int lsize)
 {
-	int i;
-
-	s5p_mfc_mem[0].dev = &s5p_device_mfc_r.dev;
-	s5p_mfc_mem[0].base = rbase;
-	s5p_mfc_mem[0].size = rsize;
-
-	s5p_mfc_mem[1].dev = &s5p_device_mfc_l.dev;
-	s5p_mfc_mem[1].base = lbase;
-	s5p_mfc_mem[1].size = lsize;
-
-	for (i = 0; i < ARRAY_SIZE(s5p_mfc_mem); i++) {
-		struct s5p_mfc_reserved_mem *area = &s5p_mfc_mem[i];
-		if (memblock_remove(area->base, area->size)) {
-			printk(KERN_ERR "Failed to reserve memory for MFC device (%ld bytes at 0x%08lx)\n",
-			       area->size, (unsigned long) area->base);
-			area->base = 0;
-		}
-	}
-}
-
-static int __init s5p_mfc_memory_init(void)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(s5p_mfc_mem); i++) {
-		struct s5p_mfc_reserved_mem *area = &s5p_mfc_mem[i];
-		if (!area->base)
-			continue;
+	if (dma_declare_contiguous(&s5p_device_mfc_r.dev, rsize, rbase, 0))
+		printk(KERN_ERR "Failed to reserve memory for MFC device (%u bytes at 0x%08lx)\n",
+		       rsize, (unsigned long) rbase);
 
-		if (dma_declare_coherent_memory(area->dev, area->base,
-				area->base, area->size,
-				DMA_MEMORY_MAP | DMA_MEMORY_EXCLUSIVE) == 0)
-			printk(KERN_ERR "Failed to declare coherent memory for MFC device (%ld bytes at 0x%08lx)\n",
-			       area->size, (unsigned long) area->base);
-	}
-	return 0;
+	if (dma_declare_contiguous(&s5p_device_mfc_l.dev, lsize, lbase, 0))
+		printk(KERN_ERR "Failed to reserve memory for MFC device (%u bytes at 0x%08lx)\n",
+		       rsize, (unsigned long) rbase);
 }
-device_initcall(s5p_mfc_memory_init);
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 9/9] ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
@ 2011-10-06 13:54   ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Replace custom memory bank initialization using memblock_reserve and
dma_declare_coherent with a single call to CMA's dma_declare_contiguous.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/plat-s5p/dev-mfc.c |   51 ++++++-------------------------------------
 1 files changed, 7 insertions(+), 44 deletions(-)

diff --git a/arch/arm/plat-s5p/dev-mfc.c b/arch/arm/plat-s5p/dev-mfc.c
index 94226a0..0dec422 100644
--- a/arch/arm/plat-s5p/dev-mfc.c
+++ b/arch/arm/plat-s5p/dev-mfc.c
@@ -14,6 +14,7 @@
 #include <linux/interrupt.h>
 #include <linux/platform_device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/memblock.h>
 #include <linux/ioport.h>
 
@@ -72,52 +73,14 @@ struct platform_device s5p_device_mfc_r = {
 	},
 };
 
-struct s5p_mfc_reserved_mem {
-	phys_addr_t	base;
-	unsigned long	size;
-	struct device	*dev;
-};
-
-static struct s5p_mfc_reserved_mem s5p_mfc_mem[2] __initdata;
-
 void __init s5p_mfc_reserve_mem(phys_addr_t rbase, unsigned int rsize,
 				phys_addr_t lbase, unsigned int lsize)
 {
-	int i;
-
-	s5p_mfc_mem[0].dev = &s5p_device_mfc_r.dev;
-	s5p_mfc_mem[0].base = rbase;
-	s5p_mfc_mem[0].size = rsize;
-
-	s5p_mfc_mem[1].dev = &s5p_device_mfc_l.dev;
-	s5p_mfc_mem[1].base = lbase;
-	s5p_mfc_mem[1].size = lsize;
-
-	for (i = 0; i < ARRAY_SIZE(s5p_mfc_mem); i++) {
-		struct s5p_mfc_reserved_mem *area = &s5p_mfc_mem[i];
-		if (memblock_remove(area->base, area->size)) {
-			printk(KERN_ERR "Failed to reserve memory for MFC device (%ld bytes at 0x%08lx)\n",
-			       area->size, (unsigned long) area->base);
-			area->base = 0;
-		}
-	}
-}
-
-static int __init s5p_mfc_memory_init(void)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(s5p_mfc_mem); i++) {
-		struct s5p_mfc_reserved_mem *area = &s5p_mfc_mem[i];
-		if (!area->base)
-			continue;
+	if (dma_declare_contiguous(&s5p_device_mfc_r.dev, rsize, rbase, 0))
+		printk(KERN_ERR "Failed to reserve memory for MFC device (%u bytes at 0x%08lx)\n",
+		       rsize, (unsigned long) rbase);
 
-		if (dma_declare_coherent_memory(area->dev, area->base,
-				area->base, area->size,
-				DMA_MEMORY_MAP | DMA_MEMORY_EXCLUSIVE) == 0)
-			printk(KERN_ERR "Failed to declare coherent memory for MFC device (%ld bytes at 0x%08lx)\n",
-			       area->size, (unsigned long) area->base);
-	}
-	return 0;
+	if (dma_declare_contiguous(&s5p_device_mfc_l.dev, lsize, lbase, 0))
+		printk(KERN_ERR "Failed to reserve memory for MFC device (%u bytes at 0x%08lx)\n",
+		       rsize, (unsigned long) rbase);
 }
-device_initcall(s5p_mfc_memory_init);
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* RE: [PATCH 7/7] ARM: integrate CMA with DMA-mapping subsystem
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-06 14:18     ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 14:18 UTC (permalink / raw)
  To: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig
  Cc: 'Michal Nazarewicz', 'Kyungmin Park',
	'Russell King', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Arnd Bergmann', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong', 'Dave Hansen'

Hello,

On Thursday, October 06, 2011 3:55 PM Marek Szyprowski wrote:

> This patch adds support for CMA to dma-mapping subsystem for ARM
> architecture. By default a global CMA area is used, but specific devices
> are allowed to have their private memory areas if required (they can be
> created with dma_declare_contiguous() function during board
> initialization).
> 
> Contiguous memory areas reserved for DMA are remapped with 2-level page
> tables on boot. Once a buffer is requested, a low memory kernel mapping
> is updated to to match requested memory access type.
> 
> GFP_ATOMIC allocations are performed from special pool which is created
> early during boot. This way remapping page attributes is not needed on
> allocation time.
> 
> CMA has been enabled unconditionally for ARMv6+ systems.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Please ignore this patch. The patch named as "[PATCH 8/9] ARM: integrate
CMA with DMA-mapping subsystem" in this thread is the correct one.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH 7/7] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-06 14:18     ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 14:18 UTC (permalink / raw)
  To: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig
  Cc: 'Michal Nazarewicz', 'Kyungmin Park',
	'Russell King', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Arnd Bergmann', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong', 'Dave Hansen'

Hello,

On Thursday, October 06, 2011 3:55 PM Marek Szyprowski wrote:

> This patch adds support for CMA to dma-mapping subsystem for ARM
> architecture. By default a global CMA area is used, but specific devices
> are allowed to have their private memory areas if required (they can be
> created with dma_declare_contiguous() function during board
> initialization).
> 
> Contiguous memory areas reserved for DMA are remapped with 2-level page
> tables on boot. Once a buffer is requested, a low memory kernel mapping
> is updated to to match requested memory access type.
> 
> GFP_ATOMIC allocations are performed from special pool which is created
> early during boot. This way remapping page attributes is not needed on
> allocation time.
> 
> CMA has been enabled unconditionally for ARMv6+ systems.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Please ignore this patch. The patch named as "[PATCH 8/9] ARM: integrate
CMA with DMA-mapping subsystem" in this thread is the correct one.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 7/7] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-06 14:18     ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-06 14:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Thursday, October 06, 2011 3:55 PM Marek Szyprowski wrote:

> This patch adds support for CMA to dma-mapping subsystem for ARM
> architecture. By default a global CMA area is used, but specific devices
> are allowed to have their private memory areas if required (they can be
> created with dma_declare_contiguous() function during board
> initialization).
> 
> Contiguous memory areas reserved for DMA are remapped with 2-level page
> tables on boot. Once a buffer is requested, a low memory kernel mapping
> is updated to to match requested memory access type.
> 
> GFP_ATOMIC allocations are performed from special pool which is created
> early during boot. This way remapping page attributes is not needed on
> allocation time.
> 
> CMA has been enabled unconditionally for ARMv6+ systems.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Please ignore this patch. The patch named as "[PATCH 8/9] ARM: integrate
CMA with DMA-mapping subsystem" in this thread is the correct one.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
@ 2011-10-07 16:27   ` Arnd Bergmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Arnd Bergmann @ 2011-10-07 16:27 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thursday 06 October 2011, Marek Szyprowski wrote:
> Once again I decided to post an updated version of the Contiguous Memory
> Allocator patches.
> 
> This version provides mainly a bugfix for a very rare issue that might
> have changed migration type of the CMA page blocks resulting in dropping
> CMA features from the affected page block and causing memory allocation
> to fail. Also the issue reported by Dave Hansen has been fixed.
> 
> This version also introduces basic support for x86 architecture, what
> allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> hope this will result in wider testing, comments and easier merging to
> mainline.

Hi Marek,

I think we need to finally get this into linux-next now, to get some
broader testing. Having the x86 patch definitely helps here becauses
it potentially exposes the code to many more testers.

IMHO it would be good to merge the entire series into 3.2, since
the ARM portion fixes an important bug (double mapping of memory
ranges with conflicting attributes) that we've lived with for far
too long, but it really depends on how everyone sees the risk
for regressions here. If something breaks in unfixable ways before
the 3.2 release, we can always revert the patches and have another
try later.

It's also not clear how we should merge it. Ideally the first bunch
would go through linux-mm, and the architecture specific patches
through the respective architecture trees, but there is an obvious
inderdependency between these sets.

Russell, Andrew, are you both comfortable with putting the entire
set into linux-mm to solve this? Do you see this as 3.2 or rather
as 3.3 material?

	Arnd

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-07 16:27   ` Arnd Bergmann
  0 siblings, 0 replies; 180+ messages in thread
From: Arnd Bergmann @ 2011-10-07 16:27 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thursday 06 October 2011, Marek Szyprowski wrote:
> Once again I decided to post an updated version of the Contiguous Memory
> Allocator patches.
> 
> This version provides mainly a bugfix for a very rare issue that might
> have changed migration type of the CMA page blocks resulting in dropping
> CMA features from the affected page block and causing memory allocation
> to fail. Also the issue reported by Dave Hansen has been fixed.
> 
> This version also introduces basic support for x86 architecture, what
> allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> hope this will result in wider testing, comments and easier merging to
> mainline.

Hi Marek,

I think we need to finally get this into linux-next now, to get some
broader testing. Having the x86 patch definitely helps here becauses
it potentially exposes the code to many more testers.

IMHO it would be good to merge the entire series into 3.2, since
the ARM portion fixes an important bug (double mapping of memory
ranges with conflicting attributes) that we've lived with for far
too long, but it really depends on how everyone sees the risk
for regressions here. If something breaks in unfixable ways before
the 3.2 release, we can always revert the patches and have another
try later.

It's also not clear how we should merge it. Ideally the first bunch
would go through linux-mm, and the architecture specific patches
through the respective architecture trees, but there is an obvious
inderdependency between these sets.

Russell, Andrew, are you both comfortable with putting the entire
set into linux-mm to solve this? Do you see this as 3.2 or rather
as 3.3 material?

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-07 16:27   ` Arnd Bergmann
  0 siblings, 0 replies; 180+ messages in thread
From: Arnd Bergmann @ 2011-10-07 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 06 October 2011, Marek Szyprowski wrote:
> Once again I decided to post an updated version of the Contiguous Memory
> Allocator patches.
> 
> This version provides mainly a bugfix for a very rare issue that might
> have changed migration type of the CMA page blocks resulting in dropping
> CMA features from the affected page block and causing memory allocation
> to fail. Also the issue reported by Dave Hansen has been fixed.
> 
> This version also introduces basic support for x86 architecture, what
> allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> hope this will result in wider testing, comments and easier merging to
> mainline.

Hi Marek,

I think we need to finally get this into linux-next now, to get some
broader testing. Having the x86 patch definitely helps here becauses
it potentially exposes the code to many more testers.

IMHO it would be good to merge the entire series into 3.2, since
the ARM portion fixes an important bug (double mapping of memory
ranges with conflicting attributes) that we've lived with for far
too long, but it really depends on how everyone sees the risk
for regressions here. If something breaks in unfixable ways before
the 3.2 release, we can always revert the patches and have another
try later.

It's also not clear how we should merge it. Ideally the first bunch
would go through linux-mm, and the architecture specific patches
through the respective architecture trees, but there is an obvious
inderdependency between these sets.

Russell, Andrew, are you both comfortable with putting the entire
set into linux-mm to solve this? Do you see this as 3.2 or rather
as 3.3 material?

	Arnd

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-07 16:27   ` Arnd Bergmann
  (?)
@ 2011-10-10  6:58     ` Ohad Ben-Cohen
  -1 siblings, 0 replies; 180+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10  6:58 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, Ankita Garg, Daniel Walker, Russell King,
	Jesse Barker, Mel Gorman, Chunsang Jeong, Jonathan Corbet,
	linux-kernel, Michal Nazarewicz, Dave Hansen, linaro-mm-sig,
	linux-mm, Kyungmin Park, KAMEZAWA Hiroyuki, Andrew Morton,
	linux-arm-kernel, linux-media

On Fri, Oct 7, 2011 at 6:27 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> IMHO it would be good to merge the entire series into 3.2, since
> the ARM portion fixes an important bug (double mapping of memory
> ranges with conflicting attributes) that we've lived with for far
> too long, but it really depends on how everyone sees the risk
> for regressions here. If something breaks in unfixable ways before
> the 3.2 release, we can always revert the patches and have another
> try later.

I didn't thoroughly review the patches, but I did try them out (to be
precise, I tried v15) on an OMAP4 PandaBoard, and really liked the
result.

The interfaces seem clean and convenient and things seem to work (I
used a private CMA pool with rpmsg and remoteproc, but also noticed
that several other drivers were utilizing the global pool). And with
this in hand we can finally ditch the old reserve+ioremap approach.

So from a user perspective, I sure do hope this patch set gets into
3.2; hopefully we can just fix anything that would show up during the
3.2 cycle.

Marek, Michal (and everyone involved!), thanks so much for pushing
this! Judging from the history of this patch set and the areas that it
touches (and from the number of LWN articles ;) it looks like a
considerable feat.

FWIW, feel free to add my

Tested-by: Ohad Ben-Cohen <ohad@wizery.com>

(small and optional comment: I think it'd be nice if
dma_declare_contiguous would fail if called too late, otherwise users
of that misconfigured device will end up using the global pool without
easily knowing that something went wrong)

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-10  6:58     ` Ohad Ben-Cohen
  0 siblings, 0 replies; 180+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10  6:58 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, Ankita Garg, Daniel Walker, Russell King,
	Jesse Barker, Mel Gorman, Chunsang Jeong, Jonathan Corbet,
	linux-kernel, Michal Nazarewicz, Dave Hansen, linaro-mm-sig,
	linux-mm, Kyungmin Park, KAMEZAWA Hiroyuki, Andrew Morton,
	linux-arm-kernel, linux-media

On Fri, Oct 7, 2011 at 6:27 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> IMHO it would be good to merge the entire series into 3.2, since
> the ARM portion fixes an important bug (double mapping of memory
> ranges with conflicting attributes) that we've lived with for far
> too long, but it really depends on how everyone sees the risk
> for regressions here. If something breaks in unfixable ways before
> the 3.2 release, we can always revert the patches and have another
> try later.

I didn't thoroughly review the patches, but I did try them out (to be
precise, I tried v15) on an OMAP4 PandaBoard, and really liked the
result.

The interfaces seem clean and convenient and things seem to work (I
used a private CMA pool with rpmsg and remoteproc, but also noticed
that several other drivers were utilizing the global pool). And with
this in hand we can finally ditch the old reserve+ioremap approach.

So from a user perspective, I sure do hope this patch set gets into
3.2; hopefully we can just fix anything that would show up during the
3.2 cycle.

Marek, Michal (and everyone involved!), thanks so much for pushing
this! Judging from the history of this patch set and the areas that it
touches (and from the number of LWN articles ;) it looks like a
considerable feat.

FWIW, feel free to add my

Tested-by: Ohad Ben-Cohen <ohad@wizery.com>

(small and optional comment: I think it'd be nice if
dma_declare_contiguous would fail if called too late, otherwise users
of that misconfigured device will end up using the global pool without
easily knowing that something went wrong)

Thanks,
Ohad.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-10  6:58     ` Ohad Ben-Cohen
  0 siblings, 0 replies; 180+ messages in thread
From: Ohad Ben-Cohen @ 2011-10-10  6:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 7, 2011 at 6:27 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> IMHO it would be good to merge the entire series into 3.2, since
> the ARM portion fixes an important bug (double mapping of memory
> ranges with conflicting attributes) that we've lived with for far
> too long, but it really depends on how everyone sees the risk
> for regressions here. If something breaks in unfixable ways before
> the 3.2 release, we can always revert the patches and have another
> try later.

I didn't thoroughly review the patches, but I did try them out (to be
precise, I tried v15) on an OMAP4 PandaBoard, and really liked the
result.

The interfaces seem clean and convenient and things seem to work (I
used a private CMA pool with rpmsg and remoteproc, but also noticed
that several other drivers were utilizing the global pool). And with
this in hand we can finally ditch the old reserve+ioremap approach.

So from a user perspective, I sure do hope this patch set gets into
3.2; hopefully we can just fix anything that would show up during the
3.2 cycle.

Marek, Michal (and everyone involved!), thanks so much for pushing
this! Judging from the history of this patch set and the areas that it
touches (and from the number of LWN articles ;) it looks like a
considerable feat.

FWIW, feel free to add my

Tested-by: Ohad Ben-Cohen <ohad@wizery.com>

(small and optional comment: I think it'd be nice if
dma_declare_contiguous would fail if called too late, otherwise users
of that misconfigured device will end up using the global pool without
easily knowing that something went wrong)

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-10  6:58     ` Ohad Ben-Cohen
  (?)
@ 2011-10-10 12:02       ` Clark, Rob
  -1 siblings, 0 replies; 180+ messages in thread
From: Clark, Rob @ 2011-10-10 12:02 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: Arnd Bergmann, linux-arm-kernel, Daniel Walker, Russell King,
	Jonathan Corbet, Mel Gorman, Chunsang Jeong, linux-kernel,
	Michal Nazarewicz, Dave Hansen, linaro-mm-sig, Jesse Barker,
	Kyungmin Park, Ankita Garg, Andrew Morton, linux-media, linux-mm,
	KAMEZAWA Hiroyuki

On Mon, Oct 10, 2011 at 1:58 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Fri, Oct 7, 2011 at 6:27 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> IMHO it would be good to merge the entire series into 3.2, since
>> the ARM portion fixes an important bug (double mapping of memory
>> ranges with conflicting attributes) that we've lived with for far
>> too long, but it really depends on how everyone sees the risk
>> for regressions here. If something breaks in unfixable ways before
>> the 3.2 release, we can always revert the patches and have another
>> try later.
>
> I didn't thoroughly review the patches, but I did try them out (to be
> precise, I tried v15) on an OMAP4 PandaBoard, and really liked the
> result.
>
> The interfaces seem clean and convenient and things seem to work (I
> used a private CMA pool with rpmsg and remoteproc, but also noticed
> that several other drivers were utilizing the global pool). And with
> this in hand we can finally ditch the old reserve+ioremap approach.
>
> So from a user perspective, I sure do hope this patch set gets into
> 3.2; hopefully we can just fix anything that would show up during the
> 3.2 cycle.
>
> Marek, Michal (and everyone involved!), thanks so much for pushing
> this! Judging from the history of this patch set and the areas that it
> touches (and from the number of LWN articles ;) it looks like a
> considerable feat.
>
> FWIW, feel free to add my
>
> Tested-by: Ohad Ben-Cohen <ohad@wizery.com>

Marek, I guess I forgot to mention earlier, but I've been using CMA
for a couple of weeks now with omapdrm driver, so you can also add my:

Tested-by: Rob Clark <rob@ti.com>

BR,
-R

> (small and optional comment: I think it'd be nice if
> dma_declare_contiguous would fail if called too late, otherwise users
> of that misconfigured device will end up using the global pool without
> easily knowing that something went wrong)
>
> Thanks,
> Ohad.
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-10 12:02       ` Clark, Rob
  0 siblings, 0 replies; 180+ messages in thread
From: Clark, Rob @ 2011-10-10 12:02 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: Arnd Bergmann, linux-arm-kernel, Daniel Walker, Russell King,
	Jonathan Corbet, Mel Gorman, Chunsang Jeong, linux-kernel,
	Michal Nazarewicz, Dave Hansen, linaro-mm-sig, Jesse Barker,
	Kyungmin Park, Ankita Garg, Andrew Morton, linux-media, linux-mm,
	KAMEZAWA Hiroyuki

On Mon, Oct 10, 2011 at 1:58 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Fri, Oct 7, 2011 at 6:27 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> IMHO it would be good to merge the entire series into 3.2, since
>> the ARM portion fixes an important bug (double mapping of memory
>> ranges with conflicting attributes) that we've lived with for far
>> too long, but it really depends on how everyone sees the risk
>> for regressions here. If something breaks in unfixable ways before
>> the 3.2 release, we can always revert the patches and have another
>> try later.
>
> I didn't thoroughly review the patches, but I did try them out (to be
> precise, I tried v15) on an OMAP4 PandaBoard, and really liked the
> result.
>
> The interfaces seem clean and convenient and things seem to work (I
> used a private CMA pool with rpmsg and remoteproc, but also noticed
> that several other drivers were utilizing the global pool). And with
> this in hand we can finally ditch the old reserve+ioremap approach.
>
> So from a user perspective, I sure do hope this patch set gets into
> 3.2; hopefully we can just fix anything that would show up during the
> 3.2 cycle.
>
> Marek, Michal (and everyone involved!), thanks so much for pushing
> this! Judging from the history of this patch set and the areas that it
> touches (and from the number of LWN articles ;) it looks like a
> considerable feat.
>
> FWIW, feel free to add my
>
> Tested-by: Ohad Ben-Cohen <ohad@wizery.com>

Marek, I guess I forgot to mention earlier, but I've been using CMA
for a couple of weeks now with omapdrm driver, so you can also add my:

Tested-by: Rob Clark <rob@ti.com>

BR,
-R

> (small and optional comment: I think it'd be nice if
> dma_declare_contiguous would fail if called too late, otherwise users
> of that misconfigured device will end up using the global pool without
> easily knowing that something went wrong)
>
> Thanks,
> Ohad.
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-10 12:02       ` Clark, Rob
  0 siblings, 0 replies; 180+ messages in thread
From: Clark, Rob @ 2011-10-10 12:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 10, 2011 at 1:58 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Fri, Oct 7, 2011 at 6:27 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> IMHO it would be good to merge the entire series into 3.2, since
>> the ARM portion fixes an important bug (double mapping of memory
>> ranges with conflicting attributes) that we've lived with for far
>> too long, but it really depends on how everyone sees the risk
>> for regressions here. If something breaks in unfixable ways before
>> the 3.2 release, we can always revert the patches and have another
>> try later.
>
> I didn't thoroughly review the patches, but I did try them out (to be
> precise, I tried v15) on an OMAP4 PandaBoard, and really liked the
> result.
>
> The interfaces seem clean and convenient and things seem to work (I
> used a private CMA pool with rpmsg and remoteproc, but also noticed
> that several other drivers were utilizing the global pool). And with
> this in hand we can finally ditch the old reserve+ioremap approach.
>
> So from a user perspective, I sure do hope this patch set gets into
> 3.2; hopefully we can just fix anything that would show up during the
> 3.2 cycle.
>
> Marek, Michal (and everyone involved!), thanks so much for pushing
> this! Judging from the history of this patch set and the areas that it
> touches (and from the number of LWN articles ;) it looks like a
> considerable feat.
>
> FWIW, feel free to add my
>
> Tested-by: Ohad Ben-Cohen <ohad@wizery.com>

Marek, I guess I forgot to mention earlier, but I've been using CMA
for a couple of weeks now with omapdrm driver, so you can also add my:

Tested-by: Rob Clark <rob@ti.com>

BR,
-R

> (small and optional comment: I think it'd be nice if
> dma_declare_contiguous would fail if called too late, otherwise users
> of that misconfigured device will end up using the global pool without
> easily knowing that something went wrong)
>
> Thanks,
> Ohad.
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig at lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-06 13:54 ` Marek Szyprowski
  (?)
  (?)
@ 2011-10-10 12:07   ` Maxime Coquelin
  -1 siblings, 0 replies; 180+ messages in thread
From: Maxime Coquelin @ 2011-10-10 12:07 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Chunsang Jeong, Michal Nazarewicz,
	Dave Hansen, Jesse Barker, Kyungmin Park, Ankita Garg,
	Andrew Morton, KAMEZAWA Hiroyuki, benjamin.gaignard, frq09524,
	vincent.guittot

On 10/06/2011 03:54 PM, Marek Szyprowski wrote:
> Welcome everyone again,
>
> Once again I decided to post an updated version of the Contiguous Memory
> Allocator patches.
>
> This version provides mainly a bugfix for a very rare issue that might
> have changed migration type of the CMA page blocks resulting in dropping
> CMA features from the affected page block and causing memory allocation
> to fail. Also the issue reported by Dave Hansen has been fixed.
>
> This version also introduces basic support for x86 architecture, what
> allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> hope this will result in wider testing, comments and easier merging to
> mainline.
>
> I've also dropped an examplary patch for s5p-fimc platform device
> private memory declaration and added the one from real life. CMA device
> private memory regions are defined for s5p-mfc device to let it allocate
> buffers from two memory banks.
>
> ARM integration code has not been changed since last version, it
> provides implementation of all the ideas that has been discussed during

Hello Marek,

     We are currently testing CMA (v16) on Snowball platform.
     This feature is very promising, thanks for pushing it!

     During our stress tests, we encountered some problems :

     1) Contiguous allocation lockup:
         When system RAM is full of Anon pages, if we try to allocate a 
contiguous buffer greater than the min_free value, we face a 
dma_alloc_from_contiguous lockup.
         The expected result would be dma_alloc_from_contiguous() to fail.
         The problem is reproduced systematically on our side.

     2) Contiguous allocation fail:
         We have developed a small driver and a shell script to 
allocate/release contiguous buffers.
         Sometimes, dma_alloc_from_contiguous() fails to allocate the 
contiguous buffer (about once every 30 runs).
         We have 270MB Memory passed to the kernel in our configuration, 
and the CMA pool is 90MB large.
         In this setup, the overall memory is either free or full of 
reclaimable pages.


     For now, we didn't had time to investigate further theses problems.
     Have you already faced this kind of issues?
     Could someone testing CMA on other boards confirm/infirm theses 
problems?

Best regards,
Maxime



> Patches in this patchset:
>
>    mm: move some functions from memory_hotplug.c to page_isolation.c
>    mm: alloc_contig_freed_pages() added
>
>      Code "stolen" from Kamezawa.  The first patch just moves code
>      around and the second provide function for "allocates" already
>      freed memory.
>
>    mm: alloc_contig_range() added
>
>      This is what Kamezawa asked: a function that tries to migrate all
>      pages from given range and then use alloc_contig_freed_pages()
>      (defined by the previous commit) to allocate those pages.
>
>    mm: MIGRATE_CMA migration type added
>    mm: MIGRATE_CMA isolation functions added
>
>      Introduction of the new migratetype and support for it in CMA.
>      MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
>      memory range can be marked as one.
>
>    mm: cma: Contiguous Memory Allocator added
>
>      The code CMA code. Manages CMA contexts and performs memory
>      allocations.
>
>    X86: integrate CMA with DMA-mapping subsystem
>    ARM: integrate CMA with dma-mapping subsystem
>
>      Main clients of CMA framework. CMA serves as a alloc_pages()
>      replacement.
>
>    ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
>
>      Use CMA device private memory regions instead of custom solution
>      based on memblock_reserve() + dma_declare_coherent().
>
>
> Patch summary:
>
> KAMEZAWA Hiroyuki (2):
>    mm: move some functions from memory_hotplug.c to page_isolation.c
>    mm: alloc_contig_freed_pages() added
>
> Marek Szyprowski (4):
>    drivers: add Contiguous Memory Allocator
>    ARM: integrate CMA with DMA-mapping subsystem
>    ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
>    X86: integrate CMA with DMA-mapping subsystem
>
> Michal Nazarewicz (3):
>    mm: alloc_contig_range() added
>    mm: MIGRATE_CMA migration type added
>    mm: MIGRATE_CMA isolation functions added
>
>   arch/Kconfig                          |    3 +
>   arch/arm/Kconfig                      |    2 +
>   arch/arm/include/asm/dma-contiguous.h |   16 ++
>   arch/arm/include/asm/mach/map.h       |    1 +
>   arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++------
>   arch/arm/mm/init.c                    |    8 +
>   arch/arm/mm/mm.h                      |    3 +
>   arch/arm/mm/mmu.c                     |   29 ++-
>   arch/arm/plat-s5p/dev-mfc.c           |   51 +----
>   arch/x86/Kconfig                      |    1 +
>   arch/x86/include/asm/dma-contiguous.h |   13 +
>   arch/x86/include/asm/dma-mapping.h    |    4 +
>   arch/x86/kernel/pci-dma.c             |   18 ++-
>   arch/x86/kernel/pci-nommu.c           |    8 +-
>   arch/x86/kernel/setup.c               |    2 +
>   drivers/base/Kconfig                  |   79 +++++++
>   drivers/base/Makefile                 |    1 +
>   drivers/base/dma-contiguous.c         |  386 +++++++++++++++++++++++++++++++++
>   include/asm-generic/dma-contiguous.h  |   27 +++
>   include/linux/device.h                |    4 +
>   include/linux/dma-contiguous.h        |  106 +++++++++
>   include/linux/mmzone.h                |   57 +++++-
>   include/linux/page-isolation.h        |   53 ++++-
>   mm/Kconfig                            |    8 +-
>   mm/compaction.c                       |   10 +
>   mm/memory_hotplug.c                   |  111 ----------
>   mm/page_alloc.c                       |  317 +++++++++++++++++++++++++--
>   mm/page_isolation.c                   |  131 +++++++++++-
>   28 files changed, 1522 insertions(+), 289 deletions(-)
>   create mode 100644 arch/arm/include/asm/dma-contiguous.h
>   create mode 100644 arch/x86/include/asm/dma-contiguous.h
>   create mode 100644 drivers/base/dma-contiguous.c
>   create mode 100644 include/asm-generic/dma-contiguous.h
>   create mode 100644 include/linux/dma-contiguous.h
>
> --
> 1.7.1.569.g6f426
>
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-10 12:07   ` Maxime Coquelin
  0 siblings, 0 replies; 180+ messages in thread
From: Maxime Coquelin @ 2011-10-10 12:07 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Chunsang Jeong, Michal Nazarewicz,
	Dave Hansen, Jesse Barker, Kyungmin Park, Ankita Garg,
	Andrew Morton, KAMEZAWA Hiroyuki, benjamin.gaignard, frq09524,
	vincent.guittot

On 10/06/2011 03:54 PM, Marek Szyprowski wrote:
> Welcome everyone again,
>
> Once again I decided to post an updated version of the Contiguous Memory
> Allocator patches.
>
> This version provides mainly a bugfix for a very rare issue that might
> have changed migration type of the CMA page blocks resulting in dropping
> CMA features from the affected page block and causing memory allocation
> to fail. Also the issue reported by Dave Hansen has been fixed.
>
> This version also introduces basic support for x86 architecture, what
> allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> hope this will result in wider testing, comments and easier merging to
> mainline.
>
> I've also dropped an examplary patch for s5p-fimc platform device
> private memory declaration and added the one from real life. CMA device
> private memory regions are defined for s5p-mfc device to let it allocate
> buffers from two memory banks.
>
> ARM integration code has not been changed since last version, it
> provides implementation of all the ideas that has been discussed during

Hello Marek,

     We are currently testing CMA (v16) on Snowball platform.
     This feature is very promising, thanks for pushing it!

     During our stress tests, we encountered some problems :

     1) Contiguous allocation lockup:
         When system RAM is full of Anon pages, if we try to allocate a 
contiguous buffer greater than the min_free value, we face a 
dma_alloc_from_contiguous lockup.
         The expected result would be dma_alloc_from_contiguous() to fail.
         The problem is reproduced systematically on our side.

     2) Contiguous allocation fail:
         We have developed a small driver and a shell script to 
allocate/release contiguous buffers.
         Sometimes, dma_alloc_from_contiguous() fails to allocate the 
contiguous buffer (about once every 30 runs).
         We have 270MB Memory passed to the kernel in our configuration, 
and the CMA pool is 90MB large.
         In this setup, the overall memory is either free or full of 
reclaimable pages.


     For now, we didn't had time to investigate further theses problems.
     Have you already faced this kind of issues?
     Could someone testing CMA on other boards confirm/infirm theses 
problems?

Best regards,
Maxime



> Patches in this patchset:
>
>    mm: move some functions from memory_hotplug.c to page_isolation.c
>    mm: alloc_contig_freed_pages() added
>
>      Code "stolen" from Kamezawa.  The first patch just moves code
>      around and the second provide function for "allocates" already
>      freed memory.
>
>    mm: alloc_contig_range() added
>
>      This is what Kamezawa asked: a function that tries to migrate all
>      pages from given range and then use alloc_contig_freed_pages()
>      (defined by the previous commit) to allocate those pages.
>
>    mm: MIGRATE_CMA migration type added
>    mm: MIGRATE_CMA isolation functions added
>
>      Introduction of the new migratetype and support for it in CMA.
>      MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
>      memory range can be marked as one.
>
>    mm: cma: Contiguous Memory Allocator added
>
>      The code CMA code. Manages CMA contexts and performs memory
>      allocations.
>
>    X86: integrate CMA with DMA-mapping subsystem
>    ARM: integrate CMA with dma-mapping subsystem
>
>      Main clients of CMA framework. CMA serves as a alloc_pages()
>      replacement.
>
>    ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
>
>      Use CMA device private memory regions instead of custom solution
>      based on memblock_reserve() + dma_declare_coherent().
>
>
> Patch summary:
>
> KAMEZAWA Hiroyuki (2):
>    mm: move some functions from memory_hotplug.c to page_isolation.c
>    mm: alloc_contig_freed_pages() added
>
> Marek Szyprowski (4):
>    drivers: add Contiguous Memory Allocator
>    ARM: integrate CMA with DMA-mapping subsystem
>    ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
>    X86: integrate CMA with DMA-mapping subsystem
>
> Michal Nazarewicz (3):
>    mm: alloc_contig_range() added
>    mm: MIGRATE_CMA migration type added
>    mm: MIGRATE_CMA isolation functions added
>
>   arch/Kconfig                          |    3 +
>   arch/arm/Kconfig                      |    2 +
>   arch/arm/include/asm/dma-contiguous.h |   16 ++
>   arch/arm/include/asm/mach/map.h       |    1 +
>   arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++------
>   arch/arm/mm/init.c                    |    8 +
>   arch/arm/mm/mm.h                      |    3 +
>   arch/arm/mm/mmu.c                     |   29 ++-
>   arch/arm/plat-s5p/dev-mfc.c           |   51 +----
>   arch/x86/Kconfig                      |    1 +
>   arch/x86/include/asm/dma-contiguous.h |   13 +
>   arch/x86/include/asm/dma-mapping.h    |    4 +
>   arch/x86/kernel/pci-dma.c             |   18 ++-
>   arch/x86/kernel/pci-nommu.c           |    8 +-
>   arch/x86/kernel/setup.c               |    2 +
>   drivers/base/Kconfig                  |   79 +++++++
>   drivers/base/Makefile                 |    1 +
>   drivers/base/dma-contiguous.c         |  386 +++++++++++++++++++++++++++++++++
>   include/asm-generic/dma-contiguous.h  |   27 +++
>   include/linux/device.h                |    4 +
>   include/linux/dma-contiguous.h        |  106 +++++++++
>   include/linux/mmzone.h                |   57 +++++-
>   include/linux/page-isolation.h        |   53 ++++-
>   mm/Kconfig                            |    8 +-
>   mm/compaction.c                       |   10 +
>   mm/memory_hotplug.c                   |  111 ----------
>   mm/page_alloc.c                       |  317 +++++++++++++++++++++++++--
>   mm/page_isolation.c                   |  131 +++++++++++-
>   28 files changed, 1522 insertions(+), 289 deletions(-)
>   create mode 100644 arch/arm/include/asm/dma-contiguous.h
>   create mode 100644 arch/x86/include/asm/dma-contiguous.h
>   create mode 100644 drivers/base/dma-contiguous.c
>   create mode 100644 include/asm-generic/dma-contiguous.h
>   create mode 100644 include/linux/dma-contiguous.h
>
> --
> 1.7.1.569.g6f426
>
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-10 12:07   ` Maxime Coquelin
  0 siblings, 0 replies; 180+ messages in thread
From: Maxime Coquelin @ 2011-10-10 12:07 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Chunsang Jeong, Michal Nazarewicz,
	Dave Hansen, Jesse Barker, Kyungmin Park, Ankita Garg,
	Andrew Morton, KAMEZAWA Hiroyuki, benjamin.gaignard, frq09524,
	vincent.guittot

On 10/06/2011 03:54 PM, Marek Szyprowski wrote:
> Welcome everyone again,
>
> Once again I decided to post an updated version of the Contiguous Memory
> Allocator patches.
>
> This version provides mainly a bugfix for a very rare issue that might
> have changed migration type of the CMA page blocks resulting in dropping
> CMA features from the affected page block and causing memory allocation
> to fail. Also the issue reported by Dave Hansen has been fixed.
>
> This version also introduces basic support for x86 architecture, what
> allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> hope this will result in wider testing, comments and easier merging to
> mainline.
>
> I've also dropped an examplary patch for s5p-fimc platform device
> private memory declaration and added the one from real life. CMA device
> private memory regions are defined for s5p-mfc device to let it allocate
> buffers from two memory banks.
>
> ARM integration code has not been changed since last version, it
> provides implementation of all the ideas that has been discussed during

Hello Marek,

     We are currently testing CMA (v16) on Snowball platform.
     This feature is very promising, thanks for pushing it!

     During our stress tests, we encountered some problems :

     1) Contiguous allocation lockup:
         When system RAM is full of Anon pages, if we try to allocate a 
contiguous buffer greater than the min_free value, we face a 
dma_alloc_from_contiguous lockup.
         The expected result would be dma_alloc_from_contiguous() to fail.
         The problem is reproduced systematically on our side.

     2) Contiguous allocation fail:
         We have developed a small driver and a shell script to 
allocate/release contiguous buffers.
         Sometimes, dma_alloc_from_contiguous() fails to allocate the 
contiguous buffer (about once every 30 runs).
         We have 270MB Memory passed to the kernel in our configuration, 
and the CMA pool is 90MB large.
         In this setup, the overall memory is either free or full of 
reclaimable pages.


     For now, we didn't had time to investigate further theses problems.
     Have you already faced this kind of issues?
     Could someone testing CMA on other boards confirm/infirm theses 
problems?

Best regards,
Maxime



> Patches in this patchset:
>
>    mm: move some functions from memory_hotplug.c to page_isolation.c
>    mm: alloc_contig_freed_pages() added
>
>      Code "stolen" from Kamezawa.  The first patch just moves code
>      around and the second provide function for "allocates" already
>      freed memory.
>
>    mm: alloc_contig_range() added
>
>      This is what Kamezawa asked: a function that tries to migrate all
>      pages from given range and then use alloc_contig_freed_pages()
>      (defined by the previous commit) to allocate those pages.
>
>    mm: MIGRATE_CMA migration type added
>    mm: MIGRATE_CMA isolation functions added
>
>      Introduction of the new migratetype and support for it in CMA.
>      MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
>      memory range can be marked as one.
>
>    mm: cma: Contiguous Memory Allocator added
>
>      The code CMA code. Manages CMA contexts and performs memory
>      allocations.
>
>    X86: integrate CMA with DMA-mapping subsystem
>    ARM: integrate CMA with dma-mapping subsystem
>
>      Main clients of CMA framework. CMA serves as a alloc_pages()
>      replacement.
>
>    ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
>
>      Use CMA device private memory regions instead of custom solution
>      based on memblock_reserve() + dma_declare_coherent().
>
>
> Patch summary:
>
> KAMEZAWA Hiroyuki (2):
>    mm: move some functions from memory_hotplug.c to page_isolation.c
>    mm: alloc_contig_freed_pages() added
>
> Marek Szyprowski (4):
>    drivers: add Contiguous Memory Allocator
>    ARM: integrate CMA with DMA-mapping subsystem
>    ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
>    X86: integrate CMA with DMA-mapping subsystem
>
> Michal Nazarewicz (3):
>    mm: alloc_contig_range() added
>    mm: MIGRATE_CMA migration type added
>    mm: MIGRATE_CMA isolation functions added
>
>   arch/Kconfig                          |    3 +
>   arch/arm/Kconfig                      |    2 +
>   arch/arm/include/asm/dma-contiguous.h |   16 ++
>   arch/arm/include/asm/mach/map.h       |    1 +
>   arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++------
>   arch/arm/mm/init.c                    |    8 +
>   arch/arm/mm/mm.h                      |    3 +
>   arch/arm/mm/mmu.c                     |   29 ++-
>   arch/arm/plat-s5p/dev-mfc.c           |   51 +----
>   arch/x86/Kconfig                      |    1 +
>   arch/x86/include/asm/dma-contiguous.h |   13 +
>   arch/x86/include/asm/dma-mapping.h    |    4 +
>   arch/x86/kernel/pci-dma.c             |   18 ++-
>   arch/x86/kernel/pci-nommu.c           |    8 +-
>   arch/x86/kernel/setup.c               |    2 +
>   drivers/base/Kconfig                  |   79 +++++++
>   drivers/base/Makefile                 |    1 +
>   drivers/base/dma-contiguous.c         |  386 +++++++++++++++++++++++++++++++++
>   include/asm-generic/dma-contiguous.h  |   27 +++
>   include/linux/device.h                |    4 +
>   include/linux/dma-contiguous.h        |  106 +++++++++
>   include/linux/mmzone.h                |   57 +++++-
>   include/linux/page-isolation.h        |   53 ++++-
>   mm/Kconfig                            |    8 +-
>   mm/compaction.c                       |   10 +
>   mm/memory_hotplug.c                   |  111 ----------
>   mm/page_alloc.c                       |  317 +++++++++++++++++++++++++--
>   mm/page_isolation.c                   |  131 +++++++++++-
>   28 files changed, 1522 insertions(+), 289 deletions(-)
>   create mode 100644 arch/arm/include/asm/dma-contiguous.h
>   create mode 100644 arch/x86/include/asm/dma-contiguous.h
>   create mode 100644 drivers/base/dma-contiguous.c
>   create mode 100644 include/asm-generic/dma-contiguous.h
>   create mode 100644 include/linux/dma-contiguous.h
>
> --
> 1.7.1.569.g6f426
>
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-10 12:07   ` Maxime Coquelin
  0 siblings, 0 replies; 180+ messages in thread
From: Maxime Coquelin @ 2011-10-10 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/06/2011 03:54 PM, Marek Szyprowski wrote:
> Welcome everyone again,
>
> Once again I decided to post an updated version of the Contiguous Memory
> Allocator patches.
>
> This version provides mainly a bugfix for a very rare issue that might
> have changed migration type of the CMA page blocks resulting in dropping
> CMA features from the affected page block and causing memory allocation
> to fail. Also the issue reported by Dave Hansen has been fixed.
>
> This version also introduces basic support for x86 architecture, what
> allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> hope this will result in wider testing, comments and easier merging to
> mainline.
>
> I've also dropped an examplary patch for s5p-fimc platform device
> private memory declaration and added the one from real life. CMA device
> private memory regions are defined for s5p-mfc device to let it allocate
> buffers from two memory banks.
>
> ARM integration code has not been changed since last version, it
> provides implementation of all the ideas that has been discussed during

Hello Marek,

     We are currently testing CMA (v16) on Snowball platform.
     This feature is very promising, thanks for pushing it!

     During our stress tests, we encountered some problems :

     1) Contiguous allocation lockup:
         When system RAM is full of Anon pages, if we try to allocate a 
contiguous buffer greater than the min_free value, we face a 
dma_alloc_from_contiguous lockup.
         The expected result would be dma_alloc_from_contiguous() to fail.
         The problem is reproduced systematically on our side.

     2) Contiguous allocation fail:
         We have developed a small driver and a shell script to 
allocate/release contiguous buffers.
         Sometimes, dma_alloc_from_contiguous() fails to allocate the 
contiguous buffer (about once every 30 runs).
         We have 270MB Memory passed to the kernel in our configuration, 
and the CMA pool is 90MB large.
         In this setup, the overall memory is either free or full of 
reclaimable pages.


     For now, we didn't had time to investigate further theses problems.
     Have you already faced this kind of issues?
     Could someone testing CMA on other boards confirm/infirm theses 
problems?

Best regards,
Maxime



> Patches in this patchset:
>
>    mm: move some functions from memory_hotplug.c to page_isolation.c
>    mm: alloc_contig_freed_pages() added
>
>      Code "stolen" from Kamezawa.  The first patch just moves code
>      around and the second provide function for "allocates" already
>      freed memory.
>
>    mm: alloc_contig_range() added
>
>      This is what Kamezawa asked: a function that tries to migrate all
>      pages from given range and then use alloc_contig_freed_pages()
>      (defined by the previous commit) to allocate those pages.
>
>    mm: MIGRATE_CMA migration type added
>    mm: MIGRATE_CMA isolation functions added
>
>      Introduction of the new migratetype and support for it in CMA.
>      MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
>      memory range can be marked as one.
>
>    mm: cma: Contiguous Memory Allocator added
>
>      The code CMA code. Manages CMA contexts and performs memory
>      allocations.
>
>    X86: integrate CMA with DMA-mapping subsystem
>    ARM: integrate CMA with dma-mapping subsystem
>
>      Main clients of CMA framework. CMA serves as a alloc_pages()
>      replacement.
>
>    ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
>
>      Use CMA device private memory regions instead of custom solution
>      based on memblock_reserve() + dma_declare_coherent().
>
>
> Patch summary:
>
> KAMEZAWA Hiroyuki (2):
>    mm: move some functions from memory_hotplug.c to page_isolation.c
>    mm: alloc_contig_freed_pages() added
>
> Marek Szyprowski (4):
>    drivers: add Contiguous Memory Allocator
>    ARM: integrate CMA with DMA-mapping subsystem
>    ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
>    X86: integrate CMA with DMA-mapping subsystem
>
> Michal Nazarewicz (3):
>    mm: alloc_contig_range() added
>    mm: MIGRATE_CMA migration type added
>    mm: MIGRATE_CMA isolation functions added
>
>   arch/Kconfig                          |    3 +
>   arch/arm/Kconfig                      |    2 +
>   arch/arm/include/asm/dma-contiguous.h |   16 ++
>   arch/arm/include/asm/mach/map.h       |    1 +
>   arch/arm/mm/dma-mapping.c             |  362 +++++++++++++++++++++++++------
>   arch/arm/mm/init.c                    |    8 +
>   arch/arm/mm/mm.h                      |    3 +
>   arch/arm/mm/mmu.c                     |   29 ++-
>   arch/arm/plat-s5p/dev-mfc.c           |   51 +----
>   arch/x86/Kconfig                      |    1 +
>   arch/x86/include/asm/dma-contiguous.h |   13 +
>   arch/x86/include/asm/dma-mapping.h    |    4 +
>   arch/x86/kernel/pci-dma.c             |   18 ++-
>   arch/x86/kernel/pci-nommu.c           |    8 +-
>   arch/x86/kernel/setup.c               |    2 +
>   drivers/base/Kconfig                  |   79 +++++++
>   drivers/base/Makefile                 |    1 +
>   drivers/base/dma-contiguous.c         |  386 +++++++++++++++++++++++++++++++++
>   include/asm-generic/dma-contiguous.h  |   27 +++
>   include/linux/device.h                |    4 +
>   include/linux/dma-contiguous.h        |  106 +++++++++
>   include/linux/mmzone.h                |   57 +++++-
>   include/linux/page-isolation.h        |   53 ++++-
>   mm/Kconfig                            |    8 +-
>   mm/compaction.c                       |   10 +
>   mm/memory_hotplug.c                   |  111 ----------
>   mm/page_alloc.c                       |  317 +++++++++++++++++++++++++--
>   mm/page_isolation.c                   |  131 +++++++++++-
>   28 files changed, 1522 insertions(+), 289 deletions(-)
>   create mode 100644 arch/arm/include/asm/dma-contiguous.h
>   create mode 100644 arch/x86/include/asm/dma-contiguous.h
>   create mode 100644 drivers/base/dma-contiguous.c
>   create mode 100644 include/asm-generic/dma-contiguous.h
>   create mode 100644 include/linux/dma-contiguous.h
>
> --
> 1.7.1.569.g6f426
>
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig at lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-07 16:27   ` Arnd Bergmann
  (?)
@ 2011-10-10 22:56     ` Andrew Morton
  -1 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-10 22:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Russell King, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Fri, 7 Oct 2011 18:27:06 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Thursday 06 October 2011, Marek Szyprowski wrote:
> > Once again I decided to post an updated version of the Contiguous Memory
> > Allocator patches.
> > 
> > This version provides mainly a bugfix for a very rare issue that might
> > have changed migration type of the CMA page blocks resulting in dropping
> > CMA features from the affected page block and causing memory allocation
> > to fail. Also the issue reported by Dave Hansen has been fixed.
> > 
> > This version also introduces basic support for x86 architecture, what
> > allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> > hope this will result in wider testing, comments and easier merging to
> > mainline.
> 
> Hi Marek,
> 
> I think we need to finally get this into linux-next now, to get some
> broader testing. Having the x86 patch definitely helps here becauses
> it potentially exposes the code to many more testers.
> 
> IMHO it would be good to merge the entire series into 3.2, since
> the ARM portion fixes an important bug (double mapping of memory
> ranges with conflicting attributes) that we've lived with for far
> too long, but it really depends on how everyone sees the risk
> for regressions here. If something breaks in unfixable ways before
> the 3.2 release, we can always revert the patches and have another
> try later.
> 
> It's also not clear how we should merge it. Ideally the first bunch
> would go through linux-mm, and the architecture specific patches
> through the respective architecture trees, but there is an obvious
> inderdependency between these sets.
> 
> Russell, Andrew, are you both comfortable with putting the entire
> set into linux-mm to solve this? Do you see this as 3.2 or rather
> as 3.3 material?
> 

Russell's going to hate me, but...

I do know that he had substantial objections to at least earlier
versions of this, and he is a guy who knows of what he speaks.

So I would want to get a nod from rmk on this work before proceeding. 
If that nod isn't available then let's please identify the issues and
see what we can do about them.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-10 22:56     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-10 22:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Russell King, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Fri, 7 Oct 2011 18:27:06 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Thursday 06 October 2011, Marek Szyprowski wrote:
> > Once again I decided to post an updated version of the Contiguous Memory
> > Allocator patches.
> > 
> > This version provides mainly a bugfix for a very rare issue that might
> > have changed migration type of the CMA page blocks resulting in dropping
> > CMA features from the affected page block and causing memory allocation
> > to fail. Also the issue reported by Dave Hansen has been fixed.
> > 
> > This version also introduces basic support for x86 architecture, what
> > allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> > hope this will result in wider testing, comments and easier merging to
> > mainline.
> 
> Hi Marek,
> 
> I think we need to finally get this into linux-next now, to get some
> broader testing. Having the x86 patch definitely helps here becauses
> it potentially exposes the code to many more testers.
> 
> IMHO it would be good to merge the entire series into 3.2, since
> the ARM portion fixes an important bug (double mapping of memory
> ranges with conflicting attributes) that we've lived with for far
> too long, but it really depends on how everyone sees the risk
> for regressions here. If something breaks in unfixable ways before
> the 3.2 release, we can always revert the patches and have another
> try later.
> 
> It's also not clear how we should merge it. Ideally the first bunch
> would go through linux-mm, and the architecture specific patches
> through the respective architecture trees, but there is an obvious
> inderdependency between these sets.
> 
> Russell, Andrew, are you both comfortable with putting the entire
> set into linux-mm to solve this? Do you see this as 3.2 or rather
> as 3.3 material?
> 

Russell's going to hate me, but...

I do know that he had substantial objections to at least earlier
versions of this, and he is a guy who knows of what he speaks.

So I would want to get a nod from rmk on this work before proceeding. 
If that nod isn't available then let's please identify the issues and
see what we can do about them.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-10 22:56     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-10 22:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 7 Oct 2011 18:27:06 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Thursday 06 October 2011, Marek Szyprowski wrote:
> > Once again I decided to post an updated version of the Contiguous Memory
> > Allocator patches.
> > 
> > This version provides mainly a bugfix for a very rare issue that might
> > have changed migration type of the CMA page blocks resulting in dropping
> > CMA features from the affected page block and causing memory allocation
> > to fail. Also the issue reported by Dave Hansen has been fixed.
> > 
> > This version also introduces basic support for x86 architecture, what
> > allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> > hope this will result in wider testing, comments and easier merging to
> > mainline.
> 
> Hi Marek,
> 
> I think we need to finally get this into linux-next now, to get some
> broader testing. Having the x86 patch definitely helps here becauses
> it potentially exposes the code to many more testers.
> 
> IMHO it would be good to merge the entire series into 3.2, since
> the ARM portion fixes an important bug (double mapping of memory
> ranges with conflicting attributes) that we've lived with for far
> too long, but it really depends on how everyone sees the risk
> for regressions here. If something breaks in unfixable ways before
> the 3.2 release, we can always revert the patches and have another
> try later.
> 
> It's also not clear how we should merge it. Ideally the first bunch
> would go through linux-mm, and the architecture specific patches
> through the respective architecture trees, but there is an obvious
> inderdependency between these sets.
> 
> Russell, Andrew, are you both comfortable with putting the entire
> set into linux-mm to solve this? Do you see this as 3.2 or rather
> as 3.3 material?
> 

Russell's going to hate me, but...

I do know that he had substantial objections to at least earlier
versions of this, and he is a guy who knows of what he speaks.

So I would want to get a nod from rmk on this work before proceeding. 
If that nod isn't available then let's please identify the issues and
see what we can do about them.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-10 22:56     ` Andrew Morton
  (?)
@ 2011-10-11  6:57       ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11  6:57 UTC (permalink / raw)
  To: 'Andrew Morton', 'Arnd Bergmann'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Dave Hansen'

Hello,

On Tuesday, October 11, 2011 12:57 AM Andrew Morton wrote:

> On Fri, 7 Oct 2011 18:27:06 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Thursday 06 October 2011, Marek Szyprowski wrote:
> > > Once again I decided to post an updated version of the Contiguous Memory
> > > Allocator patches.
> > >
> > > This version provides mainly a bugfix for a very rare issue that might
> > > have changed migration type of the CMA page blocks resulting in dropping
> > > CMA features from the affected page block and causing memory allocation
> > > to fail. Also the issue reported by Dave Hansen has been fixed.
> > >
> > > This version also introduces basic support for x86 architecture, what
> > > allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> > > hope this will result in wider testing, comments and easier merging to
> > > mainline.
> >
> > Hi Marek,
> >
> > I think we need to finally get this into linux-next now, to get some
> > broader testing. Having the x86 patch definitely helps here becauses
> > it potentially exposes the code to many more testers.
> >
> > IMHO it would be good to merge the entire series into 3.2, since
> > the ARM portion fixes an important bug (double mapping of memory
> > ranges with conflicting attributes) that we've lived with for far
> > too long, but it really depends on how everyone sees the risk
> > for regressions here. If something breaks in unfixable ways before
> > the 3.2 release, we can always revert the patches and have another
> > try later.
> >
> > It's also not clear how we should merge it. Ideally the first bunch
> > would go through linux-mm, and the architecture specific patches
> > through the respective architecture trees, but there is an obvious
> > inderdependency between these sets.
> >
> > Russell, Andrew, are you both comfortable with putting the entire
> > set into linux-mm to solve this? Do you see this as 3.2 or rather
> > as 3.3 material?
> >
> 
> Russell's going to hate me, but...
> 
> I do know that he had substantial objections to at least earlier
> versions of this, and he is a guy who knows of what he speaks.

I've did my best to fix these issues. I'm still waiting for comments...

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11  6:57       ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11  6:57 UTC (permalink / raw)
  To: 'Andrew Morton', 'Arnd Bergmann'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Dave Hansen'

Hello,

On Tuesday, October 11, 2011 12:57 AM Andrew Morton wrote:

> On Fri, 7 Oct 2011 18:27:06 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Thursday 06 October 2011, Marek Szyprowski wrote:
> > > Once again I decided to post an updated version of the Contiguous Memory
> > > Allocator patches.
> > >
> > > This version provides mainly a bugfix for a very rare issue that might
> > > have changed migration type of the CMA page blocks resulting in dropping
> > > CMA features from the affected page block and causing memory allocation
> > > to fail. Also the issue reported by Dave Hansen has been fixed.
> > >
> > > This version also introduces basic support for x86 architecture, what
> > > allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> > > hope this will result in wider testing, comments and easier merging to
> > > mainline.
> >
> > Hi Marek,
> >
> > I think we need to finally get this into linux-next now, to get some
> > broader testing. Having the x86 patch definitely helps here becauses
> > it potentially exposes the code to many more testers.
> >
> > IMHO it would be good to merge the entire series into 3.2, since
> > the ARM portion fixes an important bug (double mapping of memory
> > ranges with conflicting attributes) that we've lived with for far
> > too long, but it really depends on how everyone sees the risk
> > for regressions here. If something breaks in unfixable ways before
> > the 3.2 release, we can always revert the patches and have another
> > try later.
> >
> > It's also not clear how we should merge it. Ideally the first bunch
> > would go through linux-mm, and the architecture specific patches
> > through the respective architecture trees, but there is an obvious
> > inderdependency between these sets.
> >
> > Russell, Andrew, are you both comfortable with putting the entire
> > set into linux-mm to solve this? Do you see this as 3.2 or rather
> > as 3.3 material?
> >
> 
> Russell's going to hate me, but...
> 
> I do know that he had substantial objections to at least earlier
> versions of this, and he is a guy who knows of what he speaks.

I've did my best to fix these issues. I'm still waiting for comments...

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11  6:57       ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11  6:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Tuesday, October 11, 2011 12:57 AM Andrew Morton wrote:

> On Fri, 7 Oct 2011 18:27:06 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Thursday 06 October 2011, Marek Szyprowski wrote:
> > > Once again I decided to post an updated version of the Contiguous Memory
> > > Allocator patches.
> > >
> > > This version provides mainly a bugfix for a very rare issue that might
> > > have changed migration type of the CMA page blocks resulting in dropping
> > > CMA features from the affected page block and causing memory allocation
> > > to fail. Also the issue reported by Dave Hansen has been fixed.
> > >
> > > This version also introduces basic support for x86 architecture, what
> > > allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> > > hope this will result in wider testing, comments and easier merging to
> > > mainline.
> >
> > Hi Marek,
> >
> > I think we need to finally get this into linux-next now, to get some
> > broader testing. Having the x86 patch definitely helps here becauses
> > it potentially exposes the code to many more testers.
> >
> > IMHO it would be good to merge the entire series into 3.2, since
> > the ARM portion fixes an important bug (double mapping of memory
> > ranges with conflicting attributes) that we've lived with for far
> > too long, but it really depends on how everyone sees the risk
> > for regressions here. If something breaks in unfixable ways before
> > the 3.2 release, we can always revert the patches and have another
> > try later.
> >
> > It's also not clear how we should merge it. Ideally the first bunch
> > would go through linux-mm, and the architecture specific patches
> > through the respective architecture trees, but there is an obvious
> > inderdependency between these sets.
> >
> > Russell, Andrew, are you both comfortable with putting the entire
> > set into linux-mm to solve this? Do you see this as 3.2 or rather
> > as 3.3 material?
> >
> 
> Russell's going to hate me, but...
> 
> I do know that he had substantial objections to at least earlier
> versions of this, and he is a guy who knows of what he speaks.

I've did my best to fix these issues. I'm still waiting for comments...

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-10 12:07   ` Maxime Coquelin
  (?)
@ 2011-10-11  7:17     ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11  7:17 UTC (permalink / raw)
  To: 'Maxime Coquelin'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	benjamin.gaignard, 'frq09524',
	vincent.guittot


Hello,

On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:

> On 10/06/2011 03:54 PM, Marek Szyprowski wrote:
> > Welcome everyone again,
> >
> > Once again I decided to post an updated version of the Contiguous Memory
> > Allocator patches.
> >
> > This version provides mainly a bugfix for a very rare issue that might
> > have changed migration type of the CMA page blocks resulting in dropping
> > CMA features from the affected page block and causing memory allocation
> > to fail. Also the issue reported by Dave Hansen has been fixed.
> >
> > This version also introduces basic support for x86 architecture, what
> > allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> > hope this will result in wider testing, comments and easier merging to
> > mainline.
> >
> > I've also dropped an examplary patch for s5p-fimc platform device
> > private memory declaration and added the one from real life. CMA device
> > private memory regions are defined for s5p-mfc device to let it allocate
> > buffers from two memory banks.
> >
> > ARM integration code has not been changed since last version, it
> > provides implementation of all the ideas that has been discussed during
> 
> Hello Marek,
> 
>      We are currently testing CMA (v16) on Snowball platform.
>      This feature is very promising, thanks for pushing it!
> 
>      During our stress tests, we encountered some problems :
> 
>      1) Contiguous allocation lockup:
>          When system RAM is full of Anon pages, if we try to allocate a
> contiguous buffer greater than the min_free value, we face a
> dma_alloc_from_contiguous lockup.
>          The expected result would be dma_alloc_from_contiguous() to fail.
>          The problem is reproduced systematically on our side.

Thanks for the report. Do you use Android's lowmemorykiller? I haven't 
tested CMA on Android kernel yet. I have no idea how it will interfere 
with Android patches.

> 
>      2) Contiguous allocation fail:
>          We have developed a small driver and a shell script to
> allocate/release contiguous buffers.
>          Sometimes, dma_alloc_from_contiguous() fails to allocate the
> contiguous buffer (about once every 30 runs).
>          We have 270MB Memory passed to the kernel in our configuration,
> and the CMA pool is 90MB large.
>          In this setup, the overall memory is either free or full of
> reclaimable pages.

Yeah. We also did such stress tests recently and faced this issue. I've
spent some time investigating it but I have no solution yet. 

The problem is caused by a page, which is put in the CMA area. This page 
is movable, but it's address space provides no 'migratepage' method. In
such case mm subsystem uses fallback_migrate_page() function. Sadly this
function only returns -EAGAIN. The migration loops a few times over it
and fails causing the fail in the allocation procedure.

We are investing now which kernel code created/allocated such problematic
pages and how to add real migration support for them.

>      For now, we didn't had time to investigate further theses problems.
>      Have you already faced this kind of issues?
>      Could someone testing CMA on other boards confirm/infirm theses
> problems?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11  7:17     ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11  7:17 UTC (permalink / raw)
  To: 'Maxime Coquelin'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	benjamin.gaignard, 'frq09524',
	vincent.guittot


Hello,

On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:

> On 10/06/2011 03:54 PM, Marek Szyprowski wrote:
> > Welcome everyone again,
> >
> > Once again I decided to post an updated version of the Contiguous Memory
> > Allocator patches.
> >
> > This version provides mainly a bugfix for a very rare issue that might
> > have changed migration type of the CMA page blocks resulting in dropping
> > CMA features from the affected page block and causing memory allocation
> > to fail. Also the issue reported by Dave Hansen has been fixed.
> >
> > This version also introduces basic support for x86 architecture, what
> > allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> > hope this will result in wider testing, comments and easier merging to
> > mainline.
> >
> > I've also dropped an examplary patch for s5p-fimc platform device
> > private memory declaration and added the one from real life. CMA device
> > private memory regions are defined for s5p-mfc device to let it allocate
> > buffers from two memory banks.
> >
> > ARM integration code has not been changed since last version, it
> > provides implementation of all the ideas that has been discussed during
> 
> Hello Marek,
> 
>      We are currently testing CMA (v16) on Snowball platform.
>      This feature is very promising, thanks for pushing it!
> 
>      During our stress tests, we encountered some problems :
> 
>      1) Contiguous allocation lockup:
>          When system RAM is full of Anon pages, if we try to allocate a
> contiguous buffer greater than the min_free value, we face a
> dma_alloc_from_contiguous lockup.
>          The expected result would be dma_alloc_from_contiguous() to fail.
>          The problem is reproduced systematically on our side.

Thanks for the report. Do you use Android's lowmemorykiller? I haven't 
tested CMA on Android kernel yet. I have no idea how it will interfere 
with Android patches.

> 
>      2) Contiguous allocation fail:
>          We have developed a small driver and a shell script to
> allocate/release contiguous buffers.
>          Sometimes, dma_alloc_from_contiguous() fails to allocate the
> contiguous buffer (about once every 30 runs).
>          We have 270MB Memory passed to the kernel in our configuration,
> and the CMA pool is 90MB large.
>          In this setup, the overall memory is either free or full of
> reclaimable pages.

Yeah. We also did such stress tests recently and faced this issue. I've
spent some time investigating it but I have no solution yet. 

The problem is caused by a page, which is put in the CMA area. This page 
is movable, but it's address space provides no 'migratepage' method. In
such case mm subsystem uses fallback_migrate_page() function. Sadly this
function only returns -EAGAIN. The migration loops a few times over it
and fails causing the fail in the allocation procedure.

We are investing now which kernel code created/allocated such problematic
pages and how to add real migration support for them.

>      For now, we didn't had time to investigate further theses problems.
>      Have you already faced this kind of issues?
>      Could someone testing CMA on other boards confirm/infirm theses
> problems?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11  7:17     ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11  7:17 UTC (permalink / raw)
  To: linux-arm-kernel


Hello,

On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:

> On 10/06/2011 03:54 PM, Marek Szyprowski wrote:
> > Welcome everyone again,
> >
> > Once again I decided to post an updated version of the Contiguous Memory
> > Allocator patches.
> >
> > This version provides mainly a bugfix for a very rare issue that might
> > have changed migration type of the CMA page blocks resulting in dropping
> > CMA features from the affected page block and causing memory allocation
> > to fail. Also the issue reported by Dave Hansen has been fixed.
> >
> > This version also introduces basic support for x86 architecture, what
> > allows wide testing on KVM/QEMU emulators and all common x86 boxes. I
> > hope this will result in wider testing, comments and easier merging to
> > mainline.
> >
> > I've also dropped an examplary patch for s5p-fimc platform device
> > private memory declaration and added the one from real life. CMA device
> > private memory regions are defined for s5p-mfc device to let it allocate
> > buffers from two memory banks.
> >
> > ARM integration code has not been changed since last version, it
> > provides implementation of all the ideas that has been discussed during
> 
> Hello Marek,
> 
>      We are currently testing CMA (v16) on Snowball platform.
>      This feature is very promising, thanks for pushing it!
> 
>      During our stress tests, we encountered some problems :
> 
>      1) Contiguous allocation lockup:
>          When system RAM is full of Anon pages, if we try to allocate a
> contiguous buffer greater than the min_free value, we face a
> dma_alloc_from_contiguous lockup.
>          The expected result would be dma_alloc_from_contiguous() to fail.
>          The problem is reproduced systematically on our side.

Thanks for the report. Do you use Android's lowmemorykiller? I haven't 
tested CMA on Android kernel yet. I have no idea how it will interfere 
with Android patches.

> 
>      2) Contiguous allocation fail:
>          We have developed a small driver and a shell script to
> allocate/release contiguous buffers.
>          Sometimes, dma_alloc_from_contiguous() fails to allocate the
> contiguous buffer (about once every 30 runs).
>          We have 270MB Memory passed to the kernel in our configuration,
> and the CMA pool is 90MB large.
>          In this setup, the overall memory is either free or full of
> reclaimable pages.

Yeah. We also did such stress tests recently and faced this issue. I've
spent some time investigating it but I have no solution yet. 

The problem is caused by a page, which is put in the CMA area. This page 
is movable, but it's address space provides no 'migratepage' method. In
such case mm subsystem uses fallback_migrate_page() function. Sadly this
function only returns -EAGAIN. The migration loops a few times over it
and fails causing the fail in the allocation procedure.

We are investing now which kernel code created/allocated such problematic
pages and how to add real migration support for them.

>      For now, we didn't had time to investigate further theses problems.
>      Have you already faced this kind of issues?
>      Could someone testing CMA on other boards confirm/infirm theses
> problems?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-11  7:17     ` Marek Szyprowski
@ 2011-10-11  7:30       ` Maxime Coquelin
  -1 siblings, 0 replies; 180+ messages in thread
From: Maxime Coquelin @ 2011-10-11  7:30 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	benjamin.gaignard, Ludovic BARRE, vincent.guittot

On 10/11/2011 09:17 AM, Marek Szyprowski wrote:
> On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:
>
>       During our stress tests, we encountered some problems :
>
>       1) Contiguous allocation lockup:
>           When system RAM is full of Anon pages, if we try to allocate a
> contiguous buffer greater than the min_free value, we face a
> dma_alloc_from_contiguous lockup.
>           The expected result would be dma_alloc_from_contiguous() to fail.
>           The problem is reproduced systematically on our side.
> Thanks for the report. Do you use Android's lowmemorykiller? I haven't
> tested CMA on Android kernel yet. I have no idea how it will interfere
> with Android patches.
>

The software used for this test (v16) is a generic 3.0 Kernel and a 
minimal filesystem using Busybox.

With v15 patchset, I also tested it with Android.
IIRC, sometimes the lowmemorykiller succeed to get free space and the 
contiguous allocation succeed, sometimes we faced  the lockup.

>>       2) Contiguous allocation fail:
>>           We have developed a small driver and a shell script to
>> allocate/release contiguous buffers.
>>           Sometimes, dma_alloc_from_contiguous() fails to allocate the
>> contiguous buffer (about once every 30 runs).
>>           We have 270MB Memory passed to the kernel in our configuration,
>> and the CMA pool is 90MB large.
>>           In this setup, the overall memory is either free or full of
>> reclaimable pages.
> Yeah. We also did such stress tests recently and faced this issue. I've
> spent some time investigating it but I have no solution yet.
>
> The problem is caused by a page, which is put in the CMA area. This page
> is movable, but it's address space provides no 'migratepage' method. In
> such case mm subsystem uses fallback_migrate_page() function. Sadly this
> function only returns -EAGAIN. The migration loops a few times over it
> and fails causing the fail in the allocation procedure.
>
> We are investing now which kernel code created/allocated such problematic
> pages and how to add real migration support for them.
>

Ok, thanks for pointing this out.

Regards,
Maxime


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11  7:30       ` Maxime Coquelin
  0 siblings, 0 replies; 180+ messages in thread
From: Maxime Coquelin @ 2011-10-11  7:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/11/2011 09:17 AM, Marek Szyprowski wrote:
> On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:
>
>       During our stress tests, we encountered some problems :
>
>       1) Contiguous allocation lockup:
>           When system RAM is full of Anon pages, if we try to allocate a
> contiguous buffer greater than the min_free value, we face a
> dma_alloc_from_contiguous lockup.
>           The expected result would be dma_alloc_from_contiguous() to fail.
>           The problem is reproduced systematically on our side.
> Thanks for the report. Do you use Android's lowmemorykiller? I haven't
> tested CMA on Android kernel yet. I have no idea how it will interfere
> with Android patches.
>

The software used for this test (v16) is a generic 3.0 Kernel and a 
minimal filesystem using Busybox.

With v15 patchset, I also tested it with Android.
IIRC, sometimes the lowmemorykiller succeed to get free space and the 
contiguous allocation succeed, sometimes we faced  the lockup.

>>       2) Contiguous allocation fail:
>>           We have developed a small driver and a shell script to
>> allocate/release contiguous buffers.
>>           Sometimes, dma_alloc_from_contiguous() fails to allocate the
>> contiguous buffer (about once every 30 runs).
>>           We have 270MB Memory passed to the kernel in our configuration,
>> and the CMA pool is 90MB large.
>>           In this setup, the overall memory is either free or full of
>> reclaimable pages.
> Yeah. We also did such stress tests recently and faced this issue. I've
> spent some time investigating it but I have no solution yet.
>
> The problem is caused by a page, which is put in the CMA area. This page
> is movable, but it's address space provides no 'migratepage' method. In
> such case mm subsystem uses fallback_migrate_page() function. Sadly this
> function only returns -EAGAIN. The migration loops a few times over it
> and fails causing the fail in the allocation procedure.
>
> We are investing now which kernel code created/allocated such problematic
> pages and how to add real migration support for them.
>

Ok, thanks for pointing this out.

Regards,
Maxime

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-11  7:30       ` Maxime Coquelin
  (?)
@ 2011-10-11 10:50         ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11 10:50 UTC (permalink / raw)
  To: 'Maxime Coquelin'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	benjamin.gaignard, 'Ludovic BARRE',
	vincent.guittot

Hello,

On Tuesday, October 11, 2011 9:30 AM Maxime Coquelin wrote:

> On 10/11/2011 09:17 AM, Marek Szyprowski wrote:
> > On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:
> >
> >       During our stress tests, we encountered some problems :
> >
> >       1) Contiguous allocation lockup:
> >           When system RAM is full of Anon pages, if we try to allocate a
> > contiguous buffer greater than the min_free value, we face a
> > dma_alloc_from_contiguous lockup.
> >           The expected result would be dma_alloc_from_contiguous() to fail.
> >           The problem is reproduced systematically on our side.
> > Thanks for the report. Do you use Android's lowmemorykiller? I haven't
> > tested CMA on Android kernel yet. I have no idea how it will interfere
> > with Android patches.
> >
> 
> The software used for this test (v16) is a generic 3.0 Kernel and a
> minimal filesystem using Busybox.

I'm really surprised. Could you elaborate a bit how to trigger this issue?
I've did several tests and I never get a lockup. Allocation failed from time
to time though.

> With v15 patchset, I also tested it with Android.
> IIRC, sometimes the lowmemorykiller succeed to get free space and the
> contiguous allocation succeed, sometimes we faced  the lockup.
> 
> >>       2) Contiguous allocation fail:
> >>           We have developed a small driver and a shell script to
> >> allocate/release contiguous buffers.
> >>           Sometimes, dma_alloc_from_contiguous() fails to allocate the
> >> contiguous buffer (about once every 30 runs).
> >>           We have 270MB Memory passed to the kernel in our configuration,
> >> and the CMA pool is 90MB large.
> >>           In this setup, the overall memory is either free or full of
> >> reclaimable pages.
> > Yeah. We also did such stress tests recently and faced this issue. I've
> > spent some time investigating it but I have no solution yet.
> >
> > The problem is caused by a page, which is put in the CMA area. This page
> > is movable, but it's address space provides no 'migratepage' method. In
> > such case mm subsystem uses fallback_migrate_page() function. Sadly this
> > function only returns -EAGAIN. The migration loops a few times over it
> > and fails causing the fail in the allocation procedure.
> >
> > We are investing now which kernel code created/allocated such problematic

s/investing/investigating

> > pages and how to add real migration support for them.
> >
> 
> Ok, thanks for pointing this out.

We found this issue very recently. I'm still surprised that we did not notice 
it during system testing.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11 10:50         ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11 10:50 UTC (permalink / raw)
  To: 'Maxime Coquelin'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	benjamin.gaignard, 'Ludovic BARRE',
	vincent.guittot

Hello,

On Tuesday, October 11, 2011 9:30 AM Maxime Coquelin wrote:

> On 10/11/2011 09:17 AM, Marek Szyprowski wrote:
> > On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:
> >
> >       During our stress tests, we encountered some problems :
> >
> >       1) Contiguous allocation lockup:
> >           When system RAM is full of Anon pages, if we try to allocate a
> > contiguous buffer greater than the min_free value, we face a
> > dma_alloc_from_contiguous lockup.
> >           The expected result would be dma_alloc_from_contiguous() to fail.
> >           The problem is reproduced systematically on our side.
> > Thanks for the report. Do you use Android's lowmemorykiller? I haven't
> > tested CMA on Android kernel yet. I have no idea how it will interfere
> > with Android patches.
> >
> 
> The software used for this test (v16) is a generic 3.0 Kernel and a
> minimal filesystem using Busybox.

I'm really surprised. Could you elaborate a bit how to trigger this issue?
I've did several tests and I never get a lockup. Allocation failed from time
to time though.

> With v15 patchset, I also tested it with Android.
> IIRC, sometimes the lowmemorykiller succeed to get free space and the
> contiguous allocation succeed, sometimes we faced  the lockup.
> 
> >>       2) Contiguous allocation fail:
> >>           We have developed a small driver and a shell script to
> >> allocate/release contiguous buffers.
> >>           Sometimes, dma_alloc_from_contiguous() fails to allocate the
> >> contiguous buffer (about once every 30 runs).
> >>           We have 270MB Memory passed to the kernel in our configuration,
> >> and the CMA pool is 90MB large.
> >>           In this setup, the overall memory is either free or full of
> >> reclaimable pages.
> > Yeah. We also did such stress tests recently and faced this issue. I've
> > spent some time investigating it but I have no solution yet.
> >
> > The problem is caused by a page, which is put in the CMA area. This page
> > is movable, but it's address space provides no 'migratepage' method. In
> > such case mm subsystem uses fallback_migrate_page() function. Sadly this
> > function only returns -EAGAIN. The migration loops a few times over it
> > and fails causing the fail in the allocation procedure.
> >
> > We are investing now which kernel code created/allocated such problematic

s/investing/investigating

> > pages and how to add real migration support for them.
> >
> 
> Ok, thanks for pointing this out.

We found this issue very recently. I'm still surprised that we did not notice 
it during system testing.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11 10:50         ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11 10:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Tuesday, October 11, 2011 9:30 AM Maxime Coquelin wrote:

> On 10/11/2011 09:17 AM, Marek Szyprowski wrote:
> > On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:
> >
> >       During our stress tests, we encountered some problems :
> >
> >       1) Contiguous allocation lockup:
> >           When system RAM is full of Anon pages, if we try to allocate a
> > contiguous buffer greater than the min_free value, we face a
> > dma_alloc_from_contiguous lockup.
> >           The expected result would be dma_alloc_from_contiguous() to fail.
> >           The problem is reproduced systematically on our side.
> > Thanks for the report. Do you use Android's lowmemorykiller? I haven't
> > tested CMA on Android kernel yet. I have no idea how it will interfere
> > with Android patches.
> >
> 
> The software used for this test (v16) is a generic 3.0 Kernel and a
> minimal filesystem using Busybox.

I'm really surprised. Could you elaborate a bit how to trigger this issue?
I've did several tests and I never get a lockup. Allocation failed from time
to time though.

> With v15 patchset, I also tested it with Android.
> IIRC, sometimes the lowmemorykiller succeed to get free space and the
> contiguous allocation succeed, sometimes we faced  the lockup.
> 
> >>       2) Contiguous allocation fail:
> >>           We have developed a small driver and a shell script to
> >> allocate/release contiguous buffers.
> >>           Sometimes, dma_alloc_from_contiguous() fails to allocate the
> >> contiguous buffer (about once every 30 runs).
> >>           We have 270MB Memory passed to the kernel in our configuration,
> >> and the CMA pool is 90MB large.
> >>           In this setup, the overall memory is either free or full of
> >> reclaimable pages.
> > Yeah. We also did such stress tests recently and faced this issue. I've
> > spent some time investigating it but I have no solution yet.
> >
> > The problem is caused by a page, which is put in the CMA area. This page
> > is movable, but it's address space provides no 'migratepage' method. In
> > such case mm subsystem uses fallback_migrate_page() function. Sadly this
> > function only returns -EAGAIN. The migration loops a few times over it
> > and fails causing the fail in the allocation procedure.
> >
> > We are investing now which kernel code created/allocated such problematic

s/investing/investigating

> > pages and how to add real migration support for them.
> >
> 
> Ok, thanks for pointing this out.

We found this issue very recently. I'm still surprised that we did not notice 
it during system testing.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-11 10:50         ` Marek Szyprowski
@ 2011-10-11 11:25           ` Maxime Coquelin
  -1 siblings, 0 replies; 180+ messages in thread
From: Maxime Coquelin @ 2011-10-11 11:25 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	benjamin.gaignard, Ludovic BARRE, vincent.guittot

On 10/11/2011 12:50 PM, Marek Szyprowski wrote:
> Hello,
>
> On Tuesday, October 11, 2011 9:30 AM Maxime Coquelin wrote:
>
>> On 10/11/2011 09:17 AM, Marek Szyprowski wrote:
>>> On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:
>>>
>>>        During our stress tests, we encountered some problems :
>>>
>>>        1) Contiguous allocation lockup:
>>>            When system RAM is full of Anon pages, if we try to allocate a
>>> contiguous buffer greater than the min_free value, we face a
>>> dma_alloc_from_contiguous lockup.
>>>            The expected result would be dma_alloc_from_contiguous() to fail.
>>>            The problem is reproduced systematically on our side.
>>> Thanks for the report. Do you use Android's lowmemorykiller? I haven't
>>> tested CMA on Android kernel yet. I have no idea how it will interfere
>>> with Android patches.
>>>
>> The software used for this test (v16) is a generic 3.0 Kernel and a
>> minimal filesystem using Busybox.
> I'm really surprised. Could you elaborate a bit how to trigger this issue?

At system startup, I drop caches (sync && echo 3 > 
/proc/sys/vm/drop_caches) and check how much memory is free.
For example, in my case, only 15MB is used on the 270MB available on the 
system, so I got 255MB of free memory. Note that the min_free is 4MB in 
my case.
In userspace, I allocate 230MB using malloc(), the free memory is now 25MB.
Finaly, I ask for a contiguous allocation of 64MB using CMA, the result 
is a lockup in dma_alloc_from_contiguous().

> I've did several tests and I never get a lockup. Allocation failed from time
> to time though.
When it succeed, what is the behaviour on your side? Is the OOM triggered?

Regards,
Maxime

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11 11:25           ` Maxime Coquelin
  0 siblings, 0 replies; 180+ messages in thread
From: Maxime Coquelin @ 2011-10-11 11:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/11/2011 12:50 PM, Marek Szyprowski wrote:
> Hello,
>
> On Tuesday, October 11, 2011 9:30 AM Maxime Coquelin wrote:
>
>> On 10/11/2011 09:17 AM, Marek Szyprowski wrote:
>>> On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:
>>>
>>>        During our stress tests, we encountered some problems :
>>>
>>>        1) Contiguous allocation lockup:
>>>            When system RAM is full of Anon pages, if we try to allocate a
>>> contiguous buffer greater than the min_free value, we face a
>>> dma_alloc_from_contiguous lockup.
>>>            The expected result would be dma_alloc_from_contiguous() to fail.
>>>            The problem is reproduced systematically on our side.
>>> Thanks for the report. Do you use Android's lowmemorykiller? I haven't
>>> tested CMA on Android kernel yet. I have no idea how it will interfere
>>> with Android patches.
>>>
>> The software used for this test (v16) is a generic 3.0 Kernel and a
>> minimal filesystem using Busybox.
> I'm really surprised. Could you elaborate a bit how to trigger this issue?

At system startup, I drop caches (sync && echo 3 > 
/proc/sys/vm/drop_caches) and check how much memory is free.
For example, in my case, only 15MB is used on the 270MB available on the 
system, so I got 255MB of free memory. Note that the min_free is 4MB in 
my case.
In userspace, I allocate 230MB using malloc(), the free memory is now 25MB.
Finaly, I ask for a contiguous allocation of 64MB using CMA, the result 
is a lockup in dma_alloc_from_contiguous().

> I've did several tests and I never get a lockup. Allocation failed from time
> to time though.
When it succeed, what is the behaviour on your side? Is the OOM triggered?

Regards,
Maxime

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-11 11:25           ` Maxime Coquelin
  (?)
@ 2011-10-11 13:05             ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11 13:05 UTC (permalink / raw)
  To: 'Maxime Coquelin'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	benjamin.gaignard, 'Ludovic BARRE',
	vincent.guittot

Hello,

On Tuesday, October 11, 2011 1:26 PM Maxime Coquelin wrote:

> On 10/11/2011 12:50 PM, Marek Szyprowski wrote:
> > Hello,
> >
> > On Tuesday, October 11, 2011 9:30 AM Maxime Coquelin wrote:
> >
> >> On 10/11/2011 09:17 AM, Marek Szyprowski wrote:
> >>> On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:
> >>>
> >>>        During our stress tests, we encountered some problems :
> >>>
> >>>        1) Contiguous allocation lockup:
> >>>            When system RAM is full of Anon pages, if we try to allocate a
> >>> contiguous buffer greater than the min_free value, we face a
> >>> dma_alloc_from_contiguous lockup.
> >>>            The expected result would be dma_alloc_from_contiguous() to fail.
> >>>            The problem is reproduced systematically on our side.
> >>> Thanks for the report. Do you use Android's lowmemorykiller? I haven't
> >>> tested CMA on Android kernel yet. I have no idea how it will interfere
> >>> with Android patches.
> >>>
> >> The software used for this test (v16) is a generic 3.0 Kernel and a
> >> minimal filesystem using Busybox.
> > I'm really surprised. Could you elaborate a bit how to trigger this issue?
> 
> At system startup, I drop caches (sync && echo 3 >
> /proc/sys/vm/drop_caches) and check how much memory is free.
> For example, in my case, only 15MB is used on the 270MB available on the
> system, so I got 255MB of free memory. Note that the min_free is 4MB in
> my case.
> In userspace, I allocate 230MB using malloc(), the free memory is now 25MB.
> Finaly, I ask for a contiguous allocation of 64MB using CMA, the result
> is a lockup in dma_alloc_from_contiguous().

Thanks for hint. I've managed to reproduce this issue. I will post fix asap.

> > I've did several tests and I never get a lockup. Allocation failed from time
> > to time though.
> When it succeed, what is the behaviour on your side? Is the OOM triggered?

OOM was never triggered.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11 13:05             ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11 13:05 UTC (permalink / raw)
  To: 'Maxime Coquelin'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	benjamin.gaignard, 'Ludovic BARRE',
	vincent.guittot

Hello,

On Tuesday, October 11, 2011 1:26 PM Maxime Coquelin wrote:

> On 10/11/2011 12:50 PM, Marek Szyprowski wrote:
> > Hello,
> >
> > On Tuesday, October 11, 2011 9:30 AM Maxime Coquelin wrote:
> >
> >> On 10/11/2011 09:17 AM, Marek Szyprowski wrote:
> >>> On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:
> >>>
> >>>        During our stress tests, we encountered some problems :
> >>>
> >>>        1) Contiguous allocation lockup:
> >>>            When system RAM is full of Anon pages, if we try to allocate a
> >>> contiguous buffer greater than the min_free value, we face a
> >>> dma_alloc_from_contiguous lockup.
> >>>            The expected result would be dma_alloc_from_contiguous() to fail.
> >>>            The problem is reproduced systematically on our side.
> >>> Thanks for the report. Do you use Android's lowmemorykiller? I haven't
> >>> tested CMA on Android kernel yet. I have no idea how it will interfere
> >>> with Android patches.
> >>>
> >> The software used for this test (v16) is a generic 3.0 Kernel and a
> >> minimal filesystem using Busybox.
> > I'm really surprised. Could you elaborate a bit how to trigger this issue?
> 
> At system startup, I drop caches (sync && echo 3 >
> /proc/sys/vm/drop_caches) and check how much memory is free.
> For example, in my case, only 15MB is used on the 270MB available on the
> system, so I got 255MB of free memory. Note that the min_free is 4MB in
> my case.
> In userspace, I allocate 230MB using malloc(), the free memory is now 25MB.
> Finaly, I ask for a contiguous allocation of 64MB using CMA, the result
> is a lockup in dma_alloc_from_contiguous().

Thanks for hint. I've managed to reproduce this issue. I will post fix asap.

> > I've did several tests and I never get a lockup. Allocation failed from time
> > to time though.
> When it succeed, what is the behaviour on your side? Is the OOM triggered?

OOM was never triggered.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11 13:05             ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-11 13:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Tuesday, October 11, 2011 1:26 PM Maxime Coquelin wrote:

> On 10/11/2011 12:50 PM, Marek Szyprowski wrote:
> > Hello,
> >
> > On Tuesday, October 11, 2011 9:30 AM Maxime Coquelin wrote:
> >
> >> On 10/11/2011 09:17 AM, Marek Szyprowski wrote:
> >>> On Monday, October 10, 2011 2:08 PM Maxime Coquelin wrote:
> >>>
> >>>        During our stress tests, we encountered some problems :
> >>>
> >>>        1) Contiguous allocation lockup:
> >>>            When system RAM is full of Anon pages, if we try to allocate a
> >>> contiguous buffer greater than the min_free value, we face a
> >>> dma_alloc_from_contiguous lockup.
> >>>            The expected result would be dma_alloc_from_contiguous() to fail.
> >>>            The problem is reproduced systematically on our side.
> >>> Thanks for the report. Do you use Android's lowmemorykiller? I haven't
> >>> tested CMA on Android kernel yet. I have no idea how it will interfere
> >>> with Android patches.
> >>>
> >> The software used for this test (v16) is a generic 3.0 Kernel and a
> >> minimal filesystem using Busybox.
> > I'm really surprised. Could you elaborate a bit how to trigger this issue?
> 
> At system startup, I drop caches (sync && echo 3 >
> /proc/sys/vm/drop_caches) and check how much memory is free.
> For example, in my case, only 15MB is used on the 270MB available on the
> system, so I got 255MB of free memory. Note that the min_free is 4MB in
> my case.
> In userspace, I allocate 230MB using malloc(), the free memory is now 25MB.
> Finaly, I ask for a contiguous allocation of 64MB using CMA, the result
> is a lockup in dma_alloc_from_contiguous().

Thanks for hint. I've managed to reproduce this issue. I will post fix asap.

> > I've did several tests and I never get a lockup. Allocation failed from time
> > to time though.
> When it succeed, what is the behaviour on your side? Is the OOM triggered?

OOM was never triggered.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-10 22:56     ` Andrew Morton
  (?)
@ 2011-10-11 13:52       ` Arnd Bergmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Arnd Bergmann @ 2011-10-11 13:52 UTC (permalink / raw)
  To: Andrew Morton, Paul McKenney
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Russell King, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Tuesday 11 October 2011, Andrew Morton wrote:
> Russell's going to hate me, but...
> 
> I do know that he had substantial objections to at least earlier
> versions of this, and he is a guy who knows of what he speaks.
> 
> So I would want to get a nod from rmk on this work before proceeding. 
> If that nod isn't available then let's please identify the issues and
> see what we can do about them.

I'm pretty sure that Russell's concerns were almost entirely about the
ARM specific parts, which were extremely hard to figure out. The
most important technical concern back in July was that the patch
series at the time did not address the problem of conflicting pte
flags when we remap memory as uncached on ARMv6. He had a patch
to address this problem that was supposed to get merged in 3.1
and would have conflicted with the CMA patch set.

Things have changed a lot since then. Russell had to revert his
own patch because he found regressions using it on older machines.
However, the current CMA on ARM patch AFAICT reliably fixes this
problem now and does not cause the same regression on older machines.
The solution used now is the one we agreed on after sitting together
for a few hours with Russell, Marek, Paul McKenney and myself.

If there are still concerns over the ARM specific portion of
the patch series, I'm very confident that we can resolve these
now (I was much less so before that meeting).

What I would really want to hear from you is your opinion on
the architecture independent stuff. Obviously, ARM is the
most important consumer of the patch set, but I think the
code has its merit on other architectures as well and most of
them (maybe not parisc) should be about as simple as the x86
one that Marek posted now with v16.

	Arnd

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11 13:52       ` Arnd Bergmann
  0 siblings, 0 replies; 180+ messages in thread
From: Arnd Bergmann @ 2011-10-11 13:52 UTC (permalink / raw)
  To: Andrew Morton, Paul McKenney
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Russell King, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Tuesday 11 October 2011, Andrew Morton wrote:
> Russell's going to hate me, but...
> 
> I do know that he had substantial objections to at least earlier
> versions of this, and he is a guy who knows of what he speaks.
> 
> So I would want to get a nod from rmk on this work before proceeding. 
> If that nod isn't available then let's please identify the issues and
> see what we can do about them.

I'm pretty sure that Russell's concerns were almost entirely about the
ARM specific parts, which were extremely hard to figure out. The
most important technical concern back in July was that the patch
series at the time did not address the problem of conflicting pte
flags when we remap memory as uncached on ARMv6. He had a patch
to address this problem that was supposed to get merged in 3.1
and would have conflicted with the CMA patch set.

Things have changed a lot since then. Russell had to revert his
own patch because he found regressions using it on older machines.
However, the current CMA on ARM patch AFAICT reliably fixes this
problem now and does not cause the same regression on older machines.
The solution used now is the one we agreed on after sitting together
for a few hours with Russell, Marek, Paul McKenney and myself.

If there are still concerns over the ARM specific portion of
the patch series, I'm very confident that we can resolve these
now (I was much less so before that meeting).

What I would really want to hear from you is your opinion on
the architecture independent stuff. Obviously, ARM is the
most important consumer of the patch set, but I think the
code has its merit on other architectures as well and most of
them (maybe not parisc) should be about as simple as the x86
one that Marek posted now with v16.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-11 13:52       ` Arnd Bergmann
  0 siblings, 0 replies; 180+ messages in thread
From: Arnd Bergmann @ 2011-10-11 13:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 11 October 2011, Andrew Morton wrote:
> Russell's going to hate me, but...
> 
> I do know that he had substantial objections to at least earlier
> versions of this, and he is a guy who knows of what he speaks.
> 
> So I would want to get a nod from rmk on this work before proceeding. 
> If that nod isn't available then let's please identify the issues and
> see what we can do about them.

I'm pretty sure that Russell's concerns were almost entirely about the
ARM specific parts, which were extremely hard to figure out. The
most important technical concern back in July was that the patch
series at the time did not address the problem of conflicting pte
flags when we remap memory as uncached on ARMv6. He had a patch
to address this problem that was supposed to get merged in 3.1
and would have conflicted with the CMA patch set.

Things have changed a lot since then. Russell had to revert his
own patch because he found regressions using it on older machines.
However, the current CMA on ARM patch AFAICT reliably fixes this
problem now and does not cause the same regression on older machines.
The solution used now is the one we agreed on after sitting together
for a few hours with Russell, Marek, Paul McKenney and myself.

If there are still concerns over the ARM specific portion of
the patch series, I'm very confident that we can resolve these
now (I was much less so before that meeting).

What I would really want to hear from you is your opinion on
the architecture independent stuff. Obviously, ARM is the
most important consumer of the patch set, but I think the
code has its merit on other architectures as well and most of
them (maybe not parisc) should be about as simple as the x86
one that Marek posted now with v16.

	Arnd

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH] fixup: mm: alloc_contig_range: increase min_free_kbytes during allocation
  2011-10-11  7:30       ` Maxime Coquelin
@ 2011-10-12 11:08         ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-12 11:08 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: 'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org',
	'linux-media@vger.kernel.org',
	'linux-mm@kvack.org',
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	benjamin.gaignard, Ludovic BARRE, vincent.guittot,
	Marek Szyprowski

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 mm/page_alloc.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)

Hello Maxime,

Please check if this patch fixes your lockup issue. It is a bit cruel,
but it looks that in case of real low-memory situation page allocation
is very complex task which usually ends in waiting for the io/fs and
free pages that really don't arrive at all.

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center



diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 055aa4c..45473e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5872,6 +5872,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
+	unsigned int count = end - start;
 	int ret;
 
 	/*
@@ -5900,7 +5901,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	ret = __start_isolate_page_range(pfn_to_maxpage(start),
 					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
-		goto done;
+		return ret;
+
+	min_free_kbytes += count * PAGE_SIZE / 1024;
+	setup_per_zone_wmarks();
 
 	ret = __alloc_contig_migrate_range(start, end);
 	if (ret)
@@ -5922,8 +5926,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
-		if (WARN_ON(++ret >= MAX_ORDER))
-			return -EINVAL;
+		if (WARN_ON(++ret >= MAX_ORDER)) {
+			ret = -EINVAL;
+			goto done;
+		}
 
 	outer_start = start & (~0UL << ret);
 	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
@@ -5936,6 +5942,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
+	min_free_kbytes -= count * PAGE_SIZE / 1024;
+	setup_per_zone_wmarks();
+
 	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
 				  migratetype);
 	return ret;
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH] fixup: mm: alloc_contig_range: increase min_free_kbytes during allocation
@ 2011-10-12 11:08         ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-12 11:08 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 mm/page_alloc.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)

Hello Maxime,

Please check if this patch fixes your lockup issue. It is a bit cruel,
but it looks that in case of real low-memory situation page allocation
is very complex task which usually ends in waiting for the io/fs and
free pages that really don't arrive at all.

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center



diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 055aa4c..45473e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5872,6 +5872,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
+	unsigned int count = end - start;
 	int ret;
 
 	/*
@@ -5900,7 +5901,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	ret = __start_isolate_page_range(pfn_to_maxpage(start),
 					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
-		goto done;
+		return ret;
+
+	min_free_kbytes += count * PAGE_SIZE / 1024;
+	setup_per_zone_wmarks();
 
 	ret = __alloc_contig_migrate_range(start, end);
 	if (ret)
@@ -5922,8 +5926,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
-		if (WARN_ON(++ret >= MAX_ORDER))
-			return -EINVAL;
+		if (WARN_ON(++ret >= MAX_ORDER)) {
+			ret = -EINVAL;
+			goto done;
+		}
 
 	outer_start = start & (~0UL << ret);
 	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
@@ -5936,6 +5942,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
+	min_free_kbytes -= count * PAGE_SIZE / 1024;
+	setup_per_zone_wmarks();
+
 	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
 				  migratetype);
 	return ret;
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH] fixup: mm: alloc_contig_range: increase min_free_kbytes during allocation
  2011-10-12 11:08         ` Marek Szyprowski
@ 2011-10-12 13:01           ` Maxime Coquelin
  -1 siblings, 0 replies; 180+ messages in thread
From: Maxime Coquelin @ 2011-10-12 13:01 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org',
	'linux-media@vger.kernel.org',
	'linux-mm@kvack.org',
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	benjamin.gaignard, Ludovic BARRE, vincent.guittot

Hello Marek,

On 10/12/2011 01:08 PM, Marek Szyprowski wrote:
> Signed-off-by: Marek Szyprowski<m.szyprowski@samsung.com>
> ---
>   mm/page_alloc.c |   15 ++++++++++++---
>   1 files changed, 12 insertions(+), 3 deletions(-)
>
> Hello Maxime,
>
> Please check if this patch fixes your lockup issue. It is a bit cruel,
> but it looks that in case of real low-memory situation page allocation
> is very complex task which usually ends in waiting for the io/fs and
> free pages that really don't arrive at all.
Thanks for the reactivity.
We just tested it, we no more faced the lockup. Instead, the OOM Killer 
is triggered and contiguous allocation succeed.
I'm not familiar enough with page_alloc.c to detect any side effects 
this patch could bring.

> Best regards
> --
> Marek Szyprowski
> Samsung Poland R&D Center
>
>
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 055aa4c..45473e9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5872,6 +5872,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   		       gfp_t flags, unsigned migratetype)
>   {
>   	unsigned long outer_start, outer_end;
> +	unsigned int count = end - start;
>   	int ret;
>
>   	/*
> @@ -5900,7 +5901,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   	ret = __start_isolate_page_range(pfn_to_maxpage(start),
>   					 pfn_to_maxpage_up(end), migratetype);
>   	if (ret)
> -		goto done;
> +		return ret;
> +
> +	min_free_kbytes += count * PAGE_SIZE / 1024;
> +	setup_per_zone_wmarks();
>
>   	ret = __alloc_contig_migrate_range(start, end);
>   	if (ret)
> @@ -5922,8 +5926,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>
>   	ret = 0;
>   	while (!PageBuddy(pfn_to_page(start&  (~0UL<<  ret))))
> -		if (WARN_ON(++ret>= MAX_ORDER))
> -			return -EINVAL;
> +		if (WARN_ON(++ret>= MAX_ORDER)) {
> +			ret = -EINVAL;
> +			goto done;
> +		}
>
>   	outer_start = start&  (~0UL<<  ret);
>   	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
> @@ -5936,6 +5942,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>
>   	ret = 0;
>   done:
> +	min_free_kbytes -= count * PAGE_SIZE / 1024;
> +	setup_per_zone_wmarks();
> +
>   	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
>   				  migratetype);
>   	return ret;

Best regards,
Maxime

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH] fixup: mm: alloc_contig_range: increase min_free_kbytes during allocation
@ 2011-10-12 13:01           ` Maxime Coquelin
  0 siblings, 0 replies; 180+ messages in thread
From: Maxime Coquelin @ 2011-10-12 13:01 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Marek,

On 10/12/2011 01:08 PM, Marek Szyprowski wrote:
> Signed-off-by: Marek Szyprowski<m.szyprowski@samsung.com>
> ---
>   mm/page_alloc.c |   15 ++++++++++++---
>   1 files changed, 12 insertions(+), 3 deletions(-)
>
> Hello Maxime,
>
> Please check if this patch fixes your lockup issue. It is a bit cruel,
> but it looks that in case of real low-memory situation page allocation
> is very complex task which usually ends in waiting for the io/fs and
> free pages that really don't arrive at all.
Thanks for the reactivity.
We just tested it, we no more faced the lockup. Instead, the OOM Killer 
is triggered and contiguous allocation succeed.
I'm not familiar enough with page_alloc.c to detect any side effects 
this patch could bring.

> Best regards
> --
> Marek Szyprowski
> Samsung Poland R&D Center
>
>
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 055aa4c..45473e9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5872,6 +5872,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   		       gfp_t flags, unsigned migratetype)
>   {
>   	unsigned long outer_start, outer_end;
> +	unsigned int count = end - start;
>   	int ret;
>
>   	/*
> @@ -5900,7 +5901,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   	ret = __start_isolate_page_range(pfn_to_maxpage(start),
>   					 pfn_to_maxpage_up(end), migratetype);
>   	if (ret)
> -		goto done;
> +		return ret;
> +
> +	min_free_kbytes += count * PAGE_SIZE / 1024;
> +	setup_per_zone_wmarks();
>
>   	ret = __alloc_contig_migrate_range(start, end);
>   	if (ret)
> @@ -5922,8 +5926,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>
>   	ret = 0;
>   	while (!PageBuddy(pfn_to_page(start&  (~0UL<<  ret))))
> -		if (WARN_ON(++ret>= MAX_ORDER))
> -			return -EINVAL;
> +		if (WARN_ON(++ret>= MAX_ORDER)) {
> +			ret = -EINVAL;
> +			goto done;
> +		}
>
>   	outer_start = start&  (~0UL<<  ret);
>   	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
> @@ -5936,6 +5942,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>
>   	ret = 0;
>   done:
> +	min_free_kbytes -= count * PAGE_SIZE / 1024;
> +	setup_per_zone_wmarks();
> +
>   	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
>   				  migratetype);
>   	return ret;

Best regards,
Maxime

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-14  4:33     ` Subash Patel
  -1 siblings, 0 replies; 180+ messages in thread
From: Subash Patel @ 2011-10-14  4:33 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Chunsang Jeong, Michal Nazarewicz,
	Dave Hansen, Jesse Barker, Kyungmin Park, Ankita Garg,
	Andrew Morton, KAMEZAWA Hiroyuki

Hi Marek,

As informed to you in private over IRC, below piece of code broke during 
booting EXYNOS4:SMDKV310 with ZONE_DMA enabled.


On 10/06/2011 07:24 PM, Marek Szyprowski wrote:
...
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index fbdd12e..9c27fbd 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -21,6 +21,7 @@
>   #include<linux/gfp.h>
>   #include<linux/memblock.h>
>   #include<linux/sort.h>
> +#include<linux/dma-contiguous.h>
>
>   #include<asm/mach-types.h>
>   #include<asm/prom.h>
> @@ -371,6 +372,13 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
>   	if (mdesc->reserve)
>   		mdesc->reserve();
>
> +	/* reserve memory for DMA contigouos allocations */
> +#ifdef CONFIG_ZONE_DMA
> +	dma_contiguous_reserve(PHYS_OFFSET + mdesc->dma_zone_size - 1);
> +#else
> +	dma_contiguous_reserve(0);
> +#endif
> +
>   	memblock_analyze();
>   	memblock_dump_all();
>   }
Regards,
Subash

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-14  4:33     ` Subash Patel
  0 siblings, 0 replies; 180+ messages in thread
From: Subash Patel @ 2011-10-14  4:33 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Chunsang Jeong, Michal Nazarewicz,
	Dave Hansen, Jesse Barker, Kyungmin Park, Ankita Garg,
	Andrew Morton, KAMEZAWA Hiroyuki

Hi Marek,

As informed to you in private over IRC, below piece of code broke during 
booting EXYNOS4:SMDKV310 with ZONE_DMA enabled.


On 10/06/2011 07:24 PM, Marek Szyprowski wrote:
...
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index fbdd12e..9c27fbd 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -21,6 +21,7 @@
>   #include<linux/gfp.h>
>   #include<linux/memblock.h>
>   #include<linux/sort.h>
> +#include<linux/dma-contiguous.h>
>
>   #include<asm/mach-types.h>
>   #include<asm/prom.h>
> @@ -371,6 +372,13 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
>   	if (mdesc->reserve)
>   		mdesc->reserve();
>
> +	/* reserve memory for DMA contigouos allocations */
> +#ifdef CONFIG_ZONE_DMA
> +	dma_contiguous_reserve(PHYS_OFFSET + mdesc->dma_zone_size - 1);
> +#else
> +	dma_contiguous_reserve(0);
> +#endif
> +
>   	memblock_analyze();
>   	memblock_dump_all();
>   }
Regards,
Subash

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Linaro-mm-sig] [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-14  4:33     ` Subash Patel
  0 siblings, 0 replies; 180+ messages in thread
From: Subash Patel @ 2011-10-14  4:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marek,

As informed to you in private over IRC, below piece of code broke during 
booting EXYNOS4:SMDKV310 with ZONE_DMA enabled.


On 10/06/2011 07:24 PM, Marek Szyprowski wrote:
...
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index fbdd12e..9c27fbd 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -21,6 +21,7 @@
>   #include<linux/gfp.h>
>   #include<linux/memblock.h>
>   #include<linux/sort.h>
> +#include<linux/dma-contiguous.h>
>
>   #include<asm/mach-types.h>
>   #include<asm/prom.h>
> @@ -371,6 +372,13 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
>   	if (mdesc->reserve)
>   		mdesc->reserve();
>
> +	/* reserve memory for DMA contigouos allocations */
> +#ifdef CONFIG_ZONE_DMA
> +	dma_contiguous_reserve(PHYS_OFFSET + mdesc->dma_zone_size - 1);
> +#else
> +	dma_contiguous_reserve(0);
> +#endif
> +
>   	memblock_analyze();
>   	memblock_dump_all();
>   }
Regards,
Subash

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
  2011-10-14  4:33     ` Subash Patel
  (?)
@ 2011-10-14  9:14       ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-14  9:14 UTC (permalink / raw)
  To: 'Subash Patel'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki'

Hello,

On Friday, October 14, 2011 6:33 AM Subash Patel wrote:

> Hi Marek,
> 
> As informed to you in private over IRC, below piece of code broke during
> booting EXYNOS4:SMDKV310 with ZONE_DMA enabled.

Right, I missed the fact that ZONE_DMA can be enabled but the machine does not
provide specific zone size. I will fix this in the next version. Thanks for 
pointing this bug!

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-14  9:14       ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-14  9:14 UTC (permalink / raw)
  To: 'Subash Patel'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Dave Hansen',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki'

Hello,

On Friday, October 14, 2011 6:33 AM Subash Patel wrote:

> Hi Marek,
> 
> As informed to you in private over IRC, below piece of code broke during
> booting EXYNOS4:SMDKV310 with ZONE_DMA enabled.

Right, I missed the fact that ZONE_DMA can be enabled but the machine does not
provide specific zone size. I will fix this in the next version. Thanks for 
pointing this bug!

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Linaro-mm-sig] [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-14  9:14       ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-14  9:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Friday, October 14, 2011 6:33 AM Subash Patel wrote:

> Hi Marek,
> 
> As informed to you in private over IRC, below piece of code broke during
> booting EXYNOS4:SMDKV310 with ZONE_DMA enabled.

Right, I missed the fact that ZONE_DMA can be enabled but the machine does not
provide specific zone size. I will fix this in the next version. Thanks for 
pointing this bug!

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-11 13:52       ` Arnd Bergmann
  (?)
@ 2011-10-14 23:19         ` Andrew Morton
  -1 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:19 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Paul McKenney, Marek Szyprowski, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Michal Nazarewicz,
	Kyungmin Park, Russell King, KAMEZAWA Hiroyuki, Ankita Garg,
	Daniel Walker, Mel Gorman, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen, Johannes Weiner

On Tue, 11 Oct 2011 15:52:04 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Tuesday 11 October 2011, Andrew Morton wrote:
> > Russell's going to hate me, but...
> > 
> > I do know that he had substantial objections to at least earlier
> > versions of this, and he is a guy who knows of what he speaks.
> > 
> > So I would want to get a nod from rmk on this work before proceeding. 
> > If that nod isn't available then let's please identify the issues and
> > see what we can do about them.
> 
> I'm pretty sure that Russell's concerns were almost entirely about the
> ARM specific parts, which were extremely hard to figure out. The
> most important technical concern back in July was that the patch
> series at the time did not address the problem of conflicting pte
> flags when we remap memory as uncached on ARMv6. He had a patch
> to address this problem that was supposed to get merged in 3.1
> and would have conflicted with the CMA patch set.
> 
> Things have changed a lot since then. Russell had to revert his
> own patch because he found regressions using it on older machines.
> However, the current CMA on ARM patch AFAICT reliably fixes this
> problem now and does not cause the same regression on older machines.
> The solution used now is the one we agreed on after sitting together
> for a few hours with Russell, Marek, Paul McKenney and myself.
> 
> If there are still concerns over the ARM specific portion of
> the patch series, I'm very confident that we can resolve these
> now (I was much less so before that meeting).
> 
> What I would really want to hear from you is your opinion on
> the architecture independent stuff. Obviously, ARM is the
> most important consumer of the patch set, but I think the
> code has its merit on other architectures as well and most of
> them (maybe not parisc) should be about as simple as the x86
> one that Marek posted now with v16.

Having an x86 implementation is good.  It would also be good to get
some x86 drivers using CMA asap, so the thing gets some runtime testing
from the masses.  Can we think of anything we can do here?

Regarding the core MM changes: Mel's the man for migration and
compaction.  I wouldn't want to proceed until he (and perferably
Johannes) have spent some quality time with the code.  I'm not seeing
their reviewed-by's of acked-by's and I don't have a good recollection
of their involvement?


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-14 23:19         ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:19 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Paul McKenney, Marek Szyprowski, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Michal Nazarewicz,
	Kyungmin Park, Russell King, KAMEZAWA Hiroyuki, Ankita Garg,
	Daniel Walker, Mel Gorman, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen, Johannes Weiner

On Tue, 11 Oct 2011 15:52:04 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Tuesday 11 October 2011, Andrew Morton wrote:
> > Russell's going to hate me, but...
> > 
> > I do know that he had substantial objections to at least earlier
> > versions of this, and he is a guy who knows of what he speaks.
> > 
> > So I would want to get a nod from rmk on this work before proceeding. 
> > If that nod isn't available then let's please identify the issues and
> > see what we can do about them.
> 
> I'm pretty sure that Russell's concerns were almost entirely about the
> ARM specific parts, which were extremely hard to figure out. The
> most important technical concern back in July was that the patch
> series at the time did not address the problem of conflicting pte
> flags when we remap memory as uncached on ARMv6. He had a patch
> to address this problem that was supposed to get merged in 3.1
> and would have conflicted with the CMA patch set.
> 
> Things have changed a lot since then. Russell had to revert his
> own patch because he found regressions using it on older machines.
> However, the current CMA on ARM patch AFAICT reliably fixes this
> problem now and does not cause the same regression on older machines.
> The solution used now is the one we agreed on after sitting together
> for a few hours with Russell, Marek, Paul McKenney and myself.
> 
> If there are still concerns over the ARM specific portion of
> the patch series, I'm very confident that we can resolve these
> now (I was much less so before that meeting).
> 
> What I would really want to hear from you is your opinion on
> the architecture independent stuff. Obviously, ARM is the
> most important consumer of the patch set, but I think the
> code has its merit on other architectures as well and most of
> them (maybe not parisc) should be about as simple as the x86
> one that Marek posted now with v16.

Having an x86 implementation is good.  It would also be good to get
some x86 drivers using CMA asap, so the thing gets some runtime testing
from the masses.  Can we think of anything we can do here?

Regarding the core MM changes: Mel's the man for migration and
compaction.  I wouldn't want to proceed until he (and perferably
Johannes) have spent some quality time with the code.  I'm not seeing
their reviewed-by's of acked-by's and I don't have a good recollection
of their involvement?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-14 23:19         ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 11 Oct 2011 15:52:04 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Tuesday 11 October 2011, Andrew Morton wrote:
> > Russell's going to hate me, but...
> > 
> > I do know that he had substantial objections to at least earlier
> > versions of this, and he is a guy who knows of what he speaks.
> > 
> > So I would want to get a nod from rmk on this work before proceeding. 
> > If that nod isn't available then let's please identify the issues and
> > see what we can do about them.
> 
> I'm pretty sure that Russell's concerns were almost entirely about the
> ARM specific parts, which were extremely hard to figure out. The
> most important technical concern back in July was that the patch
> series at the time did not address the problem of conflicting pte
> flags when we remap memory as uncached on ARMv6. He had a patch
> to address this problem that was supposed to get merged in 3.1
> and would have conflicted with the CMA patch set.
> 
> Things have changed a lot since then. Russell had to revert his
> own patch because he found regressions using it on older machines.
> However, the current CMA on ARM patch AFAICT reliably fixes this
> problem now and does not cause the same regression on older machines.
> The solution used now is the one we agreed on after sitting together
> for a few hours with Russell, Marek, Paul McKenney and myself.
> 
> If there are still concerns over the ARM specific portion of
> the patch series, I'm very confident that we can resolve these
> now (I was much less so before that meeting).
> 
> What I would really want to hear from you is your opinion on
> the architecture independent stuff. Obviously, ARM is the
> most important consumer of the patch set, but I think the
> code has its merit on other architectures as well and most of
> them (maybe not parisc) should be about as simple as the x86
> one that Marek posted now with v16.

Having an x86 implementation is good.  It would also be good to get
some x86 drivers using CMA asap, so the thing gets some runtime testing
from the masses.  Can we think of anything we can do here?

Regarding the core MM changes: Mel's the man for migration and
compaction.  I wouldn't want to proceed until he (and perferably
Johannes) have spent some quality time with the code.  I'm not seeing
their reviewed-by's of acked-by's and I don't have a good recollection
of their involvement?

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-14 23:23     ` Andrew Morton
  -1 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:23 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, 06 Oct 2011 15:54:41 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified
> range of pfn. So, some of core logics can be used for other purpose
> as allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> [m.nazarewicz: reworded commit message]
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: rebased and updated to Linux v3.0-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
>
> ...
>
> +/*
> + * For migration.
> + */
> +
> +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);

This is a rather poor function name.  Given that we're now making it a
global identifier, perhaps we should give it a better name. 
pages_in_single_zone()?

> +unsigned long scan_lru_pages(unsigned long start, unsigned long end);
> +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>  
>
> ...
>
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -5,6 +5,9 @@
>  #include <linux/mm.h>
>  #include <linux/page-isolation.h>
>  #include <linux/pageblock-flags.h>
> +#include <linux/memcontrol.h>
> +#include <linux/migrate.h>
> +#include <linux/mm_inline.h>
>  #include "internal.h"
>  
>  static inline struct page *
> @@ -139,3 +142,114 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  	return ret ? 0 : -EBUSY;
>  }
> +
> +
> +/*
> + * Confirm all pages in a range [start, end) is belongs to the same zone.

It would be good to fix up that sentence while we're touching it. 
"Confirm that all pages ...  belong to the same zone".

>
> ...
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-10-14 23:23     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:23 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, 06 Oct 2011 15:54:41 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified
> range of pfn. So, some of core logics can be used for other purpose
> as allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> [m.nazarewicz: reworded commit message]
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: rebased and updated to Linux v3.0-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
>
> ...
>
> +/*
> + * For migration.
> + */
> +
> +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);

This is a rather poor function name.  Given that we're now making it a
global identifier, perhaps we should give it a better name. 
pages_in_single_zone()?

> +unsigned long scan_lru_pages(unsigned long start, unsigned long end);
> +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>  
>
> ...
>
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -5,6 +5,9 @@
>  #include <linux/mm.h>
>  #include <linux/page-isolation.h>
>  #include <linux/pageblock-flags.h>
> +#include <linux/memcontrol.h>
> +#include <linux/migrate.h>
> +#include <linux/mm_inline.h>
>  #include "internal.h"
>  
>  static inline struct page *
> @@ -139,3 +142,114 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  	return ret ? 0 : -EBUSY;
>  }
> +
> +
> +/*
> + * Confirm all pages in a range [start, end) is belongs to the same zone.

It would be good to fix up that sentence while we're touching it. 
"Confirm that all pages ...  belong to the same zone".

>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-10-14 23:23     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 06 Oct 2011 15:54:41 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified
> range of pfn. So, some of core logics can be used for other purpose
> as allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> [m.nazarewicz: reworded commit message]
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: rebased and updated to Linux v3.0-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
>
> ...
>
> +/*
> + * For migration.
> + */
> +
> +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);

This is a rather poor function name.  Given that we're now making it a
global identifier, perhaps we should give it a better name. 
pages_in_single_zone()?

> +unsigned long scan_lru_pages(unsigned long start, unsigned long end);
> +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>  
>
> ...
>
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -5,6 +5,9 @@
>  #include <linux/mm.h>
>  #include <linux/page-isolation.h>
>  #include <linux/pageblock-flags.h>
> +#include <linux/memcontrol.h>
> +#include <linux/migrate.h>
> +#include <linux/mm_inline.h>
>  #include "internal.h"
>  
>  static inline struct page *
> @@ -139,3 +142,114 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  	return ret ? 0 : -EBUSY;
>  }
> +
> +
> +/*
> + * Confirm all pages in a range [start, end) is belongs to the same zone.

It would be good to fix up that sentence while we're touching it. 
"Confirm that all pages ...  belong to the same zone".

>
> ...
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-14 23:29     ` Andrew Morton
  -1 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:29 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, 06 Oct 2011 15:54:42 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This commit introduces alloc_contig_freed_pages() function

The "freed" seems redundant to me.  Wouldn't "alloc_contig_pages" be a
better name?

> which allocates (ie. removes from buddy system) free pages
> in range.  Caller has to guarantee that all pages in range
> are in buddy system.
> 
> Along with this function, a free_contig_pages() function is
> provided which frees all (or a subset of) pages allocated
> with alloc_contig_free_pages().
> 
> Michal Nazarewicz has modified the function to make it easier
> to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
> return pfn of one-past-the-last allocated page.
> 
>
> ...
>
> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +/*
> + * Both PFNs must be from the same zone!  If this function returns
> + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
> + */
> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
> +{
> +	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
> +}
> +
> +#else
> +
> +#define zone_pfn_same_memmap(pfn1, pfn2) (true)

Do this in C, please.  It's nicer and can prevent unused-var warnings.

> +#endif
> +
>
> ...
>
> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
> +				       gfp_t flag)
> +{
> +	unsigned long pfn = start, count;
> +	struct page *page;
> +	struct zone *zone;
> +	int order;
> +
> +	VM_BUG_ON(!pfn_valid(start));
> +	page = pfn_to_page(start);
> +	zone = page_zone(page);
> +
> +	spin_lock_irq(&zone->lock);
> +
> +	for (;;) {
> +		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
> +			  page_zone(page) != zone);
> +
> +		list_del(&page->lru);
> +		order = page_order(page);
> +		count = 1UL << order;
> +		zone->free_area[order].nr_free--;
> +		rmv_page_order(page);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);

__mod_zone_page_state() generally shouldn't be used - it bypasses the
per-cpu magazines and can introduce high lock contentions.

That's hopefully not an issue on this callpath but it is still a red
flag.  I'd suggest at least the addition of a suitably apologetic code
comment here - we don't want people to naively copy this code.

Plus such a comment would let me know why this was done ;)

> +		pfn += count;
> +		if (pfn >= end)
> +			break;
> +		VM_BUG_ON(!pfn_valid(pfn));
> +
> +		if (zone_pfn_same_memmap(pfn - count, pfn))
> +			page += count;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +
> +	spin_unlock_irq(&zone->lock);
> +
> +	/* After this, pages in the range can be freed one be one */
> +	count = pfn - start;
> +	pfn = start;
> +	for (page = pfn_to_page(pfn); count; --count) {
> +		prep_new_page(page, 0, flag);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +
> +	return pfn;
> +}
> +
> +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +
> +	while (nr_pages--) {
> +		__free_page(page);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +}

You're sure these functions don't need EXPORT_SYMBOL()?  Maybe the
design is that only DMA core calls into here (if so, that's good).


>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.
> -- 
> 1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-14 23:29     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:29 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, 06 Oct 2011 15:54:42 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This commit introduces alloc_contig_freed_pages() function

The "freed" seems redundant to me.  Wouldn't "alloc_contig_pages" be a
better name?

> which allocates (ie. removes from buddy system) free pages
> in range.  Caller has to guarantee that all pages in range
> are in buddy system.
> 
> Along with this function, a free_contig_pages() function is
> provided which frees all (or a subset of) pages allocated
> with alloc_contig_free_pages().
> 
> Michal Nazarewicz has modified the function to make it easier
> to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
> return pfn of one-past-the-last allocated page.
> 
>
> ...
>
> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +/*
> + * Both PFNs must be from the same zone!  If this function returns
> + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
> + */
> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
> +{
> +	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
> +}
> +
> +#else
> +
> +#define zone_pfn_same_memmap(pfn1, pfn2) (true)

Do this in C, please.  It's nicer and can prevent unused-var warnings.

> +#endif
> +
>
> ...
>
> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
> +				       gfp_t flag)
> +{
> +	unsigned long pfn = start, count;
> +	struct page *page;
> +	struct zone *zone;
> +	int order;
> +
> +	VM_BUG_ON(!pfn_valid(start));
> +	page = pfn_to_page(start);
> +	zone = page_zone(page);
> +
> +	spin_lock_irq(&zone->lock);
> +
> +	for (;;) {
> +		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
> +			  page_zone(page) != zone);
> +
> +		list_del(&page->lru);
> +		order = page_order(page);
> +		count = 1UL << order;
> +		zone->free_area[order].nr_free--;
> +		rmv_page_order(page);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);

__mod_zone_page_state() generally shouldn't be used - it bypasses the
per-cpu magazines and can introduce high lock contentions.

That's hopefully not an issue on this callpath but it is still a red
flag.  I'd suggest at least the addition of a suitably apologetic code
comment here - we don't want people to naively copy this code.

Plus such a comment would let me know why this was done ;)

> +		pfn += count;
> +		if (pfn >= end)
> +			break;
> +		VM_BUG_ON(!pfn_valid(pfn));
> +
> +		if (zone_pfn_same_memmap(pfn - count, pfn))
> +			page += count;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +
> +	spin_unlock_irq(&zone->lock);
> +
> +	/* After this, pages in the range can be freed one be one */
> +	count = pfn - start;
> +	pfn = start;
> +	for (page = pfn_to_page(pfn); count; --count) {
> +		prep_new_page(page, 0, flag);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +
> +	return pfn;
> +}
> +
> +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +
> +	while (nr_pages--) {
> +		__free_page(page);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +}

You're sure these functions don't need EXPORT_SYMBOL()?  Maybe the
design is that only DMA core calls into here (if so, that's good).


>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.
> -- 
> 1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-14 23:29     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 06 Oct 2011 15:54:42 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This commit introduces alloc_contig_freed_pages() function

The "freed" seems redundant to me.  Wouldn't "alloc_contig_pages" be a
better name?

> which allocates (ie. removes from buddy system) free pages
> in range.  Caller has to guarantee that all pages in range
> are in buddy system.
> 
> Along with this function, a free_contig_pages() function is
> provided which frees all (or a subset of) pages allocated
> with alloc_contig_free_pages().
> 
> Michal Nazarewicz has modified the function to make it easier
> to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
> return pfn of one-past-the-last allocated page.
> 
>
> ...
>
> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +/*
> + * Both PFNs must be from the same zone!  If this function returns
> + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
> + */
> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
> +{
> +	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
> +}
> +
> +#else
> +
> +#define zone_pfn_same_memmap(pfn1, pfn2) (true)

Do this in C, please.  It's nicer and can prevent unused-var warnings.

> +#endif
> +
>
> ...
>
> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
> +				       gfp_t flag)
> +{
> +	unsigned long pfn = start, count;
> +	struct page *page;
> +	struct zone *zone;
> +	int order;
> +
> +	VM_BUG_ON(!pfn_valid(start));
> +	page = pfn_to_page(start);
> +	zone = page_zone(page);
> +
> +	spin_lock_irq(&zone->lock);
> +
> +	for (;;) {
> +		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
> +			  page_zone(page) != zone);
> +
> +		list_del(&page->lru);
> +		order = page_order(page);
> +		count = 1UL << order;
> +		zone->free_area[order].nr_free--;
> +		rmv_page_order(page);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);

__mod_zone_page_state() generally shouldn't be used - it bypasses the
per-cpu magazines and can introduce high lock contentions.

That's hopefully not an issue on this callpath but it is still a red
flag.  I'd suggest at least the addition of a suitably apologetic code
comment here - we don't want people to naively copy this code.

Plus such a comment would let me know why this was done ;)

> +		pfn += count;
> +		if (pfn >= end)
> +			break;
> +		VM_BUG_ON(!pfn_valid(pfn));
> +
> +		if (zone_pfn_same_memmap(pfn - count, pfn))
> +			page += count;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +
> +	spin_unlock_irq(&zone->lock);
> +
> +	/* After this, pages in the range can be freed one be one */
> +	count = pfn - start;
> +	pfn = start;
> +	for (page = pfn_to_page(pfn); count; --count) {
> +		prep_new_page(page, 0, flag);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +
> +	return pfn;
> +}
> +
> +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +
> +	while (nr_pages--) {
> +		__free_page(page);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +}

You're sure these functions don't need EXPORT_SYMBOL()?  Maybe the
design is that only DMA core calls into here (if so, that's good).


>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.
> -- 
> 1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 3/9] mm: alloc_contig_range() added
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-14 23:35     ` Andrew Morton
  -1 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:35 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Mel Gorman

On Thu, 06 Oct 2011 15:54:43 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit adds the alloc_contig_range() function which tries
> to allocate given range of pages.  It tries to migrate all
> already allocated pages that fall in the range thus freeing them.
> Once all pages in the range are freed they are removed from the
> buddy system thus allocated for the caller to use.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: renamed some variables for easier code reading]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>

Where-is: Mel Gorman <mel@csn.ul.ie>

> +#define MIGRATION_RETRY	5
> +static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> +{
> +	int migration_failed = 0, ret;
> +	unsigned long pfn = start;
> +
> +	/*
> +	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
> +	 * __alloc_contig_pages().
> +	 */
> +
> +	/* drop all pages in pagevec and pcp list */
> +	lru_add_drain_all();
> +	drain_all_pages();

These operations are sometimes wrong ;) Have you confirmed that we
really need to perform them here?  If so, a little comment explaining
why we're using them here would be good.

> +	for (;;) {
> +		pfn = scan_lru_pages(pfn, end);
> +		if (!pfn || pfn >= end)
> +			break;
> +
> +		ret = do_migrate_range(pfn, end);
> +		if (!ret) {
> +			migration_failed = 0;
> +		} else if (ret != -EBUSY
> +			|| ++migration_failed >= MIGRATION_RETRY) {

Sigh, magic numbers.

Have you ever seen this retry loop actually expire in testing?

migrate_pages() tries ten times.  This code tries five times.  Is there
any science to all of this?

> +			return ret;
> +		} else {
> +			/* There are unstable pages.on pagevec. */
> +			lru_add_drain_all();
> +			/*
> +			 * there may be pages on pcplist before
> +			 * we mark the range as ISOLATED.
> +			 */
> +			drain_all_pages();
> +		}
> +		cond_resched();
> +	}
> +
> +	if (!migration_failed) {
> +		/* drop all pages in pagevec and pcp list */
> +		lru_add_drain_all();
> +		drain_all_pages();

hm.

> +	}
> +
> +	/* Make sure all pages are isolated */
> +	if (WARN_ON(test_pages_isolated(start, end)))
> +		return -EBUSY;
> +
> +	return 0;
> +}
> +
> +/**
> + * alloc_contig_range() -- tries to allocate given range of pages
> + * @start:	start PFN to allocate
> + * @end:	one-past-the-last PFN to allocate
> + * @flags:	flags passed to alloc_contig_freed_pages().
> + *
> + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> + * aligned, hovewer it's callers responsibility to guarantee that we

"however"

"however it is the caller's responsibility.."

> + * are the only thread that changes migrate type of pageblocks the
> + * pages fall in.
> + *
> + * Returns zero on success or negative error code.  On success all
> + * pages which PFN is in (start, end) are allocated for the caller and
> + * need to be freed with free_contig_pages().
> + */
>
> ...
>


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 3/9] mm: alloc_contig_range() added
@ 2011-10-14 23:35     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:35 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, 06 Oct 2011 15:54:43 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit adds the alloc_contig_range() function which tries
> to allocate given range of pages.  It tries to migrate all
> already allocated pages that fall in the range thus freeing them.
> Once all pages in the range are freed they are removed from the
> buddy system thus allocated for the caller to use.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: renamed some variables for easier code reading]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>

Where-is: Mel Gorman <mel@csn.ul.ie>

> +#define MIGRATION_RETRY	5
> +static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> +{
> +	int migration_failed = 0, ret;
> +	unsigned long pfn = start;
> +
> +	/*
> +	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
> +	 * __alloc_contig_pages().
> +	 */
> +
> +	/* drop all pages in pagevec and pcp list */
> +	lru_add_drain_all();
> +	drain_all_pages();

These operations are sometimes wrong ;) Have you confirmed that we
really need to perform them here?  If so, a little comment explaining
why we're using them here would be good.

> +	for (;;) {
> +		pfn = scan_lru_pages(pfn, end);
> +		if (!pfn || pfn >= end)
> +			break;
> +
> +		ret = do_migrate_range(pfn, end);
> +		if (!ret) {
> +			migration_failed = 0;
> +		} else if (ret != -EBUSY
> +			|| ++migration_failed >= MIGRATION_RETRY) {

Sigh, magic numbers.

Have you ever seen this retry loop actually expire in testing?

migrate_pages() tries ten times.  This code tries five times.  Is there
any science to all of this?

> +			return ret;
> +		} else {
> +			/* There are unstable pages.on pagevec. */
> +			lru_add_drain_all();
> +			/*
> +			 * there may be pages on pcplist before
> +			 * we mark the range as ISOLATED.
> +			 */
> +			drain_all_pages();
> +		}
> +		cond_resched();
> +	}
> +
> +	if (!migration_failed) {
> +		/* drop all pages in pagevec and pcp list */
> +		lru_add_drain_all();
> +		drain_all_pages();

hm.

> +	}
> +
> +	/* Make sure all pages are isolated */
> +	if (WARN_ON(test_pages_isolated(start, end)))
> +		return -EBUSY;
> +
> +	return 0;
> +}
> +
> +/**
> + * alloc_contig_range() -- tries to allocate given range of pages
> + * @start:	start PFN to allocate
> + * @end:	one-past-the-last PFN to allocate
> + * @flags:	flags passed to alloc_contig_freed_pages().
> + *
> + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> + * aligned, hovewer it's callers responsibility to guarantee that we

"however"

"however it is the caller's responsibility.."

> + * are the only thread that changes migrate type of pageblocks the
> + * pages fall in.
> + *
> + * Returns zero on success or negative error code.  On success all
> + * pages which PFN is in (start, end) are allocated for the caller and
> + * need to be freed with free_contig_pages().
> + */
>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 3/9] mm: alloc_contig_range() added
@ 2011-10-14 23:35     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 06 Oct 2011 15:54:43 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit adds the alloc_contig_range() function which tries
> to allocate given range of pages.  It tries to migrate all
> already allocated pages that fall in the range thus freeing them.
> Once all pages in the range are freed they are removed from the
> buddy system thus allocated for the caller to use.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: renamed some variables for easier code reading]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>

Where-is: Mel Gorman <mel@csn.ul.ie>

> +#define MIGRATION_RETRY	5
> +static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> +{
> +	int migration_failed = 0, ret;
> +	unsigned long pfn = start;
> +
> +	/*
> +	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
> +	 * __alloc_contig_pages().
> +	 */
> +
> +	/* drop all pages in pagevec and pcp list */
> +	lru_add_drain_all();
> +	drain_all_pages();

These operations are sometimes wrong ;) Have you confirmed that we
really need to perform them here?  If so, a little comment explaining
why we're using them here would be good.

> +	for (;;) {
> +		pfn = scan_lru_pages(pfn, end);
> +		if (!pfn || pfn >= end)
> +			break;
> +
> +		ret = do_migrate_range(pfn, end);
> +		if (!ret) {
> +			migration_failed = 0;
> +		} else if (ret != -EBUSY
> +			|| ++migration_failed >= MIGRATION_RETRY) {

Sigh, magic numbers.

Have you ever seen this retry loop actually expire in testing?

migrate_pages() tries ten times.  This code tries five times.  Is there
any science to all of this?

> +			return ret;
> +		} else {
> +			/* There are unstable pages.on pagevec. */
> +			lru_add_drain_all();
> +			/*
> +			 * there may be pages on pcplist before
> +			 * we mark the range as ISOLATED.
> +			 */
> +			drain_all_pages();
> +		}
> +		cond_resched();
> +	}
> +
> +	if (!migration_failed) {
> +		/* drop all pages in pagevec and pcp list */
> +		lru_add_drain_all();
> +		drain_all_pages();

hm.

> +	}
> +
> +	/* Make sure all pages are isolated */
> +	if (WARN_ON(test_pages_isolated(start, end)))
> +		return -EBUSY;
> +
> +	return 0;
> +}
> +
> +/**
> + * alloc_contig_range() -- tries to allocate given range of pages
> + * @start:	start PFN to allocate
> + * @end:	one-past-the-last PFN to allocate
> + * @flags:	flags passed to alloc_contig_freed_pages().
> + *
> + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> + * aligned, hovewer it's callers responsibility to guarantee that we

"however"

"however it is the caller's responsibility.."

> + * are the only thread that changes migrate type of pageblocks the
> + * pages fall in.
> + *
> + * Returns zero on success or negative error code.  On success all
> + * pages which PFN is in (start, end) are allocated for the caller and
> + * need to be freed with free_contig_pages().
> + */
>
> ...
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-14 23:38     ` Andrew Morton
  -1 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:38 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Mel Gorman

On Thu, 06 Oct 2011 15:54:44 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
> 
> This guarantees that page in a MIGRATE_CMA page block can
> always be migrated somewhere else (unless there's no memory left
> in the system).
> 
> It is designed to be used with Contiguous Memory Allocator
> (CMA) for allocating big chunks (eg. 10MiB) of physically
> contiguous memory.  Once driver requests contiguous memory,
> CMA will migrate pages from MIGRATE_CMA pageblocks.
> 
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
> 
>
> ...
>
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> +#else
> +#  define is_migrate_cma(migratetype) false
> +#endif

Implement in C, please.

>
> ...
>
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>  		return false;
>  
> +	/* Keep MIGRATE_CMA alone as well. */
> +	/*
> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
> +	 * pages since compaction insists on changing their migration
> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
> +	 * isolate_freepages_block() above).
> +	 */

Talk to us about this.

How serious is this shortcoming in practice?  What would a fix look
like?  Is anyone working on an implementation, or planning to do so?


> +	if (is_migrate_cma(migratetype))
> +		return false;
> +
>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>  		return true;
>
> ...
>
> +void __init init_cma_reserved_pageblock(struct page *page)
> +{
> +	struct page *p = page;
> +	unsigned i = pageblock_nr_pages;
> +
> +	prefetchw(p);
> +	do {
> +		if (--i)
> +			prefetchw(p + 1);
> +		__ClearPageReserved(p);
> +		set_page_count(p, 0);
> +	} while (++p, i);
> +
> +	set_page_refcounted(page);
> +	set_pageblock_migratetype(page, MIGRATE_CMA);
> +	__free_pages(page, pageblock_order);
> +	totalram_pages += pageblock_nr_pages;
> +}

I wonder if the prefetches do any good.  it doesn't seem very important
in an __init function.

> +#endif
>  
>
> ...
>


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-10-14 23:38     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:38 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, 06 Oct 2011 15:54:44 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
> 
> This guarantees that page in a MIGRATE_CMA page block can
> always be migrated somewhere else (unless there's no memory left
> in the system).
> 
> It is designed to be used with Contiguous Memory Allocator
> (CMA) for allocating big chunks (eg. 10MiB) of physically
> contiguous memory.  Once driver requests contiguous memory,
> CMA will migrate pages from MIGRATE_CMA pageblocks.
> 
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
> 
>
> ...
>
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> +#else
> +#  define is_migrate_cma(migratetype) false
> +#endif

Implement in C, please.

>
> ...
>
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>  		return false;
>  
> +	/* Keep MIGRATE_CMA alone as well. */
> +	/*
> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
> +	 * pages since compaction insists on changing their migration
> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
> +	 * isolate_freepages_block() above).
> +	 */

Talk to us about this.

How serious is this shortcoming in practice?  What would a fix look
like?  Is anyone working on an implementation, or planning to do so?


> +	if (is_migrate_cma(migratetype))
> +		return false;
> +
>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>  		return true;
>
> ...
>
> +void __init init_cma_reserved_pageblock(struct page *page)
> +{
> +	struct page *p = page;
> +	unsigned i = pageblock_nr_pages;
> +
> +	prefetchw(p);
> +	do {
> +		if (--i)
> +			prefetchw(p + 1);
> +		__ClearPageReserved(p);
> +		set_page_count(p, 0);
> +	} while (++p, i);
> +
> +	set_page_refcounted(page);
> +	set_pageblock_migratetype(page, MIGRATE_CMA);
> +	__free_pages(page, pageblock_order);
> +	totalram_pages += pageblock_nr_pages;
> +}

I wonder if the prefetches do any good.  it doesn't seem very important
in an __init function.

> +#endif
>  
>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-10-14 23:38     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 06 Oct 2011 15:54:44 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
> 
> This guarantees that page in a MIGRATE_CMA page block can
> always be migrated somewhere else (unless there's no memory left
> in the system).
> 
> It is designed to be used with Contiguous Memory Allocator
> (CMA) for allocating big chunks (eg. 10MiB) of physically
> contiguous memory.  Once driver requests contiguous memory,
> CMA will migrate pages from MIGRATE_CMA pageblocks.
> 
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
> 
>
> ...
>
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> +#else
> +#  define is_migrate_cma(migratetype) false
> +#endif

Implement in C, please.

>
> ...
>
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>  		return false;
>  
> +	/* Keep MIGRATE_CMA alone as well. */
> +	/*
> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
> +	 * pages since compaction insists on changing their migration
> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
> +	 * isolate_freepages_block() above).
> +	 */

Talk to us about this.

How serious is this shortcoming in practice?  What would a fix look
like?  Is anyone working on an implementation, or planning to do so?


> +	if (is_migrate_cma(migratetype))
> +		return false;
> +
>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>  		return true;
>
> ...
>
> +void __init init_cma_reserved_pageblock(struct page *page)
> +{
> +	struct page *p = page;
> +	unsigned i = pageblock_nr_pages;
> +
> +	prefetchw(p);
> +	do {
> +		if (--i)
> +			prefetchw(p + 1);
> +		__ClearPageReserved(p);
> +		set_page_count(p, 0);
> +	} while (++p, i);
> +
> +	set_page_refcounted(page);
> +	set_pageblock_migratetype(page, MIGRATE_CMA);
> +	__free_pages(page, pageblock_order);
> +	totalram_pages += pageblock_nr_pages;
> +}

I wonder if the prefetches do any good.  it doesn't seem very important
in an __init function.

> +#endif
>  
>
> ...
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-14 23:57     ` Andrew Morton
  -1 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:57 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, 06 Oct 2011 15:54:46 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
>
> ...
>
> +#ifdef phys_to_pfn
> +/* nothing to do */
> +#elif defined __phys_to_pfn
> +#  define phys_to_pfn __phys_to_pfn
> +#elif defined __va
> +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> +#else
> +#  error phys_to_pfn implementation needed
> +#endif

Yikes!

This hackery should not be here, please.  If we need a phys_to_pfn()
then let's write a proper one which lives in core MM and arch, then get
it suitably reviewed and integrated and then maintained.

> +struct cma {
> +	unsigned long	base_pfn;
> +	unsigned long	count;
> +	unsigned long	*bitmap;
> +};
> +
> +struct cma *dma_contiguous_default_area;
> +
> +#ifndef CONFIG_CMA_SIZE_ABSOLUTE
> +#define CONFIG_CMA_SIZE_ABSOLUTE 0
> +#endif
> +
> +#ifndef CONFIG_CMA_SIZE_PERCENTAGE
> +#define CONFIG_CMA_SIZE_PERCENTAGE 0
> +#endif

No, .c files should not #define CONFIG_ variables like this.

One approach is

#ifdef CONFIG_FOO
#define BAR CONFIG_FOO
#else
#define BAR 0
#endif

but that's merely cosmetic fluff.  A superior fix is to get the Kconfig
correct, so CONFIG_FOO cannot ever be undefined if we're compiling this
.c file.

> +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
> +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> +static long size_cmdline = -1;

Maybe a little documentation for these, explaining their role in
everything?

> +static int __init early_cma(char *p)
> +{
> +	pr_debug("%s(%s)\n", __func__, p);
> +	size_cmdline = memparse(p, &p);
> +	return 0;
> +}
> +early_param("cma", early_cma);

Did this get added to Documentation/kernel-parameters.txt?

> +static unsigned long __init __cma_early_get_total_pages(void)

The leading __ seems unnecessay for a static function.

> +{
> +	struct memblock_region *reg;
> +	unsigned long total_pages = 0;
> +
> +	/*
> +	 * We cannot use memblock_phys_mem_size() here, because
> +	 * memblock_analyze() has not been called yet.
> +	 */
> +	for_each_memblock(memory, reg)
> +		total_pages += memblock_region_memory_end_pfn(reg) -
> +			       memblock_region_memory_base_pfn(reg);
> +	return total_pages;
> +}
> +
> +/**
> + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> + *
> + * This funtion reserves memory from early allocator. It should be
> + * called by arch specific code once the early allocator (memblock or bootmem)
> + * has been activated and all other subsystems have already allocated/reserved
> + * memory.
> + */

Forgot to document the argument.

> +void __init dma_contiguous_reserve(phys_addr_t limit)
> +{
> +	unsigned long selected_size = 0;
> +	unsigned long total_pages;
> +
> +	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
> +
> +	total_pages = __cma_early_get_total_pages();
> +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> +
> +	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n",
> +		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
> +		size_abs / SZ_1M, size_percent / SZ_1M);
> +
> +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> +	selected_size = size_abs;
> +#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
> +	selected_size = size_percent;
> +#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
> +	selected_size = min(size_abs, size_percent);
> +#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
> +	selected_size = max(size_abs, size_percent);
> +#endif

geeze, what's all that stuff?

Whatever it's doing, it seems a bad idea to relegate these decisions to
Kconfig-time.  The vast majority of users don't have control of their
kernel configuration!  The code would be more flexible and generic if
this was done at runtime somehow.

> +	if (size_cmdline != -1)
> +		selected_size = size_cmdline;
> +
> +	if (!selected_size)
> +		return;
> +
> +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +		 selected_size / SZ_1M);
> +
> +	dma_declare_contiguous(NULL, selected_size, 0, limit);
> +};
> +
>
> ...
>
> +static struct cma *__cma_create_area(unsigned long base_pfn,

s/__//?

> +				     unsigned long count)
> +{
> +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> +	struct cma *cma;
> +
> +	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
> +
> +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> +	if (!cma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cma->base_pfn = base_pfn;
> +	cma->count = count;
> +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> +
> +	if (!cma->bitmap)
> +		goto no_mem;
> +
> +	__cma_activate_area(base_pfn, count);
> +
> +	pr_debug("%s: returned %p\n", __func__, (void *)cma);
> +	return cma;
> +
> +no_mem:
> +	kfree(cma);
> +	return ERR_PTR(-ENOMEM);
> +}
>
> ...
>
> +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> +				  phys_addr_t base, phys_addr_t limit)
> +{
> +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> +	unsigned long alignment;
> +
> +	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> +		 (unsigned long)size, (unsigned long)base,
> +		 (unsigned long)limit);
> +
> +	/* Sanity checks */
> +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> +		return -ENOSPC;

I think a loud printk() is appropriate if the kernel fails in this
manner.

> +	if (!size)
> +		return -EINVAL;
> +
> +	/* Sanitise input arguments */
> +	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
> +	base = ALIGN(base, alignment);
> +	size = ALIGN(size, alignment);
> +	limit = ALIGN(limit, alignment);
> +
> +	/* Reserve memory */
> +	if (base) {
> +		if (memblock_is_region_reserved(base, size) ||
> +		    memblock_reserve(base, size) < 0) {
> +			base = -EBUSY;
> +			goto err;
> +		}
> +	} else {
> +		/*
> +		 * Use __memblock_alloc_base() since
> +		 * memblock_alloc_base() panic()s.
> +		 */
> +		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> +		if (!addr) {
> +			base = -ENOMEM;
> +			goto err;
> +		} else if (addr + size > ~(unsigned long)0) {
> +			memblock_free(addr, size);
> +			base = -EOVERFLOW;

EOVERFLOW is a numeric/float thing.  It seems inappropriate to use it
here.

> +			goto err;
> +		} else {
> +			base = addr;
> +		}
> +	}
> +
> +	/*
> +	 * Each reserved area must be initialised later, when more kernel
> +	 * subsystems (like slab allocator) are available.
> +	 */
> +	r->start = base;
> +	r->size = size;
> +	r->dev = dev;
> +	cma_reserved_count++;
> +	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
> +	       (unsigned long)base);
> +
> +	/*
> +	 * Architecture specific contiguous memory fixup.
> +	 */
> +	dma_contiguous_early_fixup(base, size);
> +	return 0;
> +err:
> +	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
> +	return base;
> +}
> +
>
> ...
>
> +static inline struct cma *get_dev_cma_area(struct device *dev)
> +{
> +	if (dev && dev->cma_area)
> +		return dev->cma_area;
> +	return dma_contiguous_default_area;
> +}
> +
> +static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
> +{
> +	if (dev)
> +		dev->cma_area = cma;
> +	dma_contiguous_default_area = cma;
> +}

dev_[get|set]_cma_area() would be better names.

> +#endif
> +#endif
> +#endif
>
> ...
>
> +
> +#ifdef CONFIG_CMA
> +
> +#define MAX_CMA_AREAS	(8)

What are the implications of this decision?

Should it be in Kconfig?  Everything else is :)

>
> ...
>


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-10-14 23:57     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:57 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, 06 Oct 2011 15:54:46 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
>
> ...
>
> +#ifdef phys_to_pfn
> +/* nothing to do */
> +#elif defined __phys_to_pfn
> +#  define phys_to_pfn __phys_to_pfn
> +#elif defined __va
> +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> +#else
> +#  error phys_to_pfn implementation needed
> +#endif

Yikes!

This hackery should not be here, please.  If we need a phys_to_pfn()
then let's write a proper one which lives in core MM and arch, then get
it suitably reviewed and integrated and then maintained.

> +struct cma {
> +	unsigned long	base_pfn;
> +	unsigned long	count;
> +	unsigned long	*bitmap;
> +};
> +
> +struct cma *dma_contiguous_default_area;
> +
> +#ifndef CONFIG_CMA_SIZE_ABSOLUTE
> +#define CONFIG_CMA_SIZE_ABSOLUTE 0
> +#endif
> +
> +#ifndef CONFIG_CMA_SIZE_PERCENTAGE
> +#define CONFIG_CMA_SIZE_PERCENTAGE 0
> +#endif

No, .c files should not #define CONFIG_ variables like this.

One approach is

#ifdef CONFIG_FOO
#define BAR CONFIG_FOO
#else
#define BAR 0
#endif

but that's merely cosmetic fluff.  A superior fix is to get the Kconfig
correct, so CONFIG_FOO cannot ever be undefined if we're compiling this
.c file.

> +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
> +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> +static long size_cmdline = -1;

Maybe a little documentation for these, explaining their role in
everything?

> +static int __init early_cma(char *p)
> +{
> +	pr_debug("%s(%s)\n", __func__, p);
> +	size_cmdline = memparse(p, &p);
> +	return 0;
> +}
> +early_param("cma", early_cma);

Did this get added to Documentation/kernel-parameters.txt?

> +static unsigned long __init __cma_early_get_total_pages(void)

The leading __ seems unnecessay for a static function.

> +{
> +	struct memblock_region *reg;
> +	unsigned long total_pages = 0;
> +
> +	/*
> +	 * We cannot use memblock_phys_mem_size() here, because
> +	 * memblock_analyze() has not been called yet.
> +	 */
> +	for_each_memblock(memory, reg)
> +		total_pages += memblock_region_memory_end_pfn(reg) -
> +			       memblock_region_memory_base_pfn(reg);
> +	return total_pages;
> +}
> +
> +/**
> + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> + *
> + * This funtion reserves memory from early allocator. It should be
> + * called by arch specific code once the early allocator (memblock or bootmem)
> + * has been activated and all other subsystems have already allocated/reserved
> + * memory.
> + */

Forgot to document the argument.

> +void __init dma_contiguous_reserve(phys_addr_t limit)
> +{
> +	unsigned long selected_size = 0;
> +	unsigned long total_pages;
> +
> +	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
> +
> +	total_pages = __cma_early_get_total_pages();
> +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> +
> +	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n",
> +		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
> +		size_abs / SZ_1M, size_percent / SZ_1M);
> +
> +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> +	selected_size = size_abs;
> +#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
> +	selected_size = size_percent;
> +#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
> +	selected_size = min(size_abs, size_percent);
> +#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
> +	selected_size = max(size_abs, size_percent);
> +#endif

geeze, what's all that stuff?

Whatever it's doing, it seems a bad idea to relegate these decisions to
Kconfig-time.  The vast majority of users don't have control of their
kernel configuration!  The code would be more flexible and generic if
this was done at runtime somehow.

> +	if (size_cmdline != -1)
> +		selected_size = size_cmdline;
> +
> +	if (!selected_size)
> +		return;
> +
> +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +		 selected_size / SZ_1M);
> +
> +	dma_declare_contiguous(NULL, selected_size, 0, limit);
> +};
> +
>
> ...
>
> +static struct cma *__cma_create_area(unsigned long base_pfn,

s/__//?

> +				     unsigned long count)
> +{
> +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> +	struct cma *cma;
> +
> +	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
> +
> +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> +	if (!cma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cma->base_pfn = base_pfn;
> +	cma->count = count;
> +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> +
> +	if (!cma->bitmap)
> +		goto no_mem;
> +
> +	__cma_activate_area(base_pfn, count);
> +
> +	pr_debug("%s: returned %p\n", __func__, (void *)cma);
> +	return cma;
> +
> +no_mem:
> +	kfree(cma);
> +	return ERR_PTR(-ENOMEM);
> +}
>
> ...
>
> +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> +				  phys_addr_t base, phys_addr_t limit)
> +{
> +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> +	unsigned long alignment;
> +
> +	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> +		 (unsigned long)size, (unsigned long)base,
> +		 (unsigned long)limit);
> +
> +	/* Sanity checks */
> +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> +		return -ENOSPC;

I think a loud printk() is appropriate if the kernel fails in this
manner.

> +	if (!size)
> +		return -EINVAL;
> +
> +	/* Sanitise input arguments */
> +	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
> +	base = ALIGN(base, alignment);
> +	size = ALIGN(size, alignment);
> +	limit = ALIGN(limit, alignment);
> +
> +	/* Reserve memory */
> +	if (base) {
> +		if (memblock_is_region_reserved(base, size) ||
> +		    memblock_reserve(base, size) < 0) {
> +			base = -EBUSY;
> +			goto err;
> +		}
> +	} else {
> +		/*
> +		 * Use __memblock_alloc_base() since
> +		 * memblock_alloc_base() panic()s.
> +		 */
> +		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> +		if (!addr) {
> +			base = -ENOMEM;
> +			goto err;
> +		} else if (addr + size > ~(unsigned long)0) {
> +			memblock_free(addr, size);
> +			base = -EOVERFLOW;

EOVERFLOW is a numeric/float thing.  It seems inappropriate to use it
here.

> +			goto err;
> +		} else {
> +			base = addr;
> +		}
> +	}
> +
> +	/*
> +	 * Each reserved area must be initialised later, when more kernel
> +	 * subsystems (like slab allocator) are available.
> +	 */
> +	r->start = base;
> +	r->size = size;
> +	r->dev = dev;
> +	cma_reserved_count++;
> +	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
> +	       (unsigned long)base);
> +
> +	/*
> +	 * Architecture specific contiguous memory fixup.
> +	 */
> +	dma_contiguous_early_fixup(base, size);
> +	return 0;
> +err:
> +	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
> +	return base;
> +}
> +
>
> ...
>
> +static inline struct cma *get_dev_cma_area(struct device *dev)
> +{
> +	if (dev && dev->cma_area)
> +		return dev->cma_area;
> +	return dma_contiguous_default_area;
> +}
> +
> +static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
> +{
> +	if (dev)
> +		dev->cma_area = cma;
> +	dma_contiguous_default_area = cma;
> +}

dev_[get|set]_cma_area() would be better names.

> +#endif
> +#endif
> +#endif
>
> ...
>
> +
> +#ifdef CONFIG_CMA
> +
> +#define MAX_CMA_AREAS	(8)

What are the implications of this decision?

Should it be in Kconfig?  Everything else is :)

>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-10-14 23:57     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-14 23:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 06 Oct 2011 15:54:46 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
>
> ...
>
> +#ifdef phys_to_pfn
> +/* nothing to do */
> +#elif defined __phys_to_pfn
> +#  define phys_to_pfn __phys_to_pfn
> +#elif defined __va
> +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> +#else
> +#  error phys_to_pfn implementation needed
> +#endif

Yikes!

This hackery should not be here, please.  If we need a phys_to_pfn()
then let's write a proper one which lives in core MM and arch, then get
it suitably reviewed and integrated and then maintained.

> +struct cma {
> +	unsigned long	base_pfn;
> +	unsigned long	count;
> +	unsigned long	*bitmap;
> +};
> +
> +struct cma *dma_contiguous_default_area;
> +
> +#ifndef CONFIG_CMA_SIZE_ABSOLUTE
> +#define CONFIG_CMA_SIZE_ABSOLUTE 0
> +#endif
> +
> +#ifndef CONFIG_CMA_SIZE_PERCENTAGE
> +#define CONFIG_CMA_SIZE_PERCENTAGE 0
> +#endif

No, .c files should not #define CONFIG_ variables like this.

One approach is

#ifdef CONFIG_FOO
#define BAR CONFIG_FOO
#else
#define BAR 0
#endif

but that's merely cosmetic fluff.  A superior fix is to get the Kconfig
correct, so CONFIG_FOO cannot ever be undefined if we're compiling this
.c file.

> +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
> +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> +static long size_cmdline = -1;

Maybe a little documentation for these, explaining their role in
everything?

> +static int __init early_cma(char *p)
> +{
> +	pr_debug("%s(%s)\n", __func__, p);
> +	size_cmdline = memparse(p, &p);
> +	return 0;
> +}
> +early_param("cma", early_cma);

Did this get added to Documentation/kernel-parameters.txt?

> +static unsigned long __init __cma_early_get_total_pages(void)

The leading __ seems unnecessay for a static function.

> +{
> +	struct memblock_region *reg;
> +	unsigned long total_pages = 0;
> +
> +	/*
> +	 * We cannot use memblock_phys_mem_size() here, because
> +	 * memblock_analyze() has not been called yet.
> +	 */
> +	for_each_memblock(memory, reg)
> +		total_pages += memblock_region_memory_end_pfn(reg) -
> +			       memblock_region_memory_base_pfn(reg);
> +	return total_pages;
> +}
> +
> +/**
> + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> + *
> + * This funtion reserves memory from early allocator. It should be
> + * called by arch specific code once the early allocator (memblock or bootmem)
> + * has been activated and all other subsystems have already allocated/reserved
> + * memory.
> + */

Forgot to document the argument.

> +void __init dma_contiguous_reserve(phys_addr_t limit)
> +{
> +	unsigned long selected_size = 0;
> +	unsigned long total_pages;
> +
> +	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
> +
> +	total_pages = __cma_early_get_total_pages();
> +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> +
> +	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n",
> +		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
> +		size_abs / SZ_1M, size_percent / SZ_1M);
> +
> +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> +	selected_size = size_abs;
> +#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
> +	selected_size = size_percent;
> +#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
> +	selected_size = min(size_abs, size_percent);
> +#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
> +	selected_size = max(size_abs, size_percent);
> +#endif

geeze, what's all that stuff?

Whatever it's doing, it seems a bad idea to relegate these decisions to
Kconfig-time.  The vast majority of users don't have control of their
kernel configuration!  The code would be more flexible and generic if
this was done at runtime somehow.

> +	if (size_cmdline != -1)
> +		selected_size = size_cmdline;
> +
> +	if (!selected_size)
> +		return;
> +
> +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +		 selected_size / SZ_1M);
> +
> +	dma_declare_contiguous(NULL, selected_size, 0, limit);
> +};
> +
>
> ...
>
> +static struct cma *__cma_create_area(unsigned long base_pfn,

s/__//?

> +				     unsigned long count)
> +{
> +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> +	struct cma *cma;
> +
> +	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
> +
> +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> +	if (!cma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cma->base_pfn = base_pfn;
> +	cma->count = count;
> +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> +
> +	if (!cma->bitmap)
> +		goto no_mem;
> +
> +	__cma_activate_area(base_pfn, count);
> +
> +	pr_debug("%s: returned %p\n", __func__, (void *)cma);
> +	return cma;
> +
> +no_mem:
> +	kfree(cma);
> +	return ERR_PTR(-ENOMEM);
> +}
>
> ...
>
> +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> +				  phys_addr_t base, phys_addr_t limit)
> +{
> +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> +	unsigned long alignment;
> +
> +	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> +		 (unsigned long)size, (unsigned long)base,
> +		 (unsigned long)limit);
> +
> +	/* Sanity checks */
> +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> +		return -ENOSPC;

I think a loud printk() is appropriate if the kernel fails in this
manner.

> +	if (!size)
> +		return -EINVAL;
> +
> +	/* Sanitise input arguments */
> +	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
> +	base = ALIGN(base, alignment);
> +	size = ALIGN(size, alignment);
> +	limit = ALIGN(limit, alignment);
> +
> +	/* Reserve memory */
> +	if (base) {
> +		if (memblock_is_region_reserved(base, size) ||
> +		    memblock_reserve(base, size) < 0) {
> +			base = -EBUSY;
> +			goto err;
> +		}
> +	} else {
> +		/*
> +		 * Use __memblock_alloc_base() since
> +		 * memblock_alloc_base() panic()s.
> +		 */
> +		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> +		if (!addr) {
> +			base = -ENOMEM;
> +			goto err;
> +		} else if (addr + size > ~(unsigned long)0) {
> +			memblock_free(addr, size);
> +			base = -EOVERFLOW;

EOVERFLOW is a numeric/float thing.  It seems inappropriate to use it
here.

> +			goto err;
> +		} else {
> +			base = addr;
> +		}
> +	}
> +
> +	/*
> +	 * Each reserved area must be initialised later, when more kernel
> +	 * subsystems (like slab allocator) are available.
> +	 */
> +	r->start = base;
> +	r->size = size;
> +	r->dev = dev;
> +	cma_reserved_count++;
> +	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
> +	       (unsigned long)base);
> +
> +	/*
> +	 * Architecture specific contiguous memory fixup.
> +	 */
> +	dma_contiguous_early_fixup(base, size);
> +	return 0;
> +err:
> +	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
> +	return base;
> +}
> +
>
> ...
>
> +static inline struct cma *get_dev_cma_area(struct device *dev)
> +{
> +	if (dev && dev->cma_area)
> +		return dev->cma_area;
> +	return dma_contiguous_default_area;
> +}
> +
> +static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
> +{
> +	if (dev)
> +		dev->cma_area = cma;
> +	dma_contiguous_default_area = cma;
> +}

dev_[get|set]_cma_area() would be better names.

> +#endif
> +#endif
> +#endif
>
> ...
>
> +
> +#ifdef CONFIG_CMA
> +
> +#define MAX_CMA_AREAS	(8)

What are the implications of this decision?

Should it be in Kconfig?  Everything else is :)

>
> ...
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/7] ARM: integrate CMA with DMA-mapping subsystem
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-15  0:03     ` Andrew Morton
  -1 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-15  0:03 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, 06 Oct 2011 15:54:47 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> This patch adds support for CMA to dma-mapping subsystem for ARM
> architecture. By default a global CMA area is used, but specific devices
> are allowed to have their private memory areas if required (they can be
> created with dma_declare_contiguous() function during board
> initialization).
> 
> Contiguous memory areas reserved for DMA are remapped with 2-level page
> tables on boot. Once a buffer is requested, a low memory kernel mapping
> is updated to to match requested memory access type.
> 
> GFP_ATOMIC allocations are performed from special pool which is created
> early during boot. This way remapping page attributes is not needed on
> allocation time.
> 
> CMA has been enabled unconditionally for ARMv6+ systems.
> 
>
> ...
>
> --- /dev/null
> +++ b/arch/arm/include/asm/dma-contiguous.h
> @@ -0,0 +1,33 @@
> +#ifndef ASMARM_DMA_CONTIGUOUS_H
> +#define ASMARM_DMA_CONTIGUOUS_H
> +
> +#ifdef __KERNEL__
> +
> +#include <linux/device.h>
> +#include <linux/dma-contiguous.h>
> +
> +#ifdef CONFIG_CMA
> +
> +#define MAX_CMA_AREAS	(8)

This was already defined in include/linux/dma-contiguous.h.  The
compiler didn't warn because it was defined to the same value.  Sort it
out, please?

>
> ...
>
> +static int __init early_coherent_pool(char *p)
> +{
> +	coherent_pool_size = memparse(p, &p);
> +	return 0;
> +}
> +early_param("coherent_pool", early_coherent_pool);

Is there user documentation for the new parameter?

>
> ...
>
> +struct dma_contiguous_early_reserve {
> +	phys_addr_t base;
> +	unsigned long size;
> +};
> +
> +static struct dma_contiguous_early_reserve
> +dma_mmu_remap[MAX_CMA_AREAS] __initdata;

Tab the continuation line to the right a bit.

> +
> +static int dma_mmu_remap_num __initdata;
>
> ...
>
> +static void *__alloc_from_pool(struct device *dev, size_t size,
> +			       struct page **ret_page)
> +{
> +	struct arm_vmregion *c;
> +	size_t align;
> +
> +	if (!coherent_head.vm_start) {
> +		printk(KERN_ERR "%s: coherent pool not initialised!\n",
> +		       __func__);
> +		dump_stack();
> +		return NULL;
> +	}
> +
> +	align = 1 << fls(size - 1);

Is there a roundup_pow_of_two() hiding in there?

> +	c = arm_vmregion_alloc(&coherent_head, align, size, 0);
> +	if (c) {
> +		void *ptr = (void *)c->vm_start;
> +		struct page *page = virt_to_page(ptr);
> +		*ret_page = page;
> +		return ptr;
> +	}
> +	return NULL;
> +}
>
> ...
>


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/7] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-15  0:03     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-15  0:03 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, 06 Oct 2011 15:54:47 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> This patch adds support for CMA to dma-mapping subsystem for ARM
> architecture. By default a global CMA area is used, but specific devices
> are allowed to have their private memory areas if required (they can be
> created with dma_declare_contiguous() function during board
> initialization).
> 
> Contiguous memory areas reserved for DMA are remapped with 2-level page
> tables on boot. Once a buffer is requested, a low memory kernel mapping
> is updated to to match requested memory access type.
> 
> GFP_ATOMIC allocations are performed from special pool which is created
> early during boot. This way remapping page attributes is not needed on
> allocation time.
> 
> CMA has been enabled unconditionally for ARMv6+ systems.
> 
>
> ...
>
> --- /dev/null
> +++ b/arch/arm/include/asm/dma-contiguous.h
> @@ -0,0 +1,33 @@
> +#ifndef ASMARM_DMA_CONTIGUOUS_H
> +#define ASMARM_DMA_CONTIGUOUS_H
> +
> +#ifdef __KERNEL__
> +
> +#include <linux/device.h>
> +#include <linux/dma-contiguous.h>
> +
> +#ifdef CONFIG_CMA
> +
> +#define MAX_CMA_AREAS	(8)

This was already defined in include/linux/dma-contiguous.h.  The
compiler didn't warn because it was defined to the same value.  Sort it
out, please?

>
> ...
>
> +static int __init early_coherent_pool(char *p)
> +{
> +	coherent_pool_size = memparse(p, &p);
> +	return 0;
> +}
> +early_param("coherent_pool", early_coherent_pool);

Is there user documentation for the new parameter?

>
> ...
>
> +struct dma_contiguous_early_reserve {
> +	phys_addr_t base;
> +	unsigned long size;
> +};
> +
> +static struct dma_contiguous_early_reserve
> +dma_mmu_remap[MAX_CMA_AREAS] __initdata;

Tab the continuation line to the right a bit.

> +
> +static int dma_mmu_remap_num __initdata;
>
> ...
>
> +static void *__alloc_from_pool(struct device *dev, size_t size,
> +			       struct page **ret_page)
> +{
> +	struct arm_vmregion *c;
> +	size_t align;
> +
> +	if (!coherent_head.vm_start) {
> +		printk(KERN_ERR "%s: coherent pool not initialised!\n",
> +		       __func__);
> +		dump_stack();
> +		return NULL;
> +	}
> +
> +	align = 1 << fls(size - 1);

Is there a roundup_pow_of_two() hiding in there?

> +	c = arm_vmregion_alloc(&coherent_head, align, size, 0);
> +	if (c) {
> +		void *ptr = (void *)c->vm_start;
> +		struct page *page = virt_to_page(ptr);
> +		*ret_page = page;
> +		return ptr;
> +	}
> +	return NULL;
> +}
>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 7/7] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-10-15  0:03     ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-15  0:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 06 Oct 2011 15:54:47 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> This patch adds support for CMA to dma-mapping subsystem for ARM
> architecture. By default a global CMA area is used, but specific devices
> are allowed to have their private memory areas if required (they can be
> created with dma_declare_contiguous() function during board
> initialization).
> 
> Contiguous memory areas reserved for DMA are remapped with 2-level page
> tables on boot. Once a buffer is requested, a low memory kernel mapping
> is updated to to match requested memory access type.
> 
> GFP_ATOMIC allocations are performed from special pool which is created
> early during boot. This way remapping page attributes is not needed on
> allocation time.
> 
> CMA has been enabled unconditionally for ARMv6+ systems.
> 
>
> ...
>
> --- /dev/null
> +++ b/arch/arm/include/asm/dma-contiguous.h
> @@ -0,0 +1,33 @@
> +#ifndef ASMARM_DMA_CONTIGUOUS_H
> +#define ASMARM_DMA_CONTIGUOUS_H
> +
> +#ifdef __KERNEL__
> +
> +#include <linux/device.h>
> +#include <linux/dma-contiguous.h>
> +
> +#ifdef CONFIG_CMA
> +
> +#define MAX_CMA_AREAS	(8)

This was already defined in include/linux/dma-contiguous.h.  The
compiler didn't warn because it was defined to the same value.  Sort it
out, please?

>
> ...
>
> +static int __init early_coherent_pool(char *p)
> +{
> +	coherent_pool_size = memparse(p, &p);
> +	return 0;
> +}
> +early_param("coherent_pool", early_coherent_pool);

Is there user documentation for the new parameter?

>
> ...
>
> +struct dma_contiguous_early_reserve {
> +	phys_addr_t base;
> +	unsigned long size;
> +};
> +
> +static struct dma_contiguous_early_reserve
> +dma_mmu_remap[MAX_CMA_AREAS] __initdata;

Tab the continuation line to the right a bit.

> +
> +static int dma_mmu_remap_num __initdata;
>
> ...
>
> +static void *__alloc_from_pool(struct device *dev, size_t size,
> +			       struct page **ret_page)
> +{
> +	struct arm_vmregion *c;
> +	size_t align;
> +
> +	if (!coherent_head.vm_start) {
> +		printk(KERN_ERR "%s: coherent pool not initialised!\n",
> +		       __func__);
> +		dump_stack();
> +		return NULL;
> +	}
> +
> +	align = 1 << fls(size - 1);

Is there a roundup_pow_of_two() hiding in there?

> +	c = arm_vmregion_alloc(&coherent_head, align, size, 0);
> +	if (c) {
> +		void *ptr = (void *)c->vm_start;
> +		struct page *page = virt_to_page(ptr);
> +		*ret_page = page;
> +		return ptr;
> +	}
> +	return NULL;
> +}
>
> ...
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCHv16 0/9] Contiguous Memory Allocator
  2011-10-14 23:19         ` Andrew Morton
  (?)
@ 2011-10-15 14:24           ` Arnd Bergmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Arnd Bergmann @ 2011-10-15 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul McKenney, Marek Szyprowski, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Michal Nazarewicz,
	Kyungmin Park, Russell King, KAMEZAWA Hiroyuki, Ankita Garg,
	Daniel Walker, Mel Gorman, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen, Johannes Weiner

On Saturday 15 October 2011, Andrew Morton wrote:
> 
> On Tue, 11 Oct 2011 15:52:04 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> > What I would really want to hear from you is your opinion on
> > the architecture independent stuff. Obviously, ARM is the
> > most important consumer of the patch set, but I think the
> > code has its merit on other architectures as well and most of
> > them (maybe not parisc) should be about as simple as the x86
> > one that Marek posted now with v16.
> 
> Having an x86 implementation is good.  It would also be good to get
> some x86 drivers using CMA asap, so the thing gets some runtime testing
> from the masses.  Can we think of anything we can do here?

With the current implementation, all drivers that use dma_alloc_coherent
automatically use CMA, there is no need to modify any driver. On
the other hand, nothing on x86 currently actually requires this feature
(otherwise it would be broken already), making it hard to test the
actual migration path.

The best test I can think of would be a network benchmark under memory
pressure, preferrably one that use large jumbo frames (64KB).

	Arnd

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-15 14:24           ` Arnd Bergmann
  0 siblings, 0 replies; 180+ messages in thread
From: Arnd Bergmann @ 2011-10-15 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul McKenney, Marek Szyprowski, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Michal Nazarewicz,
	Kyungmin Park, Russell King, KAMEZAWA Hiroyuki, Ankita Garg,
	Daniel Walker, Mel Gorman, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong, Dave Hansen, Johannes Weiner

On Saturday 15 October 2011, Andrew Morton wrote:
> 
> On Tue, 11 Oct 2011 15:52:04 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> > What I would really want to hear from you is your opinion on
> > the architecture independent stuff. Obviously, ARM is the
> > most important consumer of the patch set, but I think the
> > code has its merit on other architectures as well and most of
> > them (maybe not parisc) should be about as simple as the x86
> > one that Marek posted now with v16.
> 
> Having an x86 implementation is good.  It would also be good to get
> some x86 drivers using CMA asap, so the thing gets some runtime testing
> from the masses.  Can we think of anything we can do here?

With the current implementation, all drivers that use dma_alloc_coherent
automatically use CMA, there is no need to modify any driver. On
the other hand, nothing on x86 currently actually requires this feature
(otherwise it would be broken already), making it hard to test the
actual migration path.

The best test I can think of would be a network benchmark under memory
pressure, preferrably one that use large jumbo frames (64KB).

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCHv16 0/9] Contiguous Memory Allocator
@ 2011-10-15 14:24           ` Arnd Bergmann
  0 siblings, 0 replies; 180+ messages in thread
From: Arnd Bergmann @ 2011-10-15 14:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Saturday 15 October 2011, Andrew Morton wrote:
> 
> On Tue, 11 Oct 2011 15:52:04 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> > What I would really want to hear from you is your opinion on
> > the architecture independent stuff. Obviously, ARM is the
> > most important consumer of the patch set, but I think the
> > code has its merit on other architectures as well and most of
> > them (maybe not parisc) should be about as simple as the x86
> > one that Marek posted now with v16.
> 
> Having an x86 implementation is good.  It would also be good to get
> some x86 drivers using CMA asap, so the thing gets some runtime testing
> from the masses.  Can we think of anything we can do here?

With the current implementation, all drivers that use dma_alloc_coherent
automatically use CMA, there is no need to modify any driver. On
the other hand, nothing on x86 currently actually requires this feature
(otherwise it would be broken already), making it hard to test the
actual migration path.

The best test I can think of would be a network benchmark under memory
pressure, preferrably one that use large jumbo frames (64KB).

	Arnd

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-14 23:29     ` Andrew Morton
  (?)
@ 2011-10-16  8:01       ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-16  8:01 UTC (permalink / raw)
  To: Marek Szyprowski, Andrew Morton
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, KAMEZAWA Hiroyuki,
	Ankita Garg, Daniel Walker, Mel Gorman, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

On Sat, 15 Oct 2011 01:29:33 +0200, Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 06 Oct 2011 15:54:42 +0200
> Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> This commit introduces alloc_contig_freed_pages() function
>
> The "freed" seems redundant to me.  Wouldn't "alloc_contig_pages" be a
> better name?

The “freed” is there because the function operates on pages that are in
buddy system, ie. it is given a range of PFNs that are to be removed
 from buddy system.

There's also a alloc_contig_range() function (added by next patch)
which frees pages in given range and then calls
alloc_contig_free_pages() to allocate them.

IMO, if there was an alloc_contig_pages() function, it would have to
be one level up (ie. it would figure out where to allocate memory and
then call alloc_contig_range()).  (That's really what CMA is doing).

Still, as I think of it now, maybe alloc_contig_free_range() would be
better?

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-16  8:01       ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-16  8:01 UTC (permalink / raw)
  To: Marek Szyprowski, Andrew Morton
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, KAMEZAWA Hiroyuki,
	Ankita Garg, Daniel Walker, Mel Gorman, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

On Sat, 15 Oct 2011 01:29:33 +0200, Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 06 Oct 2011 15:54:42 +0200
> Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> This commit introduces alloc_contig_freed_pages() function
>
> The "freed" seems redundant to me.  Wouldn't "alloc_contig_pages" be a
> better name?

The “freed” is there because the function operates on pages that are in
buddy system, ie. it is given a range of PFNs that are to be removed
 from buddy system.

There's also a alloc_contig_range() function (added by next patch)
which frees pages in given range and then calls
alloc_contig_free_pages() to allocate them.

IMO, if there was an alloc_contig_pages() function, it would have to
be one level up (ie. it would figure out where to allocate memory and
then call alloc_contig_range()).  (That's really what CMA is doing).

Still, as I think of it now, maybe alloc_contig_free_range() would be
better?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-16  8:01       ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-16  8:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 15 Oct 2011 01:29:33 +0200, Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 06 Oct 2011 15:54:42 +0200
> Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> This commit introduces alloc_contig_freed_pages() function
>
> The "freed" seems redundant to me.  Wouldn't "alloc_contig_pages" be a
> better name?

The ?freed? is there because the function operates on pages that are in
buddy system, ie. it is given a range of PFNs that are to be removed
 from buddy system.

There's also a alloc_contig_range() function (added by next patch)
which frees pages in given range and then calls
alloc_contig_free_pages() to allocate them.

IMO, if there was an alloc_contig_pages() function, it would have to
be one level up (ie. it would figure out where to allocate memory and
then call alloc_contig_range()).  (That's really what CMA is doing).

Still, as I think of it now, maybe alloc_contig_free_range() would be
better?

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-16  8:01       ` Michal Nazarewicz
  (?)
@ 2011-10-16  8:31         ` Andrew Morton
  -1 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-16  8:31 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Sun, 16 Oct 2011 10:01:36 +0200 "Michal Nazarewicz" <mina86@mina86.com> wrote:

> Still, as I think of it now, maybe alloc_contig_free_range() would be
> better?

Nope.  Of *course* the pages were free.  Otherwise we couldn't
(re)allocate them.  I still think the "free" part is redundant.

What could be improved is the "alloc" part.  This really isn't an
allocation operation.  The pages are being removed from buddy then
moved into the free arena of a different memory manager from where they
will _later_ be "allocated".

So we should move away from the alloc/free naming altogether for this
operation and think up new terms.  How about "claim" and "release"? 
claim_contig_pages, claim_contig_range, release_contig_pages, etc?
Or we could use take/return.

Also, if we have no expectation that anything apart from CMA will use
these interfaces (?), the names could/should be prefixed with "cma_".


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-16  8:31         ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-16  8:31 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Sun, 16 Oct 2011 10:01:36 +0200 "Michal Nazarewicz" <mina86@mina86.com> wrote:

> Still, as I think of it now, maybe alloc_contig_free_range() would be
> better?

Nope.  Of *course* the pages were free.  Otherwise we couldn't
(re)allocate them.  I still think the "free" part is redundant.

What could be improved is the "alloc" part.  This really isn't an
allocation operation.  The pages are being removed from buddy then
moved into the free arena of a different memory manager from where they
will _later_ be "allocated".

So we should move away from the alloc/free naming altogether for this
operation and think up new terms.  How about "claim" and "release"? 
claim_contig_pages, claim_contig_range, release_contig_pages, etc?
Or we could use take/return.

Also, if we have no expectation that anything apart from CMA will use
these interfaces (?), the names could/should be prefixed with "cma_".

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-16  8:31         ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-16  8:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 16 Oct 2011 10:01:36 +0200 "Michal Nazarewicz" <mina86@mina86.com> wrote:

> Still, as I think of it now, maybe alloc_contig_free_range() would be
> better?

Nope.  Of *course* the pages were free.  Otherwise we couldn't
(re)allocate them.  I still think the "free" part is redundant.

What could be improved is the "alloc" part.  This really isn't an
allocation operation.  The pages are being removed from buddy then
moved into the free arena of a different memory manager from where they
will _later_ be "allocated".

So we should move away from the alloc/free naming altogether for this
operation and think up new terms.  How about "claim" and "release"? 
claim_contig_pages, claim_contig_range, release_contig_pages, etc?
Or we could use take/return.

Also, if we have no expectation that anything apart from CMA will use
these interfaces (?), the names could/should be prefixed with "cma_".

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-16  8:31         ` Andrew Morton
  (?)
@ 2011-10-16  9:39           ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-16  9:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

> On Sun, 16 Oct 2011 10:01:36 +0200 "Michal Nazarewicz" wrote:
>> Still, as I think of it now, maybe alloc_contig_free_range() would be
>> better?

On Sun, 16 Oct 2011 10:31:16 +0200, Andrew Morton wrote:
> Nope.  Of *course* the pages were free.  Otherwise we couldn't
> (re)allocate them.  I still think the "free" part is redundant.

Makes sense.

> What could be improved is the "alloc" part.  This really isn't an
> allocation operation.  The pages are being removed from buddy then
> moved into the free arena of a different memory manager from where they
> will _later_ be "allocated".

Not quite.  After alloc_contig_range() returns, the pages are passed with
no further processing to the caller.  Ie. the area is not later split into
several parts nor kept in CMA's pool unused.

alloc_contig_freed_pages() is a little different since it must be called on
a buddy page boundary and may return more then requested (because of the way
buddy system merges buddies) so there is a little processing after it returns
(namely freeing of the excess pages).

> So we should move away from the alloc/free naming altogether for this
> operation and think up new terms.  How about "claim" and "release"?
> claim_contig_pages, claim_contig_range, release_contig_pages, etc?
> Or we could use take/return.

Personally, I'm not convinced about changing the names of alloc_contig_range()
and free_contig_pages() but I see merit in changing alloc_contig_freed_pages()
to something else.

Since at the moment, it's used only by alloc_contig_range(), I'd lean
towards removing it from page-isolation.h, marking as static and renaming
to __alloc_contig_range().

> Also, if we have no expectation that anything apart from CMA will use
> these interfaces (?), the names could/should be prefixed with "cma_".

In Kamezawa's original patchset, he used those for a bit different
approach (IIRC, Kamezawa's patchset introduced a function that scanned memory
and tried to allocate contiguous memory where it could), so I can imagine that
someone will make use of those functions.  It may be used in any situation
where a range of pages is either free (ie. in buddy system) or movable and
one wants to allocate them for some reason.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +--<mina86@mina86.com>---<mina86@jabber.org>---ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-16  9:39           ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-16  9:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

> On Sun, 16 Oct 2011 10:01:36 +0200 "Michal Nazarewicz" wrote:
>> Still, as I think of it now, maybe alloc_contig_free_range() would be
>> better?

On Sun, 16 Oct 2011 10:31:16 +0200, Andrew Morton wrote:
> Nope.  Of *course* the pages were free.  Otherwise we couldn't
> (re)allocate them.  I still think the "free" part is redundant.

Makes sense.

> What could be improved is the "alloc" part.  This really isn't an
> allocation operation.  The pages are being removed from buddy then
> moved into the free arena of a different memory manager from where they
> will _later_ be "allocated".

Not quite.  After alloc_contig_range() returns, the pages are passed with
no further processing to the caller.  Ie. the area is not later split into
several parts nor kept in CMA's pool unused.

alloc_contig_freed_pages() is a little different since it must be called on
a buddy page boundary and may return more then requested (because of the way
buddy system merges buddies) so there is a little processing after it returns
(namely freeing of the excess pages).

> So we should move away from the alloc/free naming altogether for this
> operation and think up new terms.  How about "claim" and "release"?
> claim_contig_pages, claim_contig_range, release_contig_pages, etc?
> Or we could use take/return.

Personally, I'm not convinced about changing the names of alloc_contig_range()
and free_contig_pages() but I see merit in changing alloc_contig_freed_pages()
to something else.

Since at the moment, it's used only by alloc_contig_range(), I'd lean
towards removing it from page-isolation.h, marking as static and renaming
to __alloc_contig_range().

> Also, if we have no expectation that anything apart from CMA will use
> these interfaces (?), the names could/should be prefixed with "cma_".

In Kamezawa's original patchset, he used those for a bit different
approach (IIRC, Kamezawa's patchset introduced a function that scanned memory
and tried to allocate contiguous memory where it could), so I can imagine that
someone will make use of those functions.  It may be used in any situation
where a range of pages is either free (ie. in buddy system) or movable and
one wants to allocate them for some reason.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +--<mina86@mina86.com>---<mina86@jabber.org>---ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-16  9:39           ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-16  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

> On Sun, 16 Oct 2011 10:01:36 +0200 "Michal Nazarewicz" wrote:
>> Still, as I think of it now, maybe alloc_contig_free_range() would be
>> better?

On Sun, 16 Oct 2011 10:31:16 +0200, Andrew Morton wrote:
> Nope.  Of *course* the pages were free.  Otherwise we couldn't
> (re)allocate them.  I still think the "free" part is redundant.

Makes sense.

> What could be improved is the "alloc" part.  This really isn't an
> allocation operation.  The pages are being removed from buddy then
> moved into the free arena of a different memory manager from where they
> will _later_ be "allocated".

Not quite.  After alloc_contig_range() returns, the pages are passed with
no further processing to the caller.  Ie. the area is not later split into
several parts nor kept in CMA's pool unused.

alloc_contig_freed_pages() is a little different since it must be called on
a buddy page boundary and may return more then requested (because of the way
buddy system merges buddies) so there is a little processing after it returns
(namely freeing of the excess pages).

> So we should move away from the alloc/free naming altogether for this
> operation and think up new terms.  How about "claim" and "release"?
> claim_contig_pages, claim_contig_range, release_contig_pages, etc?
> Or we could use take/return.

Personally, I'm not convinced about changing the names of alloc_contig_range()
and free_contig_pages() but I see merit in changing alloc_contig_freed_pages()
to something else.

Since at the moment, it's used only by alloc_contig_range(), I'd lean
towards removing it from page-isolation.h, marking as static and renaming
to __alloc_contig_range().

> Also, if we have no expectation that anything apart from CMA will use
> these interfaces (?), the names could/should be prefixed with "cma_".

In Kamezawa's original patchset, he used those for a bit different
approach (IIRC, Kamezawa's patchset introduced a function that scanned memory
and tried to allocate contiguous memory where it could), so I can imagine that
someone will make use of those functions.  It may be used in any situation
where a range of pages is either free (ie. in buddy system) or movable and
one wants to allocate them for some reason.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +--<mina86@mina86.com>---<mina86@jabber.org>---ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator
  2011-10-14 23:57     ` Andrew Morton
  (?)
@ 2011-10-16 10:08       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 180+ messages in thread
From: Russell King - ARM Linux @ 2011-10-16 10:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Fri, Oct 14, 2011 at 04:57:30PM -0700, Andrew Morton wrote:
> On Thu, 06 Oct 2011 15:54:46 +0200
> Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > +#ifdef phys_to_pfn
> > +/* nothing to do */
> > +#elif defined __phys_to_pfn
> > +#  define phys_to_pfn __phys_to_pfn
> > +#elif defined __va
> > +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> > +#else
> > +#  error phys_to_pfn implementation needed
> > +#endif
> 
> Yikes!
> 
> This hackery should not be here, please.  If we need a phys_to_pfn()
> then let's write a proper one which lives in core MM and arch, then get
> it suitably reviewed and integrated and then maintained.

Another question is whether we have any arch where PFN != PHYS >> PAGE_SHIFT?
We've used __phys_to_pfn() to implement that on ARM (with a corresponding
__pfn_to_phys()).  Catalin recently added a cast to __phys_to_pfn() for
LPAE, which I don't think is required:

-#define        __phys_to_pfn(paddr)    ((paddr) >> PAGE_SHIFT)
+#define        __phys_to_pfn(paddr)    ((unsigned long)((paddr) >> PAGE_SHIFT))

since a phys_addr_t >> PAGE_SHIFT will be silently truncated if the passed
in physical address was 64-bit anyway.  (Note: we don't support > 32-bit
PFNs).

So, I'd suggest CMA should just use PFN_DOWN() and be done with it.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-10-16 10:08       ` Russell King - ARM Linux
  0 siblings, 0 replies; 180+ messages in thread
From: Russell King - ARM Linux @ 2011-10-16 10:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Fri, Oct 14, 2011 at 04:57:30PM -0700, Andrew Morton wrote:
> On Thu, 06 Oct 2011 15:54:46 +0200
> Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > +#ifdef phys_to_pfn
> > +/* nothing to do */
> > +#elif defined __phys_to_pfn
> > +#  define phys_to_pfn __phys_to_pfn
> > +#elif defined __va
> > +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> > +#else
> > +#  error phys_to_pfn implementation needed
> > +#endif
> 
> Yikes!
> 
> This hackery should not be here, please.  If we need a phys_to_pfn()
> then let's write a proper one which lives in core MM and arch, then get
> it suitably reviewed and integrated and then maintained.

Another question is whether we have any arch where PFN != PHYS >> PAGE_SHIFT?
We've used __phys_to_pfn() to implement that on ARM (with a corresponding
__pfn_to_phys()).  Catalin recently added a cast to __phys_to_pfn() for
LPAE, which I don't think is required:

-#define        __phys_to_pfn(paddr)    ((paddr) >> PAGE_SHIFT)
+#define        __phys_to_pfn(paddr)    ((unsigned long)((paddr) >> PAGE_SHIFT))

since a phys_addr_t >> PAGE_SHIFT will be silently truncated if the passed
in physical address was 64-bit anyway.  (Note: we don't support > 32-bit
PFNs).

So, I'd suggest CMA should just use PFN_DOWN() and be done with it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-10-16 10:08       ` Russell King - ARM Linux
  0 siblings, 0 replies; 180+ messages in thread
From: Russell King - ARM Linux @ 2011-10-16 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 14, 2011 at 04:57:30PM -0700, Andrew Morton wrote:
> On Thu, 06 Oct 2011 15:54:46 +0200
> Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > +#ifdef phys_to_pfn
> > +/* nothing to do */
> > +#elif defined __phys_to_pfn
> > +#  define phys_to_pfn __phys_to_pfn
> > +#elif defined __va
> > +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> > +#else
> > +#  error phys_to_pfn implementation needed
> > +#endif
> 
> Yikes!
> 
> This hackery should not be here, please.  If we need a phys_to_pfn()
> then let's write a proper one which lives in core MM and arch, then get
> it suitably reviewed and integrated and then maintained.

Another question is whether we have any arch where PFN != PHYS >> PAGE_SHIFT?
We've used __phys_to_pfn() to implement that on ARM (with a corresponding
__pfn_to_phys()).  Catalin recently added a cast to __phys_to_pfn() for
LPAE, which I don't think is required:

-#define        __phys_to_pfn(paddr)    ((paddr) >> PAGE_SHIFT)
+#define        __phys_to_pfn(paddr)    ((unsigned long)((paddr) >> PAGE_SHIFT))

since a phys_addr_t >> PAGE_SHIFT will be silently truncated if the passed
in physical address was 64-bit anyway.  (Note: we don't support > 32-bit
PFNs).

So, I'd suggest CMA should just use PFN_DOWN() and be done with it.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-14 23:29     ` Andrew Morton
  (?)
@ 2011-10-17 12:21       ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-17 12:21 UTC (permalink / raw)
  To: 'Andrew Morton'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Arnd Bergmann', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong', 'Dave Hansen'

Hello Andrew,

Thanks for your comments. I will try to address them in the next round of
CMA patches.

On Saturday, October 15, 2011 1:30 AM Andrew Morton wrote:

(snipped)

> > +
> > +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> > +{
> > +	struct page *page = pfn_to_page(pfn);
> > +
> > +	while (nr_pages--) {
> > +		__free_page(page);
> > +		++pfn;
> > +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> > +			++page;
> > +		else
> > +			page = pfn_to_page(pfn);
> > +	}
> > +}
> 
> You're sure these functions don't need EXPORT_SYMBOL()?  Maybe the
> design is that only DMA core calls into here (if so, that's good).

Drivers should not call it, it is intended to be used by low-level DMA
code. Do you think that a comment about missing EXPORT_SYMBOL is 
required?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-17 12:21       ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-17 12:21 UTC (permalink / raw)
  To: 'Andrew Morton'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Arnd Bergmann', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong', 'Dave Hansen'

Hello Andrew,

Thanks for your comments. I will try to address them in the next round of
CMA patches.

On Saturday, October 15, 2011 1:30 AM Andrew Morton wrote:

(snipped)

> > +
> > +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> > +{
> > +	struct page *page = pfn_to_page(pfn);
> > +
> > +	while (nr_pages--) {
> > +		__free_page(page);
> > +		++pfn;
> > +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> > +			++page;
> > +		else
> > +			page = pfn_to_page(pfn);
> > +	}
> > +}
> 
> You're sure these functions don't need EXPORT_SYMBOL()?  Maybe the
> design is that only DMA core calls into here (if so, that's good).

Drivers should not call it, it is intended to be used by low-level DMA
code. Do you think that a comment about missing EXPORT_SYMBOL is 
required?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-17 12:21       ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-10-17 12:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Andrew,

Thanks for your comments. I will try to address them in the next round of
CMA patches.

On Saturday, October 15, 2011 1:30 AM Andrew Morton wrote:

(snipped)

> > +
> > +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> > +{
> > +	struct page *page = pfn_to_page(pfn);
> > +
> > +	while (nr_pages--) {
> > +		__free_page(page);
> > +		++pfn;
> > +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> > +			++page;
> > +		else
> > +			page = pfn_to_page(pfn);
> > +	}
> > +}
> 
> You're sure these functions don't need EXPORT_SYMBOL()?  Maybe the
> design is that only DMA core calls into here (if so, that's good).

Drivers should not call it, it is intended to be used by low-level DMA
code. Do you think that a comment about missing EXPORT_SYMBOL is 
required?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-17 12:21       ` Marek Szyprowski
  (?)
@ 2011-10-17 18:39         ` Andrew Morton
  -1 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-17 18:39 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Arnd Bergmann', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong', 'Dave Hansen'

On Mon, 17 Oct 2011 14:21:07 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> > > +
> > > +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> > > +{
> > > +	struct page *page = pfn_to_page(pfn);
> > > +
> > > +	while (nr_pages--) {
> > > +		__free_page(page);
> > > +		++pfn;
> > > +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> > > +			++page;
> > > +		else
> > > +			page = pfn_to_page(pfn);
> > > +	}
> > > +}
> > 
> > You're sure these functions don't need EXPORT_SYMBOL()?  Maybe the
> > design is that only DMA core calls into here (if so, that's good).
> 
> Drivers should not call it, it is intended to be used by low-level DMA
> code.

OK, thanks for checking.

> Do you think that a comment about missing EXPORT_SYMBOL is 
> required?

No.  If someone later wants to use these from a module then we can look
at their reasons and make a decision at that time.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-17 18:39         ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-17 18:39 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Arnd Bergmann', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong', 'Dave Hansen'

On Mon, 17 Oct 2011 14:21:07 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> > > +
> > > +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> > > +{
> > > +	struct page *page = pfn_to_page(pfn);
> > > +
> > > +	while (nr_pages--) {
> > > +		__free_page(page);
> > > +		++pfn;
> > > +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> > > +			++page;
> > > +		else
> > > +			page = pfn_to_page(pfn);
> > > +	}
> > > +}
> > 
> > You're sure these functions don't need EXPORT_SYMBOL()?  Maybe the
> > design is that only DMA core calls into here (if so, that's good).
> 
> Drivers should not call it, it is intended to be used by low-level DMA
> code.

OK, thanks for checking.

> Do you think that a comment about missing EXPORT_SYMBOL is 
> required?

No.  If someone later wants to use these from a module then we can look
at their reasons and make a decision at that time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-17 18:39         ` Andrew Morton
  0 siblings, 0 replies; 180+ messages in thread
From: Andrew Morton @ 2011-10-17 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 17 Oct 2011 14:21:07 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> > > +
> > > +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> > > +{
> > > +	struct page *page = pfn_to_page(pfn);
> > > +
> > > +	while (nr_pages--) {
> > > +		__free_page(page);
> > > +		++pfn;
> > > +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> > > +			++page;
> > > +		else
> > > +			page = pfn_to_page(pfn);
> > > +	}
> > > +}
> > 
> > You're sure these functions don't need EXPORT_SYMBOL()?  Maybe the
> > design is that only DMA core calls into here (if so, that's good).
> 
> Drivers should not call it, it is intended to be used by low-level DMA
> code.

OK, thanks for checking.

> Do you think that a comment about missing EXPORT_SYMBOL is 
> required?

No.  If someone later wants to use these from a module then we can look
at their reasons and make a decision at that time.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-18 12:05     ` Mel Gorman
  -1 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 12:05 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, Oct 06, 2011 at 03:54:41PM +0200, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified
> range of pfn. So, some of core logics can be used for other purpose
> as allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> [m.nazarewicz: reworded commit message]
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: rebased and updated to Linux v3.0-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/page-isolation.h |    7 +++
>  mm/memory_hotplug.c            |  111 --------------------------------------
>  mm/page_isolation.c            |  114 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 121 insertions(+), 111 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 051c1b1..58cdbac 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
>  extern int set_migratetype_isolate(struct page *page);
>  extern void unset_migratetype_isolate(struct page *page);
>  
> +/*
> + * For migration.
> + */
> +
> +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);

bool

> +unsigned long scan_lru_pages(unsigned long start, unsigned long end);

Both function names are misleading.

> +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>  

scan_lru_pages as it is used by memory hotplug is also extremely
expensive. Unplugging memory is rare so the performance is not a concern
but it make be for CMA.

I think it would have made more sense to either express this as an iterator
like for_each_lru_page_in_range() and use cursors or reuse the compaction
code. As it is, these functions are a bit rough. I'm biased but this code
seems to have very similar responsibilities to the compaction.c code for
isolate_migratepages and how it handles migration. It also knows how to avoid
isolating so much memory as to put the system in the risk of being livelocked,
isolate pages from the LRU in batch etc.

This is not a show-stopper as such but personally I would prefer that
the memory hotplug code be sharing code with compaction than CMA adding
a new dependency on memory hotplug.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-10-18 12:05     ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 12:05 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, Oct 06, 2011 at 03:54:41PM +0200, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified
> range of pfn. So, some of core logics can be used for other purpose
> as allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> [m.nazarewicz: reworded commit message]
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: rebased and updated to Linux v3.0-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/page-isolation.h |    7 +++
>  mm/memory_hotplug.c            |  111 --------------------------------------
>  mm/page_isolation.c            |  114 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 121 insertions(+), 111 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 051c1b1..58cdbac 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
>  extern int set_migratetype_isolate(struct page *page);
>  extern void unset_migratetype_isolate(struct page *page);
>  
> +/*
> + * For migration.
> + */
> +
> +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);

bool

> +unsigned long scan_lru_pages(unsigned long start, unsigned long end);

Both function names are misleading.

> +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>  

scan_lru_pages as it is used by memory hotplug is also extremely
expensive. Unplugging memory is rare so the performance is not a concern
but it make be for CMA.

I think it would have made more sense to either express this as an iterator
like for_each_lru_page_in_range() and use cursors or reuse the compaction
code. As it is, these functions are a bit rough. I'm biased but this code
seems to have very similar responsibilities to the compaction.c code for
isolate_migratepages and how it handles migration. It also knows how to avoid
isolating so much memory as to put the system in the risk of being livelocked,
isolate pages from the LRU in batch etc.

This is not a show-stopper as such but personally I would prefer that
the memory hotplug code be sharing code with compaction than CMA adding
a new dependency on memory hotplug.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-10-18 12:05     ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 12:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 06, 2011 at 03:54:41PM +0200, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified
> range of pfn. So, some of core logics can be used for other purpose
> as allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> [m.nazarewicz: reworded commit message]
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: rebased and updated to Linux v3.0-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/page-isolation.h |    7 +++
>  mm/memory_hotplug.c            |  111 --------------------------------------
>  mm/page_isolation.c            |  114 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 121 insertions(+), 111 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 051c1b1..58cdbac 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
>  extern int set_migratetype_isolate(struct page *page);
>  extern void unset_migratetype_isolate(struct page *page);
>  
> +/*
> + * For migration.
> + */
> +
> +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);

bool

> +unsigned long scan_lru_pages(unsigned long start, unsigned long end);

Both function names are misleading.

> +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>  

scan_lru_pages as it is used by memory hotplug is also extremely
expensive. Unplugging memory is rare so the performance is not a concern
but it make be for CMA.

I think it would have made more sense to either express this as an iterator
like for_each_lru_page_in_range() and use cursors or reuse the compaction
code. As it is, these functions are a bit rough. I'm biased but this code
seems to have very similar responsibilities to the compaction.c code for
isolate_migratepages and how it handles migration. It also knows how to avoid
isolating so much memory as to put the system in the risk of being livelocked,
isolate pages from the LRU in batch etc.

This is not a show-stopper as such but personally I would prefer that
the memory hotplug code be sharing code with compaction than CMA adding
a new dependency on memory hotplug.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-18 12:21     ` Mel Gorman
  -1 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 12:21 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

At this point, I'm going to apologise for not reviewing this a long long
time ago.

On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This commit introduces alloc_contig_freed_pages() function
> which allocates (ie. removes from buddy system) free pages
> in range. Caller has to guarantee that all pages in range
> are in buddy system.
> 

Straight away, I'm wondering why you didn't use

mm/compaction.c#isolate_freepages()

It knows how to isolate pages within ranges. All its control information
is passed via struct compact_control() which I recognise may be awkward
for CMA but compaction.c know how to manage all the isolated pages and
pass them to migrate.c appropriately.

I haven't read all the patches yet but isolate_freepages() does break
everything up into order-0 pages. This may not be to your liking but it
would not be possible to change.

> Along with this function, a free_contig_pages() function is
> provided which frees all (or a subset of) pages allocated
> with alloc_contig_free_pages().
> 

mm/compaction.c#release_freepages()

> Michal Nazarewicz has modified the function to make it easier
> to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
> return pfn of one-past-the-last allocated page.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> [m.nazarewicz: added checks if all allocated pages comes from the
> same memory zone]
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: fixed wrong condition in VM_BUG_ON assert]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/mmzone.h         |   16 +++++++++
>  include/linux/page-isolation.h |    5 +++
>  mm/page_alloc.c                |   67 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 88 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index a2760bb..862a834 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1168,6 +1168,22 @@ static inline int memmap_valid_within(unsigned long pfn,
>  }
>  #endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
>  
> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +/*
> + * Both PFNs must be from the same zone!  If this function returns
> + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
> + */
> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
> +{
> +	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
> +}
> +

Why do you care what section the page is in? The zone is important all
right, but not the section. Also, offhand I'm unsure if being in the
same section guarantees the same zone. sections are ordinarily fully
populated (except on ARM but hey) but I can't remember anything
enforcing that zones be section-aligned.

Later I think I see that the intention was to reduce the use of
pfn_to_page(). You can do this in a more general fashion by checking the
zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
That will not be SPARSEMEM specific.

> +#else
> +
> +#define zone_pfn_same_memmap(pfn1, pfn2) (true)
> +
> +#endif
> +
>  #endif /* !__GENERATING_BOUNDS.H */
>  #endif /* !__ASSEMBLY__ */
>  #endif /* _LINUX_MMZONE_H */
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 58cdbac..b9fc428 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -33,6 +33,11 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
>  extern int set_migratetype_isolate(struct page *page);
>  extern void unset_migratetype_isolate(struct page *page);
>  
> +/* The below functions must be run on a range from a single zone. */
> +extern unsigned long alloc_contig_freed_pages(unsigned long start,
> +					      unsigned long end, gfp_t flag);
> +extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
> +
>  /*
>   * For migration.
>   */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf4399a..fbfb920 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5706,6 +5706,73 @@ out:
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
>  
> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
> +				       gfp_t flag)
> +{
> +	unsigned long pfn = start, count;
> +	struct page *page;
> +	struct zone *zone;
> +	int order;
> +
> +	VM_BUG_ON(!pfn_valid(start));

VM_BUG_ON seems very harsh here. WARN_ON_ONCE and returning 0 to the
caller sees reasonable.

> +	page = pfn_to_page(start);
> +	zone = page_zone(page);
> +
> +	spin_lock_irq(&zone->lock);
> +
> +	for (;;) {
> +		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
> +			  page_zone(page) != zone);
> +

Here you will VM_BUG_ON with the zone lock held leading to system
halting very shortly.

> +		list_del(&page->lru);
> +		order = page_order(page);
> +		count = 1UL << order;
> +		zone->free_area[order].nr_free--;
> +		rmv_page_order(page);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
> +

The callers need to check in advance if watermarks are sufficient for
this. In compaction, it happens in compaction_suitable() because it only
needed to be checked once. Your requirements might be different.

> +		pfn += count;
> +		if (pfn >= end)
> +			break;
> +		VM_BUG_ON(!pfn_valid(pfn));
> +

On ARM, it's possible to encounter invalid pages. VM_BUG_ON is serious
overkill.

> +		if (zone_pfn_same_memmap(pfn - count, pfn))
> +			page += count;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +
> +	spin_unlock_irq(&zone->lock);
> +
> +	/* After this, pages in the range can be freed one be one */
> +	count = pfn - start;
> +	pfn = start;
> +	for (page = pfn_to_page(pfn); count; --count) {
> +		prep_new_page(page, 0, flag);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +

Here it looks like you have implemented something like split_free_page().

> +	return pfn;
> +}
> +
> +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +
> +	while (nr_pages--) {
> +		__free_page(page);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-18 12:21     ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 12:21 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

At this point, I'm going to apologise for not reviewing this a long long
time ago.

On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This commit introduces alloc_contig_freed_pages() function
> which allocates (ie. removes from buddy system) free pages
> in range. Caller has to guarantee that all pages in range
> are in buddy system.
> 

Straight away, I'm wondering why you didn't use

mm/compaction.c#isolate_freepages()

It knows how to isolate pages within ranges. All its control information
is passed via struct compact_control() which I recognise may be awkward
for CMA but compaction.c know how to manage all the isolated pages and
pass them to migrate.c appropriately.

I haven't read all the patches yet but isolate_freepages() does break
everything up into order-0 pages. This may not be to your liking but it
would not be possible to change.

> Along with this function, a free_contig_pages() function is
> provided which frees all (or a subset of) pages allocated
> with alloc_contig_free_pages().
> 

mm/compaction.c#release_freepages()

> Michal Nazarewicz has modified the function to make it easier
> to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
> return pfn of one-past-the-last allocated page.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> [m.nazarewicz: added checks if all allocated pages comes from the
> same memory zone]
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: fixed wrong condition in VM_BUG_ON assert]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/mmzone.h         |   16 +++++++++
>  include/linux/page-isolation.h |    5 +++
>  mm/page_alloc.c                |   67 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 88 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index a2760bb..862a834 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1168,6 +1168,22 @@ static inline int memmap_valid_within(unsigned long pfn,
>  }
>  #endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
>  
> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +/*
> + * Both PFNs must be from the same zone!  If this function returns
> + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
> + */
> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
> +{
> +	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
> +}
> +

Why do you care what section the page is in? The zone is important all
right, but not the section. Also, offhand I'm unsure if being in the
same section guarantees the same zone. sections are ordinarily fully
populated (except on ARM but hey) but I can't remember anything
enforcing that zones be section-aligned.

Later I think I see that the intention was to reduce the use of
pfn_to_page(). You can do this in a more general fashion by checking the
zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
That will not be SPARSEMEM specific.

> +#else
> +
> +#define zone_pfn_same_memmap(pfn1, pfn2) (true)
> +
> +#endif
> +
>  #endif /* !__GENERATING_BOUNDS.H */
>  #endif /* !__ASSEMBLY__ */
>  #endif /* _LINUX_MMZONE_H */
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 58cdbac..b9fc428 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -33,6 +33,11 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
>  extern int set_migratetype_isolate(struct page *page);
>  extern void unset_migratetype_isolate(struct page *page);
>  
> +/* The below functions must be run on a range from a single zone. */
> +extern unsigned long alloc_contig_freed_pages(unsigned long start,
> +					      unsigned long end, gfp_t flag);
> +extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
> +
>  /*
>   * For migration.
>   */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf4399a..fbfb920 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5706,6 +5706,73 @@ out:
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
>  
> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
> +				       gfp_t flag)
> +{
> +	unsigned long pfn = start, count;
> +	struct page *page;
> +	struct zone *zone;
> +	int order;
> +
> +	VM_BUG_ON(!pfn_valid(start));

VM_BUG_ON seems very harsh here. WARN_ON_ONCE and returning 0 to the
caller sees reasonable.

> +	page = pfn_to_page(start);
> +	zone = page_zone(page);
> +
> +	spin_lock_irq(&zone->lock);
> +
> +	for (;;) {
> +		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
> +			  page_zone(page) != zone);
> +

Here you will VM_BUG_ON with the zone lock held leading to system
halting very shortly.

> +		list_del(&page->lru);
> +		order = page_order(page);
> +		count = 1UL << order;
> +		zone->free_area[order].nr_free--;
> +		rmv_page_order(page);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
> +

The callers need to check in advance if watermarks are sufficient for
this. In compaction, it happens in compaction_suitable() because it only
needed to be checked once. Your requirements might be different.

> +		pfn += count;
> +		if (pfn >= end)
> +			break;
> +		VM_BUG_ON(!pfn_valid(pfn));
> +

On ARM, it's possible to encounter invalid pages. VM_BUG_ON is serious
overkill.

> +		if (zone_pfn_same_memmap(pfn - count, pfn))
> +			page += count;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +
> +	spin_unlock_irq(&zone->lock);
> +
> +	/* After this, pages in the range can be freed one be one */
> +	count = pfn - start;
> +	pfn = start;
> +	for (page = pfn_to_page(pfn); count; --count) {
> +		prep_new_page(page, 0, flag);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +

Here it looks like you have implemented something like split_free_page().

> +	return pfn;
> +}
> +
> +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +
> +	while (nr_pages--) {
> +		__free_page(page);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-18 12:21     ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 12:21 UTC (permalink / raw)
  To: linux-arm-kernel

At this point, I'm going to apologise for not reviewing this a long long
time ago.

On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This commit introduces alloc_contig_freed_pages() function
> which allocates (ie. removes from buddy system) free pages
> in range. Caller has to guarantee that all pages in range
> are in buddy system.
> 

Straight away, I'm wondering why you didn't use

mm/compaction.c#isolate_freepages()

It knows how to isolate pages within ranges. All its control information
is passed via struct compact_control() which I recognise may be awkward
for CMA but compaction.c know how to manage all the isolated pages and
pass them to migrate.c appropriately.

I haven't read all the patches yet but isolate_freepages() does break
everything up into order-0 pages. This may not be to your liking but it
would not be possible to change.

> Along with this function, a free_contig_pages() function is
> provided which frees all (or a subset of) pages allocated
> with alloc_contig_free_pages().
> 

mm/compaction.c#release_freepages()

> Michal Nazarewicz has modified the function to make it easier
> to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
> return pfn of one-past-the-last allocated page.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> [m.nazarewicz: added checks if all allocated pages comes from the
> same memory zone]
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: fixed wrong condition in VM_BUG_ON assert]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/mmzone.h         |   16 +++++++++
>  include/linux/page-isolation.h |    5 +++
>  mm/page_alloc.c                |   67 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 88 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index a2760bb..862a834 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1168,6 +1168,22 @@ static inline int memmap_valid_within(unsigned long pfn,
>  }
>  #endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
>  
> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +/*
> + * Both PFNs must be from the same zone!  If this function returns
> + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
> + */
> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
> +{
> +	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
> +}
> +

Why do you care what section the page is in? The zone is important all
right, but not the section. Also, offhand I'm unsure if being in the
same section guarantees the same zone. sections are ordinarily fully
populated (except on ARM but hey) but I can't remember anything
enforcing that zones be section-aligned.

Later I think I see that the intention was to reduce the use of
pfn_to_page(). You can do this in a more general fashion by checking the
zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
That will not be SPARSEMEM specific.

> +#else
> +
> +#define zone_pfn_same_memmap(pfn1, pfn2) (true)
> +
> +#endif
> +
>  #endif /* !__GENERATING_BOUNDS.H */
>  #endif /* !__ASSEMBLY__ */
>  #endif /* _LINUX_MMZONE_H */
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 58cdbac..b9fc428 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -33,6 +33,11 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
>  extern int set_migratetype_isolate(struct page *page);
>  extern void unset_migratetype_isolate(struct page *page);
>  
> +/* The below functions must be run on a range from a single zone. */
> +extern unsigned long alloc_contig_freed_pages(unsigned long start,
> +					      unsigned long end, gfp_t flag);
> +extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
> +
>  /*
>   * For migration.
>   */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf4399a..fbfb920 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5706,6 +5706,73 @@ out:
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
>  
> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
> +				       gfp_t flag)
> +{
> +	unsigned long pfn = start, count;
> +	struct page *page;
> +	struct zone *zone;
> +	int order;
> +
> +	VM_BUG_ON(!pfn_valid(start));

VM_BUG_ON seems very harsh here. WARN_ON_ONCE and returning 0 to the
caller sees reasonable.

> +	page = pfn_to_page(start);
> +	zone = page_zone(page);
> +
> +	spin_lock_irq(&zone->lock);
> +
> +	for (;;) {
> +		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
> +			  page_zone(page) != zone);
> +

Here you will VM_BUG_ON with the zone lock held leading to system
halting very shortly.

> +		list_del(&page->lru);
> +		order = page_order(page);
> +		count = 1UL << order;
> +		zone->free_area[order].nr_free--;
> +		rmv_page_order(page);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
> +

The callers need to check in advance if watermarks are sufficient for
this. In compaction, it happens in compaction_suitable() because it only
needed to be checked once. Your requirements might be different.

> +		pfn += count;
> +		if (pfn >= end)
> +			break;
> +		VM_BUG_ON(!pfn_valid(pfn));
> +

On ARM, it's possible to encounter invalid pages. VM_BUG_ON is serious
overkill.

> +		if (zone_pfn_same_memmap(pfn - count, pfn))
> +			page += count;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +
> +	spin_unlock_irq(&zone->lock);
> +
> +	/* After this, pages in the range can be freed one be one */
> +	count = pfn - start;
> +	pfn = start;
> +	for (page = pfn_to_page(pfn); count; --count) {
> +		prep_new_page(page, 0, flag);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +

Here it looks like you have implemented something like split_free_page().

> +	return pfn;
> +}
> +
> +void free_contig_pages(unsigned long pfn, unsigned nr_pages)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +
> +	while (nr_pages--) {
> +		__free_page(page);
> +		++pfn;
> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> +			++page;
> +		else
> +			page = pfn_to_page(pfn);
> +	}
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 3/9] mm: alloc_contig_range() added
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-18 12:38     ` Mel Gorman
  -1 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 12:38 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, Oct 06, 2011 at 03:54:43PM +0200, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit adds the alloc_contig_range() function which tries
> to allocate given range of pages.  It tries to migrate all
> already allocated pages that fall in the range thus freeing them.
> Once all pages in the range are freed they are removed from the
> buddy system thus allocated for the caller to use.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: renamed some variables for easier code reading]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/page-isolation.h |    2 +
>  mm/page_alloc.c                |  148 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 150 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index b9fc428..774ecec 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -36,6 +36,8 @@ extern void unset_migratetype_isolate(struct page *page);
>  /* The below functions must be run on a range from a single zone. */
>  extern unsigned long alloc_contig_freed_pages(unsigned long start,
>  					      unsigned long end, gfp_t flag);
> +extern int alloc_contig_range(unsigned long start, unsigned long end,
> +			      gfp_t flags);
>  extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
>  
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fbfb920..8010854 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5773,6 +5773,154 @@ void free_contig_pages(unsigned long pfn, unsigned nr_pages)
>  	}
>  }
>  
> +static unsigned long pfn_to_maxpage(unsigned long pfn)
> +{
> +	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
> +}
> +

pfn_to_maxpage is a very confusing name here. It would be preferable to
create a MAX_ORDER_MASK that you apply directly.

Maybe something like SECTION_ALIGN_UP and SECTION_ALIGN_DOWN.

> +static unsigned long pfn_to_maxpage_up(unsigned long pfn)
> +{
> +	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
> +}
> +
> +#define MIGRATION_RETRY	5
> +static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> +{
> +	int migration_failed = 0, ret;
> +	unsigned long pfn = start;
> +
> +	/*
> +	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
> +	 * __alloc_contig_pages().
> +	 */
> +

There is no need to put a comment like this here. Credit him in the
changelog.

> +	/* drop all pages in pagevec and pcp list */
> +	lru_add_drain_all();
> +	drain_all_pages();
> +

Very similar to migrate_prep(). drain_all_pages should not be required
at this point.

> +	for (;;) {
> +		pfn = scan_lru_pages(pfn, end);

scan_lru_pages() is inefficient, this is going to be costly.

> +		if (!pfn || pfn >= end)
> +			break;
> +
> +		ret = do_migrate_range(pfn, end);
> +		if (!ret) {
> +			migration_failed = 0;
> +		} else if (ret != -EBUSY
> +			|| ++migration_failed >= MIGRATION_RETRY) {
> +			return ret;
> +		} else {
> +			/* There are unstable pages.on pagevec. */
> +			lru_add_drain_all();
> +			/*
> +			 * there may be pages on pcplist before
> +			 * we mark the range as ISOLATED.
> +			 */
> +			drain_all_pages();
> +		}
> +		cond_resched();
> +	}
> +
> +	if (!migration_failed) {
> +		/* drop all pages in pagevec and pcp list */
> +		lru_add_drain_all();
> +		drain_all_pages();
> +	}
> +
> +	/* Make sure all pages are isolated */
> +	if (WARN_ON(test_pages_isolated(start, end)))
> +		return -EBUSY;
> +

In some respects, this is very similar to mm/compaction#compact_zone().
They could have shared significant code if you reworked compact_zone
to work on ranges of memory and express compact_zone to operate on
zone->zone_start_pfn zone->zone_start_pfn+zone->spanned_pages . The
compaction code is 

> +	return 0;
> +}
> +
> +/**
> + * alloc_contig_range() -- tries to allocate given range of pages
> + * @start:	start PFN to allocate
> + * @end:	one-past-the-last PFN to allocate
> + * @flags:	flags passed to alloc_contig_freed_pages().
> + *
> + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> + * aligned, hovewer it's callers responsibility to guarantee that we
> + * are the only thread that changes migrate type of pageblocks the
> + * pages fall in.
> + *
> + * Returns zero on success or negative error code.  On success all
> + * pages which PFN is in (start, end) are allocated for the caller and
> + * need to be freed with free_contig_pages().
> + */
> +int alloc_contig_range(unsigned long start, unsigned long end,
> +		       gfp_t flags)
> +{
> +	unsigned long outer_start, outer_end;
> +	int ret;
> +
> +	/*
> +	 * What we do here is we mark all pageblocks in range as
> +	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
> +	 * align the range to MAX_ORDER pages so that page allocator
> +	 * won't try to merge buddies from different pageblocks and
> +	 * change MIGRATE_ISOLATE to some other migration type.
> +	 *

This part is new. compaction does not need to do this nor does it need
to. I can see though why it would be important for CMA though and
calling start_isolate_page_range() is reasonable.

There are alignment problems because it'll be on a pageblock boundary
but that is not a significant problem.

> +	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
> +	 * migrate the pages from an unaligned range (ie. pages that
> +	 * we are interested in).  This will put all the pages in
> +	 * range back to page allocator as MIGRATE_ISOLATE.
> +	 *
> +	 * When this is done, we take the pages in range from page
> +	 * allocator removing them from the buddy system.  This way
> +	 * page allocator will never consider using them.
> +	 *
> +	 * This lets us mark the pageblocks back as
> +	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
> +	 * MAX_ORDER aligned range but not in the unaligned, original
> +	 * range are put back to page allocator so that buddy can use
> +	 * them.
> +	 */
> +
> +	ret = start_isolate_page_range(pfn_to_maxpage(start),
> +				       pfn_to_maxpage_up(end));
> +	if (ret)
> +		goto done;
> +
> +	ret = __alloc_contig_migrate_range(start, end);
> +	if (ret)
> +		goto done;
> +
> +	/*
> +	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
> +	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
> +	 * more, all pages in [start, end) are free in page allocator.
> +	 * What we are going to do is to allocate all pages from
> +	 * [start, end) (that is remove them from page allocater).
> +	 *
> +	 * The only problem is that pages at the beginning and at the
> +	 * end of interesting range may be not aligned with pages that
> +	 * page allocator holds, ie. they can be part of higher order
> +	 * pages.  Because of this, we reserve the bigger range and
> +	 * once this is done free the pages we are not interested in.
> +	 */
> +
> +	ret = 0;
> +	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
> +		if (WARN_ON(++ret >= MAX_ORDER))
> +			return -EINVAL;
> +
> +	outer_start = start & (~0UL << ret);
> +	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
> +
> +	/* Free head and tail (if any) */
> +	if (start != outer_start)
> +		free_contig_pages(outer_start, start - outer_start);
> +	if (end != outer_end)
> +		free_contig_pages(end, outer_end - end);
> +
> +	ret = 0;
> +done:
> +	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
> +	return ret;
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 3/9] mm: alloc_contig_range() added
@ 2011-10-18 12:38     ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 12:38 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, Oct 06, 2011 at 03:54:43PM +0200, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit adds the alloc_contig_range() function which tries
> to allocate given range of pages.  It tries to migrate all
> already allocated pages that fall in the range thus freeing them.
> Once all pages in the range are freed they are removed from the
> buddy system thus allocated for the caller to use.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: renamed some variables for easier code reading]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/page-isolation.h |    2 +
>  mm/page_alloc.c                |  148 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 150 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index b9fc428..774ecec 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -36,6 +36,8 @@ extern void unset_migratetype_isolate(struct page *page);
>  /* The below functions must be run on a range from a single zone. */
>  extern unsigned long alloc_contig_freed_pages(unsigned long start,
>  					      unsigned long end, gfp_t flag);
> +extern int alloc_contig_range(unsigned long start, unsigned long end,
> +			      gfp_t flags);
>  extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
>  
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fbfb920..8010854 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5773,6 +5773,154 @@ void free_contig_pages(unsigned long pfn, unsigned nr_pages)
>  	}
>  }
>  
> +static unsigned long pfn_to_maxpage(unsigned long pfn)
> +{
> +	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
> +}
> +

pfn_to_maxpage is a very confusing name here. It would be preferable to
create a MAX_ORDER_MASK that you apply directly.

Maybe something like SECTION_ALIGN_UP and SECTION_ALIGN_DOWN.

> +static unsigned long pfn_to_maxpage_up(unsigned long pfn)
> +{
> +	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
> +}
> +
> +#define MIGRATION_RETRY	5
> +static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> +{
> +	int migration_failed = 0, ret;
> +	unsigned long pfn = start;
> +
> +	/*
> +	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
> +	 * __alloc_contig_pages().
> +	 */
> +

There is no need to put a comment like this here. Credit him in the
changelog.

> +	/* drop all pages in pagevec and pcp list */
> +	lru_add_drain_all();
> +	drain_all_pages();
> +

Very similar to migrate_prep(). drain_all_pages should not be required
at this point.

> +	for (;;) {
> +		pfn = scan_lru_pages(pfn, end);

scan_lru_pages() is inefficient, this is going to be costly.

> +		if (!pfn || pfn >= end)
> +			break;
> +
> +		ret = do_migrate_range(pfn, end);
> +		if (!ret) {
> +			migration_failed = 0;
> +		} else if (ret != -EBUSY
> +			|| ++migration_failed >= MIGRATION_RETRY) {
> +			return ret;
> +		} else {
> +			/* There are unstable pages.on pagevec. */
> +			lru_add_drain_all();
> +			/*
> +			 * there may be pages on pcplist before
> +			 * we mark the range as ISOLATED.
> +			 */
> +			drain_all_pages();
> +		}
> +		cond_resched();
> +	}
> +
> +	if (!migration_failed) {
> +		/* drop all pages in pagevec and pcp list */
> +		lru_add_drain_all();
> +		drain_all_pages();
> +	}
> +
> +	/* Make sure all pages are isolated */
> +	if (WARN_ON(test_pages_isolated(start, end)))
> +		return -EBUSY;
> +

In some respects, this is very similar to mm/compaction#compact_zone().
They could have shared significant code if you reworked compact_zone
to work on ranges of memory and express compact_zone to operate on
zone->zone_start_pfn zone->zone_start_pfn+zone->spanned_pages . The
compaction code is 

> +	return 0;
> +}
> +
> +/**
> + * alloc_contig_range() -- tries to allocate given range of pages
> + * @start:	start PFN to allocate
> + * @end:	one-past-the-last PFN to allocate
> + * @flags:	flags passed to alloc_contig_freed_pages().
> + *
> + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> + * aligned, hovewer it's callers responsibility to guarantee that we
> + * are the only thread that changes migrate type of pageblocks the
> + * pages fall in.
> + *
> + * Returns zero on success or negative error code.  On success all
> + * pages which PFN is in (start, end) are allocated for the caller and
> + * need to be freed with free_contig_pages().
> + */
> +int alloc_contig_range(unsigned long start, unsigned long end,
> +		       gfp_t flags)
> +{
> +	unsigned long outer_start, outer_end;
> +	int ret;
> +
> +	/*
> +	 * What we do here is we mark all pageblocks in range as
> +	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
> +	 * align the range to MAX_ORDER pages so that page allocator
> +	 * won't try to merge buddies from different pageblocks and
> +	 * change MIGRATE_ISOLATE to some other migration type.
> +	 *

This part is new. compaction does not need to do this nor does it need
to. I can see though why it would be important for CMA though and
calling start_isolate_page_range() is reasonable.

There are alignment problems because it'll be on a pageblock boundary
but that is not a significant problem.

> +	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
> +	 * migrate the pages from an unaligned range (ie. pages that
> +	 * we are interested in).  This will put all the pages in
> +	 * range back to page allocator as MIGRATE_ISOLATE.
> +	 *
> +	 * When this is done, we take the pages in range from page
> +	 * allocator removing them from the buddy system.  This way
> +	 * page allocator will never consider using them.
> +	 *
> +	 * This lets us mark the pageblocks back as
> +	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
> +	 * MAX_ORDER aligned range but not in the unaligned, original
> +	 * range are put back to page allocator so that buddy can use
> +	 * them.
> +	 */
> +
> +	ret = start_isolate_page_range(pfn_to_maxpage(start),
> +				       pfn_to_maxpage_up(end));
> +	if (ret)
> +		goto done;
> +
> +	ret = __alloc_contig_migrate_range(start, end);
> +	if (ret)
> +		goto done;
> +
> +	/*
> +	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
> +	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
> +	 * more, all pages in [start, end) are free in page allocator.
> +	 * What we are going to do is to allocate all pages from
> +	 * [start, end) (that is remove them from page allocater).
> +	 *
> +	 * The only problem is that pages at the beginning and at the
> +	 * end of interesting range may be not aligned with pages that
> +	 * page allocator holds, ie. they can be part of higher order
> +	 * pages.  Because of this, we reserve the bigger range and
> +	 * once this is done free the pages we are not interested in.
> +	 */
> +
> +	ret = 0;
> +	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
> +		if (WARN_ON(++ret >= MAX_ORDER))
> +			return -EINVAL;
> +
> +	outer_start = start & (~0UL << ret);
> +	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
> +
> +	/* Free head and tail (if any) */
> +	if (start != outer_start)
> +		free_contig_pages(outer_start, start - outer_start);
> +	if (end != outer_end)
> +		free_contig_pages(end, outer_end - end);
> +
> +	ret = 0;
> +done:
> +	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
> +	return ret;
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 3/9] mm: alloc_contig_range() added
@ 2011-10-18 12:38     ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 06, 2011 at 03:54:43PM +0200, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit adds the alloc_contig_range() function which tries
> to allocate given range of pages.  It tries to migrate all
> already allocated pages that fall in the range thus freeing them.
> Once all pages in the range are freed they are removed from the
> buddy system thus allocated for the caller to use.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: renamed some variables for easier code reading]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/page-isolation.h |    2 +
>  mm/page_alloc.c                |  148 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 150 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index b9fc428..774ecec 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -36,6 +36,8 @@ extern void unset_migratetype_isolate(struct page *page);
>  /* The below functions must be run on a range from a single zone. */
>  extern unsigned long alloc_contig_freed_pages(unsigned long start,
>  					      unsigned long end, gfp_t flag);
> +extern int alloc_contig_range(unsigned long start, unsigned long end,
> +			      gfp_t flags);
>  extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
>  
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fbfb920..8010854 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5773,6 +5773,154 @@ void free_contig_pages(unsigned long pfn, unsigned nr_pages)
>  	}
>  }
>  
> +static unsigned long pfn_to_maxpage(unsigned long pfn)
> +{
> +	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
> +}
> +

pfn_to_maxpage is a very confusing name here. It would be preferable to
create a MAX_ORDER_MASK that you apply directly.

Maybe something like SECTION_ALIGN_UP and SECTION_ALIGN_DOWN.

> +static unsigned long pfn_to_maxpage_up(unsigned long pfn)
> +{
> +	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
> +}
> +
> +#define MIGRATION_RETRY	5
> +static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> +{
> +	int migration_failed = 0, ret;
> +	unsigned long pfn = start;
> +
> +	/*
> +	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
> +	 * __alloc_contig_pages().
> +	 */
> +

There is no need to put a comment like this here. Credit him in the
changelog.

> +	/* drop all pages in pagevec and pcp list */
> +	lru_add_drain_all();
> +	drain_all_pages();
> +

Very similar to migrate_prep(). drain_all_pages should not be required
at this point.

> +	for (;;) {
> +		pfn = scan_lru_pages(pfn, end);

scan_lru_pages() is inefficient, this is going to be costly.

> +		if (!pfn || pfn >= end)
> +			break;
> +
> +		ret = do_migrate_range(pfn, end);
> +		if (!ret) {
> +			migration_failed = 0;
> +		} else if (ret != -EBUSY
> +			|| ++migration_failed >= MIGRATION_RETRY) {
> +			return ret;
> +		} else {
> +			/* There are unstable pages.on pagevec. */
> +			lru_add_drain_all();
> +			/*
> +			 * there may be pages on pcplist before
> +			 * we mark the range as ISOLATED.
> +			 */
> +			drain_all_pages();
> +		}
> +		cond_resched();
> +	}
> +
> +	if (!migration_failed) {
> +		/* drop all pages in pagevec and pcp list */
> +		lru_add_drain_all();
> +		drain_all_pages();
> +	}
> +
> +	/* Make sure all pages are isolated */
> +	if (WARN_ON(test_pages_isolated(start, end)))
> +		return -EBUSY;
> +

In some respects, this is very similar to mm/compaction#compact_zone().
They could have shared significant code if you reworked compact_zone
to work on ranges of memory and express compact_zone to operate on
zone->zone_start_pfn zone->zone_start_pfn+zone->spanned_pages . The
compaction code is 

> +	return 0;
> +}
> +
> +/**
> + * alloc_contig_range() -- tries to allocate given range of pages
> + * @start:	start PFN to allocate
> + * @end:	one-past-the-last PFN to allocate
> + * @flags:	flags passed to alloc_contig_freed_pages().
> + *
> + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> + * aligned, hovewer it's callers responsibility to guarantee that we
> + * are the only thread that changes migrate type of pageblocks the
> + * pages fall in.
> + *
> + * Returns zero on success or negative error code.  On success all
> + * pages which PFN is in (start, end) are allocated for the caller and
> + * need to be freed with free_contig_pages().
> + */
> +int alloc_contig_range(unsigned long start, unsigned long end,
> +		       gfp_t flags)
> +{
> +	unsigned long outer_start, outer_end;
> +	int ret;
> +
> +	/*
> +	 * What we do here is we mark all pageblocks in range as
> +	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
> +	 * align the range to MAX_ORDER pages so that page allocator
> +	 * won't try to merge buddies from different pageblocks and
> +	 * change MIGRATE_ISOLATE to some other migration type.
> +	 *

This part is new. compaction does not need to do this nor does it need
to. I can see though why it would be important for CMA though and
calling start_isolate_page_range() is reasonable.

There are alignment problems because it'll be on a pageblock boundary
but that is not a significant problem.

> +	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
> +	 * migrate the pages from an unaligned range (ie. pages that
> +	 * we are interested in).  This will put all the pages in
> +	 * range back to page allocator as MIGRATE_ISOLATE.
> +	 *
> +	 * When this is done, we take the pages in range from page
> +	 * allocator removing them from the buddy system.  This way
> +	 * page allocator will never consider using them.
> +	 *
> +	 * This lets us mark the pageblocks back as
> +	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
> +	 * MAX_ORDER aligned range but not in the unaligned, original
> +	 * range are put back to page allocator so that buddy can use
> +	 * them.
> +	 */
> +
> +	ret = start_isolate_page_range(pfn_to_maxpage(start),
> +				       pfn_to_maxpage_up(end));
> +	if (ret)
> +		goto done;
> +
> +	ret = __alloc_contig_migrate_range(start, end);
> +	if (ret)
> +		goto done;
> +
> +	/*
> +	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
> +	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
> +	 * more, all pages in [start, end) are free in page allocator.
> +	 * What we are going to do is to allocate all pages from
> +	 * [start, end) (that is remove them from page allocater).
> +	 *
> +	 * The only problem is that pages at the beginning and at the
> +	 * end of interesting range may be not aligned with pages that
> +	 * page allocator holds, ie. they can be part of higher order
> +	 * pages.  Because of this, we reserve the bigger range and
> +	 * once this is done free the pages we are not interested in.
> +	 */
> +
> +	ret = 0;
> +	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
> +		if (WARN_ON(++ret >= MAX_ORDER))
> +			return -EINVAL;
> +
> +	outer_start = start & (~0UL << ret);
> +	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
> +
> +	/* Free head and tail (if any) */
> +	if (start != outer_start)
> +		free_contig_pages(outer_start, start - outer_start);
> +	if (end != outer_end)
> +		free_contig_pages(end, outer_end - end);
> +
> +	ret = 0;
> +done:
> +	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
> +	return ret;
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-18 13:08     ` Mel Gorman
  -1 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 13:08 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, Oct 06, 2011 at 03:54:44PM +0200, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
> 
> This guarantees that page in a MIGRATE_CMA page block can
> always be migrated somewhere else (unless there's no memory left
> in the system).
> 

Or the count is premanently elevated by a device driver for some reason or if
the page is backed by a filesystem with a broken or unusable migrate_page()
function. This is unavoidable, I'm just pointing out that you can stil have
migration failures, particularly if GFP_MOVABLE has been improperly
used.

> It is designed to be used with Contiguous Memory Allocator
> (CMA) for allocating big chunks (eg. 10MiB) of physically
> contiguous memory.  Once driver requests contiguous memory,
> CMA will migrate pages from MIGRATE_CMA pageblocks.
> 
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
> 

It would be preferable if you could figure out how to reuse the
MIGRATE_RESERVE type for just the bitmap. Like MIGRATE_CMA, it does not
change type except when min_free_kbytes changes. However, it is
something that could be done in the future to keep the size of the
pageblock bitmap where it is now.

> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> [m.nazarewicz: fixed incorrect handling of pages from ISOLATE page blocks]
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/mmzone.h         |   41 +++++++++++++++++----
>  include/linux/page-isolation.h |    1 +
>  mm/Kconfig                     |    8 ++++-
>  mm/compaction.c                |   10 +++++
>  mm/page_alloc.c                |   79 ++++++++++++++++++++++++++++++----------
>  5 files changed, 112 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 862a834..cc34965 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -35,13 +35,35 @@
>   */
>  #define PAGE_ALLOC_COSTLY_ORDER 3
>  
> -#define MIGRATE_UNMOVABLE     0
> -#define MIGRATE_RECLAIMABLE   1
> -#define MIGRATE_MOVABLE       2
> -#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
> -#define MIGRATE_RESERVE       3
> -#define MIGRATE_ISOLATE       4 /* can't allocate from here */
> -#define MIGRATE_TYPES         5
> +enum {
> +	MIGRATE_UNMOVABLE,
> +	MIGRATE_RECLAIMABLE,
> +	MIGRATE_MOVABLE,
> +	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
> +	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
> +	/*
> +	 * MIGRATE_CMA migration type is designed to mimic the way
> +	 * ZONE_MOVABLE works.  Only movable pages can be allocated
> +	 * from MIGRATE_CMA pageblocks and page allocator never
> +	 * implicitly change migration type of MIGRATE_CMA pageblock.
> +	 *
> +	 * The way to use it is to change migratetype of a range of
> +	 * pageblocks to MIGRATE_CMA which can be done by
> +	 * __free_pageblock_cma() function.  What is important though
> +	 * is that a range of pageblocks must be aligned to
> +	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
> +	 * a single pageblock.
> +	 */
> +	MIGRATE_CMA,

This does mean that MIGRATE_CMA also does not have a per-cpu list. I
don't know if that matters to you but all allocations using MIGRATE_CMA
will take the zone lock. I'm not sure this can be easily avoided because
if there is a per-CPU list for MIGRATE_CMA, it might use a new cache
line for it and incur a different set of performance problems.

> +	MIGRATE_ISOLATE,	/* can't allocate from here */
> +	MIGRATE_TYPES
> +};
> +
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> +#else
> +#  define is_migrate_cma(migratetype) false
> +#endif
>  
>  #define for_each_migratetype_order(order, type) \
>  	for (order = 0; order < MAX_ORDER; order++) \
> @@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
>  	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
>  }
>  
> +static inline bool is_pageblock_cma(struct page *page)
> +{
> +	return is_migrate_cma(get_pageblock_migratetype(page));
> +}
> +
>  struct free_area {
>  	struct list_head	free_list[MIGRATE_TYPES];
>  	unsigned long		nr_free;
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 774ecec..9b6aa8a 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -48,4 +48,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
>  unsigned long scan_lru_pages(unsigned long start, unsigned long end);
>  int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>  
> +extern void init_cma_reserved_pageblock(struct page *page);
>  #endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 10d7986..d067b84 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -192,7 +192,7 @@ config COMPACTION
>  config MIGRATION
>  	bool "Page migration"
>  	def_bool y
> -	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
> +	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
>  	help

CMA_MIGRATE_TYPE is an implementation detail of how CMA is implemented.
It makes more sense for the Kconfig option to be simply CMA.

>  	  Allows the migration of the physical location of pages of processes
>  	  while the virtual addresses are not changed. This is useful in
> @@ -201,6 +201,12 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config CMA_MIGRATE_TYPE
> +	bool
> +	help
> +	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
> +	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
> +
>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
>  
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 97254e4..9cf6b2b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>  		return false;
>  
> +	/* Keep MIGRATE_CMA alone as well. */
> +	/*
> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
> +	 * pages since compaction insists on changing their migration
> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
> +	 * isolate_freepages_block() above).
> +	 */
> +	if (is_migrate_cma(migratetype))
> +		return false;
> +

This is another reason why CMA and compaction should be using almost
identical code. It does mean that the compact_control may need to be
renamed and get flags to control things like the setting of pageblock
flags but it would be preferable to having two almost identical pieces
of code.

>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>  		return true;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8010854..6758b9a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -733,6 +733,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
>  	}
>  }
>  
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +/*
> + * Free whole pageblock and set it's migration type to MIGRATE_CMA.
> + */
> +void __init init_cma_reserved_pageblock(struct page *page)
> +{
> +	struct page *p = page;
> +	unsigned i = pageblock_nr_pages;
> +
> +	prefetchw(p);
> +	do {
> +		if (--i)
> +			prefetchw(p + 1);

There is no need to use prefetch here. It's a sequential read by a fixed
stride. The hardware prefetcher on modern CPUs (well for x86 anyway) have
no problem identifying this sort of pattern.

> +		__ClearPageReserved(p);
> +		set_page_count(p, 0);
> +	} while (++p, i);
> +
> +	set_page_refcounted(page);
> +	set_pageblock_migratetype(page, MIGRATE_CMA);
> +	__free_pages(page, pageblock_order);
> +	totalram_pages += pageblock_nr_pages;
> +}
> +#endif
>  
>  /*
>   * The order of subdivision here is critical for the IO subsystem.
> @@ -841,11 +864,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>   * This array describes the order lists are fallen back to when
>   * the free lists for the desirable migrate type are depleted
>   */
> -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
> +static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
> -	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> -	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
> +	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
>  };
>  
>  /*
> @@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  	/* Find the largest possible block of pages in the other list */
>  	for (current_order = MAX_ORDER-1; current_order >= order;
>  						--current_order) {
> -		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
> +		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {

I don't see why this change is necessary.

>  			migratetype = fallbacks[start_migratetype][i];
>  
>  			/* MIGRATE_RESERVE handled later if necessary */
>  			if (migratetype == MIGRATE_RESERVE)
> -				continue;
> +				break;
>  
>  			area = &(zone->free_area[current_order]);
>  			if (list_empty(&area->free_list[migratetype]))
> @@ -960,19 +983,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  			 * pages to the preferred allocation list. If falling
>  			 * back for a reclaimable kernel allocation, be more
>  			 * aggressive about taking ownership of free pages
> +			 *
> +			 * On the other hand, never change migration
> +			 * type of MIGRATE_CMA pageblocks nor move CMA
> +			 * pages on different free lists. We don't
> +			 * want unmovable pages to be allocated from
> +			 * MIGRATE_CMA areas.
>  			 */
> -			if (unlikely(current_order >= (pageblock_order >> 1)) ||
> -					start_migratetype == MIGRATE_RECLAIMABLE ||
> -					page_group_by_mobility_disabled) {
> -				unsigned long pages;
> +			if (!is_pageblock_cma(page) &&
> +			    (unlikely(current_order >= pageblock_order / 2) ||
> +			     start_migratetype == MIGRATE_RECLAIMABLE ||
> +			     page_group_by_mobility_disabled)) {
> +				int pages;
>  				pages = move_freepages_block(zone, page,
> -								start_migratetype);
> +							     start_migratetype);
>  
> -				/* Claim the whole block if over half of it is free */
> +				/*
> +				 * Claim the whole block if over half
> +				 * of it is free
> +				 */
>  				if (pages >= (1 << (pageblock_order-1)) ||
> -						page_group_by_mobility_disabled)
> +				    page_group_by_mobility_disabled)
>  					set_pageblock_migratetype(page,
> -								start_migratetype);
> +							start_migratetype);
>  

I only glanced through this because I'm thinking at this point that
MIGRATE_CMA should be like or identical to MIGRATE_RESERVE. That said, I
didn't spot anything obviously wrong other than some of the changes are
only affecting indentation.

>  				migratetype = start_migratetype;
>  			}
> @@ -982,11 +1015,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  			rmv_page_order(page);
>  
>  			/* Take ownership for orders >= pageblock_order */
> -			if (current_order >= pageblock_order)
> +			if (current_order >= pageblock_order &&
> +			    !is_pageblock_cma(page))
>  				change_pageblock_range(page, current_order,
>  							start_migratetype);
>  
> -			expand(zone, page, order, current_order, area, migratetype);
> +			expand(zone, page, order, current_order, area,
> +			       is_migrate_cma(start_migratetype)
> +			     ? start_migratetype : migratetype);
>  
>  			trace_mm_page_alloc_extfrag(page, order, current_order,
>  				start_migratetype, migratetype);
> @@ -1058,7 +1094,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>  			list_add(&page->lru, list);
>  		else
>  			list_add_tail(&page->lru, list);
> -		set_page_private(page, migratetype);
> +		if (is_pageblock_cma(page))
> +			set_page_private(page, MIGRATE_CMA);
> +		else
> +			set_page_private(page, migratetype);
>  		list = &page->lru;
>  	}
>  	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> @@ -1302,7 +1341,9 @@ int split_free_page(struct page *page)
>  	if (order >= pageblock_order - 1) {
>  		struct page *endpage = page + (1 << order) - 1;
>  		for (; page < endpage; page += pageblock_nr_pages)
> -			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +			if (!is_pageblock_cma(page))
> +				set_pageblock_migratetype(page,
> +							  MIGRATE_MOVABLE);
>  	}
>  
>  	return 1 << order;
> @@ -5592,8 +5633,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
>  	 */
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return true;
> -
> -	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
> +	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
> +	    is_pageblock_cma(page))
>  		return true;
>  
>  	pfn = page_to_pfn(page);
> -- 
> 1.7.1.569.g6f426
> 

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-10-18 13:08     ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 13:08 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, Oct 06, 2011 at 03:54:44PM +0200, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
> 
> This guarantees that page in a MIGRATE_CMA page block can
> always be migrated somewhere else (unless there's no memory left
> in the system).
> 

Or the count is premanently elevated by a device driver for some reason or if
the page is backed by a filesystem with a broken or unusable migrate_page()
function. This is unavoidable, I'm just pointing out that you can stil have
migration failures, particularly if GFP_MOVABLE has been improperly
used.

> It is designed to be used with Contiguous Memory Allocator
> (CMA) for allocating big chunks (eg. 10MiB) of physically
> contiguous memory.  Once driver requests contiguous memory,
> CMA will migrate pages from MIGRATE_CMA pageblocks.
> 
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
> 

It would be preferable if you could figure out how to reuse the
MIGRATE_RESERVE type for just the bitmap. Like MIGRATE_CMA, it does not
change type except when min_free_kbytes changes. However, it is
something that could be done in the future to keep the size of the
pageblock bitmap where it is now.

> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> [m.nazarewicz: fixed incorrect handling of pages from ISOLATE page blocks]
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/mmzone.h         |   41 +++++++++++++++++----
>  include/linux/page-isolation.h |    1 +
>  mm/Kconfig                     |    8 ++++-
>  mm/compaction.c                |   10 +++++
>  mm/page_alloc.c                |   79 ++++++++++++++++++++++++++++++----------
>  5 files changed, 112 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 862a834..cc34965 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -35,13 +35,35 @@
>   */
>  #define PAGE_ALLOC_COSTLY_ORDER 3
>  
> -#define MIGRATE_UNMOVABLE     0
> -#define MIGRATE_RECLAIMABLE   1
> -#define MIGRATE_MOVABLE       2
> -#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
> -#define MIGRATE_RESERVE       3
> -#define MIGRATE_ISOLATE       4 /* can't allocate from here */
> -#define MIGRATE_TYPES         5
> +enum {
> +	MIGRATE_UNMOVABLE,
> +	MIGRATE_RECLAIMABLE,
> +	MIGRATE_MOVABLE,
> +	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
> +	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
> +	/*
> +	 * MIGRATE_CMA migration type is designed to mimic the way
> +	 * ZONE_MOVABLE works.  Only movable pages can be allocated
> +	 * from MIGRATE_CMA pageblocks and page allocator never
> +	 * implicitly change migration type of MIGRATE_CMA pageblock.
> +	 *
> +	 * The way to use it is to change migratetype of a range of
> +	 * pageblocks to MIGRATE_CMA which can be done by
> +	 * __free_pageblock_cma() function.  What is important though
> +	 * is that a range of pageblocks must be aligned to
> +	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
> +	 * a single pageblock.
> +	 */
> +	MIGRATE_CMA,

This does mean that MIGRATE_CMA also does not have a per-cpu list. I
don't know if that matters to you but all allocations using MIGRATE_CMA
will take the zone lock. I'm not sure this can be easily avoided because
if there is a per-CPU list for MIGRATE_CMA, it might use a new cache
line for it and incur a different set of performance problems.

> +	MIGRATE_ISOLATE,	/* can't allocate from here */
> +	MIGRATE_TYPES
> +};
> +
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> +#else
> +#  define is_migrate_cma(migratetype) false
> +#endif
>  
>  #define for_each_migratetype_order(order, type) \
>  	for (order = 0; order < MAX_ORDER; order++) \
> @@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
>  	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
>  }
>  
> +static inline bool is_pageblock_cma(struct page *page)
> +{
> +	return is_migrate_cma(get_pageblock_migratetype(page));
> +}
> +
>  struct free_area {
>  	struct list_head	free_list[MIGRATE_TYPES];
>  	unsigned long		nr_free;
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 774ecec..9b6aa8a 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -48,4 +48,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
>  unsigned long scan_lru_pages(unsigned long start, unsigned long end);
>  int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>  
> +extern void init_cma_reserved_pageblock(struct page *page);
>  #endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 10d7986..d067b84 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -192,7 +192,7 @@ config COMPACTION
>  config MIGRATION
>  	bool "Page migration"
>  	def_bool y
> -	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
> +	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
>  	help

CMA_MIGRATE_TYPE is an implementation detail of how CMA is implemented.
It makes more sense for the Kconfig option to be simply CMA.

>  	  Allows the migration of the physical location of pages of processes
>  	  while the virtual addresses are not changed. This is useful in
> @@ -201,6 +201,12 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config CMA_MIGRATE_TYPE
> +	bool
> +	help
> +	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
> +	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
> +
>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
>  
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 97254e4..9cf6b2b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>  		return false;
>  
> +	/* Keep MIGRATE_CMA alone as well. */
> +	/*
> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
> +	 * pages since compaction insists on changing their migration
> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
> +	 * isolate_freepages_block() above).
> +	 */
> +	if (is_migrate_cma(migratetype))
> +		return false;
> +

This is another reason why CMA and compaction should be using almost
identical code. It does mean that the compact_control may need to be
renamed and get flags to control things like the setting of pageblock
flags but it would be preferable to having two almost identical pieces
of code.

>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>  		return true;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8010854..6758b9a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -733,6 +733,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
>  	}
>  }
>  
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +/*
> + * Free whole pageblock and set it's migration type to MIGRATE_CMA.
> + */
> +void __init init_cma_reserved_pageblock(struct page *page)
> +{
> +	struct page *p = page;
> +	unsigned i = pageblock_nr_pages;
> +
> +	prefetchw(p);
> +	do {
> +		if (--i)
> +			prefetchw(p + 1);

There is no need to use prefetch here. It's a sequential read by a fixed
stride. The hardware prefetcher on modern CPUs (well for x86 anyway) have
no problem identifying this sort of pattern.

> +		__ClearPageReserved(p);
> +		set_page_count(p, 0);
> +	} while (++p, i);
> +
> +	set_page_refcounted(page);
> +	set_pageblock_migratetype(page, MIGRATE_CMA);
> +	__free_pages(page, pageblock_order);
> +	totalram_pages += pageblock_nr_pages;
> +}
> +#endif
>  
>  /*
>   * The order of subdivision here is critical for the IO subsystem.
> @@ -841,11 +864,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>   * This array describes the order lists are fallen back to when
>   * the free lists for the desirable migrate type are depleted
>   */
> -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
> +static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
> -	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> -	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
> +	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
>  };
>  
>  /*
> @@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  	/* Find the largest possible block of pages in the other list */
>  	for (current_order = MAX_ORDER-1; current_order >= order;
>  						--current_order) {
> -		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
> +		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {

I don't see why this change is necessary.

>  			migratetype = fallbacks[start_migratetype][i];
>  
>  			/* MIGRATE_RESERVE handled later if necessary */
>  			if (migratetype == MIGRATE_RESERVE)
> -				continue;
> +				break;
>  
>  			area = &(zone->free_area[current_order]);
>  			if (list_empty(&area->free_list[migratetype]))
> @@ -960,19 +983,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  			 * pages to the preferred allocation list. If falling
>  			 * back for a reclaimable kernel allocation, be more
>  			 * aggressive about taking ownership of free pages
> +			 *
> +			 * On the other hand, never change migration
> +			 * type of MIGRATE_CMA pageblocks nor move CMA
> +			 * pages on different free lists. We don't
> +			 * want unmovable pages to be allocated from
> +			 * MIGRATE_CMA areas.
>  			 */
> -			if (unlikely(current_order >= (pageblock_order >> 1)) ||
> -					start_migratetype == MIGRATE_RECLAIMABLE ||
> -					page_group_by_mobility_disabled) {
> -				unsigned long pages;
> +			if (!is_pageblock_cma(page) &&
> +			    (unlikely(current_order >= pageblock_order / 2) ||
> +			     start_migratetype == MIGRATE_RECLAIMABLE ||
> +			     page_group_by_mobility_disabled)) {
> +				int pages;
>  				pages = move_freepages_block(zone, page,
> -								start_migratetype);
> +							     start_migratetype);
>  
> -				/* Claim the whole block if over half of it is free */
> +				/*
> +				 * Claim the whole block if over half
> +				 * of it is free
> +				 */
>  				if (pages >= (1 << (pageblock_order-1)) ||
> -						page_group_by_mobility_disabled)
> +				    page_group_by_mobility_disabled)
>  					set_pageblock_migratetype(page,
> -								start_migratetype);
> +							start_migratetype);
>  

I only glanced through this because I'm thinking at this point that
MIGRATE_CMA should be like or identical to MIGRATE_RESERVE. That said, I
didn't spot anything obviously wrong other than some of the changes are
only affecting indentation.

>  				migratetype = start_migratetype;
>  			}
> @@ -982,11 +1015,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  			rmv_page_order(page);
>  
>  			/* Take ownership for orders >= pageblock_order */
> -			if (current_order >= pageblock_order)
> +			if (current_order >= pageblock_order &&
> +			    !is_pageblock_cma(page))
>  				change_pageblock_range(page, current_order,
>  							start_migratetype);
>  
> -			expand(zone, page, order, current_order, area, migratetype);
> +			expand(zone, page, order, current_order, area,
> +			       is_migrate_cma(start_migratetype)
> +			     ? start_migratetype : migratetype);
>  
>  			trace_mm_page_alloc_extfrag(page, order, current_order,
>  				start_migratetype, migratetype);
> @@ -1058,7 +1094,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>  			list_add(&page->lru, list);
>  		else
>  			list_add_tail(&page->lru, list);
> -		set_page_private(page, migratetype);
> +		if (is_pageblock_cma(page))
> +			set_page_private(page, MIGRATE_CMA);
> +		else
> +			set_page_private(page, migratetype);
>  		list = &page->lru;
>  	}
>  	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> @@ -1302,7 +1341,9 @@ int split_free_page(struct page *page)
>  	if (order >= pageblock_order - 1) {
>  		struct page *endpage = page + (1 << order) - 1;
>  		for (; page < endpage; page += pageblock_nr_pages)
> -			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +			if (!is_pageblock_cma(page))
> +				set_pageblock_migratetype(page,
> +							  MIGRATE_MOVABLE);
>  	}
>  
>  	return 1 << order;
> @@ -5592,8 +5633,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
>  	 */
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return true;
> -
> -	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
> +	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
> +	    is_pageblock_cma(page))
>  		return true;
>  
>  	pfn = page_to_pfn(page);
> -- 
> 1.7.1.569.g6f426
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-10-18 13:08     ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 06, 2011 at 03:54:44PM +0200, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
> 
> This guarantees that page in a MIGRATE_CMA page block can
> always be migrated somewhere else (unless there's no memory left
> in the system).
> 

Or the count is premanently elevated by a device driver for some reason or if
the page is backed by a filesystem with a broken or unusable migrate_page()
function. This is unavoidable, I'm just pointing out that you can stil have
migration failures, particularly if GFP_MOVABLE has been improperly
used.

> It is designed to be used with Contiguous Memory Allocator
> (CMA) for allocating big chunks (eg. 10MiB) of physically
> contiguous memory.  Once driver requests contiguous memory,
> CMA will migrate pages from MIGRATE_CMA pageblocks.
> 
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
> 

It would be preferable if you could figure out how to reuse the
MIGRATE_RESERVE type for just the bitmap. Like MIGRATE_CMA, it does not
change type except when min_free_kbytes changes. However, it is
something that could be done in the future to keep the size of the
pageblock bitmap where it is now.

> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> [m.nazarewicz: fixed incorrect handling of pages from ISOLATE page blocks]
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/linux/mmzone.h         |   41 +++++++++++++++++----
>  include/linux/page-isolation.h |    1 +
>  mm/Kconfig                     |    8 ++++-
>  mm/compaction.c                |   10 +++++
>  mm/page_alloc.c                |   79 ++++++++++++++++++++++++++++++----------
>  5 files changed, 112 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 862a834..cc34965 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -35,13 +35,35 @@
>   */
>  #define PAGE_ALLOC_COSTLY_ORDER 3
>  
> -#define MIGRATE_UNMOVABLE     0
> -#define MIGRATE_RECLAIMABLE   1
> -#define MIGRATE_MOVABLE       2
> -#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
> -#define MIGRATE_RESERVE       3
> -#define MIGRATE_ISOLATE       4 /* can't allocate from here */
> -#define MIGRATE_TYPES         5
> +enum {
> +	MIGRATE_UNMOVABLE,
> +	MIGRATE_RECLAIMABLE,
> +	MIGRATE_MOVABLE,
> +	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
> +	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
> +	/*
> +	 * MIGRATE_CMA migration type is designed to mimic the way
> +	 * ZONE_MOVABLE works.  Only movable pages can be allocated
> +	 * from MIGRATE_CMA pageblocks and page allocator never
> +	 * implicitly change migration type of MIGRATE_CMA pageblock.
> +	 *
> +	 * The way to use it is to change migratetype of a range of
> +	 * pageblocks to MIGRATE_CMA which can be done by
> +	 * __free_pageblock_cma() function.  What is important though
> +	 * is that a range of pageblocks must be aligned to
> +	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
> +	 * a single pageblock.
> +	 */
> +	MIGRATE_CMA,

This does mean that MIGRATE_CMA also does not have a per-cpu list. I
don't know if that matters to you but all allocations using MIGRATE_CMA
will take the zone lock. I'm not sure this can be easily avoided because
if there is a per-CPU list for MIGRATE_CMA, it might use a new cache
line for it and incur a different set of performance problems.

> +	MIGRATE_ISOLATE,	/* can't allocate from here */
> +	MIGRATE_TYPES
> +};
> +
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> +#else
> +#  define is_migrate_cma(migratetype) false
> +#endif
>  
>  #define for_each_migratetype_order(order, type) \
>  	for (order = 0; order < MAX_ORDER; order++) \
> @@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
>  	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
>  }
>  
> +static inline bool is_pageblock_cma(struct page *page)
> +{
> +	return is_migrate_cma(get_pageblock_migratetype(page));
> +}
> +
>  struct free_area {
>  	struct list_head	free_list[MIGRATE_TYPES];
>  	unsigned long		nr_free;
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 774ecec..9b6aa8a 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -48,4 +48,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
>  unsigned long scan_lru_pages(unsigned long start, unsigned long end);
>  int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>  
> +extern void init_cma_reserved_pageblock(struct page *page);
>  #endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 10d7986..d067b84 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -192,7 +192,7 @@ config COMPACTION
>  config MIGRATION
>  	bool "Page migration"
>  	def_bool y
> -	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
> +	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
>  	help

CMA_MIGRATE_TYPE is an implementation detail of how CMA is implemented.
It makes more sense for the Kconfig option to be simply CMA.

>  	  Allows the migration of the physical location of pages of processes
>  	  while the virtual addresses are not changed. This is useful in
> @@ -201,6 +201,12 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config CMA_MIGRATE_TYPE
> +	bool
> +	help
> +	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
> +	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
> +
>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
>  
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 97254e4..9cf6b2b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>  		return false;
>  
> +	/* Keep MIGRATE_CMA alone as well. */
> +	/*
> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
> +	 * pages since compaction insists on changing their migration
> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
> +	 * isolate_freepages_block() above).
> +	 */
> +	if (is_migrate_cma(migratetype))
> +		return false;
> +

This is another reason why CMA and compaction should be using almost
identical code. It does mean that the compact_control may need to be
renamed and get flags to control things like the setting of pageblock
flags but it would be preferable to having two almost identical pieces
of code.

>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>  		return true;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8010854..6758b9a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -733,6 +733,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
>  	}
>  }
>  
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +/*
> + * Free whole pageblock and set it's migration type to MIGRATE_CMA.
> + */
> +void __init init_cma_reserved_pageblock(struct page *page)
> +{
> +	struct page *p = page;
> +	unsigned i = pageblock_nr_pages;
> +
> +	prefetchw(p);
> +	do {
> +		if (--i)
> +			prefetchw(p + 1);

There is no need to use prefetch here. It's a sequential read by a fixed
stride. The hardware prefetcher on modern CPUs (well for x86 anyway) have
no problem identifying this sort of pattern.

> +		__ClearPageReserved(p);
> +		set_page_count(p, 0);
> +	} while (++p, i);
> +
> +	set_page_refcounted(page);
> +	set_pageblock_migratetype(page, MIGRATE_CMA);
> +	__free_pages(page, pageblock_order);
> +	totalram_pages += pageblock_nr_pages;
> +}
> +#endif
>  
>  /*
>   * The order of subdivision here is critical for the IO subsystem.
> @@ -841,11 +864,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>   * This array describes the order lists are fallen back to when
>   * the free lists for the desirable migrate type are depleted
>   */
> -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
> +static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
> -	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> -	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
> +	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
>  };
>  
>  /*
> @@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  	/* Find the largest possible block of pages in the other list */
>  	for (current_order = MAX_ORDER-1; current_order >= order;
>  						--current_order) {
> -		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
> +		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {

I don't see why this change is necessary.

>  			migratetype = fallbacks[start_migratetype][i];
>  
>  			/* MIGRATE_RESERVE handled later if necessary */
>  			if (migratetype == MIGRATE_RESERVE)
> -				continue;
> +				break;
>  
>  			area = &(zone->free_area[current_order]);
>  			if (list_empty(&area->free_list[migratetype]))
> @@ -960,19 +983,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  			 * pages to the preferred allocation list. If falling
>  			 * back for a reclaimable kernel allocation, be more
>  			 * aggressive about taking ownership of free pages
> +			 *
> +			 * On the other hand, never change migration
> +			 * type of MIGRATE_CMA pageblocks nor move CMA
> +			 * pages on different free lists. We don't
> +			 * want unmovable pages to be allocated from
> +			 * MIGRATE_CMA areas.
>  			 */
> -			if (unlikely(current_order >= (pageblock_order >> 1)) ||
> -					start_migratetype == MIGRATE_RECLAIMABLE ||
> -					page_group_by_mobility_disabled) {
> -				unsigned long pages;
> +			if (!is_pageblock_cma(page) &&
> +			    (unlikely(current_order >= pageblock_order / 2) ||
> +			     start_migratetype == MIGRATE_RECLAIMABLE ||
> +			     page_group_by_mobility_disabled)) {
> +				int pages;
>  				pages = move_freepages_block(zone, page,
> -								start_migratetype);
> +							     start_migratetype);
>  
> -				/* Claim the whole block if over half of it is free */
> +				/*
> +				 * Claim the whole block if over half
> +				 * of it is free
> +				 */
>  				if (pages >= (1 << (pageblock_order-1)) ||
> -						page_group_by_mobility_disabled)
> +				    page_group_by_mobility_disabled)
>  					set_pageblock_migratetype(page,
> -								start_migratetype);
> +							start_migratetype);
>  

I only glanced through this because I'm thinking at this point that
MIGRATE_CMA should be like or identical to MIGRATE_RESERVE. That said, I
didn't spot anything obviously wrong other than some of the changes are
only affecting indentation.

>  				migratetype = start_migratetype;
>  			}
> @@ -982,11 +1015,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  			rmv_page_order(page);
>  
>  			/* Take ownership for orders >= pageblock_order */
> -			if (current_order >= pageblock_order)
> +			if (current_order >= pageblock_order &&
> +			    !is_pageblock_cma(page))
>  				change_pageblock_range(page, current_order,
>  							start_migratetype);
>  
> -			expand(zone, page, order, current_order, area, migratetype);
> +			expand(zone, page, order, current_order, area,
> +			       is_migrate_cma(start_migratetype)
> +			     ? start_migratetype : migratetype);
>  
>  			trace_mm_page_alloc_extfrag(page, order, current_order,
>  				start_migratetype, migratetype);
> @@ -1058,7 +1094,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>  			list_add(&page->lru, list);
>  		else
>  			list_add_tail(&page->lru, list);
> -		set_page_private(page, migratetype);
> +		if (is_pageblock_cma(page))
> +			set_page_private(page, MIGRATE_CMA);
> +		else
> +			set_page_private(page, migratetype);
>  		list = &page->lru;
>  	}
>  	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> @@ -1302,7 +1341,9 @@ int split_free_page(struct page *page)
>  	if (order >= pageblock_order - 1) {
>  		struct page *endpage = page + (1 << order) - 1;
>  		for (; page < endpage; page += pageblock_nr_pages)
> -			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +			if (!is_pageblock_cma(page))
> +				set_pageblock_migratetype(page,
> +							  MIGRATE_MOVABLE);
>  	}
>  
>  	return 1 << order;
> @@ -5592,8 +5633,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
>  	 */
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return true;
> -
> -	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
> +	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
> +	    is_pageblock_cma(page))
>  		return true;
>  
>  	pfn = page_to_pfn(page);
> -- 
> 1.7.1.569.g6f426
> 

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator
  2011-10-06 13:54   ` Marek Szyprowski
  (?)
@ 2011-10-18 13:43     ` Mel Gorman
  -1 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 13:43 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:
> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  arch/Kconfig                         |    3 +
>  drivers/base/Kconfig                 |   79 +++++++
>  drivers/base/Makefile                |    1 +
>  drivers/base/dma-contiguous.c        |  386 ++++++++++++++++++++++++++++++++++
>  include/asm-generic/dma-contiguous.h |   27 +++
>  include/linux/device.h               |    4 +
>  include/linux/dma-contiguous.h       |  106 ++++++++++
>  7 files changed, 606 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/asm-generic/dma-contiguous.h
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 4b0669c..a3b39a2 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
>  config HAVE_DMA_ATTRS
>  	bool
>  
> +config HAVE_DMA_CONTIGUOUS
> +	bool
> +
>  config USE_GENERIC_SMP_HELPERS
>  	bool
>  
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 21cf46f..a5e6d75 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -174,4 +174,83 @@ config SYS_HYPERVISOR
>  
>  source "drivers/base/regmap/Kconfig"
>  
> +config CMA
> +	bool "Contiguous Memory Allocator (EXPERIMENTAL)"
> +	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL
> +	select MIGRATION
> +	select CMA_MIGRATE_TYPE
> +	help
> +	  This enables the Contiguous Memory Allocator which allows drivers
> +	  to allocate big physically-contiguous blocks of memory for use with
> +	  hardware components that do not support I/O map nor scatter-gather.
> +
> +	  For more information see <include/linux/dma-contiguous.h>.
> +	  If unsure, say "n".
> +
> +if CMA
> +
> +config CMA_DEBUG
> +	bool "CMA debug messages (DEVELOPEMENT)"

s/DEVELOPEMENT/DEVELOPMENT/

Should it be under DEBUG_KERNEL?

> +	help
> +	  Turns on debug messages in CMA.  This produces KERN_DEBUG
> +	  messages for every CMA call as well as various messages while
> +	  processing calls such as dma_alloc_from_contiguous().
> +	  This option does not affect warning and error messages.
> +
> +comment "Default contiguous memory area size:"
> +
> +config CMA_SIZE_ABSOLUTE
> +	int "Absolute size (in MiB)"
> +	depends on !CMA_SIZE_SEL_PERCENTAGE
> +	default 16
> +	help
> +	  Defines the size (in MiB) of the default memory area for Contiguous
> +	  Memory Allocator.
> +
> +config CMA_SIZE_PERCENTAGE
> +	int "Percentage of total memory"
> +	depends on !CMA_SIZE_SEL_ABSOLUTE
> +	default 10
> +	help
> +	  Defines the size of the default memory area for Contiguous Memory
> +	  Allocator as a percentage of the total memory in the system.
> +

Why is this not a kernel parameter rather than a config option?

Better yet, why do drivers not register how much CMA memory they are
interested in and then the drive core figure out if it can allocate that
much or not?

> +choice
> +	prompt "Selected region size"
> +	default CMA_SIZE_SEL_ABSOLUTE
> +
> +config CMA_SIZE_SEL_ABSOLUTE
> +	bool "Use absolute value only"
> +
> +config CMA_SIZE_SEL_PERCENTAGE
> +	bool "Use percentage value only"
> +
> +config CMA_SIZE_SEL_MIN
> +	bool "Use lower value (minimum)"
> +
> +config CMA_SIZE_SEL_MAX
> +	bool "Use higher value (maximum)"
> +
> +endchoice
> +
> +config CMA_ALIGNMENT
> +	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
> +	range 4 9
> +	default 8
> +	help
> +	  DMA mapping framework by default aligns all buffers to the smallest
> +	  PAGE_SIZE order which is greater than or equal to the requested buffer
> +	  size. This works well for buffers up to a few hundreds kilobytes, but
> +	  for larger buffers it just a memory waste. With this parameter you can
> +	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
> +	  buffers will be aligned only to this specified order. The order is
> +	  expressed as a power of two multiplied by the PAGE_SIZE.
> +
> +	  For example, if your system defaults to 4KiB pages, the order value
> +	  of 8 means that the buffers will be aligned up to 1MiB only.
> +
> +	  If unsure, leave the default value "8".
> +
> +endif
> +
>  endmenu
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 99a375a..794546f 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> +obj-$(CONFIG_CMA) += dma-contiguous.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
>  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> new file mode 100644
> index 0000000..e54bb76
> --- /dev/null
> +++ b/drivers/base/dma-contiguous.c
> @@ -0,0 +1,386 @@
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +#define pr_fmt(fmt) "cma: " fmt
> +
> +#ifdef CONFIG_CMA_DEBUG
> +#ifndef DEBUG
> +#  define DEBUG
> +#endif
> +#endif
> +
> +#include <asm/page.h>
> +#include <asm/dma-contiguous.h>
> +
> +#include <linux/memblock.h>
> +#include <linux/err.h>
> +#include <linux/mm.h>
> +#include <linux/mutex.h>
> +#include <linux/page-isolation.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +#include <linux/mm_types.h>
> +#include <linux/dma-contiguous.h>
> +
> +#ifndef SZ_1M
> +#define SZ_1M (1 << 20)
> +#endif
> +
> +#ifdef phys_to_pfn
> +/* nothing to do */
> +#elif defined __phys_to_pfn
> +#  define phys_to_pfn __phys_to_pfn
> +#elif defined __va
> +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> +#else
> +#  error phys_to_pfn implementation needed
> +#endif
> +

Parts of this are assuming that there is a linear mapping of virtual to
physical memory. I think this is always the case but it looks like
something that should be defined in asm-generic with an option for
architectures to override.

> +struct cma {
> +	unsigned long	base_pfn;
> +	unsigned long	count;
> +	unsigned long	*bitmap;
> +};
> +
> +struct cma *dma_contiguous_default_area;
> +
> +#ifndef CONFIG_CMA_SIZE_ABSOLUTE
> +#define CONFIG_CMA_SIZE_ABSOLUTE 0
> +#endif
> +
> +#ifndef CONFIG_CMA_SIZE_PERCENTAGE
> +#define CONFIG_CMA_SIZE_PERCENTAGE 0
> +#endif
> +
> +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;

SIZE_ABSOLUTE is an odd name. It can't be a negative size. size_bytes
maybe.

> +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> +static long size_cmdline = -1;
> +
> +static int __init early_cma(char *p)
> +{
> +	pr_debug("%s(%s)\n", __func__, p);
> +	size_cmdline = memparse(p, &p);
> +	return 0;
> +}
> +early_param("cma", early_cma);
> +
> +static unsigned long __init __cma_early_get_total_pages(void)
> +{
> +	struct memblock_region *reg;
> +	unsigned long total_pages = 0;
> +
> +	/*
> +	 * We cannot use memblock_phys_mem_size() here, because
> +	 * memblock_analyze() has not been called yet.
> +	 */
> +	for_each_memblock(memory, reg)
> +		total_pages += memblock_region_memory_end_pfn(reg) -
> +			       memblock_region_memory_base_pfn(reg);
> +	return total_pages;
> +}
> +

Is this being called too early yet? What prevents you seeing up the CMA
regions after the page allocator is brought up for example? I understand
that there is a need for the memory to be coherent so maybe that is the
obstacle.

> +/**
> + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> + *
> + * This funtion reserves memory from early allocator. It should be
> + * called by arch specific code once the early allocator (memblock or bootmem)
> + * has been activated and all other subsystems have already allocated/reserved
> + * memory.
> + */
> +void __init dma_contiguous_reserve(phys_addr_t limit)
> +{
> +	unsigned long selected_size = 0;
> +	unsigned long total_pages;
> +
> +	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
> +
> +	total_pages = __cma_early_get_total_pages();
> +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> +
> +	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n",
> +		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
> +		size_abs / SZ_1M, size_percent / SZ_1M);
> +
> +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> +	selected_size = size_abs;
> +#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
> +	selected_size = size_percent;
> +#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
> +	selected_size = min(size_abs, size_percent);
> +#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
> +	selected_size = max(size_abs, size_percent);
> +#endif
> +

It seems very strange to do this at Kconfig time instead of via kernel
parameters.

> +	if (size_cmdline != -1)
> +		selected_size = size_cmdline;
> +
> +	if (!selected_size)
> +		return;
> +
> +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +		 selected_size / SZ_1M);
> +
> +	dma_declare_contiguous(NULL, selected_size, 0, limit);
> +};
> +
> +static DEFINE_MUTEX(cma_mutex);
> +
> +static void __cma_activate_area(unsigned long base_pfn, unsigned long count)
> +{
> +	unsigned long pfn = base_pfn;
> +	unsigned i = count >> pageblock_order;
> +	struct zone *zone;
> +
> +	VM_BUG_ON(!pfn_valid(pfn));

Again, VM_BUG_ON is an extreme reaction. WARN_ON_ONCE, return an error
code and fail gracefully.

> +	zone = page_zone(pfn_to_page(pfn));
> +
> +	do {
> +		unsigned j;
> +		base_pfn = pfn;
> +		for (j = pageblock_nr_pages; j; --j, pfn++) {

This is correct but does not look like any other PFN walker. There are
plenty of examples of where we walk PFN ranges. There is no requirement
to use the same pattern but it does make reviewing easier.

> +			VM_BUG_ON(!pfn_valid(pfn));
> +			VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
> +		}

In the field, this is a no-op as I would assume CONFIG_DEBUG_VM is not
set. This should be checked unconditionally and fail gracefully if necessary.

> +		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
> +	} while (--i);
> +}
> +
> +static struct cma *__cma_create_area(unsigned long base_pfn,
> +				     unsigned long count)
> +{
> +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> +	struct cma *cma;
> +
> +	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
> +
> +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> +	if (!cma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cma->base_pfn = base_pfn;
> +	cma->count = count;
> +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> +
> +	if (!cma->bitmap)
> +		goto no_mem;
> +
> +	__cma_activate_area(base_pfn, count);
> +
> +	pr_debug("%s: returned %p\n", __func__, (void *)cma);
> +	return cma;
> +
> +no_mem:
> +	kfree(cma);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
> +static struct cma_reserved {
> +	phys_addr_t start;
> +	unsigned long size;
> +	struct device *dev;
> +} cma_reserved[MAX_CMA_AREAS] __initdata;
> +static unsigned cma_reserved_count __initdata;
> +
> +static int __init __cma_init_reserved_areas(void)
> +{
> +	struct cma_reserved *r = cma_reserved;
> +	unsigned i = cma_reserved_count;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	for (; i; --i, ++r) {
> +		struct cma *cma;
> +		cma = __cma_create_area(phys_to_pfn(r->start),
> +					r->size >> PAGE_SHIFT);
> +		if (!IS_ERR(cma)) {
> +			if (r->dev)
> +				set_dev_cma_area(r->dev, cma);
> +			else
> +				dma_contiguous_default_area = cma;
> +		}
> +	}
> +	return 0;
> +}
> +core_initcall(__cma_init_reserved_areas);
> +
> +/**
> + * dma_declare_contiguous() - reserve area for contiguous memory handling
> + *			      for particular device
> + * @dev:   Pointer to device structure.
> + * @size:  Size of the reserved memory.
> + * @start: Start address of the reserved memory (optional, 0 for any).
> + * @limit: End address of the reserved memory (optional, 0 for any).
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code when early allocator (memblock or bootmem)
> + * is still activate.
> + */
> +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> +				  phys_addr_t base, phys_addr_t limit)
> +{
> +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> +	unsigned long alignment;
> +
> +	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> +		 (unsigned long)size, (unsigned long)base,
> +		 (unsigned long)limit);
> +
> +	/* Sanity checks */
> +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> +		return -ENOSPC;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +	/* Sanitise input arguments */
> +	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
> +	base = ALIGN(base, alignment);
> +	size = ALIGN(size, alignment);
> +	limit = ALIGN(limit, alignment);
> +
> +	/* Reserve memory */
> +	if (base) {
> +		if (memblock_is_region_reserved(base, size) ||
> +		    memblock_reserve(base, size) < 0) {
> +			base = -EBUSY;
> +			goto err;
> +		}
> +	} else {
> +		/*
> +		 * Use __memblock_alloc_base() since
> +		 * memblock_alloc_base() panic()s.
> +		 */
> +		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> +		if (!addr) {
> +			base = -ENOMEM;
> +			goto err;
> +		} else if (addr + size > ~(unsigned long)0) {
> +			memblock_free(addr, size);
> +			base = -EOVERFLOW;
> +			goto err;
> +		} else {
> +			base = addr;
> +		}
> +	}
> +
> +	/*
> +	 * Each reserved area must be initialised later, when more kernel
> +	 * subsystems (like slab allocator) are available.
> +	 */
> +	r->start = base;
> +	r->size = size;
> +	r->dev = dev;
> +	cma_reserved_count++;
> +	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
> +	       (unsigned long)base);
> +
> +	/*
> +	 * Architecture specific contiguous memory fixup.
> +	 */
> +	dma_contiguous_early_fixup(base, size);
> +	return 0;
> +err:
> +	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
> +	return base;
> +}
> +
> +/**
> + * dma_alloc_from_contiguous() - allocate pages from contiguous area
> + * @dev:   Pointer to device for which the allocation is performed.
> + * @count: Requested number of pages.
> + * @align: Requested alignment of pages (in PAGE_SIZE order).
> + *
> + * This funtion allocates memory buffer for specified device. It uses
> + * device specific contiguous memory area if available or the default
> + * global one. Requires architecture specific get_dev_cma_area() helper
> + * function.
> + */
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int align)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn, pageno;
> +	int ret;
> +
> +	if (!cma)
> +		return NULL;
> +
> +	if (align > CONFIG_CMA_ALIGNMENT)
> +		align = CONFIG_CMA_ALIGNMENT;
> +
> +	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
> +		 count, align);
> +
> +	if (!count)
> +		return NULL;
> +
> +	mutex_lock(&cma_mutex);
> +
> +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
> +					    (1 << align) - 1);
> +	if (pageno >= cma->count) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +	bitmap_set(cma->bitmap, pageno, count);
> +
> +	pfn = cma->base_pfn + pageno;
> +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
> +	if (ret)
> +		goto free;
> +

If alloc_contig_range returns failure, the bitmap is still set. It will
never be freed so now the area cannot be used for CMA allocations any
more.

> +	mutex_unlock(&cma_mutex);
> +
> +	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
> +	return pfn_to_page(pfn);
> +free:
> +	bitmap_clear(cma->bitmap, pageno, count);
> +error:
> +	mutex_unlock(&cma_mutex);
> +	return NULL;
> +}
> +
> +/**
> + * dma_release_from_contiguous() - release allocated pages
> + * @dev:   Pointer to device for which the pages were allocated.
> + * @pages: Allocated pages.
> + * @count: Number of allocated pages.
> + *
> + * This funtion releases memory allocated by dma_alloc_from_contiguous().
> + * It return 0 when provided pages doen't belongs to contiguous area and
> + * 1 on success.
> + */
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn;
> +
> +	if (!cma || !pages)
> +		return 0;
> +
> +	pr_debug("%s(page %p)\n", __func__, (void *)pages);
> +
> +	pfn = page_to_pfn(pages);
> +
> +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
> +		return 0;
> +
> +	mutex_lock(&cma_mutex);
> +
> +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> +	free_contig_pages(pfn, count);
> +
> +	mutex_unlock(&cma_mutex);

It feels like the mutex could be a lot lighter here. If the bitmap is
protected by a spinlock, it would only need to be held while the bitmap
was being cleared. free the contig pages outside the spinlock and clear
the bitmap afterwards. 

It's not particularly important as the scalability of CMA is not
something to be concerned with at this point.

> +	return 1;
> +}
> diff --git a/include/asm-generic/dma-contiguous.h b/include/asm-generic/dma-contiguous.h
> new file mode 100644
> index 0000000..8c76649
> --- /dev/null
> +++ b/include/asm-generic/dma-contiguous.h
> @@ -0,0 +1,27 @@
> +#ifndef ASM_DMA_CONTIGUOUS_H
> +#define ASM_DMA_CONTIGUOUS_H
> +
> +#ifdef __KERNEL__
> +
> +#include <linux/device.h>
> +#include <linux/dma-contiguous.h>
> +
> +#ifdef CONFIG_CMA
> +
> +static inline struct cma *get_dev_cma_area(struct device *dev)
> +{
> +	if (dev && dev->cma_area)
> +		return dev->cma_area;
> +	return dma_contiguous_default_area;
> +}
> +
> +static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
> +{
> +	if (dev)
> +		dev->cma_area = cma;
> +	dma_contiguous_default_area = cma;
> +}
> +
> +#endif
> +#endif
> +#endif
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 8bab5c4..cc1e7f0 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -592,6 +592,10 @@ struct device {
>  
>  	struct dma_coherent_mem	*dma_mem; /* internal for coherent mem
>  					     override */
> +#ifdef CONFIG_CMA
> +	struct cma *cma_area;		/* contiguous memory area for dma
> +					   allocations */
> +#endif
>  	/* arch specific additions */
>  	struct dev_archdata	archdata;
>  
> diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
> new file mode 100644
> index 0000000..7ca81c9
> --- /dev/null
> +++ b/include/linux/dma-contiguous.h
> @@ -0,0 +1,106 @@
> +#ifndef __LINUX_CMA_H
> +#define __LINUX_CMA_H
> +
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +/*
> + * Contiguous Memory Allocator
> + *
> + *   The Contiguous Memory Allocator (CMA) makes it possible to
> + *   allocate big contiguous chunks of memory after the system has
> + *   booted.
> + *
> + * Why is it needed?
> + *
> + *   Various devices on embedded systems have no scatter-getter and/or
> + *   IO map support and require contiguous blocks of memory to
> + *   operate.  They include devices such as cameras, hardware video
> + *   coders, etc.
> + *
> + *   Such devices often require big memory buffers (a full HD frame
> + *   is, for instance, more then 2 mega pixels large, i.e. more than 6
> + *   MB of memory), which makes mechanisms such as kmalloc() or
> + *   alloc_page() ineffective.
> + *
> + *   At the same time, a solution where a big memory region is
> + *   reserved for a device is suboptimal since often more memory is
> + *   reserved then strictly required and, moreover, the memory is
> + *   inaccessible to page system even if device drivers don't use it.
> + *
> + *   CMA tries to solve this issue by operating on memory regions
> + *   where only movable pages can be allocated from.  This way, kernel
> + *   can use the memory for pagecache and when device driver requests
> + *   it, allocated pages can be migrated.
> + *
> + * Driver usage
> + *
> + *   CMA should not be used by the device drivers directly. It is
> + *   only a helper framework for dma-mapping subsystem.
> + *
> + *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
> + */
> +
> +#ifdef __KERNEL__
> +
> +struct cma;
> +struct page;
> +struct device;
> +
> +#ifdef CONFIG_CMA
> +
> +#define MAX_CMA_AREAS	(8)
> +
> +extern struct cma *dma_contiguous_default_area;
> +
> +void dma_contiguous_reserve(phys_addr_t addr_limit);
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   phys_addr_t base, phys_addr_t limit);
> +
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order);
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count);
> +
> +#else
> +
> +#define MAX_CMA_AREAS	(0)
> +
> +static inline void dma_contiguous_reserve(phys_addr_t limit) { }
> +
> +static inline
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   phys_addr_t base, phys_addr_t limit)
> +{
> +	return -ENOSYS;
> +}
> +
> +static inline
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order)
> +{
> +	return NULL;
> +}
> +
> +static inline
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	return 0;
> +}
> +
> +#endif
> +
> +#endif
> +
> +#endif
> -- 
> 1.7.1.569.g6f426
> 

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-10-18 13:43     ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 13:43 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:
> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  arch/Kconfig                         |    3 +
>  drivers/base/Kconfig                 |   79 +++++++
>  drivers/base/Makefile                |    1 +
>  drivers/base/dma-contiguous.c        |  386 ++++++++++++++++++++++++++++++++++
>  include/asm-generic/dma-contiguous.h |   27 +++
>  include/linux/device.h               |    4 +
>  include/linux/dma-contiguous.h       |  106 ++++++++++
>  7 files changed, 606 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/asm-generic/dma-contiguous.h
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 4b0669c..a3b39a2 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
>  config HAVE_DMA_ATTRS
>  	bool
>  
> +config HAVE_DMA_CONTIGUOUS
> +	bool
> +
>  config USE_GENERIC_SMP_HELPERS
>  	bool
>  
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 21cf46f..a5e6d75 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -174,4 +174,83 @@ config SYS_HYPERVISOR
>  
>  source "drivers/base/regmap/Kconfig"
>  
> +config CMA
> +	bool "Contiguous Memory Allocator (EXPERIMENTAL)"
> +	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL
> +	select MIGRATION
> +	select CMA_MIGRATE_TYPE
> +	help
> +	  This enables the Contiguous Memory Allocator which allows drivers
> +	  to allocate big physically-contiguous blocks of memory for use with
> +	  hardware components that do not support I/O map nor scatter-gather.
> +
> +	  For more information see <include/linux/dma-contiguous.h>.
> +	  If unsure, say "n".
> +
> +if CMA
> +
> +config CMA_DEBUG
> +	bool "CMA debug messages (DEVELOPEMENT)"

s/DEVELOPEMENT/DEVELOPMENT/

Should it be under DEBUG_KERNEL?

> +	help
> +	  Turns on debug messages in CMA.  This produces KERN_DEBUG
> +	  messages for every CMA call as well as various messages while
> +	  processing calls such as dma_alloc_from_contiguous().
> +	  This option does not affect warning and error messages.
> +
> +comment "Default contiguous memory area size:"
> +
> +config CMA_SIZE_ABSOLUTE
> +	int "Absolute size (in MiB)"
> +	depends on !CMA_SIZE_SEL_PERCENTAGE
> +	default 16
> +	help
> +	  Defines the size (in MiB) of the default memory area for Contiguous
> +	  Memory Allocator.
> +
> +config CMA_SIZE_PERCENTAGE
> +	int "Percentage of total memory"
> +	depends on !CMA_SIZE_SEL_ABSOLUTE
> +	default 10
> +	help
> +	  Defines the size of the default memory area for Contiguous Memory
> +	  Allocator as a percentage of the total memory in the system.
> +

Why is this not a kernel parameter rather than a config option?

Better yet, why do drivers not register how much CMA memory they are
interested in and then the drive core figure out if it can allocate that
much or not?

> +choice
> +	prompt "Selected region size"
> +	default CMA_SIZE_SEL_ABSOLUTE
> +
> +config CMA_SIZE_SEL_ABSOLUTE
> +	bool "Use absolute value only"
> +
> +config CMA_SIZE_SEL_PERCENTAGE
> +	bool "Use percentage value only"
> +
> +config CMA_SIZE_SEL_MIN
> +	bool "Use lower value (minimum)"
> +
> +config CMA_SIZE_SEL_MAX
> +	bool "Use higher value (maximum)"
> +
> +endchoice
> +
> +config CMA_ALIGNMENT
> +	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
> +	range 4 9
> +	default 8
> +	help
> +	  DMA mapping framework by default aligns all buffers to the smallest
> +	  PAGE_SIZE order which is greater than or equal to the requested buffer
> +	  size. This works well for buffers up to a few hundreds kilobytes, but
> +	  for larger buffers it just a memory waste. With this parameter you can
> +	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
> +	  buffers will be aligned only to this specified order. The order is
> +	  expressed as a power of two multiplied by the PAGE_SIZE.
> +
> +	  For example, if your system defaults to 4KiB pages, the order value
> +	  of 8 means that the buffers will be aligned up to 1MiB only.
> +
> +	  If unsure, leave the default value "8".
> +
> +endif
> +
>  endmenu
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 99a375a..794546f 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> +obj-$(CONFIG_CMA) += dma-contiguous.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
>  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> new file mode 100644
> index 0000000..e54bb76
> --- /dev/null
> +++ b/drivers/base/dma-contiguous.c
> @@ -0,0 +1,386 @@
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +#define pr_fmt(fmt) "cma: " fmt
> +
> +#ifdef CONFIG_CMA_DEBUG
> +#ifndef DEBUG
> +#  define DEBUG
> +#endif
> +#endif
> +
> +#include <asm/page.h>
> +#include <asm/dma-contiguous.h>
> +
> +#include <linux/memblock.h>
> +#include <linux/err.h>
> +#include <linux/mm.h>
> +#include <linux/mutex.h>
> +#include <linux/page-isolation.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +#include <linux/mm_types.h>
> +#include <linux/dma-contiguous.h>
> +
> +#ifndef SZ_1M
> +#define SZ_1M (1 << 20)
> +#endif
> +
> +#ifdef phys_to_pfn
> +/* nothing to do */
> +#elif defined __phys_to_pfn
> +#  define phys_to_pfn __phys_to_pfn
> +#elif defined __va
> +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> +#else
> +#  error phys_to_pfn implementation needed
> +#endif
> +

Parts of this are assuming that there is a linear mapping of virtual to
physical memory. I think this is always the case but it looks like
something that should be defined in asm-generic with an option for
architectures to override.

> +struct cma {
> +	unsigned long	base_pfn;
> +	unsigned long	count;
> +	unsigned long	*bitmap;
> +};
> +
> +struct cma *dma_contiguous_default_area;
> +
> +#ifndef CONFIG_CMA_SIZE_ABSOLUTE
> +#define CONFIG_CMA_SIZE_ABSOLUTE 0
> +#endif
> +
> +#ifndef CONFIG_CMA_SIZE_PERCENTAGE
> +#define CONFIG_CMA_SIZE_PERCENTAGE 0
> +#endif
> +
> +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;

SIZE_ABSOLUTE is an odd name. It can't be a negative size. size_bytes
maybe.

> +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> +static long size_cmdline = -1;
> +
> +static int __init early_cma(char *p)
> +{
> +	pr_debug("%s(%s)\n", __func__, p);
> +	size_cmdline = memparse(p, &p);
> +	return 0;
> +}
> +early_param("cma", early_cma);
> +
> +static unsigned long __init __cma_early_get_total_pages(void)
> +{
> +	struct memblock_region *reg;
> +	unsigned long total_pages = 0;
> +
> +	/*
> +	 * We cannot use memblock_phys_mem_size() here, because
> +	 * memblock_analyze() has not been called yet.
> +	 */
> +	for_each_memblock(memory, reg)
> +		total_pages += memblock_region_memory_end_pfn(reg) -
> +			       memblock_region_memory_base_pfn(reg);
> +	return total_pages;
> +}
> +

Is this being called too early yet? What prevents you seeing up the CMA
regions after the page allocator is brought up for example? I understand
that there is a need for the memory to be coherent so maybe that is the
obstacle.

> +/**
> + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> + *
> + * This funtion reserves memory from early allocator. It should be
> + * called by arch specific code once the early allocator (memblock or bootmem)
> + * has been activated and all other subsystems have already allocated/reserved
> + * memory.
> + */
> +void __init dma_contiguous_reserve(phys_addr_t limit)
> +{
> +	unsigned long selected_size = 0;
> +	unsigned long total_pages;
> +
> +	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
> +
> +	total_pages = __cma_early_get_total_pages();
> +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> +
> +	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n",
> +		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
> +		size_abs / SZ_1M, size_percent / SZ_1M);
> +
> +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> +	selected_size = size_abs;
> +#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
> +	selected_size = size_percent;
> +#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
> +	selected_size = min(size_abs, size_percent);
> +#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
> +	selected_size = max(size_abs, size_percent);
> +#endif
> +

It seems very strange to do this at Kconfig time instead of via kernel
parameters.

> +	if (size_cmdline != -1)
> +		selected_size = size_cmdline;
> +
> +	if (!selected_size)
> +		return;
> +
> +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +		 selected_size / SZ_1M);
> +
> +	dma_declare_contiguous(NULL, selected_size, 0, limit);
> +};
> +
> +static DEFINE_MUTEX(cma_mutex);
> +
> +static void __cma_activate_area(unsigned long base_pfn, unsigned long count)
> +{
> +	unsigned long pfn = base_pfn;
> +	unsigned i = count >> pageblock_order;
> +	struct zone *zone;
> +
> +	VM_BUG_ON(!pfn_valid(pfn));

Again, VM_BUG_ON is an extreme reaction. WARN_ON_ONCE, return an error
code and fail gracefully.

> +	zone = page_zone(pfn_to_page(pfn));
> +
> +	do {
> +		unsigned j;
> +		base_pfn = pfn;
> +		for (j = pageblock_nr_pages; j; --j, pfn++) {

This is correct but does not look like any other PFN walker. There are
plenty of examples of where we walk PFN ranges. There is no requirement
to use the same pattern but it does make reviewing easier.

> +			VM_BUG_ON(!pfn_valid(pfn));
> +			VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
> +		}

In the field, this is a no-op as I would assume CONFIG_DEBUG_VM is not
set. This should be checked unconditionally and fail gracefully if necessary.

> +		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
> +	} while (--i);
> +}
> +
> +static struct cma *__cma_create_area(unsigned long base_pfn,
> +				     unsigned long count)
> +{
> +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> +	struct cma *cma;
> +
> +	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
> +
> +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> +	if (!cma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cma->base_pfn = base_pfn;
> +	cma->count = count;
> +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> +
> +	if (!cma->bitmap)
> +		goto no_mem;
> +
> +	__cma_activate_area(base_pfn, count);
> +
> +	pr_debug("%s: returned %p\n", __func__, (void *)cma);
> +	return cma;
> +
> +no_mem:
> +	kfree(cma);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
> +static struct cma_reserved {
> +	phys_addr_t start;
> +	unsigned long size;
> +	struct device *dev;
> +} cma_reserved[MAX_CMA_AREAS] __initdata;
> +static unsigned cma_reserved_count __initdata;
> +
> +static int __init __cma_init_reserved_areas(void)
> +{
> +	struct cma_reserved *r = cma_reserved;
> +	unsigned i = cma_reserved_count;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	for (; i; --i, ++r) {
> +		struct cma *cma;
> +		cma = __cma_create_area(phys_to_pfn(r->start),
> +					r->size >> PAGE_SHIFT);
> +		if (!IS_ERR(cma)) {
> +			if (r->dev)
> +				set_dev_cma_area(r->dev, cma);
> +			else
> +				dma_contiguous_default_area = cma;
> +		}
> +	}
> +	return 0;
> +}
> +core_initcall(__cma_init_reserved_areas);
> +
> +/**
> + * dma_declare_contiguous() - reserve area for contiguous memory handling
> + *			      for particular device
> + * @dev:   Pointer to device structure.
> + * @size:  Size of the reserved memory.
> + * @start: Start address of the reserved memory (optional, 0 for any).
> + * @limit: End address of the reserved memory (optional, 0 for any).
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code when early allocator (memblock or bootmem)
> + * is still activate.
> + */
> +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> +				  phys_addr_t base, phys_addr_t limit)
> +{
> +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> +	unsigned long alignment;
> +
> +	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> +		 (unsigned long)size, (unsigned long)base,
> +		 (unsigned long)limit);
> +
> +	/* Sanity checks */
> +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> +		return -ENOSPC;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +	/* Sanitise input arguments */
> +	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
> +	base = ALIGN(base, alignment);
> +	size = ALIGN(size, alignment);
> +	limit = ALIGN(limit, alignment);
> +
> +	/* Reserve memory */
> +	if (base) {
> +		if (memblock_is_region_reserved(base, size) ||
> +		    memblock_reserve(base, size) < 0) {
> +			base = -EBUSY;
> +			goto err;
> +		}
> +	} else {
> +		/*
> +		 * Use __memblock_alloc_base() since
> +		 * memblock_alloc_base() panic()s.
> +		 */
> +		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> +		if (!addr) {
> +			base = -ENOMEM;
> +			goto err;
> +		} else if (addr + size > ~(unsigned long)0) {
> +			memblock_free(addr, size);
> +			base = -EOVERFLOW;
> +			goto err;
> +		} else {
> +			base = addr;
> +		}
> +	}
> +
> +	/*
> +	 * Each reserved area must be initialised later, when more kernel
> +	 * subsystems (like slab allocator) are available.
> +	 */
> +	r->start = base;
> +	r->size = size;
> +	r->dev = dev;
> +	cma_reserved_count++;
> +	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
> +	       (unsigned long)base);
> +
> +	/*
> +	 * Architecture specific contiguous memory fixup.
> +	 */
> +	dma_contiguous_early_fixup(base, size);
> +	return 0;
> +err:
> +	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
> +	return base;
> +}
> +
> +/**
> + * dma_alloc_from_contiguous() - allocate pages from contiguous area
> + * @dev:   Pointer to device for which the allocation is performed.
> + * @count: Requested number of pages.
> + * @align: Requested alignment of pages (in PAGE_SIZE order).
> + *
> + * This funtion allocates memory buffer for specified device. It uses
> + * device specific contiguous memory area if available or the default
> + * global one. Requires architecture specific get_dev_cma_area() helper
> + * function.
> + */
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int align)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn, pageno;
> +	int ret;
> +
> +	if (!cma)
> +		return NULL;
> +
> +	if (align > CONFIG_CMA_ALIGNMENT)
> +		align = CONFIG_CMA_ALIGNMENT;
> +
> +	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
> +		 count, align);
> +
> +	if (!count)
> +		return NULL;
> +
> +	mutex_lock(&cma_mutex);
> +
> +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
> +					    (1 << align) - 1);
> +	if (pageno >= cma->count) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +	bitmap_set(cma->bitmap, pageno, count);
> +
> +	pfn = cma->base_pfn + pageno;
> +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
> +	if (ret)
> +		goto free;
> +

If alloc_contig_range returns failure, the bitmap is still set. It will
never be freed so now the area cannot be used for CMA allocations any
more.

> +	mutex_unlock(&cma_mutex);
> +
> +	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
> +	return pfn_to_page(pfn);
> +free:
> +	bitmap_clear(cma->bitmap, pageno, count);
> +error:
> +	mutex_unlock(&cma_mutex);
> +	return NULL;
> +}
> +
> +/**
> + * dma_release_from_contiguous() - release allocated pages
> + * @dev:   Pointer to device for which the pages were allocated.
> + * @pages: Allocated pages.
> + * @count: Number of allocated pages.
> + *
> + * This funtion releases memory allocated by dma_alloc_from_contiguous().
> + * It return 0 when provided pages doen't belongs to contiguous area and
> + * 1 on success.
> + */
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn;
> +
> +	if (!cma || !pages)
> +		return 0;
> +
> +	pr_debug("%s(page %p)\n", __func__, (void *)pages);
> +
> +	pfn = page_to_pfn(pages);
> +
> +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
> +		return 0;
> +
> +	mutex_lock(&cma_mutex);
> +
> +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> +	free_contig_pages(pfn, count);
> +
> +	mutex_unlock(&cma_mutex);

It feels like the mutex could be a lot lighter here. If the bitmap is
protected by a spinlock, it would only need to be held while the bitmap
was being cleared. free the contig pages outside the spinlock and clear
the bitmap afterwards. 

It's not particularly important as the scalability of CMA is not
something to be concerned with at this point.

> +	return 1;
> +}
> diff --git a/include/asm-generic/dma-contiguous.h b/include/asm-generic/dma-contiguous.h
> new file mode 100644
> index 0000000..8c76649
> --- /dev/null
> +++ b/include/asm-generic/dma-contiguous.h
> @@ -0,0 +1,27 @@
> +#ifndef ASM_DMA_CONTIGUOUS_H
> +#define ASM_DMA_CONTIGUOUS_H
> +
> +#ifdef __KERNEL__
> +
> +#include <linux/device.h>
> +#include <linux/dma-contiguous.h>
> +
> +#ifdef CONFIG_CMA
> +
> +static inline struct cma *get_dev_cma_area(struct device *dev)
> +{
> +	if (dev && dev->cma_area)
> +		return dev->cma_area;
> +	return dma_contiguous_default_area;
> +}
> +
> +static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
> +{
> +	if (dev)
> +		dev->cma_area = cma;
> +	dma_contiguous_default_area = cma;
> +}
> +
> +#endif
> +#endif
> +#endif
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 8bab5c4..cc1e7f0 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -592,6 +592,10 @@ struct device {
>  
>  	struct dma_coherent_mem	*dma_mem; /* internal for coherent mem
>  					     override */
> +#ifdef CONFIG_CMA
> +	struct cma *cma_area;		/* contiguous memory area for dma
> +					   allocations */
> +#endif
>  	/* arch specific additions */
>  	struct dev_archdata	archdata;
>  
> diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
> new file mode 100644
> index 0000000..7ca81c9
> --- /dev/null
> +++ b/include/linux/dma-contiguous.h
> @@ -0,0 +1,106 @@
> +#ifndef __LINUX_CMA_H
> +#define __LINUX_CMA_H
> +
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +/*
> + * Contiguous Memory Allocator
> + *
> + *   The Contiguous Memory Allocator (CMA) makes it possible to
> + *   allocate big contiguous chunks of memory after the system has
> + *   booted.
> + *
> + * Why is it needed?
> + *
> + *   Various devices on embedded systems have no scatter-getter and/or
> + *   IO map support and require contiguous blocks of memory to
> + *   operate.  They include devices such as cameras, hardware video
> + *   coders, etc.
> + *
> + *   Such devices often require big memory buffers (a full HD frame
> + *   is, for instance, more then 2 mega pixels large, i.e. more than 6
> + *   MB of memory), which makes mechanisms such as kmalloc() or
> + *   alloc_page() ineffective.
> + *
> + *   At the same time, a solution where a big memory region is
> + *   reserved for a device is suboptimal since often more memory is
> + *   reserved then strictly required and, moreover, the memory is
> + *   inaccessible to page system even if device drivers don't use it.
> + *
> + *   CMA tries to solve this issue by operating on memory regions
> + *   where only movable pages can be allocated from.  This way, kernel
> + *   can use the memory for pagecache and when device driver requests
> + *   it, allocated pages can be migrated.
> + *
> + * Driver usage
> + *
> + *   CMA should not be used by the device drivers directly. It is
> + *   only a helper framework for dma-mapping subsystem.
> + *
> + *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
> + */
> +
> +#ifdef __KERNEL__
> +
> +struct cma;
> +struct page;
> +struct device;
> +
> +#ifdef CONFIG_CMA
> +
> +#define MAX_CMA_AREAS	(8)
> +
> +extern struct cma *dma_contiguous_default_area;
> +
> +void dma_contiguous_reserve(phys_addr_t addr_limit);
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   phys_addr_t base, phys_addr_t limit);
> +
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order);
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count);
> +
> +#else
> +
> +#define MAX_CMA_AREAS	(0)
> +
> +static inline void dma_contiguous_reserve(phys_addr_t limit) { }
> +
> +static inline
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   phys_addr_t base, phys_addr_t limit)
> +{
> +	return -ENOSYS;
> +}
> +
> +static inline
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order)
> +{
> +	return NULL;
> +}
> +
> +static inline
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	return 0;
> +}
> +
> +#endif
> +
> +#endif
> +
> +#endif
> -- 
> 1.7.1.569.g6f426
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-10-18 13:43     ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-18 13:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:
> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  arch/Kconfig                         |    3 +
>  drivers/base/Kconfig                 |   79 +++++++
>  drivers/base/Makefile                |    1 +
>  drivers/base/dma-contiguous.c        |  386 ++++++++++++++++++++++++++++++++++
>  include/asm-generic/dma-contiguous.h |   27 +++
>  include/linux/device.h               |    4 +
>  include/linux/dma-contiguous.h       |  106 ++++++++++
>  7 files changed, 606 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/asm-generic/dma-contiguous.h
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 4b0669c..a3b39a2 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
>  config HAVE_DMA_ATTRS
>  	bool
>  
> +config HAVE_DMA_CONTIGUOUS
> +	bool
> +
>  config USE_GENERIC_SMP_HELPERS
>  	bool
>  
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 21cf46f..a5e6d75 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -174,4 +174,83 @@ config SYS_HYPERVISOR
>  
>  source "drivers/base/regmap/Kconfig"
>  
> +config CMA
> +	bool "Contiguous Memory Allocator (EXPERIMENTAL)"
> +	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL
> +	select MIGRATION
> +	select CMA_MIGRATE_TYPE
> +	help
> +	  This enables the Contiguous Memory Allocator which allows drivers
> +	  to allocate big physically-contiguous blocks of memory for use with
> +	  hardware components that do not support I/O map nor scatter-gather.
> +
> +	  For more information see <include/linux/dma-contiguous.h>.
> +	  If unsure, say "n".
> +
> +if CMA
> +
> +config CMA_DEBUG
> +	bool "CMA debug messages (DEVELOPEMENT)"

s/DEVELOPEMENT/DEVELOPMENT/

Should it be under DEBUG_KERNEL?

> +	help
> +	  Turns on debug messages in CMA.  This produces KERN_DEBUG
> +	  messages for every CMA call as well as various messages while
> +	  processing calls such as dma_alloc_from_contiguous().
> +	  This option does not affect warning and error messages.
> +
> +comment "Default contiguous memory area size:"
> +
> +config CMA_SIZE_ABSOLUTE
> +	int "Absolute size (in MiB)"
> +	depends on !CMA_SIZE_SEL_PERCENTAGE
> +	default 16
> +	help
> +	  Defines the size (in MiB) of the default memory area for Contiguous
> +	  Memory Allocator.
> +
> +config CMA_SIZE_PERCENTAGE
> +	int "Percentage of total memory"
> +	depends on !CMA_SIZE_SEL_ABSOLUTE
> +	default 10
> +	help
> +	  Defines the size of the default memory area for Contiguous Memory
> +	  Allocator as a percentage of the total memory in the system.
> +

Why is this not a kernel parameter rather than a config option?

Better yet, why do drivers not register how much CMA memory they are
interested in and then the drive core figure out if it can allocate that
much or not?

> +choice
> +	prompt "Selected region size"
> +	default CMA_SIZE_SEL_ABSOLUTE
> +
> +config CMA_SIZE_SEL_ABSOLUTE
> +	bool "Use absolute value only"
> +
> +config CMA_SIZE_SEL_PERCENTAGE
> +	bool "Use percentage value only"
> +
> +config CMA_SIZE_SEL_MIN
> +	bool "Use lower value (minimum)"
> +
> +config CMA_SIZE_SEL_MAX
> +	bool "Use higher value (maximum)"
> +
> +endchoice
> +
> +config CMA_ALIGNMENT
> +	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
> +	range 4 9
> +	default 8
> +	help
> +	  DMA mapping framework by default aligns all buffers to the smallest
> +	  PAGE_SIZE order which is greater than or equal to the requested buffer
> +	  size. This works well for buffers up to a few hundreds kilobytes, but
> +	  for larger buffers it just a memory waste. With this parameter you can
> +	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
> +	  buffers will be aligned only to this specified order. The order is
> +	  expressed as a power of two multiplied by the PAGE_SIZE.
> +
> +	  For example, if your system defaults to 4KiB pages, the order value
> +	  of 8 means that the buffers will be aligned up to 1MiB only.
> +
> +	  If unsure, leave the default value "8".
> +
> +endif
> +
>  endmenu
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 99a375a..794546f 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> +obj-$(CONFIG_CMA) += dma-contiguous.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
>  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> new file mode 100644
> index 0000000..e54bb76
> --- /dev/null
> +++ b/drivers/base/dma-contiguous.c
> @@ -0,0 +1,386 @@
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +#define pr_fmt(fmt) "cma: " fmt
> +
> +#ifdef CONFIG_CMA_DEBUG
> +#ifndef DEBUG
> +#  define DEBUG
> +#endif
> +#endif
> +
> +#include <asm/page.h>
> +#include <asm/dma-contiguous.h>
> +
> +#include <linux/memblock.h>
> +#include <linux/err.h>
> +#include <linux/mm.h>
> +#include <linux/mutex.h>
> +#include <linux/page-isolation.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +#include <linux/mm_types.h>
> +#include <linux/dma-contiguous.h>
> +
> +#ifndef SZ_1M
> +#define SZ_1M (1 << 20)
> +#endif
> +
> +#ifdef phys_to_pfn
> +/* nothing to do */
> +#elif defined __phys_to_pfn
> +#  define phys_to_pfn __phys_to_pfn
> +#elif defined __va
> +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> +#else
> +#  error phys_to_pfn implementation needed
> +#endif
> +

Parts of this are assuming that there is a linear mapping of virtual to
physical memory. I think this is always the case but it looks like
something that should be defined in asm-generic with an option for
architectures to override.

> +struct cma {
> +	unsigned long	base_pfn;
> +	unsigned long	count;
> +	unsigned long	*bitmap;
> +};
> +
> +struct cma *dma_contiguous_default_area;
> +
> +#ifndef CONFIG_CMA_SIZE_ABSOLUTE
> +#define CONFIG_CMA_SIZE_ABSOLUTE 0
> +#endif
> +
> +#ifndef CONFIG_CMA_SIZE_PERCENTAGE
> +#define CONFIG_CMA_SIZE_PERCENTAGE 0
> +#endif
> +
> +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;

SIZE_ABSOLUTE is an odd name. It can't be a negative size. size_bytes
maybe.

> +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> +static long size_cmdline = -1;
> +
> +static int __init early_cma(char *p)
> +{
> +	pr_debug("%s(%s)\n", __func__, p);
> +	size_cmdline = memparse(p, &p);
> +	return 0;
> +}
> +early_param("cma", early_cma);
> +
> +static unsigned long __init __cma_early_get_total_pages(void)
> +{
> +	struct memblock_region *reg;
> +	unsigned long total_pages = 0;
> +
> +	/*
> +	 * We cannot use memblock_phys_mem_size() here, because
> +	 * memblock_analyze() has not been called yet.
> +	 */
> +	for_each_memblock(memory, reg)
> +		total_pages += memblock_region_memory_end_pfn(reg) -
> +			       memblock_region_memory_base_pfn(reg);
> +	return total_pages;
> +}
> +

Is this being called too early yet? What prevents you seeing up the CMA
regions after the page allocator is brought up for example? I understand
that there is a need for the memory to be coherent so maybe that is the
obstacle.

> +/**
> + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> + *
> + * This funtion reserves memory from early allocator. It should be
> + * called by arch specific code once the early allocator (memblock or bootmem)
> + * has been activated and all other subsystems have already allocated/reserved
> + * memory.
> + */
> +void __init dma_contiguous_reserve(phys_addr_t limit)
> +{
> +	unsigned long selected_size = 0;
> +	unsigned long total_pages;
> +
> +	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
> +
> +	total_pages = __cma_early_get_total_pages();
> +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> +
> +	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n",
> +		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
> +		size_abs / SZ_1M, size_percent / SZ_1M);
> +
> +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> +	selected_size = size_abs;
> +#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
> +	selected_size = size_percent;
> +#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
> +	selected_size = min(size_abs, size_percent);
> +#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
> +	selected_size = max(size_abs, size_percent);
> +#endif
> +

It seems very strange to do this at Kconfig time instead of via kernel
parameters.

> +	if (size_cmdline != -1)
> +		selected_size = size_cmdline;
> +
> +	if (!selected_size)
> +		return;
> +
> +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +		 selected_size / SZ_1M);
> +
> +	dma_declare_contiguous(NULL, selected_size, 0, limit);
> +};
> +
> +static DEFINE_MUTEX(cma_mutex);
> +
> +static void __cma_activate_area(unsigned long base_pfn, unsigned long count)
> +{
> +	unsigned long pfn = base_pfn;
> +	unsigned i = count >> pageblock_order;
> +	struct zone *zone;
> +
> +	VM_BUG_ON(!pfn_valid(pfn));

Again, VM_BUG_ON is an extreme reaction. WARN_ON_ONCE, return an error
code and fail gracefully.

> +	zone = page_zone(pfn_to_page(pfn));
> +
> +	do {
> +		unsigned j;
> +		base_pfn = pfn;
> +		for (j = pageblock_nr_pages; j; --j, pfn++) {

This is correct but does not look like any other PFN walker. There are
plenty of examples of where we walk PFN ranges. There is no requirement
to use the same pattern but it does make reviewing easier.

> +			VM_BUG_ON(!pfn_valid(pfn));
> +			VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
> +		}

In the field, this is a no-op as I would assume CONFIG_DEBUG_VM is not
set. This should be checked unconditionally and fail gracefully if necessary.

> +		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
> +	} while (--i);
> +}
> +
> +static struct cma *__cma_create_area(unsigned long base_pfn,
> +				     unsigned long count)
> +{
> +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> +	struct cma *cma;
> +
> +	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
> +
> +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> +	if (!cma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cma->base_pfn = base_pfn;
> +	cma->count = count;
> +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> +
> +	if (!cma->bitmap)
> +		goto no_mem;
> +
> +	__cma_activate_area(base_pfn, count);
> +
> +	pr_debug("%s: returned %p\n", __func__, (void *)cma);
> +	return cma;
> +
> +no_mem:
> +	kfree(cma);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
> +static struct cma_reserved {
> +	phys_addr_t start;
> +	unsigned long size;
> +	struct device *dev;
> +} cma_reserved[MAX_CMA_AREAS] __initdata;
> +static unsigned cma_reserved_count __initdata;
> +
> +static int __init __cma_init_reserved_areas(void)
> +{
> +	struct cma_reserved *r = cma_reserved;
> +	unsigned i = cma_reserved_count;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	for (; i; --i, ++r) {
> +		struct cma *cma;
> +		cma = __cma_create_area(phys_to_pfn(r->start),
> +					r->size >> PAGE_SHIFT);
> +		if (!IS_ERR(cma)) {
> +			if (r->dev)
> +				set_dev_cma_area(r->dev, cma);
> +			else
> +				dma_contiguous_default_area = cma;
> +		}
> +	}
> +	return 0;
> +}
> +core_initcall(__cma_init_reserved_areas);
> +
> +/**
> + * dma_declare_contiguous() - reserve area for contiguous memory handling
> + *			      for particular device
> + * @dev:   Pointer to device structure.
> + * @size:  Size of the reserved memory.
> + * @start: Start address of the reserved memory (optional, 0 for any).
> + * @limit: End address of the reserved memory (optional, 0 for any).
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code when early allocator (memblock or bootmem)
> + * is still activate.
> + */
> +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> +				  phys_addr_t base, phys_addr_t limit)
> +{
> +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> +	unsigned long alignment;
> +
> +	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> +		 (unsigned long)size, (unsigned long)base,
> +		 (unsigned long)limit);
> +
> +	/* Sanity checks */
> +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> +		return -ENOSPC;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +	/* Sanitise input arguments */
> +	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
> +	base = ALIGN(base, alignment);
> +	size = ALIGN(size, alignment);
> +	limit = ALIGN(limit, alignment);
> +
> +	/* Reserve memory */
> +	if (base) {
> +		if (memblock_is_region_reserved(base, size) ||
> +		    memblock_reserve(base, size) < 0) {
> +			base = -EBUSY;
> +			goto err;
> +		}
> +	} else {
> +		/*
> +		 * Use __memblock_alloc_base() since
> +		 * memblock_alloc_base() panic()s.
> +		 */
> +		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> +		if (!addr) {
> +			base = -ENOMEM;
> +			goto err;
> +		} else if (addr + size > ~(unsigned long)0) {
> +			memblock_free(addr, size);
> +			base = -EOVERFLOW;
> +			goto err;
> +		} else {
> +			base = addr;
> +		}
> +	}
> +
> +	/*
> +	 * Each reserved area must be initialised later, when more kernel
> +	 * subsystems (like slab allocator) are available.
> +	 */
> +	r->start = base;
> +	r->size = size;
> +	r->dev = dev;
> +	cma_reserved_count++;
> +	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
> +	       (unsigned long)base);
> +
> +	/*
> +	 * Architecture specific contiguous memory fixup.
> +	 */
> +	dma_contiguous_early_fixup(base, size);
> +	return 0;
> +err:
> +	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
> +	return base;
> +}
> +
> +/**
> + * dma_alloc_from_contiguous() - allocate pages from contiguous area
> + * @dev:   Pointer to device for which the allocation is performed.
> + * @count: Requested number of pages.
> + * @align: Requested alignment of pages (in PAGE_SIZE order).
> + *
> + * This funtion allocates memory buffer for specified device. It uses
> + * device specific contiguous memory area if available or the default
> + * global one. Requires architecture specific get_dev_cma_area() helper
> + * function.
> + */
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int align)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn, pageno;
> +	int ret;
> +
> +	if (!cma)
> +		return NULL;
> +
> +	if (align > CONFIG_CMA_ALIGNMENT)
> +		align = CONFIG_CMA_ALIGNMENT;
> +
> +	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
> +		 count, align);
> +
> +	if (!count)
> +		return NULL;
> +
> +	mutex_lock(&cma_mutex);
> +
> +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
> +					    (1 << align) - 1);
> +	if (pageno >= cma->count) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +	bitmap_set(cma->bitmap, pageno, count);
> +
> +	pfn = cma->base_pfn + pageno;
> +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
> +	if (ret)
> +		goto free;
> +

If alloc_contig_range returns failure, the bitmap is still set. It will
never be freed so now the area cannot be used for CMA allocations any
more.

> +	mutex_unlock(&cma_mutex);
> +
> +	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
> +	return pfn_to_page(pfn);
> +free:
> +	bitmap_clear(cma->bitmap, pageno, count);
> +error:
> +	mutex_unlock(&cma_mutex);
> +	return NULL;
> +}
> +
> +/**
> + * dma_release_from_contiguous() - release allocated pages
> + * @dev:   Pointer to device for which the pages were allocated.
> + * @pages: Allocated pages.
> + * @count: Number of allocated pages.
> + *
> + * This funtion releases memory allocated by dma_alloc_from_contiguous().
> + * It return 0 when provided pages doen't belongs to contiguous area and
> + * 1 on success.
> + */
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn;
> +
> +	if (!cma || !pages)
> +		return 0;
> +
> +	pr_debug("%s(page %p)\n", __func__, (void *)pages);
> +
> +	pfn = page_to_pfn(pages);
> +
> +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
> +		return 0;
> +
> +	mutex_lock(&cma_mutex);
> +
> +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> +	free_contig_pages(pfn, count);
> +
> +	mutex_unlock(&cma_mutex);

It feels like the mutex could be a lot lighter here. If the bitmap is
protected by a spinlock, it would only need to be held while the bitmap
was being cleared. free the contig pages outside the spinlock and clear
the bitmap afterwards. 

It's not particularly important as the scalability of CMA is not
something to be concerned with at this point.

> +	return 1;
> +}
> diff --git a/include/asm-generic/dma-contiguous.h b/include/asm-generic/dma-contiguous.h
> new file mode 100644
> index 0000000..8c76649
> --- /dev/null
> +++ b/include/asm-generic/dma-contiguous.h
> @@ -0,0 +1,27 @@
> +#ifndef ASM_DMA_CONTIGUOUS_H
> +#define ASM_DMA_CONTIGUOUS_H
> +
> +#ifdef __KERNEL__
> +
> +#include <linux/device.h>
> +#include <linux/dma-contiguous.h>
> +
> +#ifdef CONFIG_CMA
> +
> +static inline struct cma *get_dev_cma_area(struct device *dev)
> +{
> +	if (dev && dev->cma_area)
> +		return dev->cma_area;
> +	return dma_contiguous_default_area;
> +}
> +
> +static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
> +{
> +	if (dev)
> +		dev->cma_area = cma;
> +	dma_contiguous_default_area = cma;
> +}
> +
> +#endif
> +#endif
> +#endif
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 8bab5c4..cc1e7f0 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -592,6 +592,10 @@ struct device {
>  
>  	struct dma_coherent_mem	*dma_mem; /* internal for coherent mem
>  					     override */
> +#ifdef CONFIG_CMA
> +	struct cma *cma_area;		/* contiguous memory area for dma
> +					   allocations */
> +#endif
>  	/* arch specific additions */
>  	struct dev_archdata	archdata;
>  
> diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
> new file mode 100644
> index 0000000..7ca81c9
> --- /dev/null
> +++ b/include/linux/dma-contiguous.h
> @@ -0,0 +1,106 @@
> +#ifndef __LINUX_CMA_H
> +#define __LINUX_CMA_H
> +
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +/*
> + * Contiguous Memory Allocator
> + *
> + *   The Contiguous Memory Allocator (CMA) makes it possible to
> + *   allocate big contiguous chunks of memory after the system has
> + *   booted.
> + *
> + * Why is it needed?
> + *
> + *   Various devices on embedded systems have no scatter-getter and/or
> + *   IO map support and require contiguous blocks of memory to
> + *   operate.  They include devices such as cameras, hardware video
> + *   coders, etc.
> + *
> + *   Such devices often require big memory buffers (a full HD frame
> + *   is, for instance, more then 2 mega pixels large, i.e. more than 6
> + *   MB of memory), which makes mechanisms such as kmalloc() or
> + *   alloc_page() ineffective.
> + *
> + *   At the same time, a solution where a big memory region is
> + *   reserved for a device is suboptimal since often more memory is
> + *   reserved then strictly required and, moreover, the memory is
> + *   inaccessible to page system even if device drivers don't use it.
> + *
> + *   CMA tries to solve this issue by operating on memory regions
> + *   where only movable pages can be allocated from.  This way, kernel
> + *   can use the memory for pagecache and when device driver requests
> + *   it, allocated pages can be migrated.
> + *
> + * Driver usage
> + *
> + *   CMA should not be used by the device drivers directly. It is
> + *   only a helper framework for dma-mapping subsystem.
> + *
> + *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
> + */
> +
> +#ifdef __KERNEL__
> +
> +struct cma;
> +struct page;
> +struct device;
> +
> +#ifdef CONFIG_CMA
> +
> +#define MAX_CMA_AREAS	(8)
> +
> +extern struct cma *dma_contiguous_default_area;
> +
> +void dma_contiguous_reserve(phys_addr_t addr_limit);
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   phys_addr_t base, phys_addr_t limit);
> +
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order);
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count);
> +
> +#else
> +
> +#define MAX_CMA_AREAS	(0)
> +
> +static inline void dma_contiguous_reserve(phys_addr_t limit) { }
> +
> +static inline
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   phys_addr_t base, phys_addr_t limit)
> +{
> +	return -ENOSYS;
> +}
> +
> +static inline
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order)
> +{
> +	return NULL;
> +}
> +
> +static inline
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	return 0;
> +}
> +
> +#endif
> +
> +#endif
> +
> +#endif
> -- 
> 1.7.1.569.g6f426
> 

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-18 12:21     ` Mel Gorman
  (?)
@ 2011-10-18 17:26       ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-18 17:26 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:

> At this point, I'm going to apologise for not reviewing this a long long
> time ago.
>
> On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> This commit introduces alloc_contig_freed_pages() function
>> which allocates (ie. removes from buddy system) free pages
>> in range. Caller has to guarantee that all pages in range
>> are in buddy system.
>>
>
> Straight away, I'm wondering why you didn't use
>
> mm/compaction.c#isolate_freepages()
>
> It knows how to isolate pages within ranges. All its control information
> is passed via struct compact_control() which I recognise may be awkward
> for CMA but compaction.c know how to manage all the isolated pages and
> pass them to migrate.c appropriately.

It is something to consider.  At first glance, I see that isolate_freepages
seem to operate on pageblocks which is not desired for CMA.

> I haven't read all the patches yet but isolate_freepages() does break
> everything up into order-0 pages. This may not be to your liking but it
> would not be possible to change.

Splitting everything into order-0 pages is desired behaviour.

>> Along with this function, a free_contig_pages() function is
>> provided which frees all (or a subset of) pages allocated
>> with alloc_contig_free_pages().

> mm/compaction.c#release_freepages()

It sort of does the same thing but release_freepages() assumes that pages
that are being freed are not-continuous and they need to be on the lru list.
With free_contig_pages(), we can assume all pages are continuous.

>> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>> +/*
>> + * Both PFNs must be from the same zone!  If this function returns
>> + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
>> + */
>> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
>> +{
>> +	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
>> +}
>> +
>
> Why do you care what section the page is in? The zone is important all
> right, but not the section. Also, offhand I'm unsure if being in the
> same section guarantees the same zone. sections are ordinarily fully
> populated (except on ARM but hey) but I can't remember anything
> enforcing that zones be section-aligned.
>
> Later I think I see that the intention was to reduce the use of
> pfn_to_page().

That is correct.

> You can do this in a more general fashion by checking the
> zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
> That will not be SPARSEMEM specific.

I've tried doing stuff that way but it ended up with much more code.

Dave suggested the above function to check if pointer arithmetic is valid.

Please see also <https://lkml.org/lkml/2011/9/21/220>.

>
>> +#else
>> +
>> +#define zone_pfn_same_memmap(pfn1, pfn2) (true)
>> +
>> +#endif
>> +
>>  #endif /* !__GENERATING_BOUNDS.H */
>>  #endif /* !__ASSEMBLY__ */
>>  #endif /* _LINUX_MMZONE_H */


>> @@ -5706,6 +5706,73 @@ out:
>>  	spin_unlock_irqrestore(&zone->lock, flags);
>>  }
>>
>> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
>> +				       gfp_t flag)
>> +{
>> +	unsigned long pfn = start, count;
>> +	struct page *page;
>> +	struct zone *zone;
>> +	int order;
>> +
>> +	VM_BUG_ON(!pfn_valid(start));
>
> VM_BUG_ON seems very harsh here. WARN_ON_ONCE and returning 0 to the
> caller sees reasonable.
>
>> +	page = pfn_to_page(start);
>> +	zone = page_zone(page);
>> +
>> +	spin_lock_irq(&zone->lock);
>> +
>> +	for (;;) {
>> +		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
>> +			  page_zone(page) != zone);
>> +
>
> Here you will VM_BUG_ON with the zone lock held leading to system
> halting very shortly.
>
>> +		list_del(&page->lru);
>> +		order = page_order(page);
>> +		count = 1UL << order;
>> +		zone->free_area[order].nr_free--;
>> +		rmv_page_order(page);
>> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
>> +
>
> The callers need to check in advance if watermarks are sufficient for
> this. In compaction, it happens in compaction_suitable() because it only
> needed to be checked once. Your requirements might be different.
>
>> +		pfn += count;
>> +		if (pfn >= end)
>> +			break;
>> +		VM_BUG_ON(!pfn_valid(pfn));
>> +
>
> On ARM, it's possible to encounter invalid pages. VM_BUG_ON is serious
> overkill.
>
>> +		if (zone_pfn_same_memmap(pfn - count, pfn))
>> +			page += count;
>> +		else
>> +			page = pfn_to_page(pfn);
>> +	}
>> +
>> +	spin_unlock_irq(&zone->lock);
>> +
>> +	/* After this, pages in the range can be freed one be one */
>> +	count = pfn - start;
>> +	pfn = start;
>> +	for (page = pfn_to_page(pfn); count; --count) {
>> +		prep_new_page(page, 0, flag);
>> +		++pfn;
>> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
>> +			++page;
>> +		else
>> +			page = pfn_to_page(pfn);
>> +	}
>> +
>
> Here it looks like you have implemented something like split_free_page().

split_free_page() takes a single page, removes it from buddy system, and finally
splits it.  alloc_contig_freed_pages() takes a range of pages, removes them from
buddy system, and finally splits them.  Because it works on a range, it is
made into a separate function.

>> +	return pfn;
>> +}

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-18 17:26       ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-18 17:26 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:

> At this point, I'm going to apologise for not reviewing this a long long
> time ago.
>
> On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> This commit introduces alloc_contig_freed_pages() function
>> which allocates (ie. removes from buddy system) free pages
>> in range. Caller has to guarantee that all pages in range
>> are in buddy system.
>>
>
> Straight away, I'm wondering why you didn't use
>
> mm/compaction.c#isolate_freepages()
>
> It knows how to isolate pages within ranges. All its control information
> is passed via struct compact_control() which I recognise may be awkward
> for CMA but compaction.c know how to manage all the isolated pages and
> pass them to migrate.c appropriately.

It is something to consider.  At first glance, I see that isolate_freepages
seem to operate on pageblocks which is not desired for CMA.

> I haven't read all the patches yet but isolate_freepages() does break
> everything up into order-0 pages. This may not be to your liking but it
> would not be possible to change.

Splitting everything into order-0 pages is desired behaviour.

>> Along with this function, a free_contig_pages() function is
>> provided which frees all (or a subset of) pages allocated
>> with alloc_contig_free_pages().

> mm/compaction.c#release_freepages()

It sort of does the same thing but release_freepages() assumes that pages
that are being freed are not-continuous and they need to be on the lru list.
With free_contig_pages(), we can assume all pages are continuous.

>> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>> +/*
>> + * Both PFNs must be from the same zone!  If this function returns
>> + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
>> + */
>> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
>> +{
>> +	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
>> +}
>> +
>
> Why do you care what section the page is in? The zone is important all
> right, but not the section. Also, offhand I'm unsure if being in the
> same section guarantees the same zone. sections are ordinarily fully
> populated (except on ARM but hey) but I can't remember anything
> enforcing that zones be section-aligned.
>
> Later I think I see that the intention was to reduce the use of
> pfn_to_page().

That is correct.

> You can do this in a more general fashion by checking the
> zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
> That will not be SPARSEMEM specific.

I've tried doing stuff that way but it ended up with much more code.

Dave suggested the above function to check if pointer arithmetic is valid.

Please see also <https://lkml.org/lkml/2011/9/21/220>.

>
>> +#else
>> +
>> +#define zone_pfn_same_memmap(pfn1, pfn2) (true)
>> +
>> +#endif
>> +
>>  #endif /* !__GENERATING_BOUNDS.H */
>>  #endif /* !__ASSEMBLY__ */
>>  #endif /* _LINUX_MMZONE_H */


>> @@ -5706,6 +5706,73 @@ out:
>>  	spin_unlock_irqrestore(&zone->lock, flags);
>>  }
>>
>> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
>> +				       gfp_t flag)
>> +{
>> +	unsigned long pfn = start, count;
>> +	struct page *page;
>> +	struct zone *zone;
>> +	int order;
>> +
>> +	VM_BUG_ON(!pfn_valid(start));
>
> VM_BUG_ON seems very harsh here. WARN_ON_ONCE and returning 0 to the
> caller sees reasonable.
>
>> +	page = pfn_to_page(start);
>> +	zone = page_zone(page);
>> +
>> +	spin_lock_irq(&zone->lock);
>> +
>> +	for (;;) {
>> +		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
>> +			  page_zone(page) != zone);
>> +
>
> Here you will VM_BUG_ON with the zone lock held leading to system
> halting very shortly.
>
>> +		list_del(&page->lru);
>> +		order = page_order(page);
>> +		count = 1UL << order;
>> +		zone->free_area[order].nr_free--;
>> +		rmv_page_order(page);
>> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
>> +
>
> The callers need to check in advance if watermarks are sufficient for
> this. In compaction, it happens in compaction_suitable() because it only
> needed to be checked once. Your requirements might be different.
>
>> +		pfn += count;
>> +		if (pfn >= end)
>> +			break;
>> +		VM_BUG_ON(!pfn_valid(pfn));
>> +
>
> On ARM, it's possible to encounter invalid pages. VM_BUG_ON is serious
> overkill.
>
>> +		if (zone_pfn_same_memmap(pfn - count, pfn))
>> +			page += count;
>> +		else
>> +			page = pfn_to_page(pfn);
>> +	}
>> +
>> +	spin_unlock_irq(&zone->lock);
>> +
>> +	/* After this, pages in the range can be freed one be one */
>> +	count = pfn - start;
>> +	pfn = start;
>> +	for (page = pfn_to_page(pfn); count; --count) {
>> +		prep_new_page(page, 0, flag);
>> +		++pfn;
>> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
>> +			++page;
>> +		else
>> +			page = pfn_to_page(pfn);
>> +	}
>> +
>
> Here it looks like you have implemented something like split_free_page().

split_free_page() takes a single page, removes it from buddy system, and finally
splits it.  alloc_contig_freed_pages() takes a range of pages, removes them from
buddy system, and finally splits them.  Because it works on a range, it is
made into a separate function.

>> +	return pfn;
>> +}

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-18 17:26       ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-18 17:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:

> At this point, I'm going to apologise for not reviewing this a long long
> time ago.
>
> On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> This commit introduces alloc_contig_freed_pages() function
>> which allocates (ie. removes from buddy system) free pages
>> in range. Caller has to guarantee that all pages in range
>> are in buddy system.
>>
>
> Straight away, I'm wondering why you didn't use
>
> mm/compaction.c#isolate_freepages()
>
> It knows how to isolate pages within ranges. All its control information
> is passed via struct compact_control() which I recognise may be awkward
> for CMA but compaction.c know how to manage all the isolated pages and
> pass them to migrate.c appropriately.

It is something to consider.  At first glance, I see that isolate_freepages
seem to operate on pageblocks which is not desired for CMA.

> I haven't read all the patches yet but isolate_freepages() does break
> everything up into order-0 pages. This may not be to your liking but it
> would not be possible to change.

Splitting everything into order-0 pages is desired behaviour.

>> Along with this function, a free_contig_pages() function is
>> provided which frees all (or a subset of) pages allocated
>> with alloc_contig_free_pages().

> mm/compaction.c#release_freepages()

It sort of does the same thing but release_freepages() assumes that pages
that are being freed are not-continuous and they need to be on the lru list.
With free_contig_pages(), we can assume all pages are continuous.

>> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>> +/*
>> + * Both PFNs must be from the same zone!  If this function returns
>> + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
>> + */
>> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
>> +{
>> +	return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
>> +}
>> +
>
> Why do you care what section the page is in? The zone is important all
> right, but not the section. Also, offhand I'm unsure if being in the
> same section guarantees the same zone. sections are ordinarily fully
> populated (except on ARM but hey) but I can't remember anything
> enforcing that zones be section-aligned.
>
> Later I think I see that the intention was to reduce the use of
> pfn_to_page().

That is correct.

> You can do this in a more general fashion by checking the
> zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
> That will not be SPARSEMEM specific.

I've tried doing stuff that way but it ended up with much more code.

Dave suggested the above function to check if pointer arithmetic is valid.

Please see also <https://lkml.org/lkml/2011/9/21/220>.

>
>> +#else
>> +
>> +#define zone_pfn_same_memmap(pfn1, pfn2) (true)
>> +
>> +#endif
>> +
>>  #endif /* !__GENERATING_BOUNDS.H */
>>  #endif /* !__ASSEMBLY__ */
>>  #endif /* _LINUX_MMZONE_H */


>> @@ -5706,6 +5706,73 @@ out:
>>  	spin_unlock_irqrestore(&zone->lock, flags);
>>  }
>>
>> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
>> +				       gfp_t flag)
>> +{
>> +	unsigned long pfn = start, count;
>> +	struct page *page;
>> +	struct zone *zone;
>> +	int order;
>> +
>> +	VM_BUG_ON(!pfn_valid(start));
>
> VM_BUG_ON seems very harsh here. WARN_ON_ONCE and returning 0 to the
> caller sees reasonable.
>
>> +	page = pfn_to_page(start);
>> +	zone = page_zone(page);
>> +
>> +	spin_lock_irq(&zone->lock);
>> +
>> +	for (;;) {
>> +		VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
>> +			  page_zone(page) != zone);
>> +
>
> Here you will VM_BUG_ON with the zone lock held leading to system
> halting very shortly.
>
>> +		list_del(&page->lru);
>> +		order = page_order(page);
>> +		count = 1UL << order;
>> +		zone->free_area[order].nr_free--;
>> +		rmv_page_order(page);
>> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
>> +
>
> The callers need to check in advance if watermarks are sufficient for
> this. In compaction, it happens in compaction_suitable() because it only
> needed to be checked once. Your requirements might be different.
>
>> +		pfn += count;
>> +		if (pfn >= end)
>> +			break;
>> +		VM_BUG_ON(!pfn_valid(pfn));
>> +
>
> On ARM, it's possible to encounter invalid pages. VM_BUG_ON is serious
> overkill.
>
>> +		if (zone_pfn_same_memmap(pfn - count, pfn))
>> +			page += count;
>> +		else
>> +			page = pfn_to_page(pfn);
>> +	}
>> +
>> +	spin_unlock_irq(&zone->lock);
>> +
>> +	/* After this, pages in the range can be freed one be one */
>> +	count = pfn - start;
>> +	pfn = start;
>> +	for (page = pfn_to_page(pfn); count; --count) {
>> +		prep_new_page(page, 0, flag);
>> +		++pfn;
>> +		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
>> +			++page;
>> +		else
>> +			page = pfn_to_page(pfn);
>> +	}
>> +
>
> Here it looks like you have implemented something like split_free_page().

split_free_page() takes a single page, removes it from buddy system, and finally
splits it.  alloc_contig_freed_pages() takes a range of pages, removes them from
buddy system, and finally splits them.  Because it works on a range, it is
made into a separate function.

>> +	return pfn;
>> +}

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Micha? ?mina86? Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-18 17:26       ` Michal Nazarewicz
  (?)
@ 2011-10-18 17:48         ` Dave Hansen
  -1 siblings, 0 replies; 180+ messages in thread
From: Dave Hansen @ 2011-10-18 17:48 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, Mel Gorman, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Kyungmin Park,
	Russell King, Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg,
	Daniel Walker, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

On Tue, 2011-10-18 at 10:26 -0700, Michal Nazarewicz wrote:
> > You can do this in a more general fashion by checking the
> > zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
> > That will not be SPARSEMEM specific.
> 
> I've tried doing stuff that way but it ended up with much more code.

I guess instead of:

>> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
>> +{
>> +    return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
>> +}

You could do:

static inline bool zone_pfn_same_maxorder(unsigned long pfn1, unsigned long pfn2)
{
	unsigned long mask = MAX_ORDER_NR_PAGES-1;
	return (pfn1 & mask) == (pfn2 & mask);
}

I think that works.  Should be the same code you have now, basically.

-- Dave


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-18 17:48         ` Dave Hansen
  0 siblings, 0 replies; 180+ messages in thread
From: Dave Hansen @ 2011-10-18 17:48 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, Mel Gorman, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Kyungmin Park,
	Russell King, Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg,
	Daniel Walker, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

On Tue, 2011-10-18 at 10:26 -0700, Michal Nazarewicz wrote:
> > You can do this in a more general fashion by checking the
> > zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
> > That will not be SPARSEMEM specific.
> 
> I've tried doing stuff that way but it ended up with much more code.

I guess instead of:

>> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
>> +{
>> +    return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
>> +}

You could do:

static inline bool zone_pfn_same_maxorder(unsigned long pfn1, unsigned long pfn2)
{
	unsigned long mask = MAX_ORDER_NR_PAGES-1;
	return (pfn1 & mask) == (pfn2 & mask);
}

I think that works.  Should be the same code you have now, basically.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-18 17:48         ` Dave Hansen
  0 siblings, 0 replies; 180+ messages in thread
From: Dave Hansen @ 2011-10-18 17:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2011-10-18 at 10:26 -0700, Michal Nazarewicz wrote:
> > You can do this in a more general fashion by checking the
> > zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
> > That will not be SPARSEMEM specific.
> 
> I've tried doing stuff that way but it ended up with much more code.

I guess instead of:

>> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
>> +{
>> +    return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
>> +}

You could do:

static inline bool zone_pfn_same_maxorder(unsigned long pfn1, unsigned long pfn2)
{
	unsigned long mask = MAX_ORDER_NR_PAGES-1;
	return (pfn1 & mask) == (pfn2 & mask);
}

I think that works.  Should be the same code you have now, basically.

-- Dave

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-18 17:48         ` Dave Hansen
  (?)
@ 2011-10-18 18:00           ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-18 18:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Marek Szyprowski, Mel Gorman, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Kyungmin Park,
	Russell King, Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg,
	Daniel Walker, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

On Tue, 18 Oct 2011 10:48:46 -0700, Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On Tue, 2011-10-18 at 10:26 -0700, Michal Nazarewicz wrote:
>> > You can do this in a more general fashion by checking the
>> > zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
>> > That will not be SPARSEMEM specific.
>>
>> I've tried doing stuff that way but it ended up with much more code.
>
> I guess instead of:
>
>>> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
>>> +{
>>> +    return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
>>> +}
>
> You could do:
>
> static inline bool zone_pfn_same_maxorder(unsigned long pfn1, unsigned long pfn2)
> {
> 	unsigned long mask = MAX_ORDER_NR_PAGES-1;
> 	return (pfn1 & mask) == (pfn2 & mask);
> }
>
> I think that works.  Should be the same code you have now, basically.

Makes sense.  It'd require calling pfn_to_page() every MAX_ORDER_NR_PAGES even
in memory models that have linear mapping of struct page, but I guess that's
not that bad.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-18 18:00           ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-18 18:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Marek Szyprowski, Mel Gorman, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Kyungmin Park,
	Russell King, Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg,
	Daniel Walker, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

On Tue, 18 Oct 2011 10:48:46 -0700, Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On Tue, 2011-10-18 at 10:26 -0700, Michal Nazarewicz wrote:
>> > You can do this in a more general fashion by checking the
>> > zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
>> > That will not be SPARSEMEM specific.
>>
>> I've tried doing stuff that way but it ended up with much more code.
>
> I guess instead of:
>
>>> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
>>> +{
>>> +    return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
>>> +}
>
> You could do:
>
> static inline bool zone_pfn_same_maxorder(unsigned long pfn1, unsigned long pfn2)
> {
> 	unsigned long mask = MAX_ORDER_NR_PAGES-1;
> 	return (pfn1 & mask) == (pfn2 & mask);
> }
>
> I think that works.  Should be the same code you have now, basically.

Makes sense.  It'd require calling pfn_to_page() every MAX_ORDER_NR_PAGES even
in memory models that have linear mapping of struct page, but I guess that's
not that bad.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-18 18:00           ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-18 18:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 18 Oct 2011 10:48:46 -0700, Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On Tue, 2011-10-18 at 10:26 -0700, Michal Nazarewicz wrote:
>> > You can do this in a more general fashion by checking the
>> > zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
>> > That will not be SPARSEMEM specific.
>>
>> I've tried doing stuff that way but it ended up with much more code.
>
> I guess instead of:
>
>>> +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
>>> +{
>>> +    return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
>>> +}
>
> You could do:
>
> static inline bool zone_pfn_same_maxorder(unsigned long pfn1, unsigned long pfn2)
> {
> 	unsigned long mask = MAX_ORDER_NR_PAGES-1;
> 	return (pfn1 & mask) == (pfn2 & mask);
> }
>
> I think that works.  Should be the same code you have now, basically.

Makes sense.  It'd require calling pfn_to_page() every MAX_ORDER_NR_PAGES even
in memory models that have linear mapping of struct page, but I guess that's
not that bad.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Micha? ?mina86? Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-18 17:26       ` Michal Nazarewicz
  (?)
@ 2011-10-21 10:06         ` Mel Gorman
  -1 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-21 10:06 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Tue, Oct 18, 2011 at 10:26:37AM -0700, Michal Nazarewicz wrote:
> On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> 
> >At this point, I'm going to apologise for not reviewing this a long long
> >time ago.
> >
> >On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
> >>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >>
> >>This commit introduces alloc_contig_freed_pages() function
> >>which allocates (ie. removes from buddy system) free pages
> >>in range. Caller has to guarantee that all pages in range
> >>are in buddy system.
> >>
> >
> >Straight away, I'm wondering why you didn't use
> >
> >mm/compaction.c#isolate_freepages()
> >
> >It knows how to isolate pages within ranges. All its control information
> >is passed via struct compact_control() which I recognise may be awkward
> >for CMA but compaction.c know how to manage all the isolated pages and
> >pass them to migrate.c appropriately.
> 
> It is something to consider.  At first glance, I see that isolate_freepages
> seem to operate on pageblocks which is not desired for CMA.
> 

isolate_freepages_block operates on a range of pages that happens to be
hard-coded to be a pageblock because that was the requirements. It calculates
end_pfn and it is possible to make that a function parameter.

> >I haven't read all the patches yet but isolate_freepages() does break
> >everything up into order-0 pages. This may not be to your liking but it
> >would not be possible to change.
> 
> Splitting everything into order-0 pages is desired behaviour.
> 

Great.

> >>Along with this function, a free_contig_pages() function is
> >>provided which frees all (or a subset of) pages allocated
> >>with alloc_contig_free_pages().
> 
> >mm/compaction.c#release_freepages()
> 
> It sort of does the same thing but release_freepages() assumes that pages
> that are being freed are not-continuous and they need to be on the lru list.
> With free_contig_pages(), we can assume all pages are continuous.
> 

Ok, I jumped the gun here. release_freepages() may not be a perfect fit.
release_freepages() is also used when finishing compaction where as it
is a real free function that is required here.

> >You can do this in a more general fashion by checking the
> >zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
> >That will not be SPARSEMEM specific.
> 
> I've tried doing stuff that way but it ended up with much more code.
> 
> Dave suggested the above function to check if pointer arithmetic is valid.
> 
> Please see also <https://lkml.org/lkml/2011/9/21/220>.
> 

Ok, I'm still not fully convinced but I confess I'm not thinking about this
particular function too deeply because I am expecting the problem would
go away if compaction and CMA shared common code for freeing contiguous
regions via page migration.

> >> <SNIP>
> >>+		if (zone_pfn_same_memmap(pfn - count, pfn))
> >>+			page += count;
> >>+		else
> >>+			page = pfn_to_page(pfn);
> >>+	}
> >>+
> >>+	spin_unlock_irq(&zone->lock);
> >>+
> >>+	/* After this, pages in the range can be freed one be one */
> >>+	count = pfn - start;
> >>+	pfn = start;
> >>+	for (page = pfn_to_page(pfn); count; --count) {
> >>+		prep_new_page(page, 0, flag);
> >>+		++pfn;
> >>+		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> >>+			++page;
> >>+		else
> >>+			page = pfn_to_page(pfn);
> >>+	}
> >>+
> >
> >Here it looks like you have implemented something like split_free_page().
> 
> split_free_page() takes a single page, removes it from buddy system, and finally
> splits it. 

I'm referring to just this chunk.

split_free_page takes a page, checks the watermarks and performs similar
operations to prep_new_page(). There should be no need to introduce a
new similar function. split_free_page() does affect hte pageblock
migratetype and that is undesirable but that part could be taken out and
moved to compaction.c if necessary.

On the watermarks thing, CMA does not pay much attention to them. I have
a strong feeling that it is easy to deadlock a system by using CMA while
under memory pressure. Compaction had the same problem early in its
development FWIW. This is partially why I'd prefer to see CMA and
compaction sharing as much code as possible because compaction gets
continual testing.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-21 10:06         ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-21 10:06 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Tue, Oct 18, 2011 at 10:26:37AM -0700, Michal Nazarewicz wrote:
> On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> 
> >At this point, I'm going to apologise for not reviewing this a long long
> >time ago.
> >
> >On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
> >>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >>
> >>This commit introduces alloc_contig_freed_pages() function
> >>which allocates (ie. removes from buddy system) free pages
> >>in range. Caller has to guarantee that all pages in range
> >>are in buddy system.
> >>
> >
> >Straight away, I'm wondering why you didn't use
> >
> >mm/compaction.c#isolate_freepages()
> >
> >It knows how to isolate pages within ranges. All its control information
> >is passed via struct compact_control() which I recognise may be awkward
> >for CMA but compaction.c know how to manage all the isolated pages and
> >pass them to migrate.c appropriately.
> 
> It is something to consider.  At first glance, I see that isolate_freepages
> seem to operate on pageblocks which is not desired for CMA.
> 

isolate_freepages_block operates on a range of pages that happens to be
hard-coded to be a pageblock because that was the requirements. It calculates
end_pfn and it is possible to make that a function parameter.

> >I haven't read all the patches yet but isolate_freepages() does break
> >everything up into order-0 pages. This may not be to your liking but it
> >would not be possible to change.
> 
> Splitting everything into order-0 pages is desired behaviour.
> 

Great.

> >>Along with this function, a free_contig_pages() function is
> >>provided which frees all (or a subset of) pages allocated
> >>with alloc_contig_free_pages().
> 
> >mm/compaction.c#release_freepages()
> 
> It sort of does the same thing but release_freepages() assumes that pages
> that are being freed are not-continuous and they need to be on the lru list.
> With free_contig_pages(), we can assume all pages are continuous.
> 

Ok, I jumped the gun here. release_freepages() may not be a perfect fit.
release_freepages() is also used when finishing compaction where as it
is a real free function that is required here.

> >You can do this in a more general fashion by checking the
> >zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
> >That will not be SPARSEMEM specific.
> 
> I've tried doing stuff that way but it ended up with much more code.
> 
> Dave suggested the above function to check if pointer arithmetic is valid.
> 
> Please see also <https://lkml.org/lkml/2011/9/21/220>.
> 

Ok, I'm still not fully convinced but I confess I'm not thinking about this
particular function too deeply because I am expecting the problem would
go away if compaction and CMA shared common code for freeing contiguous
regions via page migration.

> >> <SNIP>
> >>+		if (zone_pfn_same_memmap(pfn - count, pfn))
> >>+			page += count;
> >>+		else
> >>+			page = pfn_to_page(pfn);
> >>+	}
> >>+
> >>+	spin_unlock_irq(&zone->lock);
> >>+
> >>+	/* After this, pages in the range can be freed one be one */
> >>+	count = pfn - start;
> >>+	pfn = start;
> >>+	for (page = pfn_to_page(pfn); count; --count) {
> >>+		prep_new_page(page, 0, flag);
> >>+		++pfn;
> >>+		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> >>+			++page;
> >>+		else
> >>+			page = pfn_to_page(pfn);
> >>+	}
> >>+
> >
> >Here it looks like you have implemented something like split_free_page().
> 
> split_free_page() takes a single page, removes it from buddy system, and finally
> splits it. 

I'm referring to just this chunk.

split_free_page takes a page, checks the watermarks and performs similar
operations to prep_new_page(). There should be no need to introduce a
new similar function. split_free_page() does affect hte pageblock
migratetype and that is undesirable but that part could be taken out and
moved to compaction.c if necessary.

On the watermarks thing, CMA does not pay much attention to them. I have
a strong feeling that it is easy to deadlock a system by using CMA while
under memory pressure. Compaction had the same problem early in its
development FWIW. This is partially why I'd prefer to see CMA and
compaction sharing as much code as possible because compaction gets
continual testing.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-21 10:06         ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-10-21 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 18, 2011 at 10:26:37AM -0700, Michal Nazarewicz wrote:
> On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> 
> >At this point, I'm going to apologise for not reviewing this a long long
> >time ago.
> >
> >On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
> >>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >>
> >>This commit introduces alloc_contig_freed_pages() function
> >>which allocates (ie. removes from buddy system) free pages
> >>in range. Caller has to guarantee that all pages in range
> >>are in buddy system.
> >>
> >
> >Straight away, I'm wondering why you didn't use
> >
> >mm/compaction.c#isolate_freepages()
> >
> >It knows how to isolate pages within ranges. All its control information
> >is passed via struct compact_control() which I recognise may be awkward
> >for CMA but compaction.c know how to manage all the isolated pages and
> >pass them to migrate.c appropriately.
> 
> It is something to consider.  At first glance, I see that isolate_freepages
> seem to operate on pageblocks which is not desired for CMA.
> 

isolate_freepages_block operates on a range of pages that happens to be
hard-coded to be a pageblock because that was the requirements. It calculates
end_pfn and it is possible to make that a function parameter.

> >I haven't read all the patches yet but isolate_freepages() does break
> >everything up into order-0 pages. This may not be to your liking but it
> >would not be possible to change.
> 
> Splitting everything into order-0 pages is desired behaviour.
> 

Great.

> >>Along with this function, a free_contig_pages() function is
> >>provided which frees all (or a subset of) pages allocated
> >>with alloc_contig_free_pages().
> 
> >mm/compaction.c#release_freepages()
> 
> It sort of does the same thing but release_freepages() assumes that pages
> that are being freed are not-continuous and they need to be on the lru list.
> With free_contig_pages(), we can assume all pages are continuous.
> 

Ok, I jumped the gun here. release_freepages() may not be a perfect fit.
release_freepages() is also used when finishing compaction where as it
is a real free function that is required here.

> >You can do this in a more general fashion by checking the
> >zone boundaries and resolving the pfn->page every MAX_ORDER_NR_PAGES.
> >That will not be SPARSEMEM specific.
> 
> I've tried doing stuff that way but it ended up with much more code.
> 
> Dave suggested the above function to check if pointer arithmetic is valid.
> 
> Please see also <https://lkml.org/lkml/2011/9/21/220>.
> 

Ok, I'm still not fully convinced but I confess I'm not thinking about this
particular function too deeply because I am expecting the problem would
go away if compaction and CMA shared common code for freeing contiguous
regions via page migration.

> >> <SNIP>
> >>+		if (zone_pfn_same_memmap(pfn - count, pfn))
> >>+			page += count;
> >>+		else
> >>+			page = pfn_to_page(pfn);
> >>+	}
> >>+
> >>+	spin_unlock_irq(&zone->lock);
> >>+
> >>+	/* After this, pages in the range can be freed one be one */
> >>+	count = pfn - start;
> >>+	pfn = start;
> >>+	for (page = pfn_to_page(pfn); count; --count) {
> >>+		prep_new_page(page, 0, flag);
> >>+		++pfn;
> >>+		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
> >>+			++page;
> >>+		else
> >>+			page = pfn_to_page(pfn);
> >>+	}
> >>+
> >
> >Here it looks like you have implemented something like split_free_page().
> 
> split_free_page() takes a single page, removes it from buddy system, and finally
> splits it. 

I'm referring to just this chunk.

split_free_page takes a page, checks the watermarks and performs similar
operations to prep_new_page(). There should be no need to introduce a
new similar function. split_free_page() does affect hte pageblock
migratetype and that is undesirable but that part could be taken out and
moved to compaction.c if necessary.

On the watermarks thing, CMA does not pay much attention to them. I have
a strong feeling that it is easy to deadlock a system by using CMA while
under memory pressure. Compaction had the same problem early in its
development FWIW. This is partially why I'd prefer to see CMA and
compaction sharing as much code as possible because compaction gets
continual testing.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-21 10:06         ` Mel Gorman
  (?)
@ 2011-10-24  1:00           ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24  1:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Fri, 21 Oct 2011 03:06:24 -0700, Mel Gorman <mel@csn.ul.ie> wrote:

> On Tue, Oct 18, 2011 at 10:26:37AM -0700, Michal Nazarewicz wrote:
>> On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
>>
>> >At this point, I'm going to apologise for not reviewing this a long long
>> >time ago.
>> >
>> >On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> >>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> >>
>> >>This commit introduces alloc_contig_freed_pages() function
>> >>which allocates (ie. removes from buddy system) free pages
>> >>in range. Caller has to guarantee that all pages in range
>> >>are in buddy system.
>> >>
>> >
>> >Straight away, I'm wondering why you didn't use
>> >
>> >mm/compaction.c#isolate_freepages()
>> >
>> >It knows how to isolate pages within ranges. All its control information
>> >is passed via struct compact_control() which I recognise may be awkward
>> >for CMA but compaction.c know how to manage all the isolated pages and
>> >pass them to migrate.c appropriately.
>>
>> It is something to consider.  At first glance, I see that isolate_freepages
>> seem to operate on pageblocks which is not desired for CMA.
>>
>
> isolate_freepages_block operates on a range of pages that happens to be
> hard-coded to be a pageblock because that was the requirements. It calculates
> end_pfn and it is possible to make that a function parameter.

Yes, this seems doable.  I'll try and rewrite the patches to use it.

The biggest difference is in how CMA and compaction treat pages which are not
free.  CMA treat it as an error and compaction just skips those.  This is
solvable by an argument though.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-24  1:00           ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24  1:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Fri, 21 Oct 2011 03:06:24 -0700, Mel Gorman <mel@csn.ul.ie> wrote:

> On Tue, Oct 18, 2011 at 10:26:37AM -0700, Michal Nazarewicz wrote:
>> On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
>>
>> >At this point, I'm going to apologise for not reviewing this a long long
>> >time ago.
>> >
>> >On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> >>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> >>
>> >>This commit introduces alloc_contig_freed_pages() function
>> >>which allocates (ie. removes from buddy system) free pages
>> >>in range. Caller has to guarantee that all pages in range
>> >>are in buddy system.
>> >>
>> >
>> >Straight away, I'm wondering why you didn't use
>> >
>> >mm/compaction.c#isolate_freepages()
>> >
>> >It knows how to isolate pages within ranges. All its control information
>> >is passed via struct compact_control() which I recognise may be awkward
>> >for CMA but compaction.c know how to manage all the isolated pages and
>> >pass them to migrate.c appropriately.
>>
>> It is something to consider.  At first glance, I see that isolate_freepages
>> seem to operate on pageblocks which is not desired for CMA.
>>
>
> isolate_freepages_block operates on a range of pages that happens to be
> hard-coded to be a pageblock because that was the requirements. It calculates
> end_pfn and it is possible to make that a function parameter.

Yes, this seems doable.  I'll try and rewrite the patches to use it.

The biggest difference is in how CMA and compaction treat pages which are not
free.  CMA treat it as an error and compaction just skips those.  This is
solvable by an argument though.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-24  1:00           ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24  1:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 21 Oct 2011 03:06:24 -0700, Mel Gorman <mel@csn.ul.ie> wrote:

> On Tue, Oct 18, 2011 at 10:26:37AM -0700, Michal Nazarewicz wrote:
>> On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
>>
>> >At this point, I'm going to apologise for not reviewing this a long long
>> >time ago.
>> >
>> >On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> >>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> >>
>> >>This commit introduces alloc_contig_freed_pages() function
>> >>which allocates (ie. removes from buddy system) free pages
>> >>in range. Caller has to guarantee that all pages in range
>> >>are in buddy system.
>> >>
>> >
>> >Straight away, I'm wondering why you didn't use
>> >
>> >mm/compaction.c#isolate_freepages()
>> >
>> >It knows how to isolate pages within ranges. All its control information
>> >is passed via struct compact_control() which I recognise may be awkward
>> >for CMA but compaction.c know how to manage all the isolated pages and
>> >pass them to migrate.c appropriately.
>>
>> It is something to consider.  At first glance, I see that isolate_freepages
>> seem to operate on pageblocks which is not desired for CMA.
>>
>
> isolate_freepages_block operates on a range of pages that happens to be
> hard-coded to be a pageblock because that was the requirements. It calculates
> end_pfn and it is possible to make that a function parameter.

Yes, this seems doable.  I'll try and rewrite the patches to use it.

The biggest difference is in how CMA and compaction treat pages which are not
free.  CMA treat it as an error and compaction just skips those.  This is
solvable by an argument though.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Micha? ?mina86? Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-18 12:21     ` Mel Gorman
  (?)
@ 2011-10-24  4:05       ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24  4:05 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman, Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

> On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> This commit introduces alloc_contig_freed_pages() function
>> which allocates (ie. removes from buddy system) free pages
>> in range. Caller has to guarantee that all pages in range
>> are in buddy system.

On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Straight away, I'm wondering why you didn't use
> mm/compaction.c#isolate_freepages()

Does the below look like a step in the right direction?

It basically moves isolate_freepages_block() to page_alloc.c (changing
it name to isolate_freepages_range()) and changes it so that depending
on arguments it treats holes (either invalid PFN or non-free page) as
errors so that CMA can use it.

It also accepts a range rather then just assuming a single pageblock
thus the change moves range calculation in compaction.c from
isolate_freepages_block() up to isolate_freepages().

The change also modifies spilt_free_page() so that it does not try to
change pageblock's migrate type if current migrate type is ISOLATE or
CMA.

---
 include/linux/mm.h             |    1 -
 include/linux/page-isolation.h |    4 +-
 mm/compaction.c                |   73 +++--------------------
 mm/internal.h                  |    5 ++
 mm/page_alloc.c                |  128 +++++++++++++++++++++++++---------------
 5 files changed, 95 insertions(+), 116 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fd599f4..98c99c4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -435,7 +435,6 @@ void put_page(struct page *page);
 void put_pages_list(struct list_head *pages);
 
 void split_page(struct page *page, unsigned int order);
-int split_free_page(struct page *page);
 
 /*
  * Compound pages have a destructor function.  Provide a
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 003c52f..6becc74 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -48,10 +48,8 @@ static inline void unset_migratetype_isolate(struct page *page)
 }
 
 /* The below functions must be run on a range from a single zone. */
-extern unsigned long alloc_contig_freed_pages(unsigned long start,
-					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags, unsigned migratetype);
+			      unsigned migratetype);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index 9e5cc59..685a19e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -58,77 +58,15 @@ static unsigned long release_freepages(struct list_head *freelist)
 	return count;
 }
 
-/* Isolate free pages onto a private freelist. Must hold zone->lock */
-static unsigned long isolate_freepages_block(struct zone *zone,
-				unsigned long blockpfn,
-				struct list_head *freelist)
-{
-	unsigned long zone_end_pfn, end_pfn;
-	int nr_scanned = 0, total_isolated = 0;
-	struct page *cursor;
-
-	/* Get the last PFN we should scan for free pages at */
-	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-	end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn);
-
-	/* Find the first usable PFN in the block to initialse page cursor */
-	for (; blockpfn < end_pfn; blockpfn++) {
-		if (pfn_valid_within(blockpfn))
-			break;
-	}
-	cursor = pfn_to_page(blockpfn);
-
-	/* Isolate free pages. This assumes the block is valid */
-	for (; blockpfn < end_pfn; blockpfn++, cursor++) {
-		int isolated, i;
-		struct page *page = cursor;
-
-		if (!pfn_valid_within(blockpfn))
-			continue;
-		nr_scanned++;
-
-		if (!PageBuddy(page))
-			continue;
-
-		/* Found a free page, break it into order-0 pages */
-		isolated = split_free_page(page);
-		total_isolated += isolated;
-		for (i = 0; i < isolated; i++) {
-			list_add(&page->lru, freelist);
-			page++;
-		}
-
-		/* If a page was split, advance to the end of it */
-		if (isolated) {
-			blockpfn += isolated - 1;
-			cursor += isolated - 1;
-		}
-	}
-
-	trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
-	return total_isolated;
-}
-
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct page *page)
 {
-
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
-	/* Keep MIGRATE_CMA alone as well. */
-	/*
-	 * XXX Revisit.  We currently cannot let compaction touch CMA
-	 * pages since compaction insists on changing their migration
-	 * type to MIGRATE_MOVABLE (see split_free_page() called from
-	 * isolate_freepages_block() above).
-	 */
-	if (is_migrate_cma(migratetype))
-		return false;
-
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
@@ -149,7 +87,7 @@ static void isolate_freepages(struct zone *zone,
 				struct compact_control *cc)
 {
 	struct page *page;
-	unsigned long high_pfn, low_pfn, pfn;
+	unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
 	unsigned long flags;
 	int nr_freepages = cc->nr_freepages;
 	struct list_head *freelist = &cc->freepages;
@@ -169,6 +107,8 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	high_pfn = min(low_pfn, pfn);
 
+	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+
 	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
@@ -176,7 +116,7 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
-		unsigned long isolated;
+		unsigned isolated, scanned;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -205,7 +145,10 @@ static void isolate_freepages(struct zone *zone,
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
 		if (suitable_migration_target(page)) {
-			isolated = isolate_freepages_block(zone, pfn, freelist);
+			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+			isolated = isolate_freepages_range(zone, pfn,
+					end_pfn, freelist, &scanned);
+			trace_mm_compaction_isolate_freepages(scanned, isolated);
 			nr_freepages += isolated;
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
diff --git a/mm/internal.h b/mm/internal.h
index d071d380..4a9bb3f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -263,3 +263,8 @@ extern u64 hwpoison_filter_flags_mask;
 extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
+
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist,
+				 unsigned *scannedp);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df69706..adf3f34 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1300,10 +1300,11 @@ void split_page(struct page *page, unsigned int order)
  * Note: this is probably too low level an operation for use in drivers.
  * Please consult with lkml before using this in your driver.
  */
-int split_free_page(struct page *page)
+static unsigned split_free_page(struct page *page)
 {
 	unsigned int order;
 	unsigned long watermark;
+	struct page *endpage;
 	struct zone *zone;
 
 	BUG_ON(!PageBuddy(page));
@@ -1326,14 +1327,18 @@ int split_free_page(struct page *page)
 	set_page_refcounted(page);
 	split_page(page, order);
 
-	if (order >= pageblock_order - 1) {
-		struct page *endpage = page + (1 << order) - 1;
-		for (; page < endpage; page += pageblock_nr_pages)
-			if (!is_pageblock_cma(page))
-				set_pageblock_migratetype(page,
-							  MIGRATE_MOVABLE);
+	if (order < pageblock_order - 1)
+		goto done;
+
+	endpage = page + (1 << order) - 1;
+	for (; page < endpage; page += pageblock_nr_pages) {
+		int mt = get_pageblock_migratetype(page);
+		/* Don't change CMA nor ISOLATE */
+		if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 	}
 
+done:
 	return 1 << order;
 }
 
@@ -5723,57 +5728,76 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
-unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
-				       gfp_t flag)
+/**
+ * isolate_freepages_range() - isolate free pages, must hold zone->lock.
+ * @zone:	Zone pages are in.
+ * @start:	The first PFN to start isolating.
+ * @end:	The one-past-last PFN.
+ * @freelist:	A list to save isolated pages to.
+ * @scannedp:	Optional pointer where to save number of scanned pages.
+ *
+ * If @freelist is not provided, holes in range (either non-free pages
+ * or invalid PFN) are considered an error and function undos its
+ * actions and returns zero.
+ *
+ * If @freelist is provided, function will simply skip non-free and
+ * missing pages and put only the ones isolated on the list.  It will
+ * also call trace_mm_compaction_isolate_freepages() at the end.
+ *
+ * Returns number of isolated pages.  This may be more then end-start
+ * if end fell in a middle of a free page.
+ */
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist, unsigned *scannedp)
 {
-	unsigned long pfn = start, count;
+	unsigned nr_scanned = 0, total_isolated = 0;
+	unsigned long pfn = start;
 	struct page *page;
-	struct zone *zone;
-	int order;
 
 	VM_BUG_ON(!pfn_valid(start));
-	page = pfn_to_page(start);
-	zone = page_zone(page);
 
-	spin_lock_irq(&zone->lock);
+	/* Isolate free pages. This assumes the block is valid */
+	page = pfn_to_page(pfn);
+	while (pfn < end) {
+		unsigned isolated = 1;
 
-	for (;;) {
-		VM_BUG_ON(!page_count(page) || !PageBuddy(page) ||
-			  page_zone(page) != zone);
+		VM_BUG_ON(page_zone(page) != zone);
 
-		list_del(&page->lru);
-		order = page_order(page);
-		count = 1UL << order;
-		zone->free_area[order].nr_free--;
-		rmv_page_order(page);
-		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+		if (!pfn_valid_within(blockpfn))
+			goto skip;
+		++nr_scanned;
 
-		pfn += count;
-		if (pfn >= end)
-			break;
-		VM_BUG_ON(!pfn_valid(pfn));
-
-		if (zone_pfn_same_memmap(pfn - count, pfn))
-			page += count;
-		else
-			page = pfn_to_page(pfn);
-	}
+		if (!PageBuddy(page)) {
+skip:
+			if (freelist)
+				goto next;
+			for (; start < pfn; ++start)
+				__free_page(pfn_to_page(pfn));
+			return 0;
+		}
 
-	spin_unlock_irq(&zone->lock);
+		/* Found a free page, break it into order-0 pages */
+		isolated = split_free_page(page);
+		total_isolated += isolated;
+		if (freelist) {
+			struct page *p = page;
+			unsigned i = isolated;
+			for (; i--; ++page)
+				list_add(&p->lru, freelist);
+		}
 
-	/* After this, pages in the range can be freed one be one */
-	count = pfn - start;
-	pfn = start;
-	for (page = pfn_to_page(pfn); count; --count) {
-		prep_new_page(page, 0, flag);
-		++pfn;
-		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
-			++page;
+next:		/* Advance to the next page */
+		pfn += isolated;
+		if (zone_pfn_same_memmap(pfn - isolated, pfn))
+			page += isolated;
 		else
 			page = pfn_to_page(pfn);
 	}
 
-	return pfn;
+	if (scannedp)
+		*scannedp = nr_scanned;
+	return total_isolated;
 }
 
 static unsigned long pfn_to_maxpage(unsigned long pfn)
@@ -5837,7 +5861,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
- * @flags:	flags passed to alloc_contig_freed_pages().
  * @migratetype:	migratetype of the underlaying pageblocks (either
  *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
  *			in range must have the same migratetype and it must
@@ -5853,9 +5876,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags, unsigned migratetype)
+		       unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
+	struct zone *zone;
 	int ret;
 
 	/*
@@ -5910,7 +5934,17 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 			return -EINVAL;
 
 	outer_start = start & (~0UL << ret);
-	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	zone = page_zone(pfn_to_page(outer_start));
+	spin_lock_irq(&zone->lock);
+	outer_end = isolate_freepages_range(zone, outer_start, end, NULL, NULL);
+	spin_unlock_irq(&zone->lock);
+
+	if (!outer_end) {
+		ret = -EBUSY;
+		goto done;
+	}
+	outer_end += outer_start;
 
 	/* Free head and tail (if any) */
 	if (start != outer_start)
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-24  4:05       ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24  4:05 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

> On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> This commit introduces alloc_contig_freed_pages() function
>> which allocates (ie. removes from buddy system) free pages
>> in range. Caller has to guarantee that all pages in range
>> are in buddy system.

On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Straight away, I'm wondering why you didn't use
> mm/compaction.c#isolate_freepages()

Does the below look like a step in the right direction?

It basically moves isolate_freepages_block() to page_alloc.c (changing
it name to isolate_freepages_range()) and changes it so that depending
on arguments it treats holes (either invalid PFN or non-free page) as
errors so that CMA can use it.

It also accepts a range rather then just assuming a single pageblock
thus the change moves range calculation in compaction.c from
isolate_freepages_block() up to isolate_freepages().

The change also modifies spilt_free_page() so that it does not try to
change pageblock's migrate type if current migrate type is ISOLATE or
CMA.

---
 include/linux/mm.h             |    1 -
 include/linux/page-isolation.h |    4 +-
 mm/compaction.c                |   73 +++--------------------
 mm/internal.h                  |    5 ++
 mm/page_alloc.c                |  128 +++++++++++++++++++++++++---------------
 5 files changed, 95 insertions(+), 116 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fd599f4..98c99c4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -435,7 +435,6 @@ void put_page(struct page *page);
 void put_pages_list(struct list_head *pages);
 
 void split_page(struct page *page, unsigned int order);
-int split_free_page(struct page *page);
 
 /*
  * Compound pages have a destructor function.  Provide a
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 003c52f..6becc74 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -48,10 +48,8 @@ static inline void unset_migratetype_isolate(struct page *page)
 }
 
 /* The below functions must be run on a range from a single zone. */
-extern unsigned long alloc_contig_freed_pages(unsigned long start,
-					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags, unsigned migratetype);
+			      unsigned migratetype);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index 9e5cc59..685a19e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -58,77 +58,15 @@ static unsigned long release_freepages(struct list_head *freelist)
 	return count;
 }
 
-/* Isolate free pages onto a private freelist. Must hold zone->lock */
-static unsigned long isolate_freepages_block(struct zone *zone,
-				unsigned long blockpfn,
-				struct list_head *freelist)
-{
-	unsigned long zone_end_pfn, end_pfn;
-	int nr_scanned = 0, total_isolated = 0;
-	struct page *cursor;
-
-	/* Get the last PFN we should scan for free pages at */
-	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-	end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn);
-
-	/* Find the first usable PFN in the block to initialse page cursor */
-	for (; blockpfn < end_pfn; blockpfn++) {
-		if (pfn_valid_within(blockpfn))
-			break;
-	}
-	cursor = pfn_to_page(blockpfn);
-
-	/* Isolate free pages. This assumes the block is valid */
-	for (; blockpfn < end_pfn; blockpfn++, cursor++) {
-		int isolated, i;
-		struct page *page = cursor;
-
-		if (!pfn_valid_within(blockpfn))
-			continue;
-		nr_scanned++;
-
-		if (!PageBuddy(page))
-			continue;
-
-		/* Found a free page, break it into order-0 pages */
-		isolated = split_free_page(page);
-		total_isolated += isolated;
-		for (i = 0; i < isolated; i++) {
-			list_add(&page->lru, freelist);
-			page++;
-		}
-
-		/* If a page was split, advance to the end of it */
-		if (isolated) {
-			blockpfn += isolated - 1;
-			cursor += isolated - 1;
-		}
-	}
-
-	trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
-	return total_isolated;
-}
-
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct page *page)
 {
-
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
-	/* Keep MIGRATE_CMA alone as well. */
-	/*
-	 * XXX Revisit.  We currently cannot let compaction touch CMA
-	 * pages since compaction insists on changing their migration
-	 * type to MIGRATE_MOVABLE (see split_free_page() called from
-	 * isolate_freepages_block() above).
-	 */
-	if (is_migrate_cma(migratetype))
-		return false;
-
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
@@ -149,7 +87,7 @@ static void isolate_freepages(struct zone *zone,
 				struct compact_control *cc)
 {
 	struct page *page;
-	unsigned long high_pfn, low_pfn, pfn;
+	unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
 	unsigned long flags;
 	int nr_freepages = cc->nr_freepages;
 	struct list_head *freelist = &cc->freepages;
@@ -169,6 +107,8 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	high_pfn = min(low_pfn, pfn);
 
+	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+
 	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
@@ -176,7 +116,7 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
-		unsigned long isolated;
+		unsigned isolated, scanned;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -205,7 +145,10 @@ static void isolate_freepages(struct zone *zone,
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
 		if (suitable_migration_target(page)) {
-			isolated = isolate_freepages_block(zone, pfn, freelist);
+			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+			isolated = isolate_freepages_range(zone, pfn,
+					end_pfn, freelist, &scanned);
+			trace_mm_compaction_isolate_freepages(scanned, isolated);
 			nr_freepages += isolated;
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
diff --git a/mm/internal.h b/mm/internal.h
index d071d380..4a9bb3f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -263,3 +263,8 @@ extern u64 hwpoison_filter_flags_mask;
 extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
+
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist,
+				 unsigned *scannedp);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df69706..adf3f34 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1300,10 +1300,11 @@ void split_page(struct page *page, unsigned int order)
  * Note: this is probably too low level an operation for use in drivers.
  * Please consult with lkml before using this in your driver.
  */
-int split_free_page(struct page *page)
+static unsigned split_free_page(struct page *page)
 {
 	unsigned int order;
 	unsigned long watermark;
+	struct page *endpage;
 	struct zone *zone;
 
 	BUG_ON(!PageBuddy(page));
@@ -1326,14 +1327,18 @@ int split_free_page(struct page *page)
 	set_page_refcounted(page);
 	split_page(page, order);
 
-	if (order >= pageblock_order - 1) {
-		struct page *endpage = page + (1 << order) - 1;
-		for (; page < endpage; page += pageblock_nr_pages)
-			if (!is_pageblock_cma(page))
-				set_pageblock_migratetype(page,
-							  MIGRATE_MOVABLE);
+	if (order < pageblock_order - 1)
+		goto done;
+
+	endpage = page + (1 << order) - 1;
+	for (; page < endpage; page += pageblock_nr_pages) {
+		int mt = get_pageblock_migratetype(page);
+		/* Don't change CMA nor ISOLATE */
+		if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 	}
 
+done:
 	return 1 << order;
 }
 
@@ -5723,57 +5728,76 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
-unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
-				       gfp_t flag)
+/**
+ * isolate_freepages_range() - isolate free pages, must hold zone->lock.
+ * @zone:	Zone pages are in.
+ * @start:	The first PFN to start isolating.
+ * @end:	The one-past-last PFN.
+ * @freelist:	A list to save isolated pages to.
+ * @scannedp:	Optional pointer where to save number of scanned pages.
+ *
+ * If @freelist is not provided, holes in range (either non-free pages
+ * or invalid PFN) are considered an error and function undos its
+ * actions and returns zero.
+ *
+ * If @freelist is provided, function will simply skip non-free and
+ * missing pages and put only the ones isolated on the list.  It will
+ * also call trace_mm_compaction_isolate_freepages() at the end.
+ *
+ * Returns number of isolated pages.  This may be more then end-start
+ * if end fell in a middle of a free page.
+ */
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist, unsigned *scannedp)
 {
-	unsigned long pfn = start, count;
+	unsigned nr_scanned = 0, total_isolated = 0;
+	unsigned long pfn = start;
 	struct page *page;
-	struct zone *zone;
-	int order;
 
 	VM_BUG_ON(!pfn_valid(start));
-	page = pfn_to_page(start);
-	zone = page_zone(page);
 
-	spin_lock_irq(&zone->lock);
+	/* Isolate free pages. This assumes the block is valid */
+	page = pfn_to_page(pfn);
+	while (pfn < end) {
+		unsigned isolated = 1;
 
-	for (;;) {
-		VM_BUG_ON(!page_count(page) || !PageBuddy(page) ||
-			  page_zone(page) != zone);
+		VM_BUG_ON(page_zone(page) != zone);
 
-		list_del(&page->lru);
-		order = page_order(page);
-		count = 1UL << order;
-		zone->free_area[order].nr_free--;
-		rmv_page_order(page);
-		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+		if (!pfn_valid_within(blockpfn))
+			goto skip;
+		++nr_scanned;
 
-		pfn += count;
-		if (pfn >= end)
-			break;
-		VM_BUG_ON(!pfn_valid(pfn));
-
-		if (zone_pfn_same_memmap(pfn - count, pfn))
-			page += count;
-		else
-			page = pfn_to_page(pfn);
-	}
+		if (!PageBuddy(page)) {
+skip:
+			if (freelist)
+				goto next;
+			for (; start < pfn; ++start)
+				__free_page(pfn_to_page(pfn));
+			return 0;
+		}
 
-	spin_unlock_irq(&zone->lock);
+		/* Found a free page, break it into order-0 pages */
+		isolated = split_free_page(page);
+		total_isolated += isolated;
+		if (freelist) {
+			struct page *p = page;
+			unsigned i = isolated;
+			for (; i--; ++page)
+				list_add(&p->lru, freelist);
+		}
 
-	/* After this, pages in the range can be freed one be one */
-	count = pfn - start;
-	pfn = start;
-	for (page = pfn_to_page(pfn); count; --count) {
-		prep_new_page(page, 0, flag);
-		++pfn;
-		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
-			++page;
+next:		/* Advance to the next page */
+		pfn += isolated;
+		if (zone_pfn_same_memmap(pfn - isolated, pfn))
+			page += isolated;
 		else
 			page = pfn_to_page(pfn);
 	}
 
-	return pfn;
+	if (scannedp)
+		*scannedp = nr_scanned;
+	return total_isolated;
 }
 
 static unsigned long pfn_to_maxpage(unsigned long pfn)
@@ -5837,7 +5861,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
- * @flags:	flags passed to alloc_contig_freed_pages().
  * @migratetype:	migratetype of the underlaying pageblocks (either
  *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
  *			in range must have the same migratetype and it must
@@ -5853,9 +5876,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags, unsigned migratetype)
+		       unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
+	struct zone *zone;
 	int ret;
 
 	/*
@@ -5910,7 +5934,17 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 			return -EINVAL;
 
 	outer_start = start & (~0UL << ret);
-	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	zone = page_zone(pfn_to_page(outer_start));
+	spin_lock_irq(&zone->lock);
+	outer_end = isolate_freepages_range(zone, outer_start, end, NULL, NULL);
+	spin_unlock_irq(&zone->lock);
+
+	if (!outer_end) {
+		ret = -EBUSY;
+		goto done;
+	}
+	outer_end += outer_start;
 
 	/* Free head and tail (if any) */
 	if (start != outer_start)
-- 
1.7.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-18 12:21     ` Mel Gorman
                       ` (5 preceding siblings ...)
  (?)
@ 2011-10-24  4:05     ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24  4:05 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

> On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> This commit introduces alloc_contig_freed_pages() function
>> which allocates (ie. removes from buddy system) free pages
>> in range. Caller has to guarantee that all pages in range
>> are in buddy system.

On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Straight away, I'm wondering why you didn't use
> mm/compaction.c#isolate_freepages()

Does the below look like a step in the right direction?

It basically moves isolate_freepages_block() to page_alloc.c (changing
it name to isolate_freepages_range()) and changes it so that depending
on arguments it treats holes (either invalid PFN or non-free page) as
errors so that CMA can use it.

It also accepts a range rather then just assuming a single pageblock
thus the change moves range calculation in compaction.c from
isolate_freepages_block() up to isolate_freepages().

The change also modifies spilt_free_page() so that it does not try to
change pageblock's migrate type if current migrate type is ISOLATE or
CMA.

---
 include/linux/mm.h             |    1 -
 include/linux/page-isolation.h |    4 +-
 mm/compaction.c                |   73 +++--------------------
 mm/internal.h                  |    5 ++
 mm/page_alloc.c                |  128 +++++++++++++++++++++++++---------------
 5 files changed, 95 insertions(+), 116 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fd599f4..98c99c4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -435,7 +435,6 @@ void put_page(struct page *page);
 void put_pages_list(struct list_head *pages);
 
 void split_page(struct page *page, unsigned int order);
-int split_free_page(struct page *page);
 
 /*
  * Compound pages have a destructor function.  Provide a
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 003c52f..6becc74 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -48,10 +48,8 @@ static inline void unset_migratetype_isolate(struct page *page)
 }
 
 /* The below functions must be run on a range from a single zone. */
-extern unsigned long alloc_contig_freed_pages(unsigned long start,
-					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags, unsigned migratetype);
+			      unsigned migratetype);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index 9e5cc59..685a19e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -58,77 +58,15 @@ static unsigned long release_freepages(struct list_head *freelist)
 	return count;
 }
 
-/* Isolate free pages onto a private freelist. Must hold zone->lock */
-static unsigned long isolate_freepages_block(struct zone *zone,
-				unsigned long blockpfn,
-				struct list_head *freelist)
-{
-	unsigned long zone_end_pfn, end_pfn;
-	int nr_scanned = 0, total_isolated = 0;
-	struct page *cursor;
-
-	/* Get the last PFN we should scan for free pages at */
-	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-	end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn);
-
-	/* Find the first usable PFN in the block to initialse page cursor */
-	for (; blockpfn < end_pfn; blockpfn++) {
-		if (pfn_valid_within(blockpfn))
-			break;
-	}
-	cursor = pfn_to_page(blockpfn);
-
-	/* Isolate free pages. This assumes the block is valid */
-	for (; blockpfn < end_pfn; blockpfn++, cursor++) {
-		int isolated, i;
-		struct page *page = cursor;
-
-		if (!pfn_valid_within(blockpfn))
-			continue;
-		nr_scanned++;
-
-		if (!PageBuddy(page))
-			continue;
-
-		/* Found a free page, break it into order-0 pages */
-		isolated = split_free_page(page);
-		total_isolated += isolated;
-		for (i = 0; i < isolated; i++) {
-			list_add(&page->lru, freelist);
-			page++;
-		}
-
-		/* If a page was split, advance to the end of it */
-		if (isolated) {
-			blockpfn += isolated - 1;
-			cursor += isolated - 1;
-		}
-	}
-
-	trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
-	return total_isolated;
-}
-
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct page *page)
 {
-
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
-	/* Keep MIGRATE_CMA alone as well. */
-	/*
-	 * XXX Revisit.  We currently cannot let compaction touch CMA
-	 * pages since compaction insists on changing their migration
-	 * type to MIGRATE_MOVABLE (see split_free_page() called from
-	 * isolate_freepages_block() above).
-	 */
-	if (is_migrate_cma(migratetype))
-		return false;
-
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
@@ -149,7 +87,7 @@ static void isolate_freepages(struct zone *zone,
 				struct compact_control *cc)
 {
 	struct page *page;
-	unsigned long high_pfn, low_pfn, pfn;
+	unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
 	unsigned long flags;
 	int nr_freepages = cc->nr_freepages;
 	struct list_head *freelist = &cc->freepages;
@@ -169,6 +107,8 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	high_pfn = min(low_pfn, pfn);
 
+	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+
 	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
@@ -176,7 +116,7 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
-		unsigned long isolated;
+		unsigned isolated, scanned;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -205,7 +145,10 @@ static void isolate_freepages(struct zone *zone,
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
 		if (suitable_migration_target(page)) {
-			isolated = isolate_freepages_block(zone, pfn, freelist);
+			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+			isolated = isolate_freepages_range(zone, pfn,
+					end_pfn, freelist, &scanned);
+			trace_mm_compaction_isolate_freepages(scanned, isolated);
 			nr_freepages += isolated;
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
diff --git a/mm/internal.h b/mm/internal.h
index d071d380..4a9bb3f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -263,3 +263,8 @@ extern u64 hwpoison_filter_flags_mask;
 extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
+
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist,
+				 unsigned *scannedp);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df69706..adf3f34 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1300,10 +1300,11 @@ void split_page(struct page *page, unsigned int order)
  * Note: this is probably too low level an operation for use in drivers.
  * Please consult with lkml before using this in your driver.
  */
-int split_free_page(struct page *page)
+static unsigned split_free_page(struct page *page)
 {
 	unsigned int order;
 	unsigned long watermark;
+	struct page *endpage;
 	struct zone *zone;
 
 	BUG_ON(!PageBuddy(page));
@@ -1326,14 +1327,18 @@ int split_free_page(struct page *page)
 	set_page_refcounted(page);
 	split_page(page, order);
 
-	if (order >= pageblock_order - 1) {
-		struct page *endpage = page + (1 << order) - 1;
-		for (; page < endpage; page += pageblock_nr_pages)
-			if (!is_pageblock_cma(page))
-				set_pageblock_migratetype(page,
-							  MIGRATE_MOVABLE);
+	if (order < pageblock_order - 1)
+		goto done;
+
+	endpage = page + (1 << order) - 1;
+	for (; page < endpage; page += pageblock_nr_pages) {
+		int mt = get_pageblock_migratetype(page);
+		/* Don't change CMA nor ISOLATE */
+		if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 	}
 
+done:
 	return 1 << order;
 }
 
@@ -5723,57 +5728,76 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
-unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
-				       gfp_t flag)
+/**
+ * isolate_freepages_range() - isolate free pages, must hold zone->lock.
+ * @zone:	Zone pages are in.
+ * @start:	The first PFN to start isolating.
+ * @end:	The one-past-last PFN.
+ * @freelist:	A list to save isolated pages to.
+ * @scannedp:	Optional pointer where to save number of scanned pages.
+ *
+ * If @freelist is not provided, holes in range (either non-free pages
+ * or invalid PFN) are considered an error and function undos its
+ * actions and returns zero.
+ *
+ * If @freelist is provided, function will simply skip non-free and
+ * missing pages and put only the ones isolated on the list.  It will
+ * also call trace_mm_compaction_isolate_freepages() at the end.
+ *
+ * Returns number of isolated pages.  This may be more then end-start
+ * if end fell in a middle of a free page.
+ */
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist, unsigned *scannedp)
 {
-	unsigned long pfn = start, count;
+	unsigned nr_scanned = 0, total_isolated = 0;
+	unsigned long pfn = start;
 	struct page *page;
-	struct zone *zone;
-	int order;
 
 	VM_BUG_ON(!pfn_valid(start));
-	page = pfn_to_page(start);
-	zone = page_zone(page);
 
-	spin_lock_irq(&zone->lock);
+	/* Isolate free pages. This assumes the block is valid */
+	page = pfn_to_page(pfn);
+	while (pfn < end) {
+		unsigned isolated = 1;
 
-	for (;;) {
-		VM_BUG_ON(!page_count(page) || !PageBuddy(page) ||
-			  page_zone(page) != zone);
+		VM_BUG_ON(page_zone(page) != zone);
 
-		list_del(&page->lru);
-		order = page_order(page);
-		count = 1UL << order;
-		zone->free_area[order].nr_free--;
-		rmv_page_order(page);
-		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+		if (!pfn_valid_within(blockpfn))
+			goto skip;
+		++nr_scanned;
 
-		pfn += count;
-		if (pfn >= end)
-			break;
-		VM_BUG_ON(!pfn_valid(pfn));
-
-		if (zone_pfn_same_memmap(pfn - count, pfn))
-			page += count;
-		else
-			page = pfn_to_page(pfn);
-	}
+		if (!PageBuddy(page)) {
+skip:
+			if (freelist)
+				goto next;
+			for (; start < pfn; ++start)
+				__free_page(pfn_to_page(pfn));
+			return 0;
+		}
 
-	spin_unlock_irq(&zone->lock);
+		/* Found a free page, break it into order-0 pages */
+		isolated = split_free_page(page);
+		total_isolated += isolated;
+		if (freelist) {
+			struct page *p = page;
+			unsigned i = isolated;
+			for (; i--; ++page)
+				list_add(&p->lru, freelist);
+		}
 
-	/* After this, pages in the range can be freed one be one */
-	count = pfn - start;
-	pfn = start;
-	for (page = pfn_to_page(pfn); count; --count) {
-		prep_new_page(page, 0, flag);
-		++pfn;
-		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
-			++page;
+next:		/* Advance to the next page */
+		pfn += isolated;
+		if (zone_pfn_same_memmap(pfn - isolated, pfn))
+			page += isolated;
 		else
 			page = pfn_to_page(pfn);
 	}
 
-	return pfn;
+	if (scannedp)
+		*scannedp = nr_scanned;
+	return total_isolated;
 }
 
 static unsigned long pfn_to_maxpage(unsigned long pfn)
@@ -5837,7 +5861,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
- * @flags:	flags passed to alloc_contig_freed_pages().
  * @migratetype:	migratetype of the underlaying pageblocks (either
  *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
  *			in range must have the same migratetype and it must
@@ -5853,9 +5876,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags, unsigned migratetype)
+		       unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
+	struct zone *zone;
 	int ret;
 
 	/*
@@ -5910,7 +5934,17 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 			return -EINVAL;
 
 	outer_start = start & (~0UL << ret);
-	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	zone = page_zone(pfn_to_page(outer_start));
+	spin_lock_irq(&zone->lock);
+	outer_end = isolate_freepages_range(zone, outer_start, end, NULL, NULL);
+	spin_unlock_irq(&zone->lock);
+
+	if (!outer_end) {
+		ret = -EBUSY;
+		goto done;
+	}
+	outer_end += outer_start;
 
 	/* Free head and tail (if any) */
 	if (start != outer_start)
-- 
1.7.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-18 12:21     ` Mel Gorman
                       ` (2 preceding siblings ...)
  (?)
@ 2011-10-24  4:05     ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24  4:05 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

> On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> This commit introduces alloc_contig_freed_pages() function
>> which allocates (ie. removes from buddy system) free pages
>> in range. Caller has to guarantee that all pages in range
>> are in buddy system.

On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Straight away, I'm wondering why you didn't use
> mm/compaction.c#isolate_freepages()

Does the below look like a step in the right direction?

It basically moves isolate_freepages_block() to page_alloc.c (changing
it name to isolate_freepages_range()) and changes it so that depending
on arguments it treats holes (either invalid PFN or non-free page) as
errors so that CMA can use it.

It also accepts a range rather then just assuming a single pageblock
thus the change moves range calculation in compaction.c from
isolate_freepages_block() up to isolate_freepages().

The change also modifies spilt_free_page() so that it does not try to
change pageblock's migrate type if current migrate type is ISOLATE or
CMA.

---
 include/linux/mm.h             |    1 -
 include/linux/page-isolation.h |    4 +-
 mm/compaction.c                |   73 +++--------------------
 mm/internal.h                  |    5 ++
 mm/page_alloc.c                |  128 +++++++++++++++++++++++++---------------
 5 files changed, 95 insertions(+), 116 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fd599f4..98c99c4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -435,7 +435,6 @@ void put_page(struct page *page);
 void put_pages_list(struct list_head *pages);
 
 void split_page(struct page *page, unsigned int order);
-int split_free_page(struct page *page);
 
 /*
  * Compound pages have a destructor function.  Provide a
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 003c52f..6becc74 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -48,10 +48,8 @@ static inline void unset_migratetype_isolate(struct page *page)
 }
 
 /* The below functions must be run on a range from a single zone. */
-extern unsigned long alloc_contig_freed_pages(unsigned long start,
-					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags, unsigned migratetype);
+			      unsigned migratetype);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index 9e5cc59..685a19e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -58,77 +58,15 @@ static unsigned long release_freepages(struct list_head *freelist)
 	return count;
 }
 
-/* Isolate free pages onto a private freelist. Must hold zone->lock */
-static unsigned long isolate_freepages_block(struct zone *zone,
-				unsigned long blockpfn,
-				struct list_head *freelist)
-{
-	unsigned long zone_end_pfn, end_pfn;
-	int nr_scanned = 0, total_isolated = 0;
-	struct page *cursor;
-
-	/* Get the last PFN we should scan for free pages at */
-	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-	end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn);
-
-	/* Find the first usable PFN in the block to initialse page cursor */
-	for (; blockpfn < end_pfn; blockpfn++) {
-		if (pfn_valid_within(blockpfn))
-			break;
-	}
-	cursor = pfn_to_page(blockpfn);
-
-	/* Isolate free pages. This assumes the block is valid */
-	for (; blockpfn < end_pfn; blockpfn++, cursor++) {
-		int isolated, i;
-		struct page *page = cursor;
-
-		if (!pfn_valid_within(blockpfn))
-			continue;
-		nr_scanned++;
-
-		if (!PageBuddy(page))
-			continue;
-
-		/* Found a free page, break it into order-0 pages */
-		isolated = split_free_page(page);
-		total_isolated += isolated;
-		for (i = 0; i < isolated; i++) {
-			list_add(&page->lru, freelist);
-			page++;
-		}
-
-		/* If a page was split, advance to the end of it */
-		if (isolated) {
-			blockpfn += isolated - 1;
-			cursor += isolated - 1;
-		}
-	}
-
-	trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
-	return total_isolated;
-}
-
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct page *page)
 {
-
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
-	/* Keep MIGRATE_CMA alone as well. */
-	/*
-	 * XXX Revisit.  We currently cannot let compaction touch CMA
-	 * pages since compaction insists on changing their migration
-	 * type to MIGRATE_MOVABLE (see split_free_page() called from
-	 * isolate_freepages_block() above).
-	 */
-	if (is_migrate_cma(migratetype))
-		return false;
-
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
@@ -149,7 +87,7 @@ static void isolate_freepages(struct zone *zone,
 				struct compact_control *cc)
 {
 	struct page *page;
-	unsigned long high_pfn, low_pfn, pfn;
+	unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
 	unsigned long flags;
 	int nr_freepages = cc->nr_freepages;
 	struct list_head *freelist = &cc->freepages;
@@ -169,6 +107,8 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	high_pfn = min(low_pfn, pfn);
 
+	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+
 	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
@@ -176,7 +116,7 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
-		unsigned long isolated;
+		unsigned isolated, scanned;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -205,7 +145,10 @@ static void isolate_freepages(struct zone *zone,
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
 		if (suitable_migration_target(page)) {
-			isolated = isolate_freepages_block(zone, pfn, freelist);
+			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+			isolated = isolate_freepages_range(zone, pfn,
+					end_pfn, freelist, &scanned);
+			trace_mm_compaction_isolate_freepages(scanned, isolated);
 			nr_freepages += isolated;
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
diff --git a/mm/internal.h b/mm/internal.h
index d071d380..4a9bb3f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -263,3 +263,8 @@ extern u64 hwpoison_filter_flags_mask;
 extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
+
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist,
+				 unsigned *scannedp);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df69706..adf3f34 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1300,10 +1300,11 @@ void split_page(struct page *page, unsigned int order)
  * Note: this is probably too low level an operation for use in drivers.
  * Please consult with lkml before using this in your driver.
  */
-int split_free_page(struct page *page)
+static unsigned split_free_page(struct page *page)
 {
 	unsigned int order;
 	unsigned long watermark;
+	struct page *endpage;
 	struct zone *zone;
 
 	BUG_ON(!PageBuddy(page));
@@ -1326,14 +1327,18 @@ int split_free_page(struct page *page)
 	set_page_refcounted(page);
 	split_page(page, order);
 
-	if (order >= pageblock_order - 1) {
-		struct page *endpage = page + (1 << order) - 1;
-		for (; page < endpage; page += pageblock_nr_pages)
-			if (!is_pageblock_cma(page))
-				set_pageblock_migratetype(page,
-							  MIGRATE_MOVABLE);
+	if (order < pageblock_order - 1)
+		goto done;
+
+	endpage = page + (1 << order) - 1;
+	for (; page < endpage; page += pageblock_nr_pages) {
+		int mt = get_pageblock_migratetype(page);
+		/* Don't change CMA nor ISOLATE */
+		if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 	}
 
+done:
 	return 1 << order;
 }
 
@@ -5723,57 +5728,76 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
-unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
-				       gfp_t flag)
+/**
+ * isolate_freepages_range() - isolate free pages, must hold zone->lock.
+ * @zone:	Zone pages are in.
+ * @start:	The first PFN to start isolating.
+ * @end:	The one-past-last PFN.
+ * @freelist:	A list to save isolated pages to.
+ * @scannedp:	Optional pointer where to save number of scanned pages.
+ *
+ * If @freelist is not provided, holes in range (either non-free pages
+ * or invalid PFN) are considered an error and function undos its
+ * actions and returns zero.
+ *
+ * If @freelist is provided, function will simply skip non-free and
+ * missing pages and put only the ones isolated on the list.  It will
+ * also call trace_mm_compaction_isolate_freepages() at the end.
+ *
+ * Returns number of isolated pages.  This may be more then end-start
+ * if end fell in a middle of a free page.
+ */
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist, unsigned *scannedp)
 {
-	unsigned long pfn = start, count;
+	unsigned nr_scanned = 0, total_isolated = 0;
+	unsigned long pfn = start;
 	struct page *page;
-	struct zone *zone;
-	int order;
 
 	VM_BUG_ON(!pfn_valid(start));
-	page = pfn_to_page(start);
-	zone = page_zone(page);
 
-	spin_lock_irq(&zone->lock);
+	/* Isolate free pages. This assumes the block is valid */
+	page = pfn_to_page(pfn);
+	while (pfn < end) {
+		unsigned isolated = 1;
 
-	for (;;) {
-		VM_BUG_ON(!page_count(page) || !PageBuddy(page) ||
-			  page_zone(page) != zone);
+		VM_BUG_ON(page_zone(page) != zone);
 
-		list_del(&page->lru);
-		order = page_order(page);
-		count = 1UL << order;
-		zone->free_area[order].nr_free--;
-		rmv_page_order(page);
-		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+		if (!pfn_valid_within(blockpfn))
+			goto skip;
+		++nr_scanned;
 
-		pfn += count;
-		if (pfn >= end)
-			break;
-		VM_BUG_ON(!pfn_valid(pfn));
-
-		if (zone_pfn_same_memmap(pfn - count, pfn))
-			page += count;
-		else
-			page = pfn_to_page(pfn);
-	}
+		if (!PageBuddy(page)) {
+skip:
+			if (freelist)
+				goto next;
+			for (; start < pfn; ++start)
+				__free_page(pfn_to_page(pfn));
+			return 0;
+		}
 
-	spin_unlock_irq(&zone->lock);
+		/* Found a free page, break it into order-0 pages */
+		isolated = split_free_page(page);
+		total_isolated += isolated;
+		if (freelist) {
+			struct page *p = page;
+			unsigned i = isolated;
+			for (; i--; ++page)
+				list_add(&p->lru, freelist);
+		}
 
-	/* After this, pages in the range can be freed one be one */
-	count = pfn - start;
-	pfn = start;
-	for (page = pfn_to_page(pfn); count; --count) {
-		prep_new_page(page, 0, flag);
-		++pfn;
-		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
-			++page;
+next:		/* Advance to the next page */
+		pfn += isolated;
+		if (zone_pfn_same_memmap(pfn - isolated, pfn))
+			page += isolated;
 		else
 			page = pfn_to_page(pfn);
 	}
 
-	return pfn;
+	if (scannedp)
+		*scannedp = nr_scanned;
+	return total_isolated;
 }
 
 static unsigned long pfn_to_maxpage(unsigned long pfn)
@@ -5837,7 +5861,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
- * @flags:	flags passed to alloc_contig_freed_pages().
  * @migratetype:	migratetype of the underlaying pageblocks (either
  *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
  *			in range must have the same migratetype and it must
@@ -5853,9 +5876,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags, unsigned migratetype)
+		       unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
+	struct zone *zone;
 	int ret;
 
 	/*
@@ -5910,7 +5934,17 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 			return -EINVAL;
 
 	outer_start = start & (~0UL << ret);
-	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	zone = page_zone(pfn_to_page(outer_start));
+	spin_lock_irq(&zone->lock);
+	outer_end = isolate_freepages_range(zone, outer_start, end, NULL, NULL);
+	spin_unlock_irq(&zone->lock);
+
+	if (!outer_end) {
+		ret = -EBUSY;
+		goto done;
+	}
+	outer_end += outer_start;
 
 	/* Free head and tail (if any) */
 	if (start != outer_start)
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-18 12:21     ` Mel Gorman
                       ` (3 preceding siblings ...)
  (?)
@ 2011-10-24  4:05     ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24  4:05 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: Ankita Garg, Daniel Walker, Russell King, Arnd Bergmann,
	Jesse Barker, Chunsang Jeong, Jonathan Corbet, linux-kernel,
	Dave Hansen, linaro-mm-sig, linux-mm, Kyungmin Park,
	KAMEZAWA Hiroyuki, Shariq Hasnain, Andrew Morton,
	linux-arm-kernel, linux-media

> On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> This commit introduces alloc_contig_freed_pages() function
>> which allocates (ie. removes from buddy system) free pages
>> in range. Caller has to guarantee that all pages in range
>> are in buddy system.

On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Straight away, I'm wondering why you didn't use
> mm/compaction.c#isolate_freepages()

Does the below look like a step in the right direction?

It basically moves isolate_freepages_block() to page_alloc.c (changing
it name to isolate_freepages_range()) and changes it so that depending
on arguments it treats holes (either invalid PFN or non-free page) as
errors so that CMA can use it.

It also accepts a range rather then just assuming a single pageblock
thus the change moves range calculation in compaction.c from
isolate_freepages_block() up to isolate_freepages().

The change also modifies spilt_free_page() so that it does not try to
change pageblock's migrate type if current migrate type is ISOLATE or
CMA.

---
 include/linux/mm.h             |    1 -
 include/linux/page-isolation.h |    4 +-
 mm/compaction.c                |   73 +++--------------------
 mm/internal.h                  |    5 ++
 mm/page_alloc.c                |  128 +++++++++++++++++++++++++---------------
 5 files changed, 95 insertions(+), 116 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fd599f4..98c99c4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -435,7 +435,6 @@ void put_page(struct page *page);
 void put_pages_list(struct list_head *pages);
 
 void split_page(struct page *page, unsigned int order);
-int split_free_page(struct page *page);
 
 /*
  * Compound pages have a destructor function.  Provide a
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 003c52f..6becc74 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -48,10 +48,8 @@ static inline void unset_migratetype_isolate(struct page *page)
 }
 
 /* The below functions must be run on a range from a single zone. */
-extern unsigned long alloc_contig_freed_pages(unsigned long start,
-					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags, unsigned migratetype);
+			      unsigned migratetype);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index 9e5cc59..685a19e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -58,77 +58,15 @@ static unsigned long release_freepages(struct list_head *freelist)
 	return count;
 }
 
-/* Isolate free pages onto a private freelist. Must hold zone->lock */
-static unsigned long isolate_freepages_block(struct zone *zone,
-				unsigned long blockpfn,
-				struct list_head *freelist)
-{
-	unsigned long zone_end_pfn, end_pfn;
-	int nr_scanned = 0, total_isolated = 0;
-	struct page *cursor;
-
-	/* Get the last PFN we should scan for free pages at */
-	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-	end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn);
-
-	/* Find the first usable PFN in the block to initialse page cursor */
-	for (; blockpfn < end_pfn; blockpfn++) {
-		if (pfn_valid_within(blockpfn))
-			break;
-	}
-	cursor = pfn_to_page(blockpfn);
-
-	/* Isolate free pages. This assumes the block is valid */
-	for (; blockpfn < end_pfn; blockpfn++, cursor++) {
-		int isolated, i;
-		struct page *page = cursor;
-
-		if (!pfn_valid_within(blockpfn))
-			continue;
-		nr_scanned++;
-
-		if (!PageBuddy(page))
-			continue;
-
-		/* Found a free page, break it into order-0 pages */
-		isolated = split_free_page(page);
-		total_isolated += isolated;
-		for (i = 0; i < isolated; i++) {
-			list_add(&page->lru, freelist);
-			page++;
-		}
-
-		/* If a page was split, advance to the end of it */
-		if (isolated) {
-			blockpfn += isolated - 1;
-			cursor += isolated - 1;
-		}
-	}
-
-	trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
-	return total_isolated;
-}
-
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct page *page)
 {
-
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
-	/* Keep MIGRATE_CMA alone as well. */
-	/*
-	 * XXX Revisit.  We currently cannot let compaction touch CMA
-	 * pages since compaction insists on changing their migration
-	 * type to MIGRATE_MOVABLE (see split_free_page() called from
-	 * isolate_freepages_block() above).
-	 */
-	if (is_migrate_cma(migratetype))
-		return false;
-
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
@@ -149,7 +87,7 @@ static void isolate_freepages(struct zone *zone,
 				struct compact_control *cc)
 {
 	struct page *page;
-	unsigned long high_pfn, low_pfn, pfn;
+	unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
 	unsigned long flags;
 	int nr_freepages = cc->nr_freepages;
 	struct list_head *freelist = &cc->freepages;
@@ -169,6 +107,8 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	high_pfn = min(low_pfn, pfn);
 
+	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+
 	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
@@ -176,7 +116,7 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
-		unsigned long isolated;
+		unsigned isolated, scanned;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -205,7 +145,10 @@ static void isolate_freepages(struct zone *zone,
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
 		if (suitable_migration_target(page)) {
-			isolated = isolate_freepages_block(zone, pfn, freelist);
+			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+			isolated = isolate_freepages_range(zone, pfn,
+					end_pfn, freelist, &scanned);
+			trace_mm_compaction_isolate_freepages(scanned, isolated);
 			nr_freepages += isolated;
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
diff --git a/mm/internal.h b/mm/internal.h
index d071d380..4a9bb3f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -263,3 +263,8 @@ extern u64 hwpoison_filter_flags_mask;
 extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
+
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist,
+				 unsigned *scannedp);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df69706..adf3f34 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1300,10 +1300,11 @@ void split_page(struct page *page, unsigned int order)
  * Note: this is probably too low level an operation for use in drivers.
  * Please consult with lkml before using this in your driver.
  */
-int split_free_page(struct page *page)
+static unsigned split_free_page(struct page *page)
 {
 	unsigned int order;
 	unsigned long watermark;
+	struct page *endpage;
 	struct zone *zone;
 
 	BUG_ON(!PageBuddy(page));
@@ -1326,14 +1327,18 @@ int split_free_page(struct page *page)
 	set_page_refcounted(page);
 	split_page(page, order);
 
-	if (order >= pageblock_order - 1) {
-		struct page *endpage = page + (1 << order) - 1;
-		for (; page < endpage; page += pageblock_nr_pages)
-			if (!is_pageblock_cma(page))
-				set_pageblock_migratetype(page,
-							  MIGRATE_MOVABLE);
+	if (order < pageblock_order - 1)
+		goto done;
+
+	endpage = page + (1 << order) - 1;
+	for (; page < endpage; page += pageblock_nr_pages) {
+		int mt = get_pageblock_migratetype(page);
+		/* Don't change CMA nor ISOLATE */
+		if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 	}
 
+done:
 	return 1 << order;
 }
 
@@ -5723,57 +5728,76 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
-unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
-				       gfp_t flag)
+/**
+ * isolate_freepages_range() - isolate free pages, must hold zone->lock.
+ * @zone:	Zone pages are in.
+ * @start:	The first PFN to start isolating.
+ * @end:	The one-past-last PFN.
+ * @freelist:	A list to save isolated pages to.
+ * @scannedp:	Optional pointer where to save number of scanned pages.
+ *
+ * If @freelist is not provided, holes in range (either non-free pages
+ * or invalid PFN) are considered an error and function undos its
+ * actions and returns zero.
+ *
+ * If @freelist is provided, function will simply skip non-free and
+ * missing pages and put only the ones isolated on the list.  It will
+ * also call trace_mm_compaction_isolate_freepages() at the end.
+ *
+ * Returns number of isolated pages.  This may be more then end-start
+ * if end fell in a middle of a free page.
+ */
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist, unsigned *scannedp)
 {
-	unsigned long pfn = start, count;
+	unsigned nr_scanned = 0, total_isolated = 0;
+	unsigned long pfn = start;
 	struct page *page;
-	struct zone *zone;
-	int order;
 
 	VM_BUG_ON(!pfn_valid(start));
-	page = pfn_to_page(start);
-	zone = page_zone(page);
 
-	spin_lock_irq(&zone->lock);
+	/* Isolate free pages. This assumes the block is valid */
+	page = pfn_to_page(pfn);
+	while (pfn < end) {
+		unsigned isolated = 1;
 
-	for (;;) {
-		VM_BUG_ON(!page_count(page) || !PageBuddy(page) ||
-			  page_zone(page) != zone);
+		VM_BUG_ON(page_zone(page) != zone);
 
-		list_del(&page->lru);
-		order = page_order(page);
-		count = 1UL << order;
-		zone->free_area[order].nr_free--;
-		rmv_page_order(page);
-		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+		if (!pfn_valid_within(blockpfn))
+			goto skip;
+		++nr_scanned;
 
-		pfn += count;
-		if (pfn >= end)
-			break;
-		VM_BUG_ON(!pfn_valid(pfn));
-
-		if (zone_pfn_same_memmap(pfn - count, pfn))
-			page += count;
-		else
-			page = pfn_to_page(pfn);
-	}
+		if (!PageBuddy(page)) {
+skip:
+			if (freelist)
+				goto next;
+			for (; start < pfn; ++start)
+				__free_page(pfn_to_page(pfn));
+			return 0;
+		}
 
-	spin_unlock_irq(&zone->lock);
+		/* Found a free page, break it into order-0 pages */
+		isolated = split_free_page(page);
+		total_isolated += isolated;
+		if (freelist) {
+			struct page *p = page;
+			unsigned i = isolated;
+			for (; i--; ++page)
+				list_add(&p->lru, freelist);
+		}
 
-	/* After this, pages in the range can be freed one be one */
-	count = pfn - start;
-	pfn = start;
-	for (page = pfn_to_page(pfn); count; --count) {
-		prep_new_page(page, 0, flag);
-		++pfn;
-		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
-			++page;
+next:		/* Advance to the next page */
+		pfn += isolated;
+		if (zone_pfn_same_memmap(pfn - isolated, pfn))
+			page += isolated;
 		else
 			page = pfn_to_page(pfn);
 	}
 
-	return pfn;
+	if (scannedp)
+		*scannedp = nr_scanned;
+	return total_isolated;
 }
 
 static unsigned long pfn_to_maxpage(unsigned long pfn)
@@ -5837,7 +5861,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
- * @flags:	flags passed to alloc_contig_freed_pages().
  * @migratetype:	migratetype of the underlaying pageblocks (either
  *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
  *			in range must have the same migratetype and it must
@@ -5853,9 +5876,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags, unsigned migratetype)
+		       unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
+	struct zone *zone;
 	int ret;
 
 	/*
@@ -5910,7 +5934,17 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 			return -EINVAL;
 
 	outer_start = start & (~0UL << ret);
-	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	zone = page_zone(pfn_to_page(outer_start));
+	spin_lock_irq(&zone->lock);
+	outer_end = isolate_freepages_range(zone, outer_start, end, NULL, NULL);
+	spin_unlock_irq(&zone->lock);
+
+	if (!outer_end) {
+		ret = -EBUSY;
+		goto done;
+	}
+	outer_end += outer_start;
 
 	/* Free head and tail (if any) */
 	if (start != outer_start)
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-10-24  4:05       ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24  4:05 UTC (permalink / raw)
  To: linux-arm-kernel

> On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
>> This commit introduces alloc_contig_freed_pages() function
>> which allocates (ie. removes from buddy system) free pages
>> in range. Caller has to guarantee that all pages in range
>> are in buddy system.

On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Straight away, I'm wondering why you didn't use
> mm/compaction.c#isolate_freepages()

Does the below look like a step in the right direction?

It basically moves isolate_freepages_block() to page_alloc.c (changing
it name to isolate_freepages_range()) and changes it so that depending
on arguments it treats holes (either invalid PFN or non-free page) as
errors so that CMA can use it.

It also accepts a range rather then just assuming a single pageblock
thus the change moves range calculation in compaction.c from
isolate_freepages_block() up to isolate_freepages().

The change also modifies spilt_free_page() so that it does not try to
change pageblock's migrate type if current migrate type is ISOLATE or
CMA.

---
 include/linux/mm.h             |    1 -
 include/linux/page-isolation.h |    4 +-
 mm/compaction.c                |   73 +++--------------------
 mm/internal.h                  |    5 ++
 mm/page_alloc.c                |  128 +++++++++++++++++++++++++---------------
 5 files changed, 95 insertions(+), 116 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fd599f4..98c99c4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -435,7 +435,6 @@ void put_page(struct page *page);
 void put_pages_list(struct list_head *pages);
 
 void split_page(struct page *page, unsigned int order);
-int split_free_page(struct page *page);
 
 /*
  * Compound pages have a destructor function.  Provide a
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 003c52f..6becc74 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -48,10 +48,8 @@ static inline void unset_migratetype_isolate(struct page *page)
 }
 
 /* The below functions must be run on a range from a single zone. */
-extern unsigned long alloc_contig_freed_pages(unsigned long start,
-					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags, unsigned migratetype);
+			      unsigned migratetype);
 extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index 9e5cc59..685a19e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -58,77 +58,15 @@ static unsigned long release_freepages(struct list_head *freelist)
 	return count;
 }
 
-/* Isolate free pages onto a private freelist. Must hold zone->lock */
-static unsigned long isolate_freepages_block(struct zone *zone,
-				unsigned long blockpfn,
-				struct list_head *freelist)
-{
-	unsigned long zone_end_pfn, end_pfn;
-	int nr_scanned = 0, total_isolated = 0;
-	struct page *cursor;
-
-	/* Get the last PFN we should scan for free pages at */
-	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-	end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn);
-
-	/* Find the first usable PFN in the block to initialse page cursor */
-	for (; blockpfn < end_pfn; blockpfn++) {
-		if (pfn_valid_within(blockpfn))
-			break;
-	}
-	cursor = pfn_to_page(blockpfn);
-
-	/* Isolate free pages. This assumes the block is valid */
-	for (; blockpfn < end_pfn; blockpfn++, cursor++) {
-		int isolated, i;
-		struct page *page = cursor;
-
-		if (!pfn_valid_within(blockpfn))
-			continue;
-		nr_scanned++;
-
-		if (!PageBuddy(page))
-			continue;
-
-		/* Found a free page, break it into order-0 pages */
-		isolated = split_free_page(page);
-		total_isolated += isolated;
-		for (i = 0; i < isolated; i++) {
-			list_add(&page->lru, freelist);
-			page++;
-		}
-
-		/* If a page was split, advance to the end of it */
-		if (isolated) {
-			blockpfn += isolated - 1;
-			cursor += isolated - 1;
-		}
-	}
-
-	trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
-	return total_isolated;
-}
-
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct page *page)
 {
-
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
-	/* Keep MIGRATE_CMA alone as well. */
-	/*
-	 * XXX Revisit.  We currently cannot let compaction touch CMA
-	 * pages since compaction insists on changing their migration
-	 * type to MIGRATE_MOVABLE (see split_free_page() called from
-	 * isolate_freepages_block() above).
-	 */
-	if (is_migrate_cma(migratetype))
-		return false;
-
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
@@ -149,7 +87,7 @@ static void isolate_freepages(struct zone *zone,
 				struct compact_control *cc)
 {
 	struct page *page;
-	unsigned long high_pfn, low_pfn, pfn;
+	unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
 	unsigned long flags;
 	int nr_freepages = cc->nr_freepages;
 	struct list_head *freelist = &cc->freepages;
@@ -169,6 +107,8 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	high_pfn = min(low_pfn, pfn);
 
+	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+
 	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
@@ -176,7 +116,7 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
-		unsigned long isolated;
+		unsigned isolated, scanned;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -205,7 +145,10 @@ static void isolate_freepages(struct zone *zone,
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
 		if (suitable_migration_target(page)) {
-			isolated = isolate_freepages_block(zone, pfn, freelist);
+			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+			isolated = isolate_freepages_range(zone, pfn,
+					end_pfn, freelist, &scanned);
+			trace_mm_compaction_isolate_freepages(scanned, isolated);
 			nr_freepages += isolated;
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
diff --git a/mm/internal.h b/mm/internal.h
index d071d380..4a9bb3f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -263,3 +263,8 @@ extern u64 hwpoison_filter_flags_mask;
 extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
+
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist,
+				 unsigned *scannedp);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df69706..adf3f34 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1300,10 +1300,11 @@ void split_page(struct page *page, unsigned int order)
  * Note: this is probably too low level an operation for use in drivers.
  * Please consult with lkml before using this in your driver.
  */
-int split_free_page(struct page *page)
+static unsigned split_free_page(struct page *page)
 {
 	unsigned int order;
 	unsigned long watermark;
+	struct page *endpage;
 	struct zone *zone;
 
 	BUG_ON(!PageBuddy(page));
@@ -1326,14 +1327,18 @@ int split_free_page(struct page *page)
 	set_page_refcounted(page);
 	split_page(page, order);
 
-	if (order >= pageblock_order - 1) {
-		struct page *endpage = page + (1 << order) - 1;
-		for (; page < endpage; page += pageblock_nr_pages)
-			if (!is_pageblock_cma(page))
-				set_pageblock_migratetype(page,
-							  MIGRATE_MOVABLE);
+	if (order < pageblock_order - 1)
+		goto done;
+
+	endpage = page + (1 << order) - 1;
+	for (; page < endpage; page += pageblock_nr_pages) {
+		int mt = get_pageblock_migratetype(page);
+		/* Don't change CMA nor ISOLATE */
+		if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 	}
 
+done:
 	return 1 << order;
 }
 
@@ -5723,57 +5728,76 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
-unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
-				       gfp_t flag)
+/**
+ * isolate_freepages_range() - isolate free pages, must hold zone->lock.
+ * @zone:	Zone pages are in.
+ * @start:	The first PFN to start isolating.
+ * @end:	The one-past-last PFN.
+ * @freelist:	A list to save isolated pages to.
+ * @scannedp:	Optional pointer where to save number of scanned pages.
+ *
+ * If @freelist is not provided, holes in range (either non-free pages
+ * or invalid PFN) are considered an error and function undos its
+ * actions and returns zero.
+ *
+ * If @freelist is provided, function will simply skip non-free and
+ * missing pages and put only the ones isolated on the list.  It will
+ * also call trace_mm_compaction_isolate_freepages() at the end.
+ *
+ * Returns number of isolated pages.  This may be more then end-start
+ * if end fell in a middle of a free page.
+ */
+unsigned isolate_freepages_range(struct zone *zone,
+				 unsigned long start, unsigned long end,
+				 struct list_head *freelist, unsigned *scannedp)
 {
-	unsigned long pfn = start, count;
+	unsigned nr_scanned = 0, total_isolated = 0;
+	unsigned long pfn = start;
 	struct page *page;
-	struct zone *zone;
-	int order;
 
 	VM_BUG_ON(!pfn_valid(start));
-	page = pfn_to_page(start);
-	zone = page_zone(page);
 
-	spin_lock_irq(&zone->lock);
+	/* Isolate free pages. This assumes the block is valid */
+	page = pfn_to_page(pfn);
+	while (pfn < end) {
+		unsigned isolated = 1;
 
-	for (;;) {
-		VM_BUG_ON(!page_count(page) || !PageBuddy(page) ||
-			  page_zone(page) != zone);
+		VM_BUG_ON(page_zone(page) != zone);
 
-		list_del(&page->lru);
-		order = page_order(page);
-		count = 1UL << order;
-		zone->free_area[order].nr_free--;
-		rmv_page_order(page);
-		__mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+		if (!pfn_valid_within(blockpfn))
+			goto skip;
+		++nr_scanned;
 
-		pfn += count;
-		if (pfn >= end)
-			break;
-		VM_BUG_ON(!pfn_valid(pfn));
-
-		if (zone_pfn_same_memmap(pfn - count, pfn))
-			page += count;
-		else
-			page = pfn_to_page(pfn);
-	}
+		if (!PageBuddy(page)) {
+skip:
+			if (freelist)
+				goto next;
+			for (; start < pfn; ++start)
+				__free_page(pfn_to_page(pfn));
+			return 0;
+		}
 
-	spin_unlock_irq(&zone->lock);
+		/* Found a free page, break it into order-0 pages */
+		isolated = split_free_page(page);
+		total_isolated += isolated;
+		if (freelist) {
+			struct page *p = page;
+			unsigned i = isolated;
+			for (; i--; ++page)
+				list_add(&p->lru, freelist);
+		}
 
-	/* After this, pages in the range can be freed one be one */
-	count = pfn - start;
-	pfn = start;
-	for (page = pfn_to_page(pfn); count; --count) {
-		prep_new_page(page, 0, flag);
-		++pfn;
-		if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
-			++page;
+next:		/* Advance to the next page */
+		pfn += isolated;
+		if (zone_pfn_same_memmap(pfn - isolated, pfn))
+			page += isolated;
 		else
 			page = pfn_to_page(pfn);
 	}
 
-	return pfn;
+	if (scannedp)
+		*scannedp = nr_scanned;
+	return total_isolated;
 }
 
 static unsigned long pfn_to_maxpage(unsigned long pfn)
@@ -5837,7 +5861,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
- * @flags:	flags passed to alloc_contig_freed_pages().
  * @migratetype:	migratetype of the underlaying pageblocks (either
  *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
  *			in range must have the same migratetype and it must
@@ -5853,9 +5876,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags, unsigned migratetype)
+		       unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
+	struct zone *zone;
 	int ret;
 
 	/*
@@ -5910,7 +5934,17 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 			return -EINVAL;
 
 	outer_start = start & (~0UL << ret);
-	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	zone = page_zone(pfn_to_page(outer_start));
+	spin_lock_irq(&zone->lock);
+	outer_end = isolate_freepages_range(zone, outer_start, end, NULL, NULL);
+	spin_unlock_irq(&zone->lock);
+
+	if (!outer_end) {
+		ret = -EBUSY;
+		goto done;
+	}
+	outer_end += outer_start;
 
 	/* Free head and tail (if any) */
 	if (start != outer_start)
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added
  2011-10-18 13:08     ` Mel Gorman
  (?)
@ 2011-10-24 19:32       ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24 19:32 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

> On Thu, Oct 06, 2011 at 03:54:44PM +0200, Marek Szyprowski wrote:
>> The MIGRATE_CMA migration type has two main characteristics:
>> (i) only movable pages can be allocated from MIGRATE_CMA
>> pageblocks and (ii) page allocator will never change migration
>> type of MIGRATE_CMA pageblocks.
>>
>> This guarantees that page in a MIGRATE_CMA page block can
>> always be migrated somewhere else (unless there's no memory left
>> in the system).

On Tue, 18 Oct 2011 06:08:26 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Or the count is premanently elevated by a device driver for some reason or if
> the page is backed by a filesystem with a broken or unusable migrate_page()
> function. This is unavoidable, I'm just pointing out that you can stil have
> migration failures, particularly if GFP_MOVABLE has been improperly used.

CMA does not handle that well right now.  I guess it's something to think about
once the rest is nice and working.

>> It is designed to be used with Contiguous Memory Allocator
>> (CMA) for allocating big chunks (eg. 10MiB) of physically
>> contiguous memory.  Once driver requests contiguous memory,
>> CMA will migrate pages from MIGRATE_CMA pageblocks.
>>
>> To minimise number of migrations, MIGRATE_CMA migration type
>> is the last type tried when page allocator falls back to other
>> migration types then requested.

> It would be preferable if you could figure out how to reuse the
> MIGRATE_RESERVE type for just the bitmap.

I'm not entirely sure of what you mean here.

> Like MIGRATE_CMA, it does not
> change type except when min_free_kbytes changes. However, it is
> something that could be done in the future to keep the size of the
> pageblock bitmap where it is now.


>> +enum {
>> +	MIGRATE_UNMOVABLE,
>> +	MIGRATE_RECLAIMABLE,
>> +	MIGRATE_MOVABLE,
>> +	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
>> +	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
>> +	/*
>> +	 * MIGRATE_CMA migration type is designed to mimic the way
>> +	 * ZONE_MOVABLE works.  Only movable pages can be allocated
>> +	 * from MIGRATE_CMA pageblocks and page allocator never
>> +	 * implicitly change migration type of MIGRATE_CMA pageblock.
>> +	 *
>> +	 * The way to use it is to change migratetype of a range of
>> +	 * pageblocks to MIGRATE_CMA which can be done by
>> +	 * __free_pageblock_cma() function.  What is important though
>> +	 * is that a range of pageblocks must be aligned to
>> +	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
>> +	 * a single pageblock.
>> +	 */
>> +	MIGRATE_CMA,

> This does mean that MIGRATE_CMA also does not have a per-cpu list.
> I don't know if that matters to you but all allocations using
> MIGRATE_CMA will take the zone lock.

This is sort of an artefact of my misunderstanding of pcp lists in the
past.  I'll have to re-evaluate the decision not to include CMA on pcp
list.

Still, I think that CMA not being on pcp lists should not be a problem
for us.  At least we can try and get CMA running and then consider adding
CMA to pcp lists.

> I'm not sure this can be easily avoided because
> if there is a per-CPU list for MIGRATE_CMA, it might use a new cache
> line for it and incur a different set of performance problems.

>> +	MIGRATE_ISOLATE,	/* can't allocate from here */
>> +	MIGRATE_TYPES
>> +};

>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 97254e4..9cf6b2b 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
>>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>>  		return false;
>>
>> +	/* Keep MIGRATE_CMA alone as well. */
>> +	/*
>> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
>> +	 * pages since compaction insists on changing their migration
>> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
>> +	 * isolate_freepages_block() above).
>> +	 */
>> +	if (is_migrate_cma(migratetype))
>> +		return false;
>> +
>
> This is another reason why CMA and compaction should be using almost
> identical code. It does mean that the compact_control may need to be
> renamed and get flags to control things like the setting of pageblock
> flags but it would be preferable to having two almost identical pieces
> of code.

I've addressed it in my other mail where I've changed the split_free_page()
to not touch CMA and ISOLATE pageblocks.  I think that this change should
make the above comment no longer accurate and the check unnecessary.

>>  	/* If the page is a large free page, then allow migration */
>>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>>  		return true;

>> @@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>>  	/* Find the largest possible block of pages in the other list */
>>  	for (current_order = MAX_ORDER-1; current_order >= order;
>>  						--current_order) {
>> -		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
>> +		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
>
> I don't see why this change is necessary.

It changes a sort of a magic number into a value that is calculated
 from the array.  This makes it resistant to changes in the definition
of the fallbacks array.  I think this is a reasonable change.

>>  			migratetype = fallbacks[start_migratetype][i];
>>
>>  			/* MIGRATE_RESERVE handled later if necessary */
>>  			if (migratetype == MIGRATE_RESERVE)
>> -				continue;
>> +				break;
>>
>>  			area = &(zone->free_area[current_order]);
>>  			if (list_empty(&area->free_list[migratetype]))

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-10-24 19:32       ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24 19:32 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

> On Thu, Oct 06, 2011 at 03:54:44PM +0200, Marek Szyprowski wrote:
>> The MIGRATE_CMA migration type has two main characteristics:
>> (i) only movable pages can be allocated from MIGRATE_CMA
>> pageblocks and (ii) page allocator will never change migration
>> type of MIGRATE_CMA pageblocks.
>>
>> This guarantees that page in a MIGRATE_CMA page block can
>> always be migrated somewhere else (unless there's no memory left
>> in the system).

On Tue, 18 Oct 2011 06:08:26 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Or the count is premanently elevated by a device driver for some reason or if
> the page is backed by a filesystem with a broken or unusable migrate_page()
> function. This is unavoidable, I'm just pointing out that you can stil have
> migration failures, particularly if GFP_MOVABLE has been improperly used.

CMA does not handle that well right now.  I guess it's something to think about
once the rest is nice and working.

>> It is designed to be used with Contiguous Memory Allocator
>> (CMA) for allocating big chunks (eg. 10MiB) of physically
>> contiguous memory.  Once driver requests contiguous memory,
>> CMA will migrate pages from MIGRATE_CMA pageblocks.
>>
>> To minimise number of migrations, MIGRATE_CMA migration type
>> is the last type tried when page allocator falls back to other
>> migration types then requested.

> It would be preferable if you could figure out how to reuse the
> MIGRATE_RESERVE type for just the bitmap.

I'm not entirely sure of what you mean here.

> Like MIGRATE_CMA, it does not
> change type except when min_free_kbytes changes. However, it is
> something that could be done in the future to keep the size of the
> pageblock bitmap where it is now.


>> +enum {
>> +	MIGRATE_UNMOVABLE,
>> +	MIGRATE_RECLAIMABLE,
>> +	MIGRATE_MOVABLE,
>> +	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
>> +	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
>> +	/*
>> +	 * MIGRATE_CMA migration type is designed to mimic the way
>> +	 * ZONE_MOVABLE works.  Only movable pages can be allocated
>> +	 * from MIGRATE_CMA pageblocks and page allocator never
>> +	 * implicitly change migration type of MIGRATE_CMA pageblock.
>> +	 *
>> +	 * The way to use it is to change migratetype of a range of
>> +	 * pageblocks to MIGRATE_CMA which can be done by
>> +	 * __free_pageblock_cma() function.  What is important though
>> +	 * is that a range of pageblocks must be aligned to
>> +	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
>> +	 * a single pageblock.
>> +	 */
>> +	MIGRATE_CMA,

> This does mean that MIGRATE_CMA also does not have a per-cpu list.
> I don't know if that matters to you but all allocations using
> MIGRATE_CMA will take the zone lock.

This is sort of an artefact of my misunderstanding of pcp lists in the
past.  I'll have to re-evaluate the decision not to include CMA on pcp
list.

Still, I think that CMA not being on pcp lists should not be a problem
for us.  At least we can try and get CMA running and then consider adding
CMA to pcp lists.

> I'm not sure this can be easily avoided because
> if there is a per-CPU list for MIGRATE_CMA, it might use a new cache
> line for it and incur a different set of performance problems.

>> +	MIGRATE_ISOLATE,	/* can't allocate from here */
>> +	MIGRATE_TYPES
>> +};

>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 97254e4..9cf6b2b 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
>>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>>  		return false;
>>
>> +	/* Keep MIGRATE_CMA alone as well. */
>> +	/*
>> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
>> +	 * pages since compaction insists on changing their migration
>> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
>> +	 * isolate_freepages_block() above).
>> +	 */
>> +	if (is_migrate_cma(migratetype))
>> +		return false;
>> +
>
> This is another reason why CMA and compaction should be using almost
> identical code. It does mean that the compact_control may need to be
> renamed and get flags to control things like the setting of pageblock
> flags but it would be preferable to having two almost identical pieces
> of code.

I've addressed it in my other mail where I've changed the split_free_page()
to not touch CMA and ISOLATE pageblocks.  I think that this change should
make the above comment no longer accurate and the check unnecessary.

>>  	/* If the page is a large free page, then allow migration */
>>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>>  		return true;

>> @@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>>  	/* Find the largest possible block of pages in the other list */
>>  	for (current_order = MAX_ORDER-1; current_order >= order;
>>  						--current_order) {
>> -		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
>> +		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
>
> I don't see why this change is necessary.

It changes a sort of a magic number into a value that is calculated
 from the array.  This makes it resistant to changes in the definition
of the fallbacks array.  I think this is a reasonable change.

>>  			migratetype = fallbacks[start_migratetype][i];
>>
>>  			/* MIGRATE_RESERVE handled later if necessary */
>>  			if (migratetype == MIGRATE_RESERVE)
>> -				continue;
>> +				break;
>>
>>  			area = &(zone->free_area[current_order]);
>>  			if (list_empty(&area->free_list[migratetype]))

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-10-24 19:32       ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24 19:32 UTC (permalink / raw)
  To: linux-arm-kernel

> On Thu, Oct 06, 2011 at 03:54:44PM +0200, Marek Szyprowski wrote:
>> The MIGRATE_CMA migration type has two main characteristics:
>> (i) only movable pages can be allocated from MIGRATE_CMA
>> pageblocks and (ii) page allocator will never change migration
>> type of MIGRATE_CMA pageblocks.
>>
>> This guarantees that page in a MIGRATE_CMA page block can
>> always be migrated somewhere else (unless there's no memory left
>> in the system).

On Tue, 18 Oct 2011 06:08:26 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Or the count is premanently elevated by a device driver for some reason or if
> the page is backed by a filesystem with a broken or unusable migrate_page()
> function. This is unavoidable, I'm just pointing out that you can stil have
> migration failures, particularly if GFP_MOVABLE has been improperly used.

CMA does not handle that well right now.  I guess it's something to think about
once the rest is nice and working.

>> It is designed to be used with Contiguous Memory Allocator
>> (CMA) for allocating big chunks (eg. 10MiB) of physically
>> contiguous memory.  Once driver requests contiguous memory,
>> CMA will migrate pages from MIGRATE_CMA pageblocks.
>>
>> To minimise number of migrations, MIGRATE_CMA migration type
>> is the last type tried when page allocator falls back to other
>> migration types then requested.

> It would be preferable if you could figure out how to reuse the
> MIGRATE_RESERVE type for just the bitmap.

I'm not entirely sure of what you mean here.

> Like MIGRATE_CMA, it does not
> change type except when min_free_kbytes changes. However, it is
> something that could be done in the future to keep the size of the
> pageblock bitmap where it is now.


>> +enum {
>> +	MIGRATE_UNMOVABLE,
>> +	MIGRATE_RECLAIMABLE,
>> +	MIGRATE_MOVABLE,
>> +	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
>> +	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
>> +	/*
>> +	 * MIGRATE_CMA migration type is designed to mimic the way
>> +	 * ZONE_MOVABLE works.  Only movable pages can be allocated
>> +	 * from MIGRATE_CMA pageblocks and page allocator never
>> +	 * implicitly change migration type of MIGRATE_CMA pageblock.
>> +	 *
>> +	 * The way to use it is to change migratetype of a range of
>> +	 * pageblocks to MIGRATE_CMA which can be done by
>> +	 * __free_pageblock_cma() function.  What is important though
>> +	 * is that a range of pageblocks must be aligned to
>> +	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
>> +	 * a single pageblock.
>> +	 */
>> +	MIGRATE_CMA,

> This does mean that MIGRATE_CMA also does not have a per-cpu list.
> I don't know if that matters to you but all allocations using
> MIGRATE_CMA will take the zone lock.

This is sort of an artefact of my misunderstanding of pcp lists in the
past.  I'll have to re-evaluate the decision not to include CMA on pcp
list.

Still, I think that CMA not being on pcp lists should not be a problem
for us.  At least we can try and get CMA running and then consider adding
CMA to pcp lists.

> I'm not sure this can be easily avoided because
> if there is a per-CPU list for MIGRATE_CMA, it might use a new cache
> line for it and incur a different set of performance problems.

>> +	MIGRATE_ISOLATE,	/* can't allocate from here */
>> +	MIGRATE_TYPES
>> +};

>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 97254e4..9cf6b2b 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
>>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>>  		return false;
>>
>> +	/* Keep MIGRATE_CMA alone as well. */
>> +	/*
>> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
>> +	 * pages since compaction insists on changing their migration
>> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
>> +	 * isolate_freepages_block() above).
>> +	 */
>> +	if (is_migrate_cma(migratetype))
>> +		return false;
>> +
>
> This is another reason why CMA and compaction should be using almost
> identical code. It does mean that the compact_control may need to be
> renamed and get flags to control things like the setting of pageblock
> flags but it would be preferable to having two almost identical pieces
> of code.

I've addressed it in my other mail where I've changed the split_free_page()
to not touch CMA and ISOLATE pageblocks.  I think that this change should
make the above comment no longer accurate and the check unnecessary.

>>  	/* If the page is a large free page, then allow migration */
>>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>>  		return true;

>> @@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>>  	/* Find the largest possible block of pages in the other list */
>>  	for (current_order = MAX_ORDER-1; current_order >= order;
>>  						--current_order) {
>> -		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
>> +		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
>
> I don't see why this change is necessary.

It changes a sort of a magic number into a value that is calculated
 from the array.  This makes it resistant to changes in the definition
of the fallbacks array.  I think this is a reasonable change.

>>  			migratetype = fallbacks[start_migratetype][i];
>>
>>  			/* MIGRATE_RESERVE handled later if necessary */
>>  			if (migratetype == MIGRATE_RESERVE)
>> -				continue;
>> +				break;
>>
>>  			area = &(zone->free_area[current_order]);
>>  			if (list_empty(&area->free_list[migratetype]))

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Micha? ?mina86? Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator
  2011-10-18 13:43     ` Mel Gorman
  (?)
@ 2011-10-24 19:39       ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24 19:39 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

> On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:
>> +static unsigned long __init __cma_early_get_total_pages(void)
>> +{
>> +	struct memblock_region *reg;
>> +	unsigned long total_pages = 0;
>> +
>> +	/*
>> +	 * We cannot use memblock_phys_mem_size() here, because
>> +	 * memblock_analyze() has not been called yet.
>> +	 */
>> +	for_each_memblock(memory, reg)
>> +		total_pages += memblock_region_memory_end_pfn(reg) -
>> +			       memblock_region_memory_base_pfn(reg);
>> +	return total_pages;
>> +}
>> +

On Tue, 18 Oct 2011 06:43:21 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Is this being called too early yet? What prevents you seeing up the CMA
> regions after the page allocator is brought up for example? I understand
> that there is a need for the memory to be coherent so maybe that is the
> obstacle.

Another reason is that we want to be sure that we can get given range of pages.
After page allocator is set-up, someone could allocate a non-movable page from
the range that interests us and that wouldn't be nice for us.

>> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
>> +				       unsigned int align)
>> +{
>> +	struct cma *cma = get_dev_cma_area(dev);
>> +	unsigned long pfn, pageno;
>> +	int ret;
>> +
>> +	if (!cma)
>> +		return NULL;
>> +
>> +	if (align > CONFIG_CMA_ALIGNMENT)
>> +		align = CONFIG_CMA_ALIGNMENT;
>> +
>> +	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
>> +		 count, align);
>> +
>> +	if (!count)
>> +		return NULL;
>> +
>> +	mutex_lock(&cma_mutex);
>> +
>> +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
>> +					    (1 << align) - 1);
>> +	if (pageno >= cma->count) {
>> +		ret = -ENOMEM;
>> +		goto error;
>> +	}
>> +	bitmap_set(cma->bitmap, pageno, count);
>> +
>> +	pfn = cma->base_pfn + pageno;
>> +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
>> +	if (ret)
>> +		goto free;
>> +

> If alloc_contig_range returns failure, the bitmap is still set. It will
> never be freed so now the area cannot be used for CMA allocations any
> more.

bitmap is cleared at the “free:” label.

>> +	mutex_unlock(&cma_mutex);
>> +
>> +	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
>> +	return pfn_to_page(pfn);
>> +free:
>> +	bitmap_clear(cma->bitmap, pageno, count);
>> +error:
>> +	mutex_unlock(&cma_mutex);
>> +	return NULL;
>> +}


>> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
>> +				int count)
>> +{
>> +	struct cma *cma = get_dev_cma_area(dev);
>> +	unsigned long pfn;
>> +
>> +	if (!cma || !pages)
>> +		return 0;
>> +
>> +	pr_debug("%s(page %p)\n", __func__, (void *)pages);
>> +
>> +	pfn = page_to_pfn(pages);
>> +
>> +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
>> +		return 0;
>> +
>> +	mutex_lock(&cma_mutex);
>> +
>> +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
>> +	free_contig_pages(pfn, count);
>> +
>> +	mutex_unlock(&cma_mutex);
>
> It feels like the mutex could be a lot lighter here. If the bitmap is
> protected by a spinlock, it would only need to be held while the bitmap
> was being cleared. free the contig pages outside the spinlock and clear
> the bitmap afterwards.
>
> It's not particularly important as the scalability of CMA is not
> something to be concerned with at this point.

Mutex is used also to protect the core operations, ie. isolating pages
and such.  This is because two CMA calls may want to work on the same
pageblock and we have to prevent that from happening.

We could add the spinlock for protecting the bitmap but we will still
need mutex for other uses.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-10-24 19:39       ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24 19:39 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

> On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:
>> +static unsigned long __init __cma_early_get_total_pages(void)
>> +{
>> +	struct memblock_region *reg;
>> +	unsigned long total_pages = 0;
>> +
>> +	/*
>> +	 * We cannot use memblock_phys_mem_size() here, because
>> +	 * memblock_analyze() has not been called yet.
>> +	 */
>> +	for_each_memblock(memory, reg)
>> +		total_pages += memblock_region_memory_end_pfn(reg) -
>> +			       memblock_region_memory_base_pfn(reg);
>> +	return total_pages;
>> +}
>> +

On Tue, 18 Oct 2011 06:43:21 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Is this being called too early yet? What prevents you seeing up the CMA
> regions after the page allocator is brought up for example? I understand
> that there is a need for the memory to be coherent so maybe that is the
> obstacle.

Another reason is that we want to be sure that we can get given range of pages.
After page allocator is set-up, someone could allocate a non-movable page from
the range that interests us and that wouldn't be nice for us.

>> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
>> +				       unsigned int align)
>> +{
>> +	struct cma *cma = get_dev_cma_area(dev);
>> +	unsigned long pfn, pageno;
>> +	int ret;
>> +
>> +	if (!cma)
>> +		return NULL;
>> +
>> +	if (align > CONFIG_CMA_ALIGNMENT)
>> +		align = CONFIG_CMA_ALIGNMENT;
>> +
>> +	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
>> +		 count, align);
>> +
>> +	if (!count)
>> +		return NULL;
>> +
>> +	mutex_lock(&cma_mutex);
>> +
>> +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
>> +					    (1 << align) - 1);
>> +	if (pageno >= cma->count) {
>> +		ret = -ENOMEM;
>> +		goto error;
>> +	}
>> +	bitmap_set(cma->bitmap, pageno, count);
>> +
>> +	pfn = cma->base_pfn + pageno;
>> +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
>> +	if (ret)
>> +		goto free;
>> +

> If alloc_contig_range returns failure, the bitmap is still set. It will
> never be freed so now the area cannot be used for CMA allocations any
> more.

bitmap is cleared at the “free:” label.

>> +	mutex_unlock(&cma_mutex);
>> +
>> +	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
>> +	return pfn_to_page(pfn);
>> +free:
>> +	bitmap_clear(cma->bitmap, pageno, count);
>> +error:
>> +	mutex_unlock(&cma_mutex);
>> +	return NULL;
>> +}


>> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
>> +				int count)
>> +{
>> +	struct cma *cma = get_dev_cma_area(dev);
>> +	unsigned long pfn;
>> +
>> +	if (!cma || !pages)
>> +		return 0;
>> +
>> +	pr_debug("%s(page %p)\n", __func__, (void *)pages);
>> +
>> +	pfn = page_to_pfn(pages);
>> +
>> +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
>> +		return 0;
>> +
>> +	mutex_lock(&cma_mutex);
>> +
>> +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
>> +	free_contig_pages(pfn, count);
>> +
>> +	mutex_unlock(&cma_mutex);
>
> It feels like the mutex could be a lot lighter here. If the bitmap is
> protected by a spinlock, it would only need to be held while the bitmap
> was being cleared. free the contig pages outside the spinlock and clear
> the bitmap afterwards.
>
> It's not particularly important as the scalability of CMA is not
> something to be concerned with at this point.

Mutex is used also to protect the core operations, ie. isolating pages
and such.  This is because two CMA calls may want to work on the same
pageblock and we have to prevent that from happening.

We could add the spinlock for protecting the bitmap but we will still
need mutex for other uses.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-10-24 19:39       ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-24 19:39 UTC (permalink / raw)
  To: linux-arm-kernel

> On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:
>> +static unsigned long __init __cma_early_get_total_pages(void)
>> +{
>> +	struct memblock_region *reg;
>> +	unsigned long total_pages = 0;
>> +
>> +	/*
>> +	 * We cannot use memblock_phys_mem_size() here, because
>> +	 * memblock_analyze() has not been called yet.
>> +	 */
>> +	for_each_memblock(memory, reg)
>> +		total_pages += memblock_region_memory_end_pfn(reg) -
>> +			       memblock_region_memory_base_pfn(reg);
>> +	return total_pages;
>> +}
>> +

On Tue, 18 Oct 2011 06:43:21 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> Is this being called too early yet? What prevents you seeing up the CMA
> regions after the page allocator is brought up for example? I understand
> that there is a need for the memory to be coherent so maybe that is the
> obstacle.

Another reason is that we want to be sure that we can get given range of pages.
After page allocator is set-up, someone could allocate a non-movable page from
the range that interests us and that wouldn't be nice for us.

>> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
>> +				       unsigned int align)
>> +{
>> +	struct cma *cma = get_dev_cma_area(dev);
>> +	unsigned long pfn, pageno;
>> +	int ret;
>> +
>> +	if (!cma)
>> +		return NULL;
>> +
>> +	if (align > CONFIG_CMA_ALIGNMENT)
>> +		align = CONFIG_CMA_ALIGNMENT;
>> +
>> +	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
>> +		 count, align);
>> +
>> +	if (!count)
>> +		return NULL;
>> +
>> +	mutex_lock(&cma_mutex);
>> +
>> +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
>> +					    (1 << align) - 1);
>> +	if (pageno >= cma->count) {
>> +		ret = -ENOMEM;
>> +		goto error;
>> +	}
>> +	bitmap_set(cma->bitmap, pageno, count);
>> +
>> +	pfn = cma->base_pfn + pageno;
>> +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
>> +	if (ret)
>> +		goto free;
>> +

> If alloc_contig_range returns failure, the bitmap is still set. It will
> never be freed so now the area cannot be used for CMA allocations any
> more.

bitmap is cleared at the ?free:? label.

>> +	mutex_unlock(&cma_mutex);
>> +
>> +	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
>> +	return pfn_to_page(pfn);
>> +free:
>> +	bitmap_clear(cma->bitmap, pageno, count);
>> +error:
>> +	mutex_unlock(&cma_mutex);
>> +	return NULL;
>> +}


>> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
>> +				int count)
>> +{
>> +	struct cma *cma = get_dev_cma_area(dev);
>> +	unsigned long pfn;
>> +
>> +	if (!cma || !pages)
>> +		return 0;
>> +
>> +	pr_debug("%s(page %p)\n", __func__, (void *)pages);
>> +
>> +	pfn = page_to_pfn(pages);
>> +
>> +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
>> +		return 0;
>> +
>> +	mutex_lock(&cma_mutex);
>> +
>> +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
>> +	free_contig_pages(pfn, count);
>> +
>> +	mutex_unlock(&cma_mutex);
>
> It feels like the mutex could be a lot lighter here. If the bitmap is
> protected by a spinlock, it would only need to be held while the bitmap
> was being cleared. free the contig pages outside the spinlock and clear
> the bitmap afterwards.
>
> It's not particularly important as the scalability of CMA is not
> something to be concerned with at this point.

Mutex is used also to protect the core operations, ie. isolating pages
and such.  This is because two CMA calls may want to work on the same
pageblock and we have to prevent that from happening.

We could add the spinlock for protecting the bitmap but we will still
need mutex for other uses.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Micha? ?mina86? Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added
  2011-10-24 19:32       ` Michal Nazarewicz
  (?)
@ 2011-10-27  9:10         ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-27  9:10 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman, Michal Nazarewicz
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

> On Tue, 18 Oct 2011 06:08:26 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
>> This does mean that MIGRATE_CMA also does not have a per-cpu list.
>> I don't know if that matters to you but all allocations using
>> MIGRATE_CMA will take the zone lock.

On Mon, 24 Oct 2011 21:32:45 +0200, Michal Nazarewicz <mina86@mina86.com> wrote:
> This is sort of an artefact of my misunderstanding of pcp lists in the
> past.  I'll have to re-evaluate the decision not to include CMA on pcp
> list.

Actually sorry.  My comment above is somehow invalid.

The CMA does not need to be on pcp list because CMA pages are never allocated
via standard kmalloc() and friends.  Because of the fallbacks in rmqueue_bulk()
the CMA pages end up being added to a pcp list of the MOVABLE type and so when
kmallec() allocates an MOVABLE page it can end up grabbing a CMA page.

So it's quite OK that CMA does not have its own pcp list as the list would
not be used anyway.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-10-27  9:10         ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-27  9:10 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman, Michal Nazarewicz
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen

> On Tue, 18 Oct 2011 06:08:26 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
>> This does mean that MIGRATE_CMA also does not have a per-cpu list.
>> I don't know if that matters to you but all allocations using
>> MIGRATE_CMA will take the zone lock.

On Mon, 24 Oct 2011 21:32:45 +0200, Michal Nazarewicz <mina86@mina86.com> wrote:
> This is sort of an artefact of my misunderstanding of pcp lists in the
> past.  I'll have to re-evaluate the decision not to include CMA on pcp
> list.

Actually sorry.  My comment above is somehow invalid.

The CMA does not need to be on pcp list because CMA pages are never allocated
via standard kmalloc() and friends.  Because of the fallbacks in rmqueue_bulk()
the CMA pages end up being added to a pcp list of the MOVABLE type and so when
kmallec() allocates an MOVABLE page it can end up grabbing a CMA page.

So it's quite OK that CMA does not have its own pcp list as the list would
not be used anyway.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-10-27  9:10         ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-10-27  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

> On Tue, 18 Oct 2011 06:08:26 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
>> This does mean that MIGRATE_CMA also does not have a per-cpu list.
>> I don't know if that matters to you but all allocations using
>> MIGRATE_CMA will take the zone lock.

On Mon, 24 Oct 2011 21:32:45 +0200, Michal Nazarewicz <mina86@mina86.com> wrote:
> This is sort of an artefact of my misunderstanding of pcp lists in the
> past.  I'll have to re-evaluate the decision not to include CMA on pcp
> list.

Actually sorry.  My comment above is somehow invalid.

The CMA does not need to be on pcp list because CMA pages are never allocated
via standard kmalloc() and friends.  Because of the fallbacks in rmqueue_bulk()
the CMA pages end up being added to a pcp list of the MOVABLE type and so when
kmallec() allocates an MOVABLE page it can end up grabbing a CMA page.

So it's quite OK that CMA does not have its own pcp list as the list would
not be used anyway.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Micha? ?mina86? Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-10-24  4:05       ` Michal Nazarewicz
  (?)
@ 2011-11-01 15:04         ` Mel Gorman
  -1 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-11-01 15:04 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Sun, Oct 23, 2011 at 09:05:05PM -0700, Michal Nazarewicz wrote:
> > On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
> >> This commit introduces alloc_contig_freed_pages() function
> >> which allocates (ie. removes from buddy system) free pages
> >> in range. Caller has to guarantee that all pages in range
> >> are in buddy system.
> 
> On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> > Straight away, I'm wondering why you didn't use
> > mm/compaction.c#isolate_freepages()
> 
> Does the below look like a step in the right direction?
> 
> It basically moves isolate_freepages_block() to page_alloc.c (changing

For the purposes of review, have a separate patch for moving
isolate_freepages_block to another file that does not alter the
function in any way. When the function is updated in a follow-on patch,
it'll be far easier to see what has changed.

page_isolation.c may also be a better fit than page_alloc.c

As it is, the patch for isolate_freepages_block is almost impossible
to read because it's getting munged with existing code that is already
in page_alloc.c . About all I caught from it is that scannedp does
not have a type. It defaults to unsigned int but it's unnecessarily
obscure.

> it name to isolate_freepages_range()) and changes it so that depending
> on arguments it treats holes (either invalid PFN or non-free page) as
> errors so that CMA can use it.
> 

I haven't actually read the function because it's too badly mixed with
page_alloc.c code but this description fits what I'm looking for.

> It also accepts a range rather then just assuming a single pageblock
> thus the change moves range calculation in compaction.c from
> isolate_freepages_block() up to isolate_freepages().
> 
> The change also modifies spilt_free_page() so that it does not try to
> change pageblock's migrate type if current migrate type is ISOLATE or
> CMA.
> 

This is fine. Later, the flags that determine what happens to pageblocks
may be placed in compact_control.

> ---
>  include/linux/mm.h             |    1 -
>  include/linux/page-isolation.h |    4 +-
>  mm/compaction.c                |   73 +++--------------------
>  mm/internal.h                  |    5 ++
>  mm/page_alloc.c                |  128 +++++++++++++++++++++++++---------------
>  5 files changed, 95 insertions(+), 116 deletions(-)
> 

I confess I didn't read closely because of the mess in page_alloc.c but
the intent seems fine. Hopefully there will be a new version of CMA
posted that will be easier to review.

Thanks

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-11-01 15:04         ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-11-01 15:04 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Sun, Oct 23, 2011 at 09:05:05PM -0700, Michal Nazarewicz wrote:
> > On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
> >> This commit introduces alloc_contig_freed_pages() function
> >> which allocates (ie. removes from buddy system) free pages
> >> in range. Caller has to guarantee that all pages in range
> >> are in buddy system.
> 
> On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> > Straight away, I'm wondering why you didn't use
> > mm/compaction.c#isolate_freepages()
> 
> Does the below look like a step in the right direction?
> 
> It basically moves isolate_freepages_block() to page_alloc.c (changing

For the purposes of review, have a separate patch for moving
isolate_freepages_block to another file that does not alter the
function in any way. When the function is updated in a follow-on patch,
it'll be far easier to see what has changed.

page_isolation.c may also be a better fit than page_alloc.c

As it is, the patch for isolate_freepages_block is almost impossible
to read because it's getting munged with existing code that is already
in page_alloc.c . About all I caught from it is that scannedp does
not have a type. It defaults to unsigned int but it's unnecessarily
obscure.

> it name to isolate_freepages_range()) and changes it so that depending
> on arguments it treats holes (either invalid PFN or non-free page) as
> errors so that CMA can use it.
> 

I haven't actually read the function because it's too badly mixed with
page_alloc.c code but this description fits what I'm looking for.

> It also accepts a range rather then just assuming a single pageblock
> thus the change moves range calculation in compaction.c from
> isolate_freepages_block() up to isolate_freepages().
> 
> The change also modifies spilt_free_page() so that it does not try to
> change pageblock's migrate type if current migrate type is ISOLATE or
> CMA.
> 

This is fine. Later, the flags that determine what happens to pageblocks
may be placed in compact_control.

> ---
>  include/linux/mm.h             |    1 -
>  include/linux/page-isolation.h |    4 +-
>  mm/compaction.c                |   73 +++--------------------
>  mm/internal.h                  |    5 ++
>  mm/page_alloc.c                |  128 +++++++++++++++++++++++++---------------
>  5 files changed, 95 insertions(+), 116 deletions(-)
> 

I confess I didn't read closely because of the mess in page_alloc.c but
the intent seems fine. Hopefully there will be a new version of CMA
posted that will be easier to review.

Thanks

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-11-01 15:04         ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-11-01 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Oct 23, 2011 at 09:05:05PM -0700, Michal Nazarewicz wrote:
> > On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
> >> This commit introduces alloc_contig_freed_pages() function
> >> which allocates (ie. removes from buddy system) free pages
> >> in range. Caller has to guarantee that all pages in range
> >> are in buddy system.
> 
> On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman <mel@csn.ul.ie> wrote:
> > Straight away, I'm wondering why you didn't use
> > mm/compaction.c#isolate_freepages()
> 
> Does the below look like a step in the right direction?
> 
> It basically moves isolate_freepages_block() to page_alloc.c (changing

For the purposes of review, have a separate patch for moving
isolate_freepages_block to another file that does not alter the
function in any way. When the function is updated in a follow-on patch,
it'll be far easier to see what has changed.

page_isolation.c may also be a better fit than page_alloc.c

As it is, the patch for isolate_freepages_block is almost impossible
to read because it's getting munged with existing code that is already
in page_alloc.c . About all I caught from it is that scannedp does
not have a type. It defaults to unsigned int but it's unnecessarily
obscure.

> it name to isolate_freepages_range()) and changes it so that depending
> on arguments it treats holes (either invalid PFN or non-free page) as
> errors so that CMA can use it.
> 

I haven't actually read the function because it's too badly mixed with
page_alloc.c code but this description fits what I'm looking for.

> It also accepts a range rather then just assuming a single pageblock
> thus the change moves range calculation in compaction.c from
> isolate_freepages_block() up to isolate_freepages().
> 
> The change also modifies spilt_free_page() so that it does not try to
> change pageblock's migrate type if current migrate type is ISOLATE or
> CMA.
> 

This is fine. Later, the flags that determine what happens to pageblocks
may be placed in compact_control.

> ---
>  include/linux/mm.h             |    1 -
>  include/linux/page-isolation.h |    4 +-
>  mm/compaction.c                |   73 +++--------------------
>  mm/internal.h                  |    5 ++
>  mm/page_alloc.c                |  128 +++++++++++++++++++++++++---------------
>  5 files changed, 95 insertions(+), 116 deletions(-)
> 

I confess I didn't read closely because of the mess in page_alloc.c but
the intent seems fine. Hopefully there will be a new version of CMA
posted that will be easier to review.

Thanks

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-11-01 15:04         ` Mel Gorman
  (?)
@ 2011-11-01 18:06           ` Michal Nazarewicz
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-11-01 18:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Tue, 01 Nov 2011 16:04:48 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> For the purposes of review, have a separate patch for moving
> isolate_freepages_block to another file that does not alter the
> function in any way. When the function is updated in a follow-on patch,
> it'll be far easier to see what has changed.

Will do.

> page_isolation.c may also be a better fit than page_alloc.c

Since isolate_freepages_block() is the only user of split_free_page(),
would it make sense to move split_free_page() to page_isolation.c as
well?  I sort of like the idea of making it static and removing from
header file.

> I confess I didn't read closely because of the mess in page_alloc.c but
> the intent seems fine.

No worries.  I just needed for a quick comment whether I'm headed the right
direction. :)

> Hopefully there will be a new version of CMA posted that will be easier
> to review.

I'll try and create the code no latter then on the weekend so hopefully
the new version will be sent next week.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-11-01 18:06           ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-11-01 18:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Tue, 01 Nov 2011 16:04:48 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> For the purposes of review, have a separate patch for moving
> isolate_freepages_block to another file that does not alter the
> function in any way. When the function is updated in a follow-on patch,
> it'll be far easier to see what has changed.

Will do.

> page_isolation.c may also be a better fit than page_alloc.c

Since isolate_freepages_block() is the only user of split_free_page(),
would it make sense to move split_free_page() to page_isolation.c as
well?  I sort of like the idea of making it static and removing from
header file.

> I confess I didn't read closely because of the mess in page_alloc.c but
> the intent seems fine.

No worries.  I just needed for a quick comment whether I'm headed the right
direction. :)

> Hopefully there will be a new version of CMA posted that will be easier
> to review.

I'll try and create the code no latter then on the weekend so hopefully
the new version will be sent next week.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-11-01 18:06           ` Michal Nazarewicz
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Nazarewicz @ 2011-11-01 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 01 Nov 2011 16:04:48 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> For the purposes of review, have a separate patch for moving
> isolate_freepages_block to another file that does not alter the
> function in any way. When the function is updated in a follow-on patch,
> it'll be far easier to see what has changed.

Will do.

> page_isolation.c may also be a better fit than page_alloc.c

Since isolate_freepages_block() is the only user of split_free_page(),
would it make sense to move split_free_page() to page_isolation.c as
well?  I sort of like the idea of making it static and removing from
header file.

> I confess I didn't read closely because of the mess in page_alloc.c but
> the intent seems fine.

No worries.  I just needed for a quick comment whether I'm headed the right
direction. :)

> Hopefully there will be a new version of CMA posted that will be easier
> to review.

I'll try and create the code no latter then on the weekend so hopefully
the new version will be sent next week.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Micha? ?mina86? Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-11-01 18:06           ` Michal Nazarewicz
  (?)
@ 2011-11-01 18:47             ` Mel Gorman
  -1 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-11-01 18:47 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Tue, Nov 01, 2011 at 07:06:56PM +0100, Michal Nazarewicz wrote:
> >page_isolation.c may also be a better fit than page_alloc.c
> 
> Since isolate_freepages_block() is the only user of split_free_page(),
> would it make sense to move split_free_page() to page_isolation.c as
> well?  I sort of like the idea of making it static and removing from
> header file.
> 

I see no problem with that. It'll be separate from split_page() but that
is not earth shattering.

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-11-01 18:47             ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-11-01 18:47 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen

On Tue, Nov 01, 2011 at 07:06:56PM +0100, Michal Nazarewicz wrote:
> >page_isolation.c may also be a better fit than page_alloc.c
> 
> Since isolate_freepages_block() is the only user of split_free_page(),
> would it make sense to move split_free_page() to page_isolation.c as
> well?  I sort of like the idea of making it static and removing from
> header file.
> 

I see no problem with that. It'll be separate from split_page() but that
is not earth shattering.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-11-01 18:47             ` Mel Gorman
  0 siblings, 0 replies; 180+ messages in thread
From: Mel Gorman @ 2011-11-01 18:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 01, 2011 at 07:06:56PM +0100, Michal Nazarewicz wrote:
> >page_isolation.c may also be a better fit than page_alloc.c
> 
> Since isolate_freepages_block() is the only user of split_free_page(),
> would it make sense to move split_free_page() to page_isolation.c as
> well?  I sort of like the idea of making it static and removing from
> header file.
> 

I see no problem with that. It'll be separate from split_page() but that
is not earth shattering.

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH 6/9] drivers: add Contiguous Memory Allocator
  2011-10-18 13:43     ` Mel Gorman
  (?)
@ 2011-11-04 10:41       ` Marek Szyprowski
  -1 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-11-04 10:41 UTC (permalink / raw)
  To: 'Mel Gorman'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Ankita Garg', 'Daniel Walker',
	'Arnd Bergmann', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong', 'Dave Hansen'

Hello,

On Tuesday, October 18, 2011 3:43 PM Mel Gorman wrote:

> On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:
> > The Contiguous Memory Allocator is a set of helper functions for DMA
> > mapping framework that improves allocations of contiguous memory chunks.
> >
> > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > gives back to the system. Kernel is allowed to allocate movable pages
> > within CMA's managed memory so that it can be used for example for page
> > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > request such pages are migrated out of CMA area to free required
> > contiguous block and fulfill the request. This allows to allocate large
> > contiguous chunks of memory at any time assuming that there is enough
> > free memory available in the system.
> >
> > This code is heavily based on earlier works by Michal Nazarewicz.
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> > CC: Michal Nazarewicz <mina86@mina86.com>
> > ---
> >  arch/Kconfig                         |    3 +
> >  drivers/base/Kconfig                 |   79 +++++++
> >  drivers/base/Makefile                |    1 +
> >  drivers/base/dma-contiguous.c        |  386 ++++++++++++++++++++++++++++++++++
> >  include/asm-generic/dma-contiguous.h |   27 +++
> >  include/linux/device.h               |    4 +
> >  include/linux/dma-contiguous.h       |  106 ++++++++++
> >  7 files changed, 606 insertions(+), 0 deletions(-)
> >  create mode 100644 drivers/base/dma-contiguous.c
> >  create mode 100644 include/asm-generic/dma-contiguous.h
> >  create mode 100644 include/linux/dma-contiguous.h
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 4b0669c..a3b39a2 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
> >  config HAVE_DMA_ATTRS
> >  	bool
> >
> > +config HAVE_DMA_CONTIGUOUS
> > +	bool
> > +
> >  config USE_GENERIC_SMP_HELPERS
> >  	bool
> >
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 21cf46f..a5e6d75 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -174,4 +174,83 @@ config SYS_HYPERVISOR
> >
> >  source "drivers/base/regmap/Kconfig"
> >
> > +config CMA
> > +	bool "Contiguous Memory Allocator (EXPERIMENTAL)"
> > +	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL
> > +	select MIGRATION
> > +	select CMA_MIGRATE_TYPE
> > +	help
> > +	  This enables the Contiguous Memory Allocator which allows drivers
> > +	  to allocate big physically-contiguous blocks of memory for use with
> > +	  hardware components that do not support I/O map nor scatter-gather.
> > +
> > +	  For more information see <include/linux/dma-contiguous.h>.
> > +	  If unsure, say "n".
> > +
> > +if CMA
> > +
> > +config CMA_DEBUG
> > +	bool "CMA debug messages (DEVELOPEMENT)"
> 
> s/DEVELOPEMENT/DEVELOPMENT/
> 
> Should it be under DEBUG_KERNEL?
> 
> > +	help
> > +	  Turns on debug messages in CMA.  This produces KERN_DEBUG
> > +	  messages for every CMA call as well as various messages while
> > +	  processing calls such as dma_alloc_from_contiguous().
> > +	  This option does not affect warning and error messages.
> > +
> > +comment "Default contiguous memory area size:"
> > +
> > +config CMA_SIZE_ABSOLUTE
> > +	int "Absolute size (in MiB)"
> > +	depends on !CMA_SIZE_SEL_PERCENTAGE
> > +	default 16
> > +	help
> > +	  Defines the size (in MiB) of the default memory area for Contiguous
> > +	  Memory Allocator.
> > +
> > +config CMA_SIZE_PERCENTAGE
> > +	int "Percentage of total memory"
> > +	depends on !CMA_SIZE_SEL_ABSOLUTE
> > +	default 10
> > +	help
> > +	  Defines the size of the default memory area for Contiguous Memory
> > +	  Allocator as a percentage of the total memory in the system.
> > +
> 
> Why is this not a kernel parameter rather than a config option?

There is also a kernel parameter for CMA area size which overrides the value
from .config.
 
> Better yet, why do drivers not register how much CMA memory they are
> interested in and then the drive core figure out if it can allocate that
> much or not?

CMA area is reserved very early during boot process, even before the buddy 
allocator gets initialized. That time no device driver has been probed yet.
Such early reservation is required to be sure that enough contiguous memory
can be gathered and to perform some MMU related fixups that are required
on ARM to avoid page aliasing for dma_alloc_coherent() memory.

> > +choice
> > +	prompt "Selected region size"
> > +	default CMA_SIZE_SEL_ABSOLUTE
> > +
> > +config CMA_SIZE_SEL_ABSOLUTE
> > +	bool "Use absolute value only"
> > +
> > +config CMA_SIZE_SEL_PERCENTAGE
> > +	bool "Use percentage value only"
> > +
> > +config CMA_SIZE_SEL_MIN
> > +	bool "Use lower value (minimum)"
> > +
> > +config CMA_SIZE_SEL_MAX
> > +	bool "Use higher value (maximum)"
> > +
> > +endchoice
> > +
> > +config CMA_ALIGNMENT
> > +	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
> > +	range 4 9
> > +	default 8
> > +	help
> > +	  DMA mapping framework by default aligns all buffers to the smallest
> > +	  PAGE_SIZE order which is greater than or equal to the requested buffer
> > +	  size. This works well for buffers up to a few hundreds kilobytes, but
> > +	  for larger buffers it just a memory waste. With this parameter you can
> > +	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
> > +	  buffers will be aligned only to this specified order. The order is
> > +	  expressed as a power of two multiplied by the PAGE_SIZE.
> > +
> > +	  For example, if your system defaults to 4KiB pages, the order value
> > +	  of 8 means that the buffers will be aligned up to 1MiB only.
> > +
> > +	  If unsure, leave the default value "8".
> > +
> > +endif
> > +
> >  endmenu
> > diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> > index 99a375a..794546f 100644
> > --- a/drivers/base/Makefile
> > +++ b/drivers/base/Makefile
> > @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
> >  			   cpu.o firmware.o init.o map.o devres.o \
> >  			   attribute_container.o transport_class.o
> >  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> > +obj-$(CONFIG_CMA) += dma-contiguous.o
> >  obj-y			+= power/
> >  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
> >  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> > new file mode 100644
> > index 0000000..e54bb76
> > --- /dev/null
> > +++ b/drivers/base/dma-contiguous.c
> > @@ -0,0 +1,386 @@
> > +/*
> > + * Contiguous Memory Allocator for DMA mapping framework
> > + * Copyright (c) 2010-2011 by Samsung Electronics.
> > + * Written by:
> > + *	Marek Szyprowski <m.szyprowski@samsung.com>
> > + *	Michal Nazarewicz <mina86@mina86.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation; either version 2 of the
> > + * License or (at your optional) any later version of the license.
> > + */
> > +
> > +#define pr_fmt(fmt) "cma: " fmt
> > +
> > +#ifdef CONFIG_CMA_DEBUG
> > +#ifndef DEBUG
> > +#  define DEBUG
> > +#endif
> > +#endif
> > +
> > +#include <asm/page.h>
> > +#include <asm/dma-contiguous.h>
> > +
> > +#include <linux/memblock.h>
> > +#include <linux/err.h>
> > +#include <linux/mm.h>
> > +#include <linux/mutex.h>
> > +#include <linux/page-isolation.h>
> > +#include <linux/slab.h>
> > +#include <linux/swap.h>
> > +#include <linux/mm_types.h>
> > +#include <linux/dma-contiguous.h>
> > +
> > +#ifndef SZ_1M
> > +#define SZ_1M (1 << 20)
> > +#endif
> > +
> > +#ifdef phys_to_pfn
> > +/* nothing to do */
> > +#elif defined __phys_to_pfn
> > +#  define phys_to_pfn __phys_to_pfn
> > +#elif defined __va
> > +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> > +#else
> > +#  error phys_to_pfn implementation needed
> > +#endif
> > +
> 
> Parts of this are assuming that there is a linear mapping of virtual to
> physical memory. I think this is always the case but it looks like
> something that should be defined in asm-generic with an option for
> architectures to override.
> 
> > +struct cma {
> > +	unsigned long	base_pfn;
> > +	unsigned long	count;
> > +	unsigned long	*bitmap;
> > +};
> > +
> > +struct cma *dma_contiguous_default_area;
> > +
> > +#ifndef CONFIG_CMA_SIZE_ABSOLUTE
> > +#define CONFIG_CMA_SIZE_ABSOLUTE 0
> > +#endif
> > +
> > +#ifndef CONFIG_CMA_SIZE_PERCENTAGE
> > +#define CONFIG_CMA_SIZE_PERCENTAGE 0
> > +#endif
> > +
> > +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
> 
> SIZE_ABSOLUTE is an odd name. It can't be a negative size. size_bytes
> maybe.
> 
> > +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> > +static long size_cmdline = -1;
> > +
> > +static int __init early_cma(char *p)
> > +{
> > +	pr_debug("%s(%s)\n", __func__, p);
> > +	size_cmdline = memparse(p, &p);
> > +	return 0;
> > +}
> > +early_param("cma", early_cma);
> > +
> > +static unsigned long __init __cma_early_get_total_pages(void)
> > +{
> > +	struct memblock_region *reg;
> > +	unsigned long total_pages = 0;
> > +
> > +	/*
> > +	 * We cannot use memblock_phys_mem_size() here, because
> > +	 * memblock_analyze() has not been called yet.
> > +	 */
> > +	for_each_memblock(memory, reg)
> > +		total_pages += memblock_region_memory_end_pfn(reg) -
> > +			       memblock_region_memory_base_pfn(reg);
> > +	return total_pages;
> > +}
> > +
> 
> Is this being called too early yet? What prevents you seeing up the CMA
> regions after the page allocator is brought up for example? I understand
> that there is a need for the memory to be coherent so maybe that is the
> obstacle.

Right now we assume that CMA areas can be created only during early boot with 
memblock allocator. The code that converts memory on-fly into CMA region
can be added later (if required).
 
> > +/**
> > + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> > + *
> > + * This funtion reserves memory from early allocator. It should be
> > + * called by arch specific code once the early allocator (memblock or bootmem)
> > + * has been activated and all other subsystems have already allocated/reserved
> > + * memory.
> > + */
> > +void __init dma_contiguous_reserve(phys_addr_t limit)
> > +{
> > +	unsigned long selected_size = 0;
> > +	unsigned long total_pages;
> > +
> > +	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
> > +
> > +	total_pages = __cma_early_get_total_pages();
> > +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> > +
> > +	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld
> MiB\n",
> > +		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
> > +		size_abs / SZ_1M, size_percent / SZ_1M);
> > +
> > +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> > +	selected_size = size_abs;
> > +#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
> > +	selected_size = size_percent;
> > +#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
> > +	selected_size = min(size_abs, size_percent);
> > +#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
> > +	selected_size = max(size_abs, size_percent);
> > +#endif
> > +
> 
> It seems very strange to do this at Kconfig time instead of via kernel
> parameters.
> 
> > +	if (size_cmdline != -1)
> > +		selected_size = size_cmdline;
> > +
> > +	if (!selected_size)
> > +		return;
> > +
> > +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> > +		 selected_size / SZ_1M);
> > +
> > +	dma_declare_contiguous(NULL, selected_size, 0, limit);
> > +};
> > +
> > +static DEFINE_MUTEX(cma_mutex);
> > +
> > +static void __cma_activate_area(unsigned long base_pfn, unsigned long count)
> > +{
> > +	unsigned long pfn = base_pfn;
> > +	unsigned i = count >> pageblock_order;
> > +	struct zone *zone;
> > +
> > +	VM_BUG_ON(!pfn_valid(pfn));
> 
> Again, VM_BUG_ON is an extreme reaction. WARN_ON_ONCE, return an error
> code and fail gracefully.
> 
> > +	zone = page_zone(pfn_to_page(pfn));
> > +
> > +	do {
> > +		unsigned j;
> > +		base_pfn = pfn;
> > +		for (j = pageblock_nr_pages; j; --j, pfn++) {
> 
> This is correct but does not look like any other PFN walker. There are
> plenty of examples of where we walk PFN ranges. There is no requirement
> to use the same pattern but it does make reviewing easier.
> 
> > +			VM_BUG_ON(!pfn_valid(pfn));
> > +			VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
> > +		}
> 
> In the field, this is a no-op as I would assume CONFIG_DEBUG_VM is not
> set. This should be checked unconditionally and fail gracefully if necessary.
> 
> > +		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
> > +	} while (--i);
> > +}
> > +
> > +static struct cma *__cma_create_area(unsigned long base_pfn,
> > +				     unsigned long count)
> > +{
> > +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> > +	struct cma *cma;
> > +
> > +	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
> > +
> > +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> > +	if (!cma)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	cma->base_pfn = base_pfn;
> > +	cma->count = count;
> > +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> > +
> > +	if (!cma->bitmap)
> > +		goto no_mem;
> > +
> > +	__cma_activate_area(base_pfn, count);
> > +
> > +	pr_debug("%s: returned %p\n", __func__, (void *)cma);
> > +	return cma;
> > +
> > +no_mem:
> > +	kfree(cma);
> > +	return ERR_PTR(-ENOMEM);
> > +}
> > +
> > +static struct cma_reserved {
> > +	phys_addr_t start;
> > +	unsigned long size;
> > +	struct device *dev;
> > +} cma_reserved[MAX_CMA_AREAS] __initdata;
> > +static unsigned cma_reserved_count __initdata;
> > +
> > +static int __init __cma_init_reserved_areas(void)
> > +{
> > +	struct cma_reserved *r = cma_reserved;
> > +	unsigned i = cma_reserved_count;
> > +
> > +	pr_debug("%s()\n", __func__);
> > +
> > +	for (; i; --i, ++r) {
> > +		struct cma *cma;
> > +		cma = __cma_create_area(phys_to_pfn(r->start),
> > +					r->size >> PAGE_SHIFT);
> > +		if (!IS_ERR(cma)) {
> > +			if (r->dev)
> > +				set_dev_cma_area(r->dev, cma);
> > +			else
> > +				dma_contiguous_default_area = cma;
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +core_initcall(__cma_init_reserved_areas);
> > +
> > +/**
> > + * dma_declare_contiguous() - reserve area for contiguous memory handling
> > + *			      for particular device
> > + * @dev:   Pointer to device structure.
> > + * @size:  Size of the reserved memory.
> > + * @start: Start address of the reserved memory (optional, 0 for any).
> > + * @limit: End address of the reserved memory (optional, 0 for any).
> > + *
> > + * This funtion reserves memory for specified device. It should be
> > + * called by board specific code when early allocator (memblock or bootmem)
> > + * is still activate.
> > + */
> > +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> > +				  phys_addr_t base, phys_addr_t limit)
> > +{
> > +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> > +	unsigned long alignment;
> > +
> > +	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> > +		 (unsigned long)size, (unsigned long)base,
> > +		 (unsigned long)limit);
> > +
> > +	/* Sanity checks */
> > +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> > +		return -ENOSPC;
> > +
> > +	if (!size)
> > +		return -EINVAL;
> > +
> > +	/* Sanitise input arguments */
> > +	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
> > +	base = ALIGN(base, alignment);
> > +	size = ALIGN(size, alignment);
> > +	limit = ALIGN(limit, alignment);
> > +
> > +	/* Reserve memory */
> > +	if (base) {
> > +		if (memblock_is_region_reserved(base, size) ||
> > +		    memblock_reserve(base, size) < 0) {
> > +			base = -EBUSY;
> > +			goto err;
> > +		}
> > +	} else {
> > +		/*
> > +		 * Use __memblock_alloc_base() since
> > +		 * memblock_alloc_base() panic()s.
> > +		 */
> > +		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> > +		if (!addr) {
> > +			base = -ENOMEM;
> > +			goto err;
> > +		} else if (addr + size > ~(unsigned long)0) {
> > +			memblock_free(addr, size);
> > +			base = -EOVERFLOW;
> > +			goto err;
> > +		} else {
> > +			base = addr;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * Each reserved area must be initialised later, when more kernel
> > +	 * subsystems (like slab allocator) are available.
> > +	 */
> > +	r->start = base;
> > +	r->size = size;
> > +	r->dev = dev;
> > +	cma_reserved_count++;
> > +	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
> > +	       (unsigned long)base);
> > +
> > +	/*
> > +	 * Architecture specific contiguous memory fixup.
> > +	 */
> > +	dma_contiguous_early_fixup(base, size);
> > +	return 0;
> > +err:
> > +	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
> > +	return base;
> > +}
> > +
> > +/**
> > + * dma_alloc_from_contiguous() - allocate pages from contiguous area
> > + * @dev:   Pointer to device for which the allocation is performed.
> > + * @count: Requested number of pages.
> > + * @align: Requested alignment of pages (in PAGE_SIZE order).
> > + *
> > + * This funtion allocates memory buffer for specified device. It uses
> > + * device specific contiguous memory area if available or the default
> > + * global one. Requires architecture specific get_dev_cma_area() helper
> > + * function.
> > + */
> > +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> > +				       unsigned int align)
> > +{
> > +	struct cma *cma = get_dev_cma_area(dev);
> > +	unsigned long pfn, pageno;
> > +	int ret;
> > +
> > +	if (!cma)
> > +		return NULL;
> > +
> > +	if (align > CONFIG_CMA_ALIGNMENT)
> > +		align = CONFIG_CMA_ALIGNMENT;
> > +
> > +	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
> > +		 count, align);
> > +
> > +	if (!count)
> > +		return NULL;
> > +
> > +	mutex_lock(&cma_mutex);
> > +
> > +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
> > +					    (1 << align) - 1);
> > +	if (pageno >= cma->count) {
> > +		ret = -ENOMEM;
> > +		goto error;
> > +	}
> > +	bitmap_set(cma->bitmap, pageno, count);
> > +
> > +	pfn = cma->base_pfn + pageno;
> > +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
> > +	if (ret)
> > +		goto free;
> > +
> 
> If alloc_contig_range returns failure, the bitmap is still set. It will
> never be freed so now the area cannot be used for CMA allocations any
> more.

There is bitmap_clear() call just after free: label, so the bitmap is updated
correctly.

> > +	mutex_unlock(&cma_mutex);
> > +
> > +	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
> > +	return pfn_to_page(pfn);
> > +free:
> > +	bitmap_clear(cma->bitmap, pageno, count);
> > +error:
> > +	mutex_unlock(&cma_mutex);
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * dma_release_from_contiguous() - release allocated pages
> > + * @dev:   Pointer to device for which the pages were allocated.
> > + * @pages: Allocated pages.
> > + * @count: Number of allocated pages.
> > + *
> > + * This funtion releases memory allocated by dma_alloc_from_contiguous().
> > + * It return 0 when provided pages doen't belongs to contiguous area and
> > + * 1 on success.
> > + */
> > +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> > +				int count)
> > +{
> > +	struct cma *cma = get_dev_cma_area(dev);
> > +	unsigned long pfn;
> > +
> > +	if (!cma || !pages)
> > +		return 0;
> > +
> > +	pr_debug("%s(page %p)\n", __func__, (void *)pages);
> > +
> > +	pfn = page_to_pfn(pages);
> > +
> > +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
> > +		return 0;
> > +
> > +	mutex_lock(&cma_mutex);
> > +
> > +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> > +	free_contig_pages(pfn, count);
> > +
> > +	mutex_unlock(&cma_mutex);
> 
> It feels like the mutex could be a lot lighter here. If the bitmap is
> protected by a spinlock, it would only need to be held while the bitmap
> was being cleared. free the contig pages outside the spinlock and clear
> the bitmap afterwards.
> 
> It's not particularly important as the scalability of CMA is not
> something to be concerned with at this point.

This mutex also serializes cma allocations, so there is only one alloc_contig_range()
call processed at once. This is done to serialize isolation of page blocks that is
performed inside alloc_contig_range().

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-11-04 10:41       ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-11-04 10:41 UTC (permalink / raw)
  To: 'Mel Gorman'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Ankita Garg', 'Daniel Walker',
	'Arnd Bergmann', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong', 'Dave Hansen'

Hello,

On Tuesday, October 18, 2011 3:43 PM Mel Gorman wrote:

> On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:
> > The Contiguous Memory Allocator is a set of helper functions for DMA
> > mapping framework that improves allocations of contiguous memory chunks.
> >
> > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > gives back to the system. Kernel is allowed to allocate movable pages
> > within CMA's managed memory so that it can be used for example for page
> > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > request such pages are migrated out of CMA area to free required
> > contiguous block and fulfill the request. This allows to allocate large
> > contiguous chunks of memory at any time assuming that there is enough
> > free memory available in the system.
> >
> > This code is heavily based on earlier works by Michal Nazarewicz.
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> > CC: Michal Nazarewicz <mina86@mina86.com>
> > ---
> >  arch/Kconfig                         |    3 +
> >  drivers/base/Kconfig                 |   79 +++++++
> >  drivers/base/Makefile                |    1 +
> >  drivers/base/dma-contiguous.c        |  386 ++++++++++++++++++++++++++++++++++
> >  include/asm-generic/dma-contiguous.h |   27 +++
> >  include/linux/device.h               |    4 +
> >  include/linux/dma-contiguous.h       |  106 ++++++++++
> >  7 files changed, 606 insertions(+), 0 deletions(-)
> >  create mode 100644 drivers/base/dma-contiguous.c
> >  create mode 100644 include/asm-generic/dma-contiguous.h
> >  create mode 100644 include/linux/dma-contiguous.h
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 4b0669c..a3b39a2 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
> >  config HAVE_DMA_ATTRS
> >  	bool
> >
> > +config HAVE_DMA_CONTIGUOUS
> > +	bool
> > +
> >  config USE_GENERIC_SMP_HELPERS
> >  	bool
> >
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 21cf46f..a5e6d75 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -174,4 +174,83 @@ config SYS_HYPERVISOR
> >
> >  source "drivers/base/regmap/Kconfig"
> >
> > +config CMA
> > +	bool "Contiguous Memory Allocator (EXPERIMENTAL)"
> > +	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL
> > +	select MIGRATION
> > +	select CMA_MIGRATE_TYPE
> > +	help
> > +	  This enables the Contiguous Memory Allocator which allows drivers
> > +	  to allocate big physically-contiguous blocks of memory for use with
> > +	  hardware components that do not support I/O map nor scatter-gather.
> > +
> > +	  For more information see <include/linux/dma-contiguous.h>.
> > +	  If unsure, say "n".
> > +
> > +if CMA
> > +
> > +config CMA_DEBUG
> > +	bool "CMA debug messages (DEVELOPEMENT)"
> 
> s/DEVELOPEMENT/DEVELOPMENT/
> 
> Should it be under DEBUG_KERNEL?
> 
> > +	help
> > +	  Turns on debug messages in CMA.  This produces KERN_DEBUG
> > +	  messages for every CMA call as well as various messages while
> > +	  processing calls such as dma_alloc_from_contiguous().
> > +	  This option does not affect warning and error messages.
> > +
> > +comment "Default contiguous memory area size:"
> > +
> > +config CMA_SIZE_ABSOLUTE
> > +	int "Absolute size (in MiB)"
> > +	depends on !CMA_SIZE_SEL_PERCENTAGE
> > +	default 16
> > +	help
> > +	  Defines the size (in MiB) of the default memory area for Contiguous
> > +	  Memory Allocator.
> > +
> > +config CMA_SIZE_PERCENTAGE
> > +	int "Percentage of total memory"
> > +	depends on !CMA_SIZE_SEL_ABSOLUTE
> > +	default 10
> > +	help
> > +	  Defines the size of the default memory area for Contiguous Memory
> > +	  Allocator as a percentage of the total memory in the system.
> > +
> 
> Why is this not a kernel parameter rather than a config option?

There is also a kernel parameter for CMA area size which overrides the value
from .config.
 
> Better yet, why do drivers not register how much CMA memory they are
> interested in and then the drive core figure out if it can allocate that
> much or not?

CMA area is reserved very early during boot process, even before the buddy 
allocator gets initialized. That time no device driver has been probed yet.
Such early reservation is required to be sure that enough contiguous memory
can be gathered and to perform some MMU related fixups that are required
on ARM to avoid page aliasing for dma_alloc_coherent() memory.

> > +choice
> > +	prompt "Selected region size"
> > +	default CMA_SIZE_SEL_ABSOLUTE
> > +
> > +config CMA_SIZE_SEL_ABSOLUTE
> > +	bool "Use absolute value only"
> > +
> > +config CMA_SIZE_SEL_PERCENTAGE
> > +	bool "Use percentage value only"
> > +
> > +config CMA_SIZE_SEL_MIN
> > +	bool "Use lower value (minimum)"
> > +
> > +config CMA_SIZE_SEL_MAX
> > +	bool "Use higher value (maximum)"
> > +
> > +endchoice
> > +
> > +config CMA_ALIGNMENT
> > +	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
> > +	range 4 9
> > +	default 8
> > +	help
> > +	  DMA mapping framework by default aligns all buffers to the smallest
> > +	  PAGE_SIZE order which is greater than or equal to the requested buffer
> > +	  size. This works well for buffers up to a few hundreds kilobytes, but
> > +	  for larger buffers it just a memory waste. With this parameter you can
> > +	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
> > +	  buffers will be aligned only to this specified order. The order is
> > +	  expressed as a power of two multiplied by the PAGE_SIZE.
> > +
> > +	  For example, if your system defaults to 4KiB pages, the order value
> > +	  of 8 means that the buffers will be aligned up to 1MiB only.
> > +
> > +	  If unsure, leave the default value "8".
> > +
> > +endif
> > +
> >  endmenu
> > diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> > index 99a375a..794546f 100644
> > --- a/drivers/base/Makefile
> > +++ b/drivers/base/Makefile
> > @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
> >  			   cpu.o firmware.o init.o map.o devres.o \
> >  			   attribute_container.o transport_class.o
> >  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> > +obj-$(CONFIG_CMA) += dma-contiguous.o
> >  obj-y			+= power/
> >  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
> >  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> > new file mode 100644
> > index 0000000..e54bb76
> > --- /dev/null
> > +++ b/drivers/base/dma-contiguous.c
> > @@ -0,0 +1,386 @@
> > +/*
> > + * Contiguous Memory Allocator for DMA mapping framework
> > + * Copyright (c) 2010-2011 by Samsung Electronics.
> > + * Written by:
> > + *	Marek Szyprowski <m.szyprowski@samsung.com>
> > + *	Michal Nazarewicz <mina86@mina86.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation; either version 2 of the
> > + * License or (at your optional) any later version of the license.
> > + */
> > +
> > +#define pr_fmt(fmt) "cma: " fmt
> > +
> > +#ifdef CONFIG_CMA_DEBUG
> > +#ifndef DEBUG
> > +#  define DEBUG
> > +#endif
> > +#endif
> > +
> > +#include <asm/page.h>
> > +#include <asm/dma-contiguous.h>
> > +
> > +#include <linux/memblock.h>
> > +#include <linux/err.h>
> > +#include <linux/mm.h>
> > +#include <linux/mutex.h>
> > +#include <linux/page-isolation.h>
> > +#include <linux/slab.h>
> > +#include <linux/swap.h>
> > +#include <linux/mm_types.h>
> > +#include <linux/dma-contiguous.h>
> > +
> > +#ifndef SZ_1M
> > +#define SZ_1M (1 << 20)
> > +#endif
> > +
> > +#ifdef phys_to_pfn
> > +/* nothing to do */
> > +#elif defined __phys_to_pfn
> > +#  define phys_to_pfn __phys_to_pfn
> > +#elif defined __va
> > +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> > +#else
> > +#  error phys_to_pfn implementation needed
> > +#endif
> > +
> 
> Parts of this are assuming that there is a linear mapping of virtual to
> physical memory. I think this is always the case but it looks like
> something that should be defined in asm-generic with an option for
> architectures to override.
> 
> > +struct cma {
> > +	unsigned long	base_pfn;
> > +	unsigned long	count;
> > +	unsigned long	*bitmap;
> > +};
> > +
> > +struct cma *dma_contiguous_default_area;
> > +
> > +#ifndef CONFIG_CMA_SIZE_ABSOLUTE
> > +#define CONFIG_CMA_SIZE_ABSOLUTE 0
> > +#endif
> > +
> > +#ifndef CONFIG_CMA_SIZE_PERCENTAGE
> > +#define CONFIG_CMA_SIZE_PERCENTAGE 0
> > +#endif
> > +
> > +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
> 
> SIZE_ABSOLUTE is an odd name. It can't be a negative size. size_bytes
> maybe.
> 
> > +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> > +static long size_cmdline = -1;
> > +
> > +static int __init early_cma(char *p)
> > +{
> > +	pr_debug("%s(%s)\n", __func__, p);
> > +	size_cmdline = memparse(p, &p);
> > +	return 0;
> > +}
> > +early_param("cma", early_cma);
> > +
> > +static unsigned long __init __cma_early_get_total_pages(void)
> > +{
> > +	struct memblock_region *reg;
> > +	unsigned long total_pages = 0;
> > +
> > +	/*
> > +	 * We cannot use memblock_phys_mem_size() here, because
> > +	 * memblock_analyze() has not been called yet.
> > +	 */
> > +	for_each_memblock(memory, reg)
> > +		total_pages += memblock_region_memory_end_pfn(reg) -
> > +			       memblock_region_memory_base_pfn(reg);
> > +	return total_pages;
> > +}
> > +
> 
> Is this being called too early yet? What prevents you seeing up the CMA
> regions after the page allocator is brought up for example? I understand
> that there is a need for the memory to be coherent so maybe that is the
> obstacle.

Right now we assume that CMA areas can be created only during early boot with 
memblock allocator. The code that converts memory on-fly into CMA region
can be added later (if required).
 
> > +/**
> > + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> > + *
> > + * This funtion reserves memory from early allocator. It should be
> > + * called by arch specific code once the early allocator (memblock or bootmem)
> > + * has been activated and all other subsystems have already allocated/reserved
> > + * memory.
> > + */
> > +void __init dma_contiguous_reserve(phys_addr_t limit)
> > +{
> > +	unsigned long selected_size = 0;
> > +	unsigned long total_pages;
> > +
> > +	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
> > +
> > +	total_pages = __cma_early_get_total_pages();
> > +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> > +
> > +	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld
> MiB\n",
> > +		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
> > +		size_abs / SZ_1M, size_percent / SZ_1M);
> > +
> > +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> > +	selected_size = size_abs;
> > +#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
> > +	selected_size = size_percent;
> > +#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
> > +	selected_size = min(size_abs, size_percent);
> > +#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
> > +	selected_size = max(size_abs, size_percent);
> > +#endif
> > +
> 
> It seems very strange to do this at Kconfig time instead of via kernel
> parameters.
> 
> > +	if (size_cmdline != -1)
> > +		selected_size = size_cmdline;
> > +
> > +	if (!selected_size)
> > +		return;
> > +
> > +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> > +		 selected_size / SZ_1M);
> > +
> > +	dma_declare_contiguous(NULL, selected_size, 0, limit);
> > +};
> > +
> > +static DEFINE_MUTEX(cma_mutex);
> > +
> > +static void __cma_activate_area(unsigned long base_pfn, unsigned long count)
> > +{
> > +	unsigned long pfn = base_pfn;
> > +	unsigned i = count >> pageblock_order;
> > +	struct zone *zone;
> > +
> > +	VM_BUG_ON(!pfn_valid(pfn));
> 
> Again, VM_BUG_ON is an extreme reaction. WARN_ON_ONCE, return an error
> code and fail gracefully.
> 
> > +	zone = page_zone(pfn_to_page(pfn));
> > +
> > +	do {
> > +		unsigned j;
> > +		base_pfn = pfn;
> > +		for (j = pageblock_nr_pages; j; --j, pfn++) {
> 
> This is correct but does not look like any other PFN walker. There are
> plenty of examples of where we walk PFN ranges. There is no requirement
> to use the same pattern but it does make reviewing easier.
> 
> > +			VM_BUG_ON(!pfn_valid(pfn));
> > +			VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
> > +		}
> 
> In the field, this is a no-op as I would assume CONFIG_DEBUG_VM is not
> set. This should be checked unconditionally and fail gracefully if necessary.
> 
> > +		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
> > +	} while (--i);
> > +}
> > +
> > +static struct cma *__cma_create_area(unsigned long base_pfn,
> > +				     unsigned long count)
> > +{
> > +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> > +	struct cma *cma;
> > +
> > +	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
> > +
> > +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> > +	if (!cma)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	cma->base_pfn = base_pfn;
> > +	cma->count = count;
> > +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> > +
> > +	if (!cma->bitmap)
> > +		goto no_mem;
> > +
> > +	__cma_activate_area(base_pfn, count);
> > +
> > +	pr_debug("%s: returned %p\n", __func__, (void *)cma);
> > +	return cma;
> > +
> > +no_mem:
> > +	kfree(cma);
> > +	return ERR_PTR(-ENOMEM);
> > +}
> > +
> > +static struct cma_reserved {
> > +	phys_addr_t start;
> > +	unsigned long size;
> > +	struct device *dev;
> > +} cma_reserved[MAX_CMA_AREAS] __initdata;
> > +static unsigned cma_reserved_count __initdata;
> > +
> > +static int __init __cma_init_reserved_areas(void)
> > +{
> > +	struct cma_reserved *r = cma_reserved;
> > +	unsigned i = cma_reserved_count;
> > +
> > +	pr_debug("%s()\n", __func__);
> > +
> > +	for (; i; --i, ++r) {
> > +		struct cma *cma;
> > +		cma = __cma_create_area(phys_to_pfn(r->start),
> > +					r->size >> PAGE_SHIFT);
> > +		if (!IS_ERR(cma)) {
> > +			if (r->dev)
> > +				set_dev_cma_area(r->dev, cma);
> > +			else
> > +				dma_contiguous_default_area = cma;
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +core_initcall(__cma_init_reserved_areas);
> > +
> > +/**
> > + * dma_declare_contiguous() - reserve area for contiguous memory handling
> > + *			      for particular device
> > + * @dev:   Pointer to device structure.
> > + * @size:  Size of the reserved memory.
> > + * @start: Start address of the reserved memory (optional, 0 for any).
> > + * @limit: End address of the reserved memory (optional, 0 for any).
> > + *
> > + * This funtion reserves memory for specified device. It should be
> > + * called by board specific code when early allocator (memblock or bootmem)
> > + * is still activate.
> > + */
> > +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> > +				  phys_addr_t base, phys_addr_t limit)
> > +{
> > +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> > +	unsigned long alignment;
> > +
> > +	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> > +		 (unsigned long)size, (unsigned long)base,
> > +		 (unsigned long)limit);
> > +
> > +	/* Sanity checks */
> > +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> > +		return -ENOSPC;
> > +
> > +	if (!size)
> > +		return -EINVAL;
> > +
> > +	/* Sanitise input arguments */
> > +	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
> > +	base = ALIGN(base, alignment);
> > +	size = ALIGN(size, alignment);
> > +	limit = ALIGN(limit, alignment);
> > +
> > +	/* Reserve memory */
> > +	if (base) {
> > +		if (memblock_is_region_reserved(base, size) ||
> > +		    memblock_reserve(base, size) < 0) {
> > +			base = -EBUSY;
> > +			goto err;
> > +		}
> > +	} else {
> > +		/*
> > +		 * Use __memblock_alloc_base() since
> > +		 * memblock_alloc_base() panic()s.
> > +		 */
> > +		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> > +		if (!addr) {
> > +			base = -ENOMEM;
> > +			goto err;
> > +		} else if (addr + size > ~(unsigned long)0) {
> > +			memblock_free(addr, size);
> > +			base = -EOVERFLOW;
> > +			goto err;
> > +		} else {
> > +			base = addr;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * Each reserved area must be initialised later, when more kernel
> > +	 * subsystems (like slab allocator) are available.
> > +	 */
> > +	r->start = base;
> > +	r->size = size;
> > +	r->dev = dev;
> > +	cma_reserved_count++;
> > +	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
> > +	       (unsigned long)base);
> > +
> > +	/*
> > +	 * Architecture specific contiguous memory fixup.
> > +	 */
> > +	dma_contiguous_early_fixup(base, size);
> > +	return 0;
> > +err:
> > +	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
> > +	return base;
> > +}
> > +
> > +/**
> > + * dma_alloc_from_contiguous() - allocate pages from contiguous area
> > + * @dev:   Pointer to device for which the allocation is performed.
> > + * @count: Requested number of pages.
> > + * @align: Requested alignment of pages (in PAGE_SIZE order).
> > + *
> > + * This funtion allocates memory buffer for specified device. It uses
> > + * device specific contiguous memory area if available or the default
> > + * global one. Requires architecture specific get_dev_cma_area() helper
> > + * function.
> > + */
> > +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> > +				       unsigned int align)
> > +{
> > +	struct cma *cma = get_dev_cma_area(dev);
> > +	unsigned long pfn, pageno;
> > +	int ret;
> > +
> > +	if (!cma)
> > +		return NULL;
> > +
> > +	if (align > CONFIG_CMA_ALIGNMENT)
> > +		align = CONFIG_CMA_ALIGNMENT;
> > +
> > +	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
> > +		 count, align);
> > +
> > +	if (!count)
> > +		return NULL;
> > +
> > +	mutex_lock(&cma_mutex);
> > +
> > +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
> > +					    (1 << align) - 1);
> > +	if (pageno >= cma->count) {
> > +		ret = -ENOMEM;
> > +		goto error;
> > +	}
> > +	bitmap_set(cma->bitmap, pageno, count);
> > +
> > +	pfn = cma->base_pfn + pageno;
> > +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
> > +	if (ret)
> > +		goto free;
> > +
> 
> If alloc_contig_range returns failure, the bitmap is still set. It will
> never be freed so now the area cannot be used for CMA allocations any
> more.

There is bitmap_clear() call just after free: label, so the bitmap is updated
correctly.

> > +	mutex_unlock(&cma_mutex);
> > +
> > +	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
> > +	return pfn_to_page(pfn);
> > +free:
> > +	bitmap_clear(cma->bitmap, pageno, count);
> > +error:
> > +	mutex_unlock(&cma_mutex);
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * dma_release_from_contiguous() - release allocated pages
> > + * @dev:   Pointer to device for which the pages were allocated.
> > + * @pages: Allocated pages.
> > + * @count: Number of allocated pages.
> > + *
> > + * This funtion releases memory allocated by dma_alloc_from_contiguous().
> > + * It return 0 when provided pages doen't belongs to contiguous area and
> > + * 1 on success.
> > + */
> > +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> > +				int count)
> > +{
> > +	struct cma *cma = get_dev_cma_area(dev);
> > +	unsigned long pfn;
> > +
> > +	if (!cma || !pages)
> > +		return 0;
> > +
> > +	pr_debug("%s(page %p)\n", __func__, (void *)pages);
> > +
> > +	pfn = page_to_pfn(pages);
> > +
> > +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
> > +		return 0;
> > +
> > +	mutex_lock(&cma_mutex);
> > +
> > +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> > +	free_contig_pages(pfn, count);
> > +
> > +	mutex_unlock(&cma_mutex);
> 
> It feels like the mutex could be a lot lighter here. If the bitmap is
> protected by a spinlock, it would only need to be held while the bitmap
> was being cleared. free the contig pages outside the spinlock and clear
> the bitmap afterwards.
> 
> It's not particularly important as the scalability of CMA is not
> something to be concerned with at this point.

This mutex also serializes cma allocations, so there is only one alloc_contig_range()
call processed at once. This is done to serialize isolation of page blocks that is
performed inside alloc_contig_range().

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-11-04 10:41       ` Marek Szyprowski
  0 siblings, 0 replies; 180+ messages in thread
From: Marek Szyprowski @ 2011-11-04 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Tuesday, October 18, 2011 3:43 PM Mel Gorman wrote:

> On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:
> > The Contiguous Memory Allocator is a set of helper functions for DMA
> > mapping framework that improves allocations of contiguous memory chunks.
> >
> > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > gives back to the system. Kernel is allowed to allocate movable pages
> > within CMA's managed memory so that it can be used for example for page
> > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > request such pages are migrated out of CMA area to free required
> > contiguous block and fulfill the request. This allows to allocate large
> > contiguous chunks of memory at any time assuming that there is enough
> > free memory available in the system.
> >
> > This code is heavily based on earlier works by Michal Nazarewicz.
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> > CC: Michal Nazarewicz <mina86@mina86.com>
> > ---
> >  arch/Kconfig                         |    3 +
> >  drivers/base/Kconfig                 |   79 +++++++
> >  drivers/base/Makefile                |    1 +
> >  drivers/base/dma-contiguous.c        |  386 ++++++++++++++++++++++++++++++++++
> >  include/asm-generic/dma-contiguous.h |   27 +++
> >  include/linux/device.h               |    4 +
> >  include/linux/dma-contiguous.h       |  106 ++++++++++
> >  7 files changed, 606 insertions(+), 0 deletions(-)
> >  create mode 100644 drivers/base/dma-contiguous.c
> >  create mode 100644 include/asm-generic/dma-contiguous.h
> >  create mode 100644 include/linux/dma-contiguous.h
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 4b0669c..a3b39a2 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
> >  config HAVE_DMA_ATTRS
> >  	bool
> >
> > +config HAVE_DMA_CONTIGUOUS
> > +	bool
> > +
> >  config USE_GENERIC_SMP_HELPERS
> >  	bool
> >
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 21cf46f..a5e6d75 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -174,4 +174,83 @@ config SYS_HYPERVISOR
> >
> >  source "drivers/base/regmap/Kconfig"
> >
> > +config CMA
> > +	bool "Contiguous Memory Allocator (EXPERIMENTAL)"
> > +	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL
> > +	select MIGRATION
> > +	select CMA_MIGRATE_TYPE
> > +	help
> > +	  This enables the Contiguous Memory Allocator which allows drivers
> > +	  to allocate big physically-contiguous blocks of memory for use with
> > +	  hardware components that do not support I/O map nor scatter-gather.
> > +
> > +	  For more information see <include/linux/dma-contiguous.h>.
> > +	  If unsure, say "n".
> > +
> > +if CMA
> > +
> > +config CMA_DEBUG
> > +	bool "CMA debug messages (DEVELOPEMENT)"
> 
> s/DEVELOPEMENT/DEVELOPMENT/
> 
> Should it be under DEBUG_KERNEL?
> 
> > +	help
> > +	  Turns on debug messages in CMA.  This produces KERN_DEBUG
> > +	  messages for every CMA call as well as various messages while
> > +	  processing calls such as dma_alloc_from_contiguous().
> > +	  This option does not affect warning and error messages.
> > +
> > +comment "Default contiguous memory area size:"
> > +
> > +config CMA_SIZE_ABSOLUTE
> > +	int "Absolute size (in MiB)"
> > +	depends on !CMA_SIZE_SEL_PERCENTAGE
> > +	default 16
> > +	help
> > +	  Defines the size (in MiB) of the default memory area for Contiguous
> > +	  Memory Allocator.
> > +
> > +config CMA_SIZE_PERCENTAGE
> > +	int "Percentage of total memory"
> > +	depends on !CMA_SIZE_SEL_ABSOLUTE
> > +	default 10
> > +	help
> > +	  Defines the size of the default memory area for Contiguous Memory
> > +	  Allocator as a percentage of the total memory in the system.
> > +
> 
> Why is this not a kernel parameter rather than a config option?

There is also a kernel parameter for CMA area size which overrides the value
from .config.
 
> Better yet, why do drivers not register how much CMA memory they are
> interested in and then the drive core figure out if it can allocate that
> much or not?

CMA area is reserved very early during boot process, even before the buddy 
allocator gets initialized. That time no device driver has been probed yet.
Such early reservation is required to be sure that enough contiguous memory
can be gathered and to perform some MMU related fixups that are required
on ARM to avoid page aliasing for dma_alloc_coherent() memory.

> > +choice
> > +	prompt "Selected region size"
> > +	default CMA_SIZE_SEL_ABSOLUTE
> > +
> > +config CMA_SIZE_SEL_ABSOLUTE
> > +	bool "Use absolute value only"
> > +
> > +config CMA_SIZE_SEL_PERCENTAGE
> > +	bool "Use percentage value only"
> > +
> > +config CMA_SIZE_SEL_MIN
> > +	bool "Use lower value (minimum)"
> > +
> > +config CMA_SIZE_SEL_MAX
> > +	bool "Use higher value (maximum)"
> > +
> > +endchoice
> > +
> > +config CMA_ALIGNMENT
> > +	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
> > +	range 4 9
> > +	default 8
> > +	help
> > +	  DMA mapping framework by default aligns all buffers to the smallest
> > +	  PAGE_SIZE order which is greater than or equal to the requested buffer
> > +	  size. This works well for buffers up to a few hundreds kilobytes, but
> > +	  for larger buffers it just a memory waste. With this parameter you can
> > +	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
> > +	  buffers will be aligned only to this specified order. The order is
> > +	  expressed as a power of two multiplied by the PAGE_SIZE.
> > +
> > +	  For example, if your system defaults to 4KiB pages, the order value
> > +	  of 8 means that the buffers will be aligned up to 1MiB only.
> > +
> > +	  If unsure, leave the default value "8".
> > +
> > +endif
> > +
> >  endmenu
> > diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> > index 99a375a..794546f 100644
> > --- a/drivers/base/Makefile
> > +++ b/drivers/base/Makefile
> > @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
> >  			   cpu.o firmware.o init.o map.o devres.o \
> >  			   attribute_container.o transport_class.o
> >  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> > +obj-$(CONFIG_CMA) += dma-contiguous.o
> >  obj-y			+= power/
> >  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
> >  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> > new file mode 100644
> > index 0000000..e54bb76
> > --- /dev/null
> > +++ b/drivers/base/dma-contiguous.c
> > @@ -0,0 +1,386 @@
> > +/*
> > + * Contiguous Memory Allocator for DMA mapping framework
> > + * Copyright (c) 2010-2011 by Samsung Electronics.
> > + * Written by:
> > + *	Marek Szyprowski <m.szyprowski@samsung.com>
> > + *	Michal Nazarewicz <mina86@mina86.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation; either version 2 of the
> > + * License or (at your optional) any later version of the license.
> > + */
> > +
> > +#define pr_fmt(fmt) "cma: " fmt
> > +
> > +#ifdef CONFIG_CMA_DEBUG
> > +#ifndef DEBUG
> > +#  define DEBUG
> > +#endif
> > +#endif
> > +
> > +#include <asm/page.h>
> > +#include <asm/dma-contiguous.h>
> > +
> > +#include <linux/memblock.h>
> > +#include <linux/err.h>
> > +#include <linux/mm.h>
> > +#include <linux/mutex.h>
> > +#include <linux/page-isolation.h>
> > +#include <linux/slab.h>
> > +#include <linux/swap.h>
> > +#include <linux/mm_types.h>
> > +#include <linux/dma-contiguous.h>
> > +
> > +#ifndef SZ_1M
> > +#define SZ_1M (1 << 20)
> > +#endif
> > +
> > +#ifdef phys_to_pfn
> > +/* nothing to do */
> > +#elif defined __phys_to_pfn
> > +#  define phys_to_pfn __phys_to_pfn
> > +#elif defined __va
> > +#  define phys_to_pfn(x) page_to_pfn(virt_to_page(__va(x)))
> > +#else
> > +#  error phys_to_pfn implementation needed
> > +#endif
> > +
> 
> Parts of this are assuming that there is a linear mapping of virtual to
> physical memory. I think this is always the case but it looks like
> something that should be defined in asm-generic with an option for
> architectures to override.
> 
> > +struct cma {
> > +	unsigned long	base_pfn;
> > +	unsigned long	count;
> > +	unsigned long	*bitmap;
> > +};
> > +
> > +struct cma *dma_contiguous_default_area;
> > +
> > +#ifndef CONFIG_CMA_SIZE_ABSOLUTE
> > +#define CONFIG_CMA_SIZE_ABSOLUTE 0
> > +#endif
> > +
> > +#ifndef CONFIG_CMA_SIZE_PERCENTAGE
> > +#define CONFIG_CMA_SIZE_PERCENTAGE 0
> > +#endif
> > +
> > +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
> 
> SIZE_ABSOLUTE is an odd name. It can't be a negative size. size_bytes
> maybe.
> 
> > +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> > +static long size_cmdline = -1;
> > +
> > +static int __init early_cma(char *p)
> > +{
> > +	pr_debug("%s(%s)\n", __func__, p);
> > +	size_cmdline = memparse(p, &p);
> > +	return 0;
> > +}
> > +early_param("cma", early_cma);
> > +
> > +static unsigned long __init __cma_early_get_total_pages(void)
> > +{
> > +	struct memblock_region *reg;
> > +	unsigned long total_pages = 0;
> > +
> > +	/*
> > +	 * We cannot use memblock_phys_mem_size() here, because
> > +	 * memblock_analyze() has not been called yet.
> > +	 */
> > +	for_each_memblock(memory, reg)
> > +		total_pages += memblock_region_memory_end_pfn(reg) -
> > +			       memblock_region_memory_base_pfn(reg);
> > +	return total_pages;
> > +}
> > +
> 
> Is this being called too early yet? What prevents you seeing up the CMA
> regions after the page allocator is brought up for example? I understand
> that there is a need for the memory to be coherent so maybe that is the
> obstacle.

Right now we assume that CMA areas can be created only during early boot with 
memblock allocator. The code that converts memory on-fly into CMA region
can be added later (if required).
 
> > +/**
> > + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> > + *
> > + * This funtion reserves memory from early allocator. It should be
> > + * called by arch specific code once the early allocator (memblock or bootmem)
> > + * has been activated and all other subsystems have already allocated/reserved
> > + * memory.
> > + */
> > +void __init dma_contiguous_reserve(phys_addr_t limit)
> > +{
> > +	unsigned long selected_size = 0;
> > +	unsigned long total_pages;
> > +
> > +	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
> > +
> > +	total_pages = __cma_early_get_total_pages();
> > +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> > +
> > +	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld
> MiB\n",
> > +		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
> > +		size_abs / SZ_1M, size_percent / SZ_1M);
> > +
> > +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> > +	selected_size = size_abs;
> > +#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
> > +	selected_size = size_percent;
> > +#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
> > +	selected_size = min(size_abs, size_percent);
> > +#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
> > +	selected_size = max(size_abs, size_percent);
> > +#endif
> > +
> 
> It seems very strange to do this at Kconfig time instead of via kernel
> parameters.
> 
> > +	if (size_cmdline != -1)
> > +		selected_size = size_cmdline;
> > +
> > +	if (!selected_size)
> > +		return;
> > +
> > +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> > +		 selected_size / SZ_1M);
> > +
> > +	dma_declare_contiguous(NULL, selected_size, 0, limit);
> > +};
> > +
> > +static DEFINE_MUTEX(cma_mutex);
> > +
> > +static void __cma_activate_area(unsigned long base_pfn, unsigned long count)
> > +{
> > +	unsigned long pfn = base_pfn;
> > +	unsigned i = count >> pageblock_order;
> > +	struct zone *zone;
> > +
> > +	VM_BUG_ON(!pfn_valid(pfn));
> 
> Again, VM_BUG_ON is an extreme reaction. WARN_ON_ONCE, return an error
> code and fail gracefully.
> 
> > +	zone = page_zone(pfn_to_page(pfn));
> > +
> > +	do {
> > +		unsigned j;
> > +		base_pfn = pfn;
> > +		for (j = pageblock_nr_pages; j; --j, pfn++) {
> 
> This is correct but does not look like any other PFN walker. There are
> plenty of examples of where we walk PFN ranges. There is no requirement
> to use the same pattern but it does make reviewing easier.
> 
> > +			VM_BUG_ON(!pfn_valid(pfn));
> > +			VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
> > +		}
> 
> In the field, this is a no-op as I would assume CONFIG_DEBUG_VM is not
> set. This should be checked unconditionally and fail gracefully if necessary.
> 
> > +		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
> > +	} while (--i);
> > +}
> > +
> > +static struct cma *__cma_create_area(unsigned long base_pfn,
> > +				     unsigned long count)
> > +{
> > +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> > +	struct cma *cma;
> > +
> > +	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
> > +
> > +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> > +	if (!cma)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	cma->base_pfn = base_pfn;
> > +	cma->count = count;
> > +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> > +
> > +	if (!cma->bitmap)
> > +		goto no_mem;
> > +
> > +	__cma_activate_area(base_pfn, count);
> > +
> > +	pr_debug("%s: returned %p\n", __func__, (void *)cma);
> > +	return cma;
> > +
> > +no_mem:
> > +	kfree(cma);
> > +	return ERR_PTR(-ENOMEM);
> > +}
> > +
> > +static struct cma_reserved {
> > +	phys_addr_t start;
> > +	unsigned long size;
> > +	struct device *dev;
> > +} cma_reserved[MAX_CMA_AREAS] __initdata;
> > +static unsigned cma_reserved_count __initdata;
> > +
> > +static int __init __cma_init_reserved_areas(void)
> > +{
> > +	struct cma_reserved *r = cma_reserved;
> > +	unsigned i = cma_reserved_count;
> > +
> > +	pr_debug("%s()\n", __func__);
> > +
> > +	for (; i; --i, ++r) {
> > +		struct cma *cma;
> > +		cma = __cma_create_area(phys_to_pfn(r->start),
> > +					r->size >> PAGE_SHIFT);
> > +		if (!IS_ERR(cma)) {
> > +			if (r->dev)
> > +				set_dev_cma_area(r->dev, cma);
> > +			else
> > +				dma_contiguous_default_area = cma;
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +core_initcall(__cma_init_reserved_areas);
> > +
> > +/**
> > + * dma_declare_contiguous() - reserve area for contiguous memory handling
> > + *			      for particular device
> > + * @dev:   Pointer to device structure.
> > + * @size:  Size of the reserved memory.
> > + * @start: Start address of the reserved memory (optional, 0 for any).
> > + * @limit: End address of the reserved memory (optional, 0 for any).
> > + *
> > + * This funtion reserves memory for specified device. It should be
> > + * called by board specific code when early allocator (memblock or bootmem)
> > + * is still activate.
> > + */
> > +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> > +				  phys_addr_t base, phys_addr_t limit)
> > +{
> > +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> > +	unsigned long alignment;
> > +
> > +	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> > +		 (unsigned long)size, (unsigned long)base,
> > +		 (unsigned long)limit);
> > +
> > +	/* Sanity checks */
> > +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> > +		return -ENOSPC;
> > +
> > +	if (!size)
> > +		return -EINVAL;
> > +
> > +	/* Sanitise input arguments */
> > +	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
> > +	base = ALIGN(base, alignment);
> > +	size = ALIGN(size, alignment);
> > +	limit = ALIGN(limit, alignment);
> > +
> > +	/* Reserve memory */
> > +	if (base) {
> > +		if (memblock_is_region_reserved(base, size) ||
> > +		    memblock_reserve(base, size) < 0) {
> > +			base = -EBUSY;
> > +			goto err;
> > +		}
> > +	} else {
> > +		/*
> > +		 * Use __memblock_alloc_base() since
> > +		 * memblock_alloc_base() panic()s.
> > +		 */
> > +		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> > +		if (!addr) {
> > +			base = -ENOMEM;
> > +			goto err;
> > +		} else if (addr + size > ~(unsigned long)0) {
> > +			memblock_free(addr, size);
> > +			base = -EOVERFLOW;
> > +			goto err;
> > +		} else {
> > +			base = addr;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * Each reserved area must be initialised later, when more kernel
> > +	 * subsystems (like slab allocator) are available.
> > +	 */
> > +	r->start = base;
> > +	r->size = size;
> > +	r->dev = dev;
> > +	cma_reserved_count++;
> > +	printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
> > +	       (unsigned long)base);
> > +
> > +	/*
> > +	 * Architecture specific contiguous memory fixup.
> > +	 */
> > +	dma_contiguous_early_fixup(base, size);
> > +	return 0;
> > +err:
> > +	printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M);
> > +	return base;
> > +}
> > +
> > +/**
> > + * dma_alloc_from_contiguous() - allocate pages from contiguous area
> > + * @dev:   Pointer to device for which the allocation is performed.
> > + * @count: Requested number of pages.
> > + * @align: Requested alignment of pages (in PAGE_SIZE order).
> > + *
> > + * This funtion allocates memory buffer for specified device. It uses
> > + * device specific contiguous memory area if available or the default
> > + * global one. Requires architecture specific get_dev_cma_area() helper
> > + * function.
> > + */
> > +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> > +				       unsigned int align)
> > +{
> > +	struct cma *cma = get_dev_cma_area(dev);
> > +	unsigned long pfn, pageno;
> > +	int ret;
> > +
> > +	if (!cma)
> > +		return NULL;
> > +
> > +	if (align > CONFIG_CMA_ALIGNMENT)
> > +		align = CONFIG_CMA_ALIGNMENT;
> > +
> > +	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
> > +		 count, align);
> > +
> > +	if (!count)
> > +		return NULL;
> > +
> > +	mutex_lock(&cma_mutex);
> > +
> > +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
> > +					    (1 << align) - 1);
> > +	if (pageno >= cma->count) {
> > +		ret = -ENOMEM;
> > +		goto error;
> > +	}
> > +	bitmap_set(cma->bitmap, pageno, count);
> > +
> > +	pfn = cma->base_pfn + pageno;
> > +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
> > +	if (ret)
> > +		goto free;
> > +
> 
> If alloc_contig_range returns failure, the bitmap is still set. It will
> never be freed so now the area cannot be used for CMA allocations any
> more.

There is bitmap_clear() call just after free: label, so the bitmap is updated
correctly.

> > +	mutex_unlock(&cma_mutex);
> > +
> > +	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
> > +	return pfn_to_page(pfn);
> > +free:
> > +	bitmap_clear(cma->bitmap, pageno, count);
> > +error:
> > +	mutex_unlock(&cma_mutex);
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * dma_release_from_contiguous() - release allocated pages
> > + * @dev:   Pointer to device for which the pages were allocated.
> > + * @pages: Allocated pages.
> > + * @count: Number of allocated pages.
> > + *
> > + * This funtion releases memory allocated by dma_alloc_from_contiguous().
> > + * It return 0 when provided pages doen't belongs to contiguous area and
> > + * 1 on success.
> > + */
> > +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> > +				int count)
> > +{
> > +	struct cma *cma = get_dev_cma_area(dev);
> > +	unsigned long pfn;
> > +
> > +	if (!cma || !pages)
> > +		return 0;
> > +
> > +	pr_debug("%s(page %p)\n", __func__, (void *)pages);
> > +
> > +	pfn = page_to_pfn(pages);
> > +
> > +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
> > +		return 0;
> > +
> > +	mutex_lock(&cma_mutex);
> > +
> > +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> > +	free_contig_pages(pfn, count);
> > +
> > +	mutex_unlock(&cma_mutex);
> 
> It feels like the mutex could be a lot lighter here. If the bitmap is
> protected by a spinlock, it would only need to be held while the bitmap
> was being cleared. free the contig pages outside the spinlock and clear
> the bitmap afterwards.
> 
> It's not particularly important as the scalability of CMA is not
> something to be concerned with at this point.

This mutex also serializes cma allocations, so there is only one alloc_contig_range()
call processed at once. This is done to serialize isolation of page blocks that is
performed inside alloc_contig_range().

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 180+ messages in thread

end of thread, other threads:[~2011-11-04 10:41 UTC | newest]

Thread overview: 180+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-06 13:54 [PATCHv16 0/9] Contiguous Memory Allocator Marek Szyprowski
2011-10-06 13:54 ` Marek Szyprowski
2011-10-06 13:54 ` Marek Szyprowski
2011-10-06 13:54 ` [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-14 23:23   ` Andrew Morton
2011-10-14 23:23     ` Andrew Morton
2011-10-14 23:23     ` Andrew Morton
2011-10-18 12:05   ` Mel Gorman
2011-10-18 12:05     ` Mel Gorman
2011-10-18 12:05     ` Mel Gorman
2011-10-06 13:54 ` [PATCH 2/9] mm: alloc_contig_freed_pages() added Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-14 23:29   ` Andrew Morton
2011-10-14 23:29     ` Andrew Morton
2011-10-14 23:29     ` Andrew Morton
2011-10-16  8:01     ` Michal Nazarewicz
2011-10-16  8:01       ` Michal Nazarewicz
2011-10-16  8:01       ` Michal Nazarewicz
2011-10-16  8:31       ` Andrew Morton
2011-10-16  8:31         ` Andrew Morton
2011-10-16  8:31         ` Andrew Morton
2011-10-16  9:39         ` Michal Nazarewicz
2011-10-16  9:39           ` Michal Nazarewicz
2011-10-16  9:39           ` Michal Nazarewicz
2011-10-17 12:21     ` Marek Szyprowski
2011-10-17 12:21       ` Marek Szyprowski
2011-10-17 12:21       ` Marek Szyprowski
2011-10-17 18:39       ` Andrew Morton
2011-10-17 18:39         ` Andrew Morton
2011-10-17 18:39         ` Andrew Morton
2011-10-18 12:21   ` Mel Gorman
2011-10-18 12:21     ` Mel Gorman
2011-10-18 12:21     ` Mel Gorman
2011-10-18 17:26     ` Michal Nazarewicz
2011-10-18 17:26       ` Michal Nazarewicz
2011-10-18 17:26       ` Michal Nazarewicz
2011-10-18 17:48       ` Dave Hansen
2011-10-18 17:48         ` Dave Hansen
2011-10-18 17:48         ` Dave Hansen
2011-10-18 18:00         ` Michal Nazarewicz
2011-10-18 18:00           ` Michal Nazarewicz
2011-10-18 18:00           ` Michal Nazarewicz
2011-10-21 10:06       ` Mel Gorman
2011-10-21 10:06         ` Mel Gorman
2011-10-21 10:06         ` Mel Gorman
2011-10-24  1:00         ` Michal Nazarewicz
2011-10-24  1:00           ` Michal Nazarewicz
2011-10-24  1:00           ` Michal Nazarewicz
2011-10-24  4:05     ` Michal Nazarewicz
2011-10-24  4:05     ` Michal Nazarewicz
2011-10-24  4:05     ` Michal Nazarewicz
2011-10-24  4:05       ` Michal Nazarewicz
2011-10-24  4:05       ` Michal Nazarewicz
2011-11-01 15:04       ` Mel Gorman
2011-11-01 15:04         ` Mel Gorman
2011-11-01 15:04         ` Mel Gorman
2011-11-01 18:06         ` Michal Nazarewicz
2011-11-01 18:06           ` Michal Nazarewicz
2011-11-01 18:06           ` Michal Nazarewicz
2011-11-01 18:47           ` Mel Gorman
2011-11-01 18:47             ` Mel Gorman
2011-11-01 18:47             ` Mel Gorman
2011-10-24  4:05     ` Michal Nazarewicz
2011-10-06 13:54 ` [PATCH 3/9] mm: alloc_contig_range() added Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-14 23:35   ` Andrew Morton
2011-10-14 23:35     ` Andrew Morton
2011-10-14 23:35     ` Andrew Morton
2011-10-18 12:38   ` Mel Gorman
2011-10-18 12:38     ` Mel Gorman
2011-10-18 12:38     ` Mel Gorman
2011-10-06 13:54 ` [PATCH 4/9] mm: MIGRATE_CMA migration type added Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-14 23:38   ` Andrew Morton
2011-10-14 23:38     ` Andrew Morton
2011-10-14 23:38     ` Andrew Morton
2011-10-18 13:08   ` Mel Gorman
2011-10-18 13:08     ` Mel Gorman
2011-10-18 13:08     ` Mel Gorman
2011-10-24 19:32     ` Michal Nazarewicz
2011-10-24 19:32       ` Michal Nazarewicz
2011-10-24 19:32       ` Michal Nazarewicz
2011-10-27  9:10       ` Michal Nazarewicz
2011-10-27  9:10         ` Michal Nazarewicz
2011-10-27  9:10         ` Michal Nazarewicz
2011-10-06 13:54 ` [PATCH 5/9] mm: MIGRATE_CMA isolation functions added Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54 ` [PATCH 6/9] drivers: add Contiguous Memory Allocator Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-14 23:57   ` Andrew Morton
2011-10-14 23:57     ` Andrew Morton
2011-10-14 23:57     ` Andrew Morton
2011-10-16 10:08     ` Russell King - ARM Linux
2011-10-16 10:08       ` Russell King - ARM Linux
2011-10-16 10:08       ` Russell King - ARM Linux
2011-10-18 13:43   ` Mel Gorman
2011-10-18 13:43     ` Mel Gorman
2011-10-18 13:43     ` Mel Gorman
2011-10-24 19:39     ` Michal Nazarewicz
2011-10-24 19:39       ` Michal Nazarewicz
2011-10-24 19:39       ` Michal Nazarewicz
2011-11-04 10:41     ` Marek Szyprowski
2011-11-04 10:41       ` Marek Szyprowski
2011-11-04 10:41       ` Marek Szyprowski
2011-10-06 13:54 ` [PATCH 7/7] ARM: integrate CMA with DMA-mapping subsystem Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 14:18   ` Marek Szyprowski
2011-10-06 14:18     ` Marek Szyprowski
2011-10-06 14:18     ` Marek Szyprowski
2011-10-15  0:03   ` Andrew Morton
2011-10-15  0:03     ` Andrew Morton
2011-10-15  0:03     ` Andrew Morton
2011-10-06 13:54 ` [PATCH 7/9] X86: " Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54 ` [PATCH 8/9] ARM: " Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-14  4:33   ` [Linaro-mm-sig] " Subash Patel
2011-10-14  4:33     ` Subash Patel
2011-10-14  4:33     ` Subash Patel
2011-10-14  9:14     ` Marek Szyprowski
2011-10-14  9:14       ` Marek Szyprowski
2011-10-14  9:14       ` Marek Szyprowski
2011-10-06 13:54 ` [PATCH 9/9] ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-06 13:54   ` Marek Szyprowski
2011-10-07 16:27 ` [PATCHv16 0/9] Contiguous Memory Allocator Arnd Bergmann
2011-10-07 16:27   ` Arnd Bergmann
2011-10-07 16:27   ` Arnd Bergmann
2011-10-10  6:58   ` [Linaro-mm-sig] " Ohad Ben-Cohen
2011-10-10  6:58     ` Ohad Ben-Cohen
2011-10-10  6:58     ` Ohad Ben-Cohen
2011-10-10 12:02     ` Clark, Rob
2011-10-10 12:02       ` Clark, Rob
2011-10-10 12:02       ` Clark, Rob
2011-10-10 22:56   ` Andrew Morton
2011-10-10 22:56     ` Andrew Morton
2011-10-10 22:56     ` Andrew Morton
2011-10-11  6:57     ` Marek Szyprowski
2011-10-11  6:57       ` Marek Szyprowski
2011-10-11  6:57       ` Marek Szyprowski
2011-10-11 13:52     ` Arnd Bergmann
2011-10-11 13:52       ` Arnd Bergmann
2011-10-11 13:52       ` Arnd Bergmann
2011-10-14 23:19       ` Andrew Morton
2011-10-14 23:19         ` Andrew Morton
2011-10-14 23:19         ` Andrew Morton
2011-10-15 14:24         ` Arnd Bergmann
2011-10-15 14:24           ` Arnd Bergmann
2011-10-15 14:24           ` Arnd Bergmann
2011-10-10 12:07 ` [Linaro-mm-sig] " Maxime Coquelin
2011-10-10 12:07   ` Maxime Coquelin
2011-10-10 12:07   ` Maxime Coquelin
2011-10-10 12:07   ` Maxime Coquelin
2011-10-11  7:17   ` Marek Szyprowski
2011-10-11  7:17     ` Marek Szyprowski
2011-10-11  7:17     ` Marek Szyprowski
2011-10-11  7:30     ` Maxime Coquelin
2011-10-11  7:30       ` Maxime Coquelin
2011-10-11 10:50       ` Marek Szyprowski
2011-10-11 10:50         ` Marek Szyprowski
2011-10-11 10:50         ` Marek Szyprowski
2011-10-11 11:25         ` Maxime Coquelin
2011-10-11 11:25           ` Maxime Coquelin
2011-10-11 13:05           ` Marek Szyprowski
2011-10-11 13:05             ` Marek Szyprowski
2011-10-11 13:05             ` Marek Szyprowski
2011-10-12 11:08       ` [PATCH] fixup: mm: alloc_contig_range: increase min_free_kbytes during allocation Marek Szyprowski
2011-10-12 11:08         ` Marek Szyprowski
2011-10-12 13:01         ` Maxime Coquelin
2011-10-12 13:01           ` Maxime Coquelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.