All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-05  7:41 ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

Hello everyone,

This is yet another round of Contiguous Memory Allocator patches. I hope
that I've managed to resolve all the items discussed during the Memory
Management summit at Linaro Meeting in Budapest and pointed later on
mailing lists. The goal is to integrate it as tight as possible with
other kernel subsystems (like memory management and dma-mapping) and
finally merge to mainline.

Previous version introduced integration with DMA-mapping subsystem for
ARM architecture. In this version I've cleaned up it even more and
prepared for easier integration on other than ARM architectures. I've
also rebased all the code onto latest v3.0-rc6 kernel.

A few words for these who see CMA for the first time:

   The Contiguous Memory Allocator (CMA) makes it possible for device
   drivers to allocate big contiguous chunks of memory after the system
   has booted. 

   The main difference from the similar frameworks is the fact that CMA
   allows to transparently reuse memory region reserved for the big
   chunk allocation as a system memory, so no memory is wasted when no
   big chunk is allocated. Once the alloc request is issued, the
   framework will migrate system pages to create a required big chunk of
   physically contiguous memory.

   For more information you can refer to nice LWN article: 
   http://lwn.net/Articles/447405/ and links to previous versions
   of CMA framework.

   The CMA framework has been initially developed by Michal Nazarewicz
   at Samsung Poland R&D Center. Since version 9, I've taken over the
   development, because Michal has left the company.

The current version of CMA is a set of helper functions for DMA mapping
framework that handles allocation of contiguous memory blocks. The
difference between this patchset and Kamezawa's alloc_contig_pages()
are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
   which may be unsuitable for embeded systems where a few MiBs are
   required.

   Lack of the requirement on the alignment means that several threads
   might try to access the same pageblock/page.  To prevent this from
   happening CMA uses a mutex so that only one allocating/releasing
   function may run at one point.

2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
   similarly to ZONE_MOVABLE but can be put in arbitrary places.

   This is required for us since we need to define two disjoint memory
   ranges inside system RAM.  (ie. in two memory banks (do not confuse
   with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
   migrated.  CMA on the other hand maintains its own allocator to
   decide where to allocate memory for device drivers and then tries
   to migrate pages from that part if needed.  This is not strictly
   required but I somehow feel it might be faster.

The integration with ARM DMA-mapping subsystem is quite straightforward.
Once cma context is available alloc_pages() can be replaced by
dma_alloc_from_contiguous() call.

Current version have been tested on Samsung S5PC110 based Goni machine
and s5p-fimc V4L2 driver. The driver itself uses videobuf2 dma-contig
memory allocator, which in turn relies on dma_alloc_coherent() from
DMA-mapping subsystem. By integrating CMA with DMA-mapping we managed to
get this driver working with CMA without any single change required in
the driver or videobuf2-dma-contig allocator.

TODO:
- resolve double-mapping issues with ARMv6+ and coherent memory

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


Links to previous versions of the patchset:
v10: <http://www.spinics.net/lists/linux-mm/msg20761.html>
 v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
 v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
 v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v5: (intentionally left out as CMA v5 was identical to CMA v4)
 v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
 v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
 v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
 v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v11:
    1. Removed genalloc usage and replaced it with direct calls to
       bitmap_* functions, dropped patches that are not needed
       anymore (genalloc extensions)

    2. Moved all contiguous area management code from mm/cma.c
       to drivers/base/dma-contiguous.c

    3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous

    4. Introduced global, system wide (default) contiguous area
       configured with kernel config and kernel cmdline parameters

    5. Simplified initialization to just one function:
       dma_declare_contiguous()

    6. Added example of device private memory contiguous area

v10:
    1. Rebased onto 3.0-rc2 and resolved all conflicts

    2. Simplified CMA to be just a pure memory allocator, for use
       with platfrom/bus specific subsystems, like dma-mapping.
       Removed all device specific functions are calls.

    3. Integrated with ARM DMA-mapping subsystem.

    4. Code cleanup here and there.

    5. Removed private context support.

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

    2. Fixed a bunch of nasty bugs that happened when the allocation
       failed (mainly kernel oops due to NULL ptr dereference).

    3. Introduced testing code: cma-regions compatibility layer and
       videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
       CMA and put in page_allocator.c.  This function tries to
       migrate all LRU pages in specified range and then allocate the
       range using alloc_contig_freed_pages().

    2. Support for MIGRATE_CMA has been separated from the CMA code.
       I have not tested if CMA works with ZONE_MOVABLE but I see no
       reasons why it shouldn't.

    3. I have added a @private argument when creating CMA contexts so
       that one can reserve memory and not share it with the rest of
       the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
       mapping has been removed from the patchset.  This is not to say
       that this code is not needed, it's just not worth posting
       everything in one patchset.

       Currently, CMA is "just" an allocator.  It uses it's own
       migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
       which behave just like ZONE_MOVABLE but dispite the latter can
       be put in arbitrary places.

    2. The migration code that was introduced in the previous version
       actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
       The implementation is not yet complete though.

       Migration support means that when CMA is not using memory
       reserved for it, page allocator can allocate pages from it.
       When CMA wants to use the memory, the pages have to be moved
       and/or evicted as to make room for CMA.

       To make it possible it must be guaranteed that only movable and
       reclaimable pages are allocated in CMA controlled regions.
       This is done by introducing a MIGRATE_CMA migrate type that
       guarantees exactly that.

       Some of the migration code is "borrowed" from Kamezawa
       Hiroyuki's alloc_contig_pages() implementation.  The main
       difference is that thanks to MIGRATE_CMA migrate type CMA
       assumes that memory controlled by CMA are is always movable or
       reclaimable so that it makes allocation decisions regardless of
       the whether some pages are actually allocated and migrates them
       if needed.

       The most interesting patches from the patchset that implement
       the functionality are:

         09/13: mm: alloc_contig_free_pages() added
         10/13: mm: MIGRATE_CMA migration type added
         11/13: mm: MIGRATE_CMA isolation functions added
         12/13: mm: cma: Migration support added [wip]

       Currently, kernel panics in some situations which I am trying
       to investigate.

    2. cma_pin() and cma_unpin() functions has been added (after
       a conversation with Johan Mossberg).  The idea is that whenever
       hardware does not use the memory (no transaction is on) the
       chunk can be moved around.  This would allow defragmentation to
       be implemented if desired.  No defragmentation algorithm is
       provided at this time.

    3. Sysfs support has been replaced with debugfs.  I always felt
       unsure about the sysfs interface and when Greg KH pointed it
       out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
       that platform will provide a "*=<regions>" rule in the map
       attribute.

    2. The terminology has been changed slightly renaming "kind" to
       "type" of memory.  In the previous revisions, the documentation
       indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
       a separate patch, the fourth one).  As a consequence, the
       cma_set_defaults() function has been changed -- it no longer
       accepts a string with list of regions but an array of regions.

    2. The "asterisk" attribute has been removed.  Now, each region
       has an "asterisk" flag which lets one specify whether this
       region should by considered "asterisk" region.

    3. SysFS support has been moved to a separate patch (the third one
       in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed.  In exchange,
       a SysFS entry has been created under kernel/mm/contiguous.

       The intended way of specifying the attributes is
       a cma_set_defaults() function called by platform initialisation
       code.  "regions" attribute (the string specified by "cma"
       command line parameter) can be overwritten with command line
       parameter; the other attributes can be changed during run-time
       using the SysFS entries.

    2. The behaviour of the "map" attribute has been modified
       slightly.  Currently, if no rule matches given device it is
       assigned regions specified by the "asterisk" attribute.  It is
       by default built from the region names given in "regions"
       attribute.

    3. Devices can register private regions as well as regions that
       can be shared but are not reserved using standard CMA
       mechanisms.  A private region has no name and can be accessed
       only by devices that have the pointer to it.

    4. The way allocators are registered has changed.  Currently,
       a cma_allocator_register() function is used for that purpose.
       Moreover, allocators are attached to regions the first time
       memory is registered from the region or when allocator is
       registered which means that allocators can be dynamic modules
       that are loaded after the kernel booted (of course, it won't be
       possible to allocate a chunk of memory from a region if
       allocator is not loaded).

    5. Index of new functions:

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions, size_t size,
    +               dma_addr_t alignment)

    +static inline int
    +cma_info_about(struct cma_info *info, const const char *regions)

    +int __must_check cma_region_register(struct cma_region *reg);

    +dma_addr_t __must_check
    +cma_alloc_from_region(struct cma_region *reg,
    +                      size_t size, dma_addr_t alignment);

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions,
    +               size_t size, dma_addr_t alignment);

    +int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

    Code "stolen" from Kamezawa.  The first patch just moves code
    around and the second provide function for "allocates" already
    freed memory.

  mm: alloc_contig_range() added

    This is what Kamezawa asked: a function that tries to migrate all
    pages from given range and then use alloc_contig_freed_pages()
    (defined by the previous commit) to allocate those pages. 

  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

    Introduction of the new migratetype and support for it in CMA.
    MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
    memory range can be marked as one.

  mm: cma: Contiguous Memory Allocator added

    The code CMA code. Manages CMA contexts and performs memory
    allocations.

  ARM: integrate CMA with dma-mapping subsystem

    Main client of CMA frame work. CMA serves as a alloc_pages()
    replacement.

  ARM: S5PV210: example of CMA private area for FIMC device on Goni board

    Example of platform/board specific code that creates cma
    context and assigns it to particular device.


Patch summary:

KAMEZAWA Hiroyuki (2):
  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

Marek Szyprowski (3):
  drivers: add Contiguous Memory Allocator
  ARM: integrate CMA with dma-mapping subsystem
  ARM: S5PV210: example of CMA private area for FIMC device on Goni
    board

Michal Nazarewicz (3):
  mm: alloc_contig_range() added
  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

 arch/arm/Kconfig                   |    1 +
 arch/arm/include/asm/device.h      |    3 +
 arch/arm/include/asm/dma-mapping.h |   20 ++
 arch/arm/mach-s5pv210/Kconfig      |    1 +
 arch/arm/mach-s5pv210/mach-goni.c  |    7 +
 arch/arm/mm/dma-mapping.c          |   51 ++++--
 arch/arm/mm/init.c                 |    3 +
 drivers/base/Kconfig               |   77 ++++++++
 drivers/base/Makefile              |    1 +
 drivers/base/dma-contiguous.c      |  367 ++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h     |  104 ++++++++++
 include/linux/mmzone.h             |   43 ++++-
 include/linux/page-isolation.h     |   54 ++++--
 mm/Kconfig                         |    8 +-
 mm/compaction.c                    |   10 +
 mm/memory_hotplug.c                |  111 -----------
 mm/page_alloc.c                    |  293 ++++++++++++++++++++++++++---
 mm/page_isolation.c                |  130 ++++++++++++-
 18 files changed, 1112 insertions(+), 172 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

-- 
1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-05  7:41 ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

Hello everyone,

This is yet another round of Contiguous Memory Allocator patches. I hope
that I've managed to resolve all the items discussed during the Memory
Management summit at Linaro Meeting in Budapest and pointed later on
mailing lists. The goal is to integrate it as tight as possible with
other kernel subsystems (like memory management and dma-mapping) and
finally merge to mainline.

Previous version introduced integration with DMA-mapping subsystem for
ARM architecture. In this version I've cleaned up it even more and
prepared for easier integration on other than ARM architectures. I've
also rebased all the code onto latest v3.0-rc6 kernel.

A few words for these who see CMA for the first time:

   The Contiguous Memory Allocator (CMA) makes it possible for device
   drivers to allocate big contiguous chunks of memory after the system
   has booted. 

   The main difference from the similar frameworks is the fact that CMA
   allows to transparently reuse memory region reserved for the big
   chunk allocation as a system memory, so no memory is wasted when no
   big chunk is allocated. Once the alloc request is issued, the
   framework will migrate system pages to create a required big chunk of
   physically contiguous memory.

   For more information you can refer to nice LWN article: 
   http://lwn.net/Articles/447405/ and links to previous versions
   of CMA framework.

   The CMA framework has been initially developed by Michal Nazarewicz
   at Samsung Poland R&D Center. Since version 9, I've taken over the
   development, because Michal has left the company.

The current version of CMA is a set of helper functions for DMA mapping
framework that handles allocation of contiguous memory blocks. The
difference between this patchset and Kamezawa's alloc_contig_pages()
are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
   which may be unsuitable for embeded systems where a few MiBs are
   required.

   Lack of the requirement on the alignment means that several threads
   might try to access the same pageblock/page.  To prevent this from
   happening CMA uses a mutex so that only one allocating/releasing
   function may run at one point.

2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
   similarly to ZONE_MOVABLE but can be put in arbitrary places.

   This is required for us since we need to define two disjoint memory
   ranges inside system RAM.  (ie. in two memory banks (do not confuse
   with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
   migrated.  CMA on the other hand maintains its own allocator to
   decide where to allocate memory for device drivers and then tries
   to migrate pages from that part if needed.  This is not strictly
   required but I somehow feel it might be faster.

The integration with ARM DMA-mapping subsystem is quite straightforward.
Once cma context is available alloc_pages() can be replaced by
dma_alloc_from_contiguous() call.

Current version have been tested on Samsung S5PC110 based Goni machine
and s5p-fimc V4L2 driver. The driver itself uses videobuf2 dma-contig
memory allocator, which in turn relies on dma_alloc_coherent() from
DMA-mapping subsystem. By integrating CMA with DMA-mapping we managed to
get this driver working with CMA without any single change required in
the driver or videobuf2-dma-contig allocator.

TODO:
- resolve double-mapping issues with ARMv6+ and coherent memory

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


Links to previous versions of the patchset:
v10: <http://www.spinics.net/lists/linux-mm/msg20761.html>
 v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
 v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
 v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v5: (intentionally left out as CMA v5 was identical to CMA v4)
 v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
 v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
 v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
 v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v11:
    1. Removed genalloc usage and replaced it with direct calls to
       bitmap_* functions, dropped patches that are not needed
       anymore (genalloc extensions)

    2. Moved all contiguous area management code from mm/cma.c
       to drivers/base/dma-contiguous.c

    3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous

    4. Introduced global, system wide (default) contiguous area
       configured with kernel config and kernel cmdline parameters

    5. Simplified initialization to just one function:
       dma_declare_contiguous()

    6. Added example of device private memory contiguous area

v10:
    1. Rebased onto 3.0-rc2 and resolved all conflicts

    2. Simplified CMA to be just a pure memory allocator, for use
       with platfrom/bus specific subsystems, like dma-mapping.
       Removed all device specific functions are calls.

    3. Integrated with ARM DMA-mapping subsystem.

    4. Code cleanup here and there.

    5. Removed private context support.

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

    2. Fixed a bunch of nasty bugs that happened when the allocation
       failed (mainly kernel oops due to NULL ptr dereference).

    3. Introduced testing code: cma-regions compatibility layer and
       videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
       CMA and put in page_allocator.c.  This function tries to
       migrate all LRU pages in specified range and then allocate the
       range using alloc_contig_freed_pages().

    2. Support for MIGRATE_CMA has been separated from the CMA code.
       I have not tested if CMA works with ZONE_MOVABLE but I see no
       reasons why it shouldn't.

    3. I have added a @private argument when creating CMA contexts so
       that one can reserve memory and not share it with the rest of
       the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
       mapping has been removed from the patchset.  This is not to say
       that this code is not needed, it's just not worth posting
       everything in one patchset.

       Currently, CMA is "just" an allocator.  It uses it's own
       migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
       which behave just like ZONE_MOVABLE but dispite the latter can
       be put in arbitrary places.

    2. The migration code that was introduced in the previous version
       actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
       The implementation is not yet complete though.

       Migration support means that when CMA is not using memory
       reserved for it, page allocator can allocate pages from it.
       When CMA wants to use the memory, the pages have to be moved
       and/or evicted as to make room for CMA.

       To make it possible it must be guaranteed that only movable and
       reclaimable pages are allocated in CMA controlled regions.
       This is done by introducing a MIGRATE_CMA migrate type that
       guarantees exactly that.

       Some of the migration code is "borrowed" from Kamezawa
       Hiroyuki's alloc_contig_pages() implementation.  The main
       difference is that thanks to MIGRATE_CMA migrate type CMA
       assumes that memory controlled by CMA are is always movable or
       reclaimable so that it makes allocation decisions regardless of
       the whether some pages are actually allocated and migrates them
       if needed.

       The most interesting patches from the patchset that implement
       the functionality are:

         09/13: mm: alloc_contig_free_pages() added
         10/13: mm: MIGRATE_CMA migration type added
         11/13: mm: MIGRATE_CMA isolation functions added
         12/13: mm: cma: Migration support added [wip]

       Currently, kernel panics in some situations which I am trying
       to investigate.

    2. cma_pin() and cma_unpin() functions has been added (after
       a conversation with Johan Mossberg).  The idea is that whenever
       hardware does not use the memory (no transaction is on) the
       chunk can be moved around.  This would allow defragmentation to
       be implemented if desired.  No defragmentation algorithm is
       provided at this time.

    3. Sysfs support has been replaced with debugfs.  I always felt
       unsure about the sysfs interface and when Greg KH pointed it
       out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
       that platform will provide a "*=<regions>" rule in the map
       attribute.

    2. The terminology has been changed slightly renaming "kind" to
       "type" of memory.  In the previous revisions, the documentation
       indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
       a separate patch, the fourth one).  As a consequence, the
       cma_set_defaults() function has been changed -- it no longer
       accepts a string with list of regions but an array of regions.

    2. The "asterisk" attribute has been removed.  Now, each region
       has an "asterisk" flag which lets one specify whether this
       region should by considered "asterisk" region.

    3. SysFS support has been moved to a separate patch (the third one
       in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed.  In exchange,
       a SysFS entry has been created under kernel/mm/contiguous.

       The intended way of specifying the attributes is
       a cma_set_defaults() function called by platform initialisation
       code.  "regions" attribute (the string specified by "cma"
       command line parameter) can be overwritten with command line
       parameter; the other attributes can be changed during run-time
       using the SysFS entries.

    2. The behaviour of the "map" attribute has been modified
       slightly.  Currently, if no rule matches given device it is
       assigned regions specified by the "asterisk" attribute.  It is
       by default built from the region names given in "regions"
       attribute.

    3. Devices can register private regions as well as regions that
       can be shared but are not reserved using standard CMA
       mechanisms.  A private region has no name and can be accessed
       only by devices that have the pointer to it.

    4. The way allocators are registered has changed.  Currently,
       a cma_allocator_register() function is used for that purpose.
       Moreover, allocators are attached to regions the first time
       memory is registered from the region or when allocator is
       registered which means that allocators can be dynamic modules
       that are loaded after the kernel booted (of course, it won't be
       possible to allocate a chunk of memory from a region if
       allocator is not loaded).

    5. Index of new functions:

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions, size_t size,
    +               dma_addr_t alignment)

    +static inline int
    +cma_info_about(struct cma_info *info, const const char *regions)

    +int __must_check cma_region_register(struct cma_region *reg);

    +dma_addr_t __must_check
    +cma_alloc_from_region(struct cma_region *reg,
    +                      size_t size, dma_addr_t alignment);

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions,
    +               size_t size, dma_addr_t alignment);

    +int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

    Code "stolen" from Kamezawa.  The first patch just moves code
    around and the second provide function for "allocates" already
    freed memory.

  mm: alloc_contig_range() added

    This is what Kamezawa asked: a function that tries to migrate all
    pages from given range and then use alloc_contig_freed_pages()
    (defined by the previous commit) to allocate those pages. 

  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

    Introduction of the new migratetype and support for it in CMA.
    MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
    memory range can be marked as one.

  mm: cma: Contiguous Memory Allocator added

    The code CMA code. Manages CMA contexts and performs memory
    allocations.

  ARM: integrate CMA with dma-mapping subsystem

    Main client of CMA frame work. CMA serves as a alloc_pages()
    replacement.

  ARM: S5PV210: example of CMA private area for FIMC device on Goni board

    Example of platform/board specific code that creates cma
    context and assigns it to particular device.


Patch summary:

KAMEZAWA Hiroyuki (2):
  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

Marek Szyprowski (3):
  drivers: add Contiguous Memory Allocator
  ARM: integrate CMA with dma-mapping subsystem
  ARM: S5PV210: example of CMA private area for FIMC device on Goni
    board

Michal Nazarewicz (3):
  mm: alloc_contig_range() added
  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

 arch/arm/Kconfig                   |    1 +
 arch/arm/include/asm/device.h      |    3 +
 arch/arm/include/asm/dma-mapping.h |   20 ++
 arch/arm/mach-s5pv210/Kconfig      |    1 +
 arch/arm/mach-s5pv210/mach-goni.c  |    7 +
 arch/arm/mm/dma-mapping.c          |   51 ++++--
 arch/arm/mm/init.c                 |    3 +
 drivers/base/Kconfig               |   77 ++++++++
 drivers/base/Makefile              |    1 +
 drivers/base/dma-contiguous.c      |  367 ++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h     |  104 ++++++++++
 include/linux/mmzone.h             |   43 ++++-
 include/linux/page-isolation.h     |   54 ++++--
 mm/Kconfig                         |    8 +-
 mm/compaction.c                    |   10 +
 mm/memory_hotplug.c                |  111 -----------
 mm/page_alloc.c                    |  293 ++++++++++++++++++++++++++---
 mm/page_isolation.c                |  130 ++++++++++++-
 18 files changed, 1112 insertions(+), 172 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-05  7:41 ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hello everyone,

This is yet another round of Contiguous Memory Allocator patches. I hope
that I've managed to resolve all the items discussed during the Memory
Management summit at Linaro Meeting in Budapest and pointed later on
mailing lists. The goal is to integrate it as tight as possible with
other kernel subsystems (like memory management and dma-mapping) and
finally merge to mainline.

Previous version introduced integration with DMA-mapping subsystem for
ARM architecture. In this version I've cleaned up it even more and
prepared for easier integration on other than ARM architectures. I've
also rebased all the code onto latest v3.0-rc6 kernel.

A few words for these who see CMA for the first time:

   The Contiguous Memory Allocator (CMA) makes it possible for device
   drivers to allocate big contiguous chunks of memory after the system
   has booted. 

   The main difference from the similar frameworks is the fact that CMA
   allows to transparently reuse memory region reserved for the big
   chunk allocation as a system memory, so no memory is wasted when no
   big chunk is allocated. Once the alloc request is issued, the
   framework will migrate system pages to create a required big chunk of
   physically contiguous memory.

   For more information you can refer to nice LWN article: 
   http://lwn.net/Articles/447405/ and links to previous versions
   of CMA framework.

   The CMA framework has been initially developed by Michal Nazarewicz
   at Samsung Poland R&D Center. Since version 9, I've taken over the
   development, because Michal has left the company.

The current version of CMA is a set of helper functions for DMA mapping
framework that handles allocation of contiguous memory blocks. The
difference between this patchset and Kamezawa's alloc_contig_pages()
are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
   which may be unsuitable for embeded systems where a few MiBs are
   required.

   Lack of the requirement on the alignment means that several threads
   might try to access the same pageblock/page.  To prevent this from
   happening CMA uses a mutex so that only one allocating/releasing
   function may run at one point.

2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
   similarly to ZONE_MOVABLE but can be put in arbitrary places.

   This is required for us since we need to define two disjoint memory
   ranges inside system RAM.  (ie. in two memory banks (do not confuse
   with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
   migrated.  CMA on the other hand maintains its own allocator to
   decide where to allocate memory for device drivers and then tries
   to migrate pages from that part if needed.  This is not strictly
   required but I somehow feel it might be faster.

The integration with ARM DMA-mapping subsystem is quite straightforward.
Once cma context is available alloc_pages() can be replaced by
dma_alloc_from_contiguous() call.

Current version have been tested on Samsung S5PC110 based Goni machine
and s5p-fimc V4L2 driver. The driver itself uses videobuf2 dma-contig
memory allocator, which in turn relies on dma_alloc_coherent() from
DMA-mapping subsystem. By integrating CMA with DMA-mapping we managed to
get this driver working with CMA without any single change required in
the driver or videobuf2-dma-contig allocator.

TODO:
- resolve double-mapping issues with ARMv6+ and coherent memory

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


Links to previous versions of the patchset:
v10: <http://www.spinics.net/lists/linux-mm/msg20761.html>
 v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
 v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
 v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v5: (intentionally left out as CMA v5 was identical to CMA v4)
 v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
 v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
 v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
 v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v11:
    1. Removed genalloc usage and replaced it with direct calls to
       bitmap_* functions, dropped patches that are not needed
       anymore (genalloc extensions)

    2. Moved all contiguous area management code from mm/cma.c
       to drivers/base/dma-contiguous.c

    3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous

    4. Introduced global, system wide (default) contiguous area
       configured with kernel config and kernel cmdline parameters

    5. Simplified initialization to just one function:
       dma_declare_contiguous()

    6. Added example of device private memory contiguous area

v10:
    1. Rebased onto 3.0-rc2 and resolved all conflicts

    2. Simplified CMA to be just a pure memory allocator, for use
       with platfrom/bus specific subsystems, like dma-mapping.
       Removed all device specific functions are calls.

    3. Integrated with ARM DMA-mapping subsystem.

    4. Code cleanup here and there.

    5. Removed private context support.

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

    2. Fixed a bunch of nasty bugs that happened when the allocation
       failed (mainly kernel oops due to NULL ptr dereference).

    3. Introduced testing code: cma-regions compatibility layer and
       videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
       CMA and put in page_allocator.c.  This function tries to
       migrate all LRU pages in specified range and then allocate the
       range using alloc_contig_freed_pages().

    2. Support for MIGRATE_CMA has been separated from the CMA code.
       I have not tested if CMA works with ZONE_MOVABLE but I see no
       reasons why it shouldn't.

    3. I have added a @private argument when creating CMA contexts so
       that one can reserve memory and not share it with the rest of
       the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
       mapping has been removed from the patchset.  This is not to say
       that this code is not needed, it's just not worth posting
       everything in one patchset.

       Currently, CMA is "just" an allocator.  It uses it's own
       migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
       which behave just like ZONE_MOVABLE but dispite the latter can
       be put in arbitrary places.

    2. The migration code that was introduced in the previous version
       actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
       The implementation is not yet complete though.

       Migration support means that when CMA is not using memory
       reserved for it, page allocator can allocate pages from it.
       When CMA wants to use the memory, the pages have to be moved
       and/or evicted as to make room for CMA.

       To make it possible it must be guaranteed that only movable and
       reclaimable pages are allocated in CMA controlled regions.
       This is done by introducing a MIGRATE_CMA migrate type that
       guarantees exactly that.

       Some of the migration code is "borrowed" from Kamezawa
       Hiroyuki's alloc_contig_pages() implementation.  The main
       difference is that thanks to MIGRATE_CMA migrate type CMA
       assumes that memory controlled by CMA are is always movable or
       reclaimable so that it makes allocation decisions regardless of
       the whether some pages are actually allocated and migrates them
       if needed.

       The most interesting patches from the patchset that implement
       the functionality are:

         09/13: mm: alloc_contig_free_pages() added
         10/13: mm: MIGRATE_CMA migration type added
         11/13: mm: MIGRATE_CMA isolation functions added
         12/13: mm: cma: Migration support added [wip]

       Currently, kernel panics in some situations which I am trying
       to investigate.

    2. cma_pin() and cma_unpin() functions has been added (after
       a conversation with Johan Mossberg).  The idea is that whenever
       hardware does not use the memory (no transaction is on) the
       chunk can be moved around.  This would allow defragmentation to
       be implemented if desired.  No defragmentation algorithm is
       provided at this time.

    3. Sysfs support has been replaced with debugfs.  I always felt
       unsure about the sysfs interface and when Greg KH pointed it
       out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
       that platform will provide a "*=<regions>" rule in the map
       attribute.

    2. The terminology has been changed slightly renaming "kind" to
       "type" of memory.  In the previous revisions, the documentation
       indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
       a separate patch, the fourth one).  As a consequence, the
       cma_set_defaults() function has been changed -- it no longer
       accepts a string with list of regions but an array of regions.

    2. The "asterisk" attribute has been removed.  Now, each region
       has an "asterisk" flag which lets one specify whether this
       region should by considered "asterisk" region.

    3. SysFS support has been moved to a separate patch (the third one
       in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed.  In exchange,
       a SysFS entry has been created under kernel/mm/contiguous.

       The intended way of specifying the attributes is
       a cma_set_defaults() function called by platform initialisation
       code.  "regions" attribute (the string specified by "cma"
       command line parameter) can be overwritten with command line
       parameter; the other attributes can be changed during run-time
       using the SysFS entries.

    2. The behaviour of the "map" attribute has been modified
       slightly.  Currently, if no rule matches given device it is
       assigned regions specified by the "asterisk" attribute.  It is
       by default built from the region names given in "regions"
       attribute.

    3. Devices can register private regions as well as regions that
       can be shared but are not reserved using standard CMA
       mechanisms.  A private region has no name and can be accessed
       only by devices that have the pointer to it.

    4. The way allocators are registered has changed.  Currently,
       a cma_allocator_register() function is used for that purpose.
       Moreover, allocators are attached to regions the first time
       memory is registered from the region or when allocator is
       registered which means that allocators can be dynamic modules
       that are loaded after the kernel booted (of course, it won't be
       possible to allocate a chunk of memory from a region if
       allocator is not loaded).

    5. Index of new functions:

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions, size_t size,
    +               dma_addr_t alignment)

    +static inline int
    +cma_info_about(struct cma_info *info, const const char *regions)

    +int __must_check cma_region_register(struct cma_region *reg);

    +dma_addr_t __must_check
    +cma_alloc_from_region(struct cma_region *reg,
    +                      size_t size, dma_addr_t alignment);

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions,
    +               size_t size, dma_addr_t alignment);

    +int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

    Code "stolen" from Kamezawa.  The first patch just moves code
    around and the second provide function for "allocates" already
    freed memory.

  mm: alloc_contig_range() added

    This is what Kamezawa asked: a function that tries to migrate all
    pages from given range and then use alloc_contig_freed_pages()
    (defined by the previous commit) to allocate those pages. 

  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

    Introduction of the new migratetype and support for it in CMA.
    MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
    memory range can be marked as one.

  mm: cma: Contiguous Memory Allocator added

    The code CMA code. Manages CMA contexts and performs memory
    allocations.

  ARM: integrate CMA with dma-mapping subsystem

    Main client of CMA frame work. CMA serves as a alloc_pages()
    replacement.

  ARM: S5PV210: example of CMA private area for FIMC device on Goni board

    Example of platform/board specific code that creates cma
    context and assigns it to particular device.


Patch summary:

KAMEZAWA Hiroyuki (2):
  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

Marek Szyprowski (3):
  drivers: add Contiguous Memory Allocator
  ARM: integrate CMA with dma-mapping subsystem
  ARM: S5PV210: example of CMA private area for FIMC device on Goni
    board

Michal Nazarewicz (3):
  mm: alloc_contig_range() added
  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

 arch/arm/Kconfig                   |    1 +
 arch/arm/include/asm/device.h      |    3 +
 arch/arm/include/asm/dma-mapping.h |   20 ++
 arch/arm/mach-s5pv210/Kconfig      |    1 +
 arch/arm/mach-s5pv210/mach-goni.c  |    7 +
 arch/arm/mm/dma-mapping.c          |   51 ++++--
 arch/arm/mm/init.c                 |    3 +
 drivers/base/Kconfig               |   77 ++++++++
 drivers/base/Makefile              |    1 +
 drivers/base/dma-contiguous.c      |  367 ++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h     |  104 ++++++++++
 include/linux/mmzone.h             |   43 ++++-
 include/linux/page-isolation.h     |   54 ++++--
 mm/Kconfig                         |    8 +-
 mm/compaction.c                    |   10 +
 mm/memory_hotplug.c                |  111 -----------
 mm/page_alloc.c                    |  293 ++++++++++++++++++++++++++---
 mm/page_isolation.c                |  130 ++++++++++++-
 18 files changed, 1112 insertions(+), 172 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

-- 
1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 1/8] mm: move some functions from memory_hotplug.c to page_isolation.c
  2011-07-05  7:41 ` Marek Szyprowski
  (?)
@ 2011-07-05  7:41   ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
[m.nazarewicz: reworded commit message]
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: rebased and updated to Linux v3.0-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |    7 +++
 mm/memory_hotplug.c            |  111 --------------------------------------
 mm/page_isolation.c            |  115 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 122 insertions(+), 111 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c46887b..c32ca23 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -645,117 +645,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!get_page_unless_zero(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			put_page(page);
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			put_page(page);
-			/* Because we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..15b41ec 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,115 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!get_page_unless_zero(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			put_page(page);
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			put_page(page);
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								true, true);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}
+
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 1/8] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
[m.nazarewicz: reworded commit message]
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: rebased and updated to Linux v3.0-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |    7 +++
 mm/memory_hotplug.c            |  111 --------------------------------------
 mm/page_isolation.c            |  115 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 122 insertions(+), 111 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c46887b..c32ca23 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -645,117 +645,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!get_page_unless_zero(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			put_page(page);
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			put_page(page);
-			/* Because we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..15b41ec 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,115 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!get_page_unless_zero(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			put_page(page);
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			put_page(page);
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								true, true);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}
+
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 1/8] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
[m.nazarewicz: reworded commit message]
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: rebased and updated to Linux v3.0-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |    7 +++
 mm/memory_hotplug.c            |  111 --------------------------------------
 mm/page_isolation.c            |  115 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 122 insertions(+), 111 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c46887b..c32ca23 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -645,117 +645,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!get_page_unless_zero(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			put_page(page);
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			put_page(page);
-			/* Because we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..15b41ec 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,115 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!get_page_unless_zero(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			put_page(page);
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			put_page(page);
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								true, true);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}
+
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 2/8] mm: alloc_contig_freed_pages() added
  2011-07-05  7:41 ` Marek Szyprowski
  (?)
@ 2011-07-05  7:41   ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range.  Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |    3 ++
 mm/page_alloc.c                |   44 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+					      unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
  * For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e8985a..00e9b24 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5600,6 +5600,50 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+				       gfp_t flag)
+{
+	unsigned long pfn = start, count;
+	struct page *page;
+	struct zone *zone;
+	int order;
+
+	VM_BUG_ON(!pfn_valid(start));
+	zone = page_zone(pfn_to_page(start));
+
+	spin_lock_irq(&zone->lock);
+
+	page = pfn_to_page(pfn);
+	for (;;) {
+		VM_BUG_ON(page_count(page) || !PageBuddy(page));
+		list_del(&page->lru);
+		order = page_order(page);
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+		pfn  += 1 << order;
+		if (pfn >= end)
+			break;
+		VM_BUG_ON(!pfn_valid(pfn));
+		page += 1 << order;
+	}
+
+	spin_unlock_irq(&zone->lock);
+
+	/* After this, pages in the range can be freed one be one */
+	page = pfn_to_page(start);
+	for (count = pfn - start; count; --count, ++page)
+		prep_new_page(page, 0, flag);
+
+	return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+	for (; nr_pages; --nr_pages, ++page)
+		__free_page(page);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 2/8] mm: alloc_contig_freed_pages() added
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range.  Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |    3 ++
 mm/page_alloc.c                |   44 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+					      unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
  * For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e8985a..00e9b24 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5600,6 +5600,50 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+				       gfp_t flag)
+{
+	unsigned long pfn = start, count;
+	struct page *page;
+	struct zone *zone;
+	int order;
+
+	VM_BUG_ON(!pfn_valid(start));
+	zone = page_zone(pfn_to_page(start));
+
+	spin_lock_irq(&zone->lock);
+
+	page = pfn_to_page(pfn);
+	for (;;) {
+		VM_BUG_ON(page_count(page) || !PageBuddy(page));
+		list_del(&page->lru);
+		order = page_order(page);
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+		pfn  += 1 << order;
+		if (pfn >= end)
+			break;
+		VM_BUG_ON(!pfn_valid(pfn));
+		page += 1 << order;
+	}
+
+	spin_unlock_irq(&zone->lock);
+
+	/* After this, pages in the range can be freed one be one */
+	page = pfn_to_page(start);
+	for (count = pfn - start; count; --count, ++page)
+		prep_new_page(page, 0, flag);
+
+	return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+	for (; nr_pages; --nr_pages, ++page)
+		__free_page(page);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 2/8] mm: alloc_contig_freed_pages() added
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range.  Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |    3 ++
 mm/page_alloc.c                |   44 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+					      unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
  * For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e8985a..00e9b24 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5600,6 +5600,50 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+				       gfp_t flag)
+{
+	unsigned long pfn = start, count;
+	struct page *page;
+	struct zone *zone;
+	int order;
+
+	VM_BUG_ON(!pfn_valid(start));
+	zone = page_zone(pfn_to_page(start));
+
+	spin_lock_irq(&zone->lock);
+
+	page = pfn_to_page(pfn);
+	for (;;) {
+		VM_BUG_ON(page_count(page) || !PageBuddy(page));
+		list_del(&page->lru);
+		order = page_order(page);
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+		pfn  += 1 << order;
+		if (pfn >= end)
+			break;
+		VM_BUG_ON(!pfn_valid(pfn));
+		page += 1 << order;
+	}
+
+	spin_unlock_irq(&zone->lock);
+
+	/* After this, pages in the range can be freed one be one */
+	page = pfn_to_page(start);
+	for (count = pfn - start; count; --count, ++page)
+		prep_new_page(page, 0, flag);
+
+	return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+	for (; nr_pages; --nr_pages, ++page)
+		__free_page(page);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 3/8] mm: alloc_contig_range() added
  2011-07-05  7:41 ` Marek Szyprowski
  (?)
@ 2011-07-05  7:41   ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit adds the alloc_contig_range() function which tries
to allecate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: renamed some variables for easier code reading]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |    2 +
 mm/page_alloc.c                |  144 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 146 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..c5d1a7c 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+			      gfp_t flags);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 00e9b24..2cea044 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5638,6 +5638,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
 	return pfn;
 }
 
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY	5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+	int migration_failed = 0, ret;
+	unsigned long pfn = start;
+
+	/*
+	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
+	 * __alloc_contig_pages().
+	 */
+
+	for (;;) {
+		pfn = scan_lru_pages(pfn, end);
+		if (!pfn || pfn >= end)
+			break;
+
+		ret = do_migrate_range(pfn, end);
+		if (!ret) {
+			migration_failed = 0;
+		} else if (ret != -EBUSY
+			|| ++migration_failed >= MIGRATION_RETRY) {
+			return ret;
+		} else {
+			/* There are unstable pages.on pagevec. */
+			lru_add_drain_all();
+			/*
+			 * there may be pages on pcplist before
+			 * we mark the range as ISOLATED.
+			 */
+			drain_all_pages();
+		}
+		cond_resched();
+	}
+
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+
+	/* Make sure all pages are isolated */
+	if (WARN_ON(test_pages_isolated(start, end)))
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ * @flags:	flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+		       gfp_t flags)
+{
+	unsigned long outer_start, outer_end;
+	int ret;
+
+	/*
+	 * What we do here is we mark all pageblocks in range as
+	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
+	 * align the range to MAX_ORDER pages so that page allocator
+	 * won't try to merge buddies from different pageblocks and
+	 * change MIGRATE_ISOLATE to some other migration type.
+	 *
+	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+	 * migrate the pages from an unaligned range (ie. pages that
+	 * we are interested in).  This will put all the pages in
+	 * range back to page allocator as MIGRATE_ISOLATE.
+	 *
+	 * When this is done, we take the pages in range from page
+	 * allocator removing them from the buddy system.  This way
+	 * page allocator will never consider using them.
+	 *
+	 * This lets us mark the pageblocks back as
+	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+	 * MAX_ORDER aligned range but not in the unaligned, original
+	 * range are put back to page allocator so that buddy can use
+	 * them.
+	 */
+
+	ret = start_isolate_page_range(pfn_to_maxpage(start),
+				       pfn_to_maxpage_up(end));
+	if (ret)
+		goto done;
+
+	ret = __alloc_contig_migrate_range(start, end);
+	if (ret)
+		goto done;
+
+	/*
+	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
+	 * more, all pages in [start, end) are free in page allocator.
+	 * What we are going to do is to allocate all pages from
+	 * [start, end) (that is remove them from page allocater).
+	 *
+	 * The only problem is that pages at the beginning and at the
+	 * end of interesting range may be not aligned with pages that
+	 * page allocator holds, ie. they can be part of higher order
+	 * pages.  Because of this, we reserve the bigger range and
+	 * once this is done free the pages we are not interested in.
+	 */
+
+	ret = 0;
+	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+		if (WARN_ON(++ret >= MAX_ORDER))
+			return -EINVAL;
+
+	outer_start = start & (~0UL << ret);
+	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	/* Free head and tail (if any) */
+	if (start != outer_start)
+		free_contig_pages(pfn_to_page(outer_start), start - outer_start);
+	if (end != outer_end)
+		free_contig_pages(pfn_to_page(end), outer_end - end);
+
+	ret = 0;
+done:
+	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	return ret;
+}
+
 void free_contig_pages(struct page *page, int nr_pages)
 {
 	for (; nr_pages; --nr_pages, ++page)
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 3/8] mm: alloc_contig_range() added
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit adds the alloc_contig_range() function which tries
to allecate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: renamed some variables for easier code reading]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |    2 +
 mm/page_alloc.c                |  144 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 146 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..c5d1a7c 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+			      gfp_t flags);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 00e9b24..2cea044 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5638,6 +5638,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
 	return pfn;
 }
 
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY	5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+	int migration_failed = 0, ret;
+	unsigned long pfn = start;
+
+	/*
+	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
+	 * __alloc_contig_pages().
+	 */
+
+	for (;;) {
+		pfn = scan_lru_pages(pfn, end);
+		if (!pfn || pfn >= end)
+			break;
+
+		ret = do_migrate_range(pfn, end);
+		if (!ret) {
+			migration_failed = 0;
+		} else if (ret != -EBUSY
+			|| ++migration_failed >= MIGRATION_RETRY) {
+			return ret;
+		} else {
+			/* There are unstable pages.on pagevec. */
+			lru_add_drain_all();
+			/*
+			 * there may be pages on pcplist before
+			 * we mark the range as ISOLATED.
+			 */
+			drain_all_pages();
+		}
+		cond_resched();
+	}
+
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+
+	/* Make sure all pages are isolated */
+	if (WARN_ON(test_pages_isolated(start, end)))
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ * @flags:	flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+		       gfp_t flags)
+{
+	unsigned long outer_start, outer_end;
+	int ret;
+
+	/*
+	 * What we do here is we mark all pageblocks in range as
+	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
+	 * align the range to MAX_ORDER pages so that page allocator
+	 * won't try to merge buddies from different pageblocks and
+	 * change MIGRATE_ISOLATE to some other migration type.
+	 *
+	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+	 * migrate the pages from an unaligned range (ie. pages that
+	 * we are interested in).  This will put all the pages in
+	 * range back to page allocator as MIGRATE_ISOLATE.
+	 *
+	 * When this is done, we take the pages in range from page
+	 * allocator removing them from the buddy system.  This way
+	 * page allocator will never consider using them.
+	 *
+	 * This lets us mark the pageblocks back as
+	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+	 * MAX_ORDER aligned range but not in the unaligned, original
+	 * range are put back to page allocator so that buddy can use
+	 * them.
+	 */
+
+	ret = start_isolate_page_range(pfn_to_maxpage(start),
+				       pfn_to_maxpage_up(end));
+	if (ret)
+		goto done;
+
+	ret = __alloc_contig_migrate_range(start, end);
+	if (ret)
+		goto done;
+
+	/*
+	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
+	 * more, all pages in [start, end) are free in page allocator.
+	 * What we are going to do is to allocate all pages from
+	 * [start, end) (that is remove them from page allocater).
+	 *
+	 * The only problem is that pages at the beginning and at the
+	 * end of interesting range may be not aligned with pages that
+	 * page allocator holds, ie. they can be part of higher order
+	 * pages.  Because of this, we reserve the bigger range and
+	 * once this is done free the pages we are not interested in.
+	 */
+
+	ret = 0;
+	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+		if (WARN_ON(++ret >= MAX_ORDER))
+			return -EINVAL;
+
+	outer_start = start & (~0UL << ret);
+	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	/* Free head and tail (if any) */
+	if (start != outer_start)
+		free_contig_pages(pfn_to_page(outer_start), start - outer_start);
+	if (end != outer_end)
+		free_contig_pages(pfn_to_page(end), outer_end - end);
+
+	ret = 0;
+done:
+	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	return ret;
+}
+
 void free_contig_pages(struct page *page, int nr_pages)
 {
 	for (; nr_pages; --nr_pages, ++page)
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 3/8] mm: alloc_contig_range() added
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit adds the alloc_contig_range() function which tries
to allecate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: renamed some variables for easier code reading]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |    2 +
 mm/page_alloc.c                |  144 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 146 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..c5d1a7c 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+			      gfp_t flags);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 00e9b24..2cea044 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5638,6 +5638,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
 	return pfn;
 }
 
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY	5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+	int migration_failed = 0, ret;
+	unsigned long pfn = start;
+
+	/*
+	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
+	 * __alloc_contig_pages().
+	 */
+
+	for (;;) {
+		pfn = scan_lru_pages(pfn, end);
+		if (!pfn || pfn >= end)
+			break;
+
+		ret = do_migrate_range(pfn, end);
+		if (!ret) {
+			migration_failed = 0;
+		} else if (ret != -EBUSY
+			|| ++migration_failed >= MIGRATION_RETRY) {
+			return ret;
+		} else {
+			/* There are unstable pages.on pagevec. */
+			lru_add_drain_all();
+			/*
+			 * there may be pages on pcplist before
+			 * we mark the range as ISOLATED.
+			 */
+			drain_all_pages();
+		}
+		cond_resched();
+	}
+
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+
+	/* Make sure all pages are isolated */
+	if (WARN_ON(test_pages_isolated(start, end)))
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ * @flags:	flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+		       gfp_t flags)
+{
+	unsigned long outer_start, outer_end;
+	int ret;
+
+	/*
+	 * What we do here is we mark all pageblocks in range as
+	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
+	 * align the range to MAX_ORDER pages so that page allocator
+	 * won't try to merge buddies from different pageblocks and
+	 * change MIGRATE_ISOLATE to some other migration type.
+	 *
+	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+	 * migrate the pages from an unaligned range (ie. pages that
+	 * we are interested in).  This will put all the pages in
+	 * range back to page allocator as MIGRATE_ISOLATE.
+	 *
+	 * When this is done, we take the pages in range from page
+	 * allocator removing them from the buddy system.  This way
+	 * page allocator will never consider using them.
+	 *
+	 * This lets us mark the pageblocks back as
+	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+	 * MAX_ORDER aligned range but not in the unaligned, original
+	 * range are put back to page allocator so that buddy can use
+	 * them.
+	 */
+
+	ret = start_isolate_page_range(pfn_to_maxpage(start),
+				       pfn_to_maxpage_up(end));
+	if (ret)
+		goto done;
+
+	ret = __alloc_contig_migrate_range(start, end);
+	if (ret)
+		goto done;
+
+	/*
+	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
+	 * more, all pages in [start, end) are free in page allocator.
+	 * What we are going to do is to allocate all pages from
+	 * [start, end) (that is remove them from page allocater).
+	 *
+	 * The only problem is that pages at the beginning and at the
+	 * end of interesting range may be not aligned with pages that
+	 * page allocator holds, ie. they can be part of higher order
+	 * pages.  Because of this, we reserve the bigger range and
+	 * once this is done free the pages we are not interested in.
+	 */
+
+	ret = 0;
+	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+		if (WARN_ON(++ret >= MAX_ORDER))
+			return -EINVAL;
+
+	outer_start = start & (~0UL << ret);
+	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	/* Free head and tail (if any) */
+	if (start != outer_start)
+		free_contig_pages(pfn_to_page(outer_start), start - outer_start);
+	if (end != outer_end)
+		free_contig_pages(pfn_to_page(end), outer_end - end);
+
+	ret = 0;
+done:
+	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	return ret;
+}
+
 void free_contig_pages(struct page *page, int nr_pages)
 {
 	for (; nr_pages; --nr_pages, ++page)
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
  2011-07-05  7:41 ` Marek Szyprowski
  (?)
@ 2011-07-05  7:41   ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory. Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>

cma migrate fixup
---
 include/linux/mmzone.h         |   43 +++++++++++++++---
 include/linux/page-isolation.h |    4 ++
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 ++++
 mm/page_alloc.c                |   94 ++++++++++++++++++++++++++++++++--------
 5 files changed, 132 insertions(+), 27 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..126014d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,37 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+#endif
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +78,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..014ebb5 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,8 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+extern void init_cma_reserved_pageblock(struct page *page);
+#endif
+
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 8ca47a5..6ffedd8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2cea044..1353a0c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,31 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	++totalram_pages;
+}
+
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +852,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+#else
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+#endif
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +955,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +975,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1007,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1086,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+#endif
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1232,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1330,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5486,8 +5542,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory. Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>

cma migrate fixup
---
 include/linux/mmzone.h         |   43 +++++++++++++++---
 include/linux/page-isolation.h |    4 ++
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 ++++
 mm/page_alloc.c                |   94 ++++++++++++++++++++++++++++++++--------
 5 files changed, 132 insertions(+), 27 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..126014d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,37 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+#endif
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +78,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..014ebb5 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,8 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+extern void init_cma_reserved_pageblock(struct page *page);
+#endif
+
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 8ca47a5..6ffedd8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2cea044..1353a0c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,31 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	++totalram_pages;
+}
+
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +852,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+#else
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+#endif
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +955,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +975,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1007,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1086,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+#endif
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1232,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1330,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5486,8 +5542,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory. Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>

cma migrate fixup
---
 include/linux/mmzone.h         |   43 +++++++++++++++---
 include/linux/page-isolation.h |    4 ++
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 ++++
 mm/page_alloc.c                |   94 ++++++++++++++++++++++++++++++++--------
 5 files changed, 132 insertions(+), 27 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..126014d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,37 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+#endif
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +78,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..014ebb5 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,8 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+extern void init_cma_reserved_pageblock(struct page *page);
+#endif
+
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 8ca47a5..6ffedd8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2cea044..1353a0c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,31 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	++totalram_pages;
+}
+
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +852,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+#else
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+#endif
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +955,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +975,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1007,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1086,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+#endif
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1232,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1330,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5486,8 +5542,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 5/8] mm: MIGRATE_CMA isolation functions added
  2011-07-05  7:41 ` Marek Szyprowski
  (?)
@ 2011-07-05  7:41   ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |   40 +++++++++++++++++++++++++++-------------
 mm/page_alloc.c                |   19 ++++++++++++-------
 mm/page_isolation.c            |   15 ++++++++-------
 3 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 014ebb5..96e287d 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,39 +3,53 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
  * free all pages in the range. test_page_isolated() can be used for
  * test it.
  */
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+	__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags);
+			      gfp_t flags, unsigned migratetype);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1353a0c..a936a75 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5642,7 +5642,7 @@ out:
 	return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags;
@@ -5650,8 +5650,8 @@ void unset_migratetype_isolate(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	set_pageblock_migratetype(page, migratetype);
+	move_freepages_block(zone, page, migratetype);
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -5756,6 +5756,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
  * @flags:	flags passed to alloc_contig_freed_pages().
+ * @migratetype:	migratetype of the underlaying pageblocks (either
+ *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ *			in range must have the same migratetype and it must
+ *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, hovewer it's callers responsibility to guarantee that we
@@ -5767,7 +5771,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags)
+		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
 	int ret;
@@ -5795,8 +5799,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * them.
 	 */
 
-	ret = start_isolate_page_range(pfn_to_maxpage(start),
-				       pfn_to_maxpage_up(end));
+	ret = __start_isolate_page_range(pfn_to_maxpage(start),
+					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
 		goto done;
 
@@ -5834,7 +5838,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
-	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+				  migratetype);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 15b41ec..f8beab5 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 }
 
 /*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * start_pfn/end_pfn must be aligned to pageblock_order.
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn));
+		__unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
 	return -EBUSY;
 }
@@ -67,8 +68,8 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page);
+		__unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 5/8] mm: MIGRATE_CMA isolation functions added
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |   40 +++++++++++++++++++++++++++-------------
 mm/page_alloc.c                |   19 ++++++++++++-------
 mm/page_isolation.c            |   15 ++++++++-------
 3 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 014ebb5..96e287d 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,39 +3,53 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
  * free all pages in the range. test_page_isolated() can be used for
  * test it.
  */
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+	__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags);
+			      gfp_t flags, unsigned migratetype);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1353a0c..a936a75 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5642,7 +5642,7 @@ out:
 	return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags;
@@ -5650,8 +5650,8 @@ void unset_migratetype_isolate(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	set_pageblock_migratetype(page, migratetype);
+	move_freepages_block(zone, page, migratetype);
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -5756,6 +5756,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
  * @flags:	flags passed to alloc_contig_freed_pages().
+ * @migratetype:	migratetype of the underlaying pageblocks (either
+ *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ *			in range must have the same migratetype and it must
+ *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, hovewer it's callers responsibility to guarantee that we
@@ -5767,7 +5771,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags)
+		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
 	int ret;
@@ -5795,8 +5799,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * them.
 	 */
 
-	ret = start_isolate_page_range(pfn_to_maxpage(start),
-				       pfn_to_maxpage_up(end));
+	ret = __start_isolate_page_range(pfn_to_maxpage(start),
+					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
 		goto done;
 
@@ -5834,7 +5838,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
-	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+				  migratetype);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 15b41ec..f8beab5 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 }
 
 /*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * start_pfn/end_pfn must be aligned to pageblock_order.
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn));
+		__unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
 	return -EBUSY;
 }
@@ -67,8 +68,8 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page);
+		__unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 5/8] mm: MIGRATE_CMA isolation functions added
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/page-isolation.h |   40 +++++++++++++++++++++++++++-------------
 mm/page_alloc.c                |   19 ++++++++++++-------
 mm/page_isolation.c            |   15 ++++++++-------
 3 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 014ebb5..96e287d 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,39 +3,53 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
  * free all pages in the range. test_page_isolated() can be used for
  * test it.
  */
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+	__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags);
+			      gfp_t flags, unsigned migratetype);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1353a0c..a936a75 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5642,7 +5642,7 @@ out:
 	return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags;
@@ -5650,8 +5650,8 @@ void unset_migratetype_isolate(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	set_pageblock_migratetype(page, migratetype);
+	move_freepages_block(zone, page, migratetype);
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -5756,6 +5756,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
  * @flags:	flags passed to alloc_contig_freed_pages().
+ * @migratetype:	migratetype of the underlaying pageblocks (either
+ *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ *			in range must have the same migratetype and it must
+ *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, hovewer it's callers responsibility to guarantee that we
@@ -5767,7 +5771,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags)
+		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
 	int ret;
@@ -5795,8 +5799,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * them.
 	 */
 
-	ret = start_isolate_page_range(pfn_to_maxpage(start),
-				       pfn_to_maxpage_up(end));
+	ret = __start_isolate_page_range(pfn_to_maxpage(start),
+					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
 		goto done;
 
@@ -5834,7 +5838,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
-	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+				  migratetype);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 15b41ec..f8beab5 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 }
 
 /*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * start_pfn/end_pfn must be aligned to pageblock_order.
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn));
+		__unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
 	return -EBUSY;
 }
@@ -67,8 +68,8 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page);
+		__unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-05  7:41 ` Marek Szyprowski
  (?)
@ 2011-07-05  7:41   ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 drivers/base/Kconfig           |   77 +++++++++
 drivers/base/Makefile          |    1 +
 drivers/base/dma-contiguous.c  |  367 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h |  104 +++++++++++
 4 files changed, 549 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d57e8d0..95ae1a7 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -168,4 +168,81 @@ config SYS_HYPERVISOR
 	bool
 	default n
 
+config CMA
+	bool "Contiguous Memory Allocator"
+	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 4c5701c..be6aab4 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..707b901
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,367 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/sizes.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from memblock subsystem. It should be
+ * called by arch specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+void __init dma_contiguous_reserve(void)
+{
+	struct memblock_region *reg;
+	unsigned long selected_size = 0;
+	unsigned long total_pages = 0;
+
+	pr_debug("%s()\n", __func__);
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
+		 (total_pages << PAGE_SHIFT) / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
+	selected_size = size_percent;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MIN
+	selected_size = min(size_abs, size_percent);
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MAX
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count;
+	struct zone *zone;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		VM_BUG_ON(!pfn_valid(pfn));
+		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		if (!(pfn & (pageblock_nr_pages - 1)))
+			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		++pfn;
+	} while (--i);
+
+	return 0;
+}
+
+#else
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned i = count >> pageblock_order;
+	struct page *p = pfn_to_page(base_pfn);
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	do {
+		init_cma_reserved_pageblock(p);
+		p += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+}
+
+#endif
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	unsigned long start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[8] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			pr_debug("%s: created area %p\n", __func__, cma);
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t start)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << (MAX_ORDER + 1);
+	start = ALIGN(start, alignment);
+	size  = ALIGN(size , alignment);
+
+	/* Reserve memory */
+	if (start) {
+		if (memblock_is_region_reserved(start, size) ||
+		    memblock_reserve(start, size) < 0)
+			return -EBUSY;
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		u64 addr = __memblock_alloc_base(size, alignment, 0);
+		if (!addr) {
+			return -ENOMEM;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			return -EOVERFLOW;
+		} else {
+			start = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = start;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
+	       size / SZ_1M, (void *)start);
+	return 0;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    align);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s([%p])\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pages, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..98312c9
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,104 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(void);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define dna_contiguous_default_area NULL
+
+static inline void dma_contiguous_reserve(void) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   unsigned long base)
+{
+	return -EINVAL;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 drivers/base/Kconfig           |   77 +++++++++
 drivers/base/Makefile          |    1 +
 drivers/base/dma-contiguous.c  |  367 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h |  104 +++++++++++
 4 files changed, 549 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d57e8d0..95ae1a7 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -168,4 +168,81 @@ config SYS_HYPERVISOR
 	bool
 	default n
 
+config CMA
+	bool "Contiguous Memory Allocator"
+	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 4c5701c..be6aab4 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..707b901
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,367 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/sizes.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from memblock subsystem. It should be
+ * called by arch specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+void __init dma_contiguous_reserve(void)
+{
+	struct memblock_region *reg;
+	unsigned long selected_size = 0;
+	unsigned long total_pages = 0;
+
+	pr_debug("%s()\n", __func__);
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
+		 (total_pages << PAGE_SHIFT) / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
+	selected_size = size_percent;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MIN
+	selected_size = min(size_abs, size_percent);
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MAX
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count;
+	struct zone *zone;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		VM_BUG_ON(!pfn_valid(pfn));
+		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		if (!(pfn & (pageblock_nr_pages - 1)))
+			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		++pfn;
+	} while (--i);
+
+	return 0;
+}
+
+#else
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned i = count >> pageblock_order;
+	struct page *p = pfn_to_page(base_pfn);
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	do {
+		init_cma_reserved_pageblock(p);
+		p += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+}
+
+#endif
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	unsigned long start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[8] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			pr_debug("%s: created area %p\n", __func__, cma);
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t start)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << (MAX_ORDER + 1);
+	start = ALIGN(start, alignment);
+	size  = ALIGN(size , alignment);
+
+	/* Reserve memory */
+	if (start) {
+		if (memblock_is_region_reserved(start, size) ||
+		    memblock_reserve(start, size) < 0)
+			return -EBUSY;
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		u64 addr = __memblock_alloc_base(size, alignment, 0);
+		if (!addr) {
+			return -ENOMEM;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			return -EOVERFLOW;
+		} else {
+			start = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = start;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
+	       size / SZ_1M, (void *)start);
+	return 0;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    align);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s([%p])\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pages, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..98312c9
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,104 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(void);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define dna_contiguous_default_area NULL
+
+static inline void dma_contiguous_reserve(void) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   unsigned long base)
+{
+	return -EINVAL;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 drivers/base/Kconfig           |   77 +++++++++
 drivers/base/Makefile          |    1 +
 drivers/base/dma-contiguous.c  |  367 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h |  104 +++++++++++
 4 files changed, 549 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d57e8d0..95ae1a7 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -168,4 +168,81 @@ config SYS_HYPERVISOR
 	bool
 	default n
 
+config CMA
+	bool "Contiguous Memory Allocator"
+	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 4c5701c..be6aab4 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..707b901
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,367 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/sizes.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from memblock subsystem. It should be
+ * called by arch specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+void __init dma_contiguous_reserve(void)
+{
+	struct memblock_region *reg;
+	unsigned long selected_size = 0;
+	unsigned long total_pages = 0;
+
+	pr_debug("%s()\n", __func__);
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
+		 (total_pages << PAGE_SHIFT) / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
+	selected_size = size_percent;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MIN
+	selected_size = min(size_abs, size_percent);
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MAX
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count;
+	struct zone *zone;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		VM_BUG_ON(!pfn_valid(pfn));
+		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		if (!(pfn & (pageblock_nr_pages - 1)))
+			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		++pfn;
+	} while (--i);
+
+	return 0;
+}
+
+#else
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned i = count >> pageblock_order;
+	struct page *p = pfn_to_page(base_pfn);
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	do {
+		init_cma_reserved_pageblock(p);
+		p += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+}
+
+#endif
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	unsigned long start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[8] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			pr_debug("%s: created area %p\n", __func__, cma);
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t start)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << (MAX_ORDER + 1);
+	start = ALIGN(start, alignment);
+	size  = ALIGN(size , alignment);
+
+	/* Reserve memory */
+	if (start) {
+		if (memblock_is_region_reserved(start, size) ||
+		    memblock_reserve(start, size) < 0)
+			return -EBUSY;
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		u64 addr = __memblock_alloc_base(size, alignment, 0);
+		if (!addr) {
+			return -ENOMEM;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			return -EOVERFLOW;
+		} else {
+			start = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = start;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
+	       size / SZ_1M, (void *)start);
+	return 0;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    align);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s([%p])\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pages, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..98312c9
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,104 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(void);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define dna_contiguous_default_area NULL
+
+static inline void dma_contiguous_reserve(void) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   unsigned long base)
+{
+	return -EINVAL;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 7/8] ARM: integrate CMA with dma-mapping subsystem
  2011-07-05  7:41 ` Marek Szyprowski
  (?)
@ 2011-07-05  7:41   ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                   |    1 +
 arch/arm/include/asm/device.h      |    3 ++
 arch/arm/include/asm/dma-mapping.h |   20 ++++++++++++++
 arch/arm/mm/dma-mapping.c          |   51 +++++++++++++++++++++++++++--------
 arch/arm/mm/init.c                 |    3 ++
 5 files changed, 66 insertions(+), 12 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9adc278..3cca8cc 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 9f390ce..942913e 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -10,6 +10,9 @@ struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
+#ifdef CONFIG_CMA
+	struct cma *cma_area;
+#endif
 };
 
 struct pdev_archdata {
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 4fff837..a3e1e48c 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -6,6 +6,7 @@
 #include <linux/mm_types.h>
 #include <linux/scatterlist.h>
 #include <linux/dma-debug.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm-generic/dma-coherent.h>
 #include <asm/memory.h>
@@ -14,6 +15,25 @@
 #error Please update to __arch_pfn_to_dma
 #endif
 
+#ifdef CONFIG_CMA
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev->archdata.cma_area)
+		return dev->archdata.cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	dev->archdata.cma_area = cma;
+}
+#else
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	return NULL;
+}
+#endif
+
 /*
  * dma_to_pfn/pfn_to_dma/dma_to_virt/virt_to_dma are architecture private
  * functions used internally by the DMA-mapping API to provide DMA
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82a093c..1d4e916 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,6 +17,7 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
 
 #include <asm/memory.h>
@@ -52,16 +53,35 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+
+static struct page *__alloc_system_pages(size_t count, unsigned int order, gfp_t gfp)
+{
+	struct page *page, *p, *e;
+
+	page = alloc_pages(gfp, order);
+	if (!page)
+		return NULL;
+
+	/*
+	 * Now split the huge page and free the excess pages
+	 */
+	split_page(page, order);
+	for (p = page + count, e = page + (1 << order); p < e; p++)
+		__free_page(p);
+	return page;
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
  */
 static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gfp)
 {
-	unsigned long order = get_order(size);
-	struct page *page, *p, *e;
+	struct page *page;
+	size_t count = size >> PAGE_SHIFT;
 	void *ptr;
 	u64 mask = get_coherent_dma_mask(dev);
+	unsigned long order = get_order(count << PAGE_SHIFT);
 
 #ifdef CONFIG_DMA_API_DEBUG
 	u64 limit = (mask + 1) & ~mask;
@@ -78,16 +98,19 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	if (mask < 0xffffffffULL)
 		gfp |= GFP_DMA;
 
-	page = alloc_pages(gfp, order);
-	if (!page)
-		return NULL;
+	/*
+	 * First, try to allocate memory from contiguous area
+	 */
+	page = dma_alloc_from_contiguous(dev, count, order);
 
 	/*
-	 * Now split the huge page and free the excess pages
+	 * Fallback if contiguous alloc fails or is not available
 	 */
-	split_page(page, order);
-	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
-		__free_page(p);
+	if (!page)
+		page = __alloc_system_pages(count, order, gfp);
+
+	if (!page)
+		return NULL;
 
 	/*
 	 * Ensure that the allocated pages are zeroed, and that any data
@@ -104,9 +127,13 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 /*
  * Free a DMA buffer.  'size' must be page aligned.
  */
-static void __dma_free_buffer(struct page *page, size_t size)
+static void __dma_free_buffer(struct device *dev, struct page *page, size_t size)
 {
-	struct page *e = page + (size >> PAGE_SHIFT);
+	size_t count = size >> PAGE_SHIFT;
+	struct page *e = page + count;
+
+	if (dma_release_from_contiguous(dev, page, count))
+		return;
 
 	while (page < e) {
 		__free_page(page);
@@ -416,7 +443,7 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 	if (!arch_is_coherent())
 		__dma_free_remap(cpu_addr, size);
 
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+	__dma_free_buffer(dev, pfn_to_page(dma_to_pfn(dev, handle)), size);
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index c19571c..b2dfdeb 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -20,6 +20,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -358,6 +359,8 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	dma_contiguous_reserve();
+
 	memblock_analyze();
 	memblock_dump_all();
 }
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 7/8] ARM: integrate CMA with dma-mapping subsystem
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                   |    1 +
 arch/arm/include/asm/device.h      |    3 ++
 arch/arm/include/asm/dma-mapping.h |   20 ++++++++++++++
 arch/arm/mm/dma-mapping.c          |   51 +++++++++++++++++++++++++++--------
 arch/arm/mm/init.c                 |    3 ++
 5 files changed, 66 insertions(+), 12 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9adc278..3cca8cc 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 9f390ce..942913e 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -10,6 +10,9 @@ struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
+#ifdef CONFIG_CMA
+	struct cma *cma_area;
+#endif
 };
 
 struct pdev_archdata {
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 4fff837..a3e1e48c 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -6,6 +6,7 @@
 #include <linux/mm_types.h>
 #include <linux/scatterlist.h>
 #include <linux/dma-debug.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm-generic/dma-coherent.h>
 #include <asm/memory.h>
@@ -14,6 +15,25 @@
 #error Please update to __arch_pfn_to_dma
 #endif
 
+#ifdef CONFIG_CMA
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev->archdata.cma_area)
+		return dev->archdata.cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	dev->archdata.cma_area = cma;
+}
+#else
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	return NULL;
+}
+#endif
+
 /*
  * dma_to_pfn/pfn_to_dma/dma_to_virt/virt_to_dma are architecture private
  * functions used internally by the DMA-mapping API to provide DMA
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82a093c..1d4e916 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,6 +17,7 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
 
 #include <asm/memory.h>
@@ -52,16 +53,35 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+
+static struct page *__alloc_system_pages(size_t count, unsigned int order, gfp_t gfp)
+{
+	struct page *page, *p, *e;
+
+	page = alloc_pages(gfp, order);
+	if (!page)
+		return NULL;
+
+	/*
+	 * Now split the huge page and free the excess pages
+	 */
+	split_page(page, order);
+	for (p = page + count, e = page + (1 << order); p < e; p++)
+		__free_page(p);
+	return page;
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
  */
 static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gfp)
 {
-	unsigned long order = get_order(size);
-	struct page *page, *p, *e;
+	struct page *page;
+	size_t count = size >> PAGE_SHIFT;
 	void *ptr;
 	u64 mask = get_coherent_dma_mask(dev);
+	unsigned long order = get_order(count << PAGE_SHIFT);
 
 #ifdef CONFIG_DMA_API_DEBUG
 	u64 limit = (mask + 1) & ~mask;
@@ -78,16 +98,19 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	if (mask < 0xffffffffULL)
 		gfp |= GFP_DMA;
 
-	page = alloc_pages(gfp, order);
-	if (!page)
-		return NULL;
+	/*
+	 * First, try to allocate memory from contiguous area
+	 */
+	page = dma_alloc_from_contiguous(dev, count, order);
 
 	/*
-	 * Now split the huge page and free the excess pages
+	 * Fallback if contiguous alloc fails or is not available
 	 */
-	split_page(page, order);
-	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
-		__free_page(p);
+	if (!page)
+		page = __alloc_system_pages(count, order, gfp);
+
+	if (!page)
+		return NULL;
 
 	/*
 	 * Ensure that the allocated pages are zeroed, and that any data
@@ -104,9 +127,13 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 /*
  * Free a DMA buffer.  'size' must be page aligned.
  */
-static void __dma_free_buffer(struct page *page, size_t size)
+static void __dma_free_buffer(struct device *dev, struct page *page, size_t size)
 {
-	struct page *e = page + (size >> PAGE_SHIFT);
+	size_t count = size >> PAGE_SHIFT;
+	struct page *e = page + count;
+
+	if (dma_release_from_contiguous(dev, page, count))
+		return;
 
 	while (page < e) {
 		__free_page(page);
@@ -416,7 +443,7 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 	if (!arch_is_coherent())
 		__dma_free_remap(cpu_addr, size);
 
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+	__dma_free_buffer(dev, pfn_to_page(dma_to_pfn(dev, handle)), size);
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index c19571c..b2dfdeb 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -20,6 +20,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -358,6 +359,8 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	dma_contiguous_reserve();
+
 	memblock_analyze();
 	memblock_dump_all();
 }
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 7/8] ARM: integrate CMA with dma-mapping subsystem
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                   |    1 +
 arch/arm/include/asm/device.h      |    3 ++
 arch/arm/include/asm/dma-mapping.h |   20 ++++++++++++++
 arch/arm/mm/dma-mapping.c          |   51 +++++++++++++++++++++++++++--------
 arch/arm/mm/init.c                 |    3 ++
 5 files changed, 66 insertions(+), 12 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9adc278..3cca8cc 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 9f390ce..942913e 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -10,6 +10,9 @@ struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
+#ifdef CONFIG_CMA
+	struct cma *cma_area;
+#endif
 };
 
 struct pdev_archdata {
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 4fff837..a3e1e48c 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -6,6 +6,7 @@
 #include <linux/mm_types.h>
 #include <linux/scatterlist.h>
 #include <linux/dma-debug.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm-generic/dma-coherent.h>
 #include <asm/memory.h>
@@ -14,6 +15,25 @@
 #error Please update to __arch_pfn_to_dma
 #endif
 
+#ifdef CONFIG_CMA
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev->archdata.cma_area)
+		return dev->archdata.cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	dev->archdata.cma_area = cma;
+}
+#else
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	return NULL;
+}
+#endif
+
 /*
  * dma_to_pfn/pfn_to_dma/dma_to_virt/virt_to_dma are architecture private
  * functions used internally by the DMA-mapping API to provide DMA
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82a093c..1d4e916 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,6 +17,7 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
 
 #include <asm/memory.h>
@@ -52,16 +53,35 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+
+static struct page *__alloc_system_pages(size_t count, unsigned int order, gfp_t gfp)
+{
+	struct page *page, *p, *e;
+
+	page = alloc_pages(gfp, order);
+	if (!page)
+		return NULL;
+
+	/*
+	 * Now split the huge page and free the excess pages
+	 */
+	split_page(page, order);
+	for (p = page + count, e = page + (1 << order); p < e; p++)
+		__free_page(p);
+	return page;
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
  */
 static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gfp)
 {
-	unsigned long order = get_order(size);
-	struct page *page, *p, *e;
+	struct page *page;
+	size_t count = size >> PAGE_SHIFT;
 	void *ptr;
 	u64 mask = get_coherent_dma_mask(dev);
+	unsigned long order = get_order(count << PAGE_SHIFT);
 
 #ifdef CONFIG_DMA_API_DEBUG
 	u64 limit = (mask + 1) & ~mask;
@@ -78,16 +98,19 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	if (mask < 0xffffffffULL)
 		gfp |= GFP_DMA;
 
-	page = alloc_pages(gfp, order);
-	if (!page)
-		return NULL;
+	/*
+	 * First, try to allocate memory from contiguous area
+	 */
+	page = dma_alloc_from_contiguous(dev, count, order);
 
 	/*
-	 * Now split the huge page and free the excess pages
+	 * Fallback if contiguous alloc fails or is not available
 	 */
-	split_page(page, order);
-	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
-		__free_page(p);
+	if (!page)
+		page = __alloc_system_pages(count, order, gfp);
+
+	if (!page)
+		return NULL;
 
 	/*
 	 * Ensure that the allocated pages are zeroed, and that any data
@@ -104,9 +127,13 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 /*
  * Free a DMA buffer.  'size' must be page aligned.
  */
-static void __dma_free_buffer(struct page *page, size_t size)
+static void __dma_free_buffer(struct device *dev, struct page *page, size_t size)
 {
-	struct page *e = page + (size >> PAGE_SHIFT);
+	size_t count = size >> PAGE_SHIFT;
+	struct page *e = page + count;
+
+	if (dma_release_from_contiguous(dev, page, count))
+		return;
 
 	while (page < e) {
 		__free_page(page);
@@ -416,7 +443,7 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 	if (!arch_is_coherent())
 		__dma_free_remap(cpu_addr, size);
 
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+	__dma_free_buffer(dev, pfn_to_page(dma_to_pfn(dev, handle)), size);
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index c19571c..b2dfdeb 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -20,6 +20,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -358,6 +359,8 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	dma_contiguous_reserve();
+
 	memblock_analyze();
 	memblock_dump_all();
 }
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 8/8] ARM: S5PV210: example of CMA private area for FIMC device on Goni board
  2011-07-05  7:41 ` Marek Szyprowski
  (?)
@ 2011-07-05  7:41   ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

This patch is an example how device private CMA area can be activated.
It creates one CMA region and assigns it to the first s5p-fimc device on
Samsung Goni S5PC110 board.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/mach-s5pv210/Kconfig     |    1 +
 arch/arm/mach-s5pv210/mach-goni.c |    8 ++++++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-s5pv210/Kconfig b/arch/arm/mach-s5pv210/Kconfig
index 37b5a97..c09a92c 100644
--- a/arch/arm/mach-s5pv210/Kconfig
+++ b/arch/arm/mach-s5pv210/Kconfig
@@ -64,6 +64,7 @@ menu "S5PC110 Machines"
 config MACH_AQUILA
 	bool "Aquila"
 	select CPU_S5PV210
+	select CMA
 	select S3C_DEV_FB
 	select S5P_DEV_FIMC0
 	select S5P_DEV_FIMC1
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index 31d5aa7..d9e565d 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -26,6 +26,7 @@
 #include <linux/input.h>
 #include <linux/gpio.h>
 #include <linux/interrupt.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach/arch.h>
 #include <asm/mach/map.h>
@@ -886,6 +887,12 @@ static void __init goni_machine_init(void)
 	platform_add_devices(goni_devices, ARRAY_SIZE(goni_devices));
 }
 
+static void __init goni_reserve(void)
+{
+	/* Create private 16MiB contiguous memory area for s5p-fimc.0 device */
+	dma_declare_contiguous(&s5p_device_fimc0.dev, 16*SZ_1M, 0);
+}
+
 MACHINE_START(GONI, "GONI")
 	/* Maintainers: Kyungmin Park <kyungmin.park@samsung.com> */
 	.boot_params	= S5P_PA_SDRAM + 0x100,
@@ -893,4 +900,5 @@ MACHINE_START(GONI, "GONI")
 	.map_io		= goni_map_io,
 	.init_machine	= goni_machine_init,
 	.timer		= &s5p_timer,
+	.reserve	= goni_reserve,
 MACHINE_END
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 8/8] ARM: S5PV210: example of CMA private area for FIMC device on Goni board
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

This patch is an example how device private CMA area can be activated.
It creates one CMA region and assigns it to the first s5p-fimc device on
Samsung Goni S5PC110 board.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/mach-s5pv210/Kconfig     |    1 +
 arch/arm/mach-s5pv210/mach-goni.c |    8 ++++++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-s5pv210/Kconfig b/arch/arm/mach-s5pv210/Kconfig
index 37b5a97..c09a92c 100644
--- a/arch/arm/mach-s5pv210/Kconfig
+++ b/arch/arm/mach-s5pv210/Kconfig
@@ -64,6 +64,7 @@ menu "S5PC110 Machines"
 config MACH_AQUILA
 	bool "Aquila"
 	select CPU_S5PV210
+	select CMA
 	select S3C_DEV_FB
 	select S5P_DEV_FIMC0
 	select S5P_DEV_FIMC1
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index 31d5aa7..d9e565d 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -26,6 +26,7 @@
 #include <linux/input.h>
 #include <linux/gpio.h>
 #include <linux/interrupt.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach/arch.h>
 #include <asm/mach/map.h>
@@ -886,6 +887,12 @@ static void __init goni_machine_init(void)
 	platform_add_devices(goni_devices, ARRAY_SIZE(goni_devices));
 }
 
+static void __init goni_reserve(void)
+{
+	/* Create private 16MiB contiguous memory area for s5p-fimc.0 device */
+	dma_declare_contiguous(&s5p_device_fimc0.dev, 16*SZ_1M, 0);
+}
+
 MACHINE_START(GONI, "GONI")
 	/* Maintainers: Kyungmin Park <kyungmin.park@samsung.com> */
 	.boot_params	= S5P_PA_SDRAM + 0x100,
@@ -893,4 +900,5 @@ MACHINE_START(GONI, "GONI")
 	.map_io		= goni_map_io,
 	.init_machine	= goni_machine_init,
 	.timer		= &s5p_timer,
+	.reserve	= goni_reserve,
 MACHINE_END
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 8/8] ARM: S5PV210: example of CMA private area for FIMC device on Goni board
@ 2011-07-05  7:41   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05  7:41 UTC (permalink / raw)
  To: linux-arm-kernel

This patch is an example how device private CMA area can be activated.
It creates one CMA region and assigns it to the first s5p-fimc device on
Samsung Goni S5PC110 board.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/mach-s5pv210/Kconfig     |    1 +
 arch/arm/mach-s5pv210/mach-goni.c |    8 ++++++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-s5pv210/Kconfig b/arch/arm/mach-s5pv210/Kconfig
index 37b5a97..c09a92c 100644
--- a/arch/arm/mach-s5pv210/Kconfig
+++ b/arch/arm/mach-s5pv210/Kconfig
@@ -64,6 +64,7 @@ menu "S5PC110 Machines"
 config MACH_AQUILA
 	bool "Aquila"
 	select CPU_S5PV210
+	select CMA
 	select S3C_DEV_FB
 	select S5P_DEV_FIMC0
 	select S5P_DEV_FIMC1
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index 31d5aa7..d9e565d 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -26,6 +26,7 @@
 #include <linux/input.h>
 #include <linux/gpio.h>
 #include <linux/interrupt.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach/arch.h>
 #include <asm/mach/map.h>
@@ -886,6 +887,12 @@ static void __init goni_machine_init(void)
 	platform_add_devices(goni_devices, ARRAY_SIZE(goni_devices));
 }
 
+static void __init goni_reserve(void)
+{
+	/* Create private 16MiB contiguous memory area for s5p-fimc.0 device */
+	dma_declare_contiguous(&s5p_device_fimc0.dev, 16*SZ_1M, 0);
+}
+
 MACHINE_START(GONI, "GONI")
 	/* Maintainers: Kyungmin Park <kyungmin.park@samsung.com> */
 	.boot_params	= S5P_PA_SDRAM + 0x100,
@@ -893,4 +900,5 @@ MACHINE_START(GONI, "GONI")
 	.map_io		= goni_map_io,
 	.init_machine	= goni_machine_init,
 	.timer		= &s5p_timer,
+	.reserve	= goni_reserve,
 MACHINE_END
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-05 10:24     ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05 10:24 UTC (permalink / raw)
  To: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig
  Cc: 'Michal Nazarewicz', 'Kyungmin Park',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Ankita Garg', 'Daniel Walker',
	'Mel Gorman', 'Arnd Bergmann',
	'Jesse Barker', 'Jonathan Corbet',
	'Chunsang Jeong'

Hello,

On Tuesday, July 05, 2011 9:42 AM Marek Szyprowski wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  drivers/base/Kconfig           |   77 +++++++++
>  drivers/base/Makefile          |    1 +
>  drivers/base/dma-contiguous.c  |  367
> ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-contiguous.h |  104 +++++++++++
>  4 files changed, 549 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index d57e8d0..95ae1a7 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -168,4 +168,81 @@ config SYS_HYPERVISOR
>  	bool
>  	default n
> 
> +config CMA
> +	bool "Contiguous Memory Allocator"
> +	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK

The above line should be obviously "depends on HAVE_DMA_CONTIGUOUS &&
HAVE_MEMBLOCK".
I'm sorry for posting broken version. 

(snipped)

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05 10:24     ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05 10:24 UTC (permalink / raw)
  To: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig
  Cc: 'Michal Nazarewicz', 'Kyungmin Park',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Ankita Garg', 'Daniel Walker',
	'Mel Gorman', 'Arnd Bergmann',
	'Jesse Barker', 'Jonathan Corbet',
	'Chunsang Jeong'

Hello,

On Tuesday, July 05, 2011 9:42 AM Marek Szyprowski wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  drivers/base/Kconfig           |   77 +++++++++
>  drivers/base/Makefile          |    1 +
>  drivers/base/dma-contiguous.c  |  367
> ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-contiguous.h |  104 +++++++++++
>  4 files changed, 549 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index d57e8d0..95ae1a7 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -168,4 +168,81 @@ config SYS_HYPERVISOR
>  	bool
>  	default n
> 
> +config CMA
> +	bool "Contiguous Memory Allocator"
> +	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK

The above line should be obviously "depends on HAVE_DMA_CONTIGUOUS &&
HAVE_MEMBLOCK".
I'm sorry for posting broken version. 

(snipped)

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05 10:24     ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05 10:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Tuesday, July 05, 2011 9:42 AM Marek Szyprowski wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  drivers/base/Kconfig           |   77 +++++++++
>  drivers/base/Makefile          |    1 +
>  drivers/base/dma-contiguous.c  |  367
> ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-contiguous.h |  104 +++++++++++
>  4 files changed, 549 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index d57e8d0..95ae1a7 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -168,4 +168,81 @@ config SYS_HYPERVISOR
>  	bool
>  	default n
> 
> +config CMA
> +	bool "Contiguous Memory Allocator"
> +	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK

The above line should be obviously "depends on HAVE_DMA_CONTIGUOUS &&
HAVE_MEMBLOCK".
I'm sorry for posting broken version. 

(snipped)

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8 RESEND] drivers: add Contiguous Memory Allocator
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-05 11:02     ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05 11:02 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/Kconfig                   |    3 +
 drivers/base/Kconfig           |   77 +++++++++
 drivers/base/Makefile          |    1 +
 drivers/base/dma-contiguous.c  |  367 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h |  104 +++++++++++
 5 files changed, 552 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 26b0e23..228d761 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
 config HAVE_DMA_ATTRS
 	bool
 
+config HAVE_DMA_CONTIGUOUS
+	bool
+
 config USE_GENERIC_SMP_HELPERS
 	bool
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d57e8d0..c690d05 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -168,4 +168,81 @@ config SYS_HYPERVISOR
 	bool
 	default n
 
+config CMA
+	bool "Contiguous Memory Allocator"
+	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 4c5701c..be6aab4 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..707b901
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,367 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/sizes.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from memblock subsystem. It should be
+ * called by arch specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+void __init dma_contiguous_reserve(void)
+{
+	struct memblock_region *reg;
+	unsigned long selected_size = 0;
+	unsigned long total_pages = 0;
+
+	pr_debug("%s()\n", __func__);
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
+		 (total_pages << PAGE_SHIFT) / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
+	selected_size = size_percent;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MIN
+	selected_size = min(size_abs, size_percent);
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MAX
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count;
+	struct zone *zone;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		VM_BUG_ON(!pfn_valid(pfn));
+		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		if (!(pfn & (pageblock_nr_pages - 1)))
+			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		++pfn;
+	} while (--i);
+
+	return 0;
+}
+
+#else
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned i = count >> pageblock_order;
+	struct page *p = pfn_to_page(base_pfn);
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	do {
+		init_cma_reserved_pageblock(p);
+		p += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+}
+
+#endif
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	unsigned long start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[8] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			pr_debug("%s: created area %p\n", __func__, cma);
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t start)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << (MAX_ORDER + 1);
+	start = ALIGN(start, alignment);
+	size  = ALIGN(size , alignment);
+
+	/* Reserve memory */
+	if (start) {
+		if (memblock_is_region_reserved(start, size) ||
+		    memblock_reserve(start, size) < 0)
+			return -EBUSY;
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		u64 addr = __memblock_alloc_base(size, alignment, 0);
+		if (!addr) {
+			return -ENOMEM;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			return -EOVERFLOW;
+		} else {
+			start = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = start;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
+	       size / SZ_1M, (void *)start);
+	return 0;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    align);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s([%p])\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pages, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..98312c9
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,104 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(void);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define dna_contiguous_default_area NULL
+
+static inline void dma_contiguous_reserve(void) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   unsigned long base)
+{
+	return -EINVAL;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 6/8 RESEND] drivers: add Contiguous Memory Allocator
@ 2011-07-05 11:02     ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05 11:02 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/Kconfig                   |    3 +
 drivers/base/Kconfig           |   77 +++++++++
 drivers/base/Makefile          |    1 +
 drivers/base/dma-contiguous.c  |  367 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h |  104 +++++++++++
 5 files changed, 552 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 26b0e23..228d761 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
 config HAVE_DMA_ATTRS
 	bool
 
+config HAVE_DMA_CONTIGUOUS
+	bool
+
 config USE_GENERIC_SMP_HELPERS
 	bool
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d57e8d0..c690d05 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -168,4 +168,81 @@ config SYS_HYPERVISOR
 	bool
 	default n
 
+config CMA
+	bool "Contiguous Memory Allocator"
+	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 4c5701c..be6aab4 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..707b901
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,367 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/sizes.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from memblock subsystem. It should be
+ * called by arch specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+void __init dma_contiguous_reserve(void)
+{
+	struct memblock_region *reg;
+	unsigned long selected_size = 0;
+	unsigned long total_pages = 0;
+
+	pr_debug("%s()\n", __func__);
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
+		 (total_pages << PAGE_SHIFT) / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
+	selected_size = size_percent;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MIN
+	selected_size = min(size_abs, size_percent);
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MAX
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count;
+	struct zone *zone;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		VM_BUG_ON(!pfn_valid(pfn));
+		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		if (!(pfn & (pageblock_nr_pages - 1)))
+			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		++pfn;
+	} while (--i);
+
+	return 0;
+}
+
+#else
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned i = count >> pageblock_order;
+	struct page *p = pfn_to_page(base_pfn);
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	do {
+		init_cma_reserved_pageblock(p);
+		p += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+}
+
+#endif
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	unsigned long start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[8] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			pr_debug("%s: created area %p\n", __func__, cma);
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t start)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << (MAX_ORDER + 1);
+	start = ALIGN(start, alignment);
+	size  = ALIGN(size , alignment);
+
+	/* Reserve memory */
+	if (start) {
+		if (memblock_is_region_reserved(start, size) ||
+		    memblock_reserve(start, size) < 0)
+			return -EBUSY;
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		u64 addr = __memblock_alloc_base(size, alignment, 0);
+		if (!addr) {
+			return -ENOMEM;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			return -EOVERFLOW;
+		} else {
+			start = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = start;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
+	       size / SZ_1M, (void *)start);
+	return 0;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    align);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s([%p])\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pages, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..98312c9
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,104 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(void);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define dna_contiguous_default_area NULL
+
+static inline void dma_contiguous_reserve(void) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   unsigned long base)
+{
+	return -EINVAL;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 6/8 RESEND] drivers: add Contiguous Memory Allocator
@ 2011-07-05 11:02     ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-05 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/Kconfig                   |    3 +
 drivers/base/Kconfig           |   77 +++++++++
 drivers/base/Makefile          |    1 +
 drivers/base/dma-contiguous.c  |  367 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h |  104 +++++++++++
 5 files changed, 552 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 26b0e23..228d761 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
 config HAVE_DMA_ATTRS
 	bool
 
+config HAVE_DMA_CONTIGUOUS
+	bool
+
 config USE_GENERIC_SMP_HELPERS
 	bool
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d57e8d0..c690d05 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -168,4 +168,81 @@ config SYS_HYPERVISOR
 	bool
 	default n
 
+config CMA
+	bool "Contiguous Memory Allocator"
+	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 4c5701c..be6aab4 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..707b901
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,367 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/sizes.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from memblock subsystem. It should be
+ * called by arch specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+void __init dma_contiguous_reserve(void)
+{
+	struct memblock_region *reg;
+	unsigned long selected_size = 0;
+	unsigned long total_pages = 0;
+
+	pr_debug("%s()\n", __func__);
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
+		 (total_pages << PAGE_SHIFT) / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
+	selected_size = size_percent;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MIN
+	selected_size = min(size_abs, size_percent);
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MAX
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count;
+	struct zone *zone;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		VM_BUG_ON(!pfn_valid(pfn));
+		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		if (!(pfn & (pageblock_nr_pages - 1)))
+			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		++pfn;
+	} while (--i);
+
+	return 0;
+}
+
+#else
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned i = count >> pageblock_order;
+	struct page *p = pfn_to_page(base_pfn);
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	do {
+		init_cma_reserved_pageblock(p);
+		p += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+}
+
+#endif
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	unsigned long start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[8] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			pr_debug("%s: created area %p\n", __func__, cma);
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t start)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << (MAX_ORDER + 1);
+	start = ALIGN(start, alignment);
+	size  = ALIGN(size , alignment);
+
+	/* Reserve memory */
+	if (start) {
+		if (memblock_is_region_reserved(start, size) ||
+		    memblock_reserve(start, size) < 0)
+			return -EBUSY;
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		u64 addr = __memblock_alloc_base(size, alignment, 0);
+		if (!addr) {
+			return -ENOMEM;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			return -EOVERFLOW;
+		} else {
+			start = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = start;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
+	       size / SZ_1M, (void *)start);
+	return 0;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    align);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s([%p])\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pages, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..98312c9
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,104 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(void);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define dna_contiguous_default_area NULL
+
+static inline void dma_contiguous_reserve(void) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   unsigned long base)
+{
+	return -EINVAL;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH 1/8] mm: move some functions from memory_hotplug.c to page_isolation.c
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-05 11:27     ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:27 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified
> range of pfn. So, some of core logics can be used for other purpose
> as allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> [m.nazarewicz: reworded commit message]
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: rebased and updated to Linux v3.0-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 1/8] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-07-05 11:27     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:27 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified
> range of pfn. So, some of core logics can be used for other purpose
> as allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> [m.nazarewicz: reworded commit message]
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: rebased and updated to Linux v3.0-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 1/8] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-07-05 11:27     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified
> range of pfn. So, some of core logics can be used for other purpose
> as allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> [m.nazarewicz: reworded commit message]
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: rebased and updated to Linux v3.0-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 2/8] mm: alloc_contig_freed_pages() added
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-05 11:30     ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:30 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This commit introduces alloc_contig_freed_pages() function
> which allocates (ie. removes from buddy system) free pages
> in range.  Caller has to guarantee that all pages in range
> are in buddy system.
> 
> Along with this function, a free_contig_pages() function is
> provided which frees all (or a subset of) pages allocated
> with alloc_contig_free_pages().
> 
> Michal Nazarewicz has modified the function to make it easier
> to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
> return pfn of one-past-the-last allocated page.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 2/8] mm: alloc_contig_freed_pages() added
@ 2011-07-05 11:30     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:30 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This commit introduces alloc_contig_freed_pages() function
> which allocates (ie. removes from buddy system) free pages
> in range.  Caller has to guarantee that all pages in range
> are in buddy system.
> 
> Along with this function, a free_contig_pages() function is
> provided which frees all (or a subset of) pages allocated
> with alloc_contig_free_pages().
> 
> Michal Nazarewicz has modified the function to make it easier
> to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
> return pfn of one-past-the-last allocated page.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 2/8] mm: alloc_contig_freed_pages() added
@ 2011-07-05 11:30     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This commit introduces alloc_contig_freed_pages() function
> which allocates (ie. removes from buddy system) free pages
> in range.  Caller has to guarantee that all pages in range
> are in buddy system.
> 
> Along with this function, a free_contig_pages() function is
> provided which frees all (or a subset of) pages allocated
> with alloc_contig_free_pages().
> 
> Michal Nazarewicz has modified the function to make it easier
> to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
> return pfn of one-past-the-last allocated page.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 3/8] mm: alloc_contig_range() added
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-05 11:31     ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:31 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit adds the alloc_contig_range() function which tries
> to allecate given range of pages.  It tries to migrate all
> already allocated pages that fall in the range thus freeing them.
> Once all pages in the range are freed they are removed from the
> buddy system thus allocated for the caller to use.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: renamed some variables for easier code reading]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 3/8] mm: alloc_contig_range() added
@ 2011-07-05 11:31     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:31 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit adds the alloc_contig_range() function which tries
> to allecate given range of pages.  It tries to migrate all
> already allocated pages that fall in the range thus freeing them.
> Once all pages in the range are freed they are removed from the
> buddy system thus allocated for the caller to use.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: renamed some variables for easier code reading]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 3/8] mm: alloc_contig_range() added
@ 2011-07-05 11:31     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit adds the alloc_contig_range() function which tries
> to allecate given range of pages.  It tries to migrate all
> already allocated pages that fall in the range thus freeing them.
> Once all pages in the range are freed they are removed from the
> buddy system thus allocated for the caller to use.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: renamed some variables for easier code reading]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-05 11:33     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 11:33 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Arnd Bergmann, Jonathan Corbet,
	Mel Gorman, Chunsang Jeong, Michal Nazarewicz, Jesse Barker,
	Kyungmin Park, Ankita Garg, Andrew Morton, KAMEZAWA Hiroyuki

On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.

And how are you addressing the technical concerns about aliasing of
cache attributes which I keep bringing up with this and you keep
ignoring and telling me that I'm standing in your way.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05 11:33     ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 11:33 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Arnd Bergmann, Jonathan Corbet,
	Mel Gorman, Chunsang Jeong, Michal Nazarewicz, Jesse Barker,
	Kyungmin Park, Ankita Garg, Andrew Morton, KAMEZAWA Hiroyuki

On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.

And how are you addressing the technical concerns about aliasing of
cache attributes which I keep bringing up with this and you keep
ignoring and telling me that I'm standing in your way.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05 11:33     ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 11:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.

And how are you addressing the technical concerns about aliasing of
cache attributes which I keep bringing up with this and you keep
ignoring and telling me that I'm standing in your way.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 4/8] mm: MIGRATE_CMA migration type added
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-05 11:44     ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:44 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
> 
> This guarantees that page in a MIGRATE_CMA page block can
> always be migrated somewhere else (unless there's no memory left
> in the system).
> 
> It is designed to be used with Contiguous Memory Allocator
> (CMA) for allocating big chunks (eg. 10MiB) of physically
> contiguous memory. Once driver requests contiguous memory,
> CMA will migrate pages from MIGRATE_CMA pageblocks.
> 
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: cleaned up Kconfig, renamed some functions]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>, but I noticed a few things:

> cma migrate fixup

This text doesn't belong here.

> +enum {
> +	MIGRATE_UNMOVABLE,
> +	MIGRATE_RECLAIMABLE,
> +	MIGRATE_MOVABLE,
> +	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
> +	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +	/*
> +	 * MIGRATE_CMA migration type is designed to mimic the way
> +	 * ZONE_MOVABLE works.  Only movable pages can be allocated
> +	 * from MIGRATE_CMA pageblocks and page allocator never
> +	 * implicitly change migration type of MIGRATE_CMA pageblock.
> +	 *
> +	 * The way to use it is to change migratetype of a range of
> +	 * pageblocks to MIGRATE_CMA which can be done by
> +	 * __free_pageblock_cma() function.  What is important though
> +	 * is that a range of pageblocks must be aligned to
> +	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
> +	 * a single pageblock.
> +	 */
> +	MIGRATE_CMA,
> +#endif
> +	MIGRATE_ISOLATE,	/* can't allocate from here */
> +	MIGRATE_TYPES
> +};

It's not clear to me why you need this #ifdef. Does it hurt if the
migration type is defined but not used?

> @@ -198,6 +198,12 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config CMA_MIGRATE_TYPE
> +	bool
> +	help
> +	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
> +	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
> +
>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT

This is currently only selected on ARM with your patch set. 

> diff --git a/mm/compaction.c b/mm/compaction.c
> index 6cc604b..9e5cc59 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>  		return false;
>  
> +	/* Keep MIGRATE_CMA alone as well. */
> +	/*
> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
> +	 * pages since compaction insists on changing their migration
> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
> +	 * isolate_freepages_block() above).
> +	 */
> +	if (is_migrate_cma(migratetype))
> +		return false;
> +
>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>  		return true;

Do you plan to fix address this before merging the patch set, or is
it harmless enough to get in this way?

>  /*
>   * The order of subdivision here is critical for the IO subsystem.
> @@ -827,11 +852,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>   * This array describes the order lists are fallen back to when
>   * the free lists for the desirable migrate type are depleted
>   */
> -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
> +static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
> +#else
>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> -	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
> +#endif
> +	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
>  };
>  
>  /*
> @@ -1044,7 +1086,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>  			list_add(&page->lru, list);
>  		else
>  			list_add_tail(&page->lru, list);
> -		set_page_private(page, migratetype);
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +		if (is_pageblock_cma(page))
> +			set_page_private(page, MIGRATE_CMA);
> +		else
> +#endif
> +			set_page_private(page, migratetype);
>  		list = &page->lru;
>  	}

I guess if you can get rid of the first #ifdef I mentioned above, these two can be
removed as well, without causing any run-time overhead.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 4/8] mm: MIGRATE_CMA migration type added
@ 2011-07-05 11:44     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:44 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
> 
> This guarantees that page in a MIGRATE_CMA page block can
> always be migrated somewhere else (unless there's no memory left
> in the system).
> 
> It is designed to be used with Contiguous Memory Allocator
> (CMA) for allocating big chunks (eg. 10MiB) of physically
> contiguous memory. Once driver requests contiguous memory,
> CMA will migrate pages from MIGRATE_CMA pageblocks.
> 
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: cleaned up Kconfig, renamed some functions]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>, but I noticed a few things:

> cma migrate fixup

This text doesn't belong here.

> +enum {
> +	MIGRATE_UNMOVABLE,
> +	MIGRATE_RECLAIMABLE,
> +	MIGRATE_MOVABLE,
> +	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
> +	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +	/*
> +	 * MIGRATE_CMA migration type is designed to mimic the way
> +	 * ZONE_MOVABLE works.  Only movable pages can be allocated
> +	 * from MIGRATE_CMA pageblocks and page allocator never
> +	 * implicitly change migration type of MIGRATE_CMA pageblock.
> +	 *
> +	 * The way to use it is to change migratetype of a range of
> +	 * pageblocks to MIGRATE_CMA which can be done by
> +	 * __free_pageblock_cma() function.  What is important though
> +	 * is that a range of pageblocks must be aligned to
> +	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
> +	 * a single pageblock.
> +	 */
> +	MIGRATE_CMA,
> +#endif
> +	MIGRATE_ISOLATE,	/* can't allocate from here */
> +	MIGRATE_TYPES
> +};

It's not clear to me why you need this #ifdef. Does it hurt if the
migration type is defined but not used?

> @@ -198,6 +198,12 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config CMA_MIGRATE_TYPE
> +	bool
> +	help
> +	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
> +	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
> +
>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT

This is currently only selected on ARM with your patch set. 

> diff --git a/mm/compaction.c b/mm/compaction.c
> index 6cc604b..9e5cc59 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>  		return false;
>  
> +	/* Keep MIGRATE_CMA alone as well. */
> +	/*
> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
> +	 * pages since compaction insists on changing their migration
> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
> +	 * isolate_freepages_block() above).
> +	 */
> +	if (is_migrate_cma(migratetype))
> +		return false;
> +
>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>  		return true;

Do you plan to fix address this before merging the patch set, or is
it harmless enough to get in this way?

>  /*
>   * The order of subdivision here is critical for the IO subsystem.
> @@ -827,11 +852,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>   * This array describes the order lists are fallen back to when
>   * the free lists for the desirable migrate type are depleted
>   */
> -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
> +static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
> +#else
>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> -	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
> +#endif
> +	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
>  };
>  
>  /*
> @@ -1044,7 +1086,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>  			list_add(&page->lru, list);
>  		else
>  			list_add_tail(&page->lru, list);
> -		set_page_private(page, migratetype);
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +		if (is_pageblock_cma(page))
> +			set_page_private(page, MIGRATE_CMA);
> +		else
> +#endif
> +			set_page_private(page, migratetype);
>  		list = &page->lru;
>  	}

I guess if you can get rid of the first #ifdef I mentioned above, these two can be
removed as well, without causing any run-time overhead.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
@ 2011-07-05 11:44     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
> 
> This guarantees that page in a MIGRATE_CMA page block can
> always be migrated somewhere else (unless there's no memory left
> in the system).
> 
> It is designed to be used with Contiguous Memory Allocator
> (CMA) for allocating big chunks (eg. 10MiB) of physically
> contiguous memory. Once driver requests contiguous memory,
> CMA will migrate pages from MIGRATE_CMA pageblocks.
> 
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> [m.szyprowski: cleaned up Kconfig, renamed some functions]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>, but I noticed a few things:

> cma migrate fixup

This text doesn't belong here.

> +enum {
> +	MIGRATE_UNMOVABLE,
> +	MIGRATE_RECLAIMABLE,
> +	MIGRATE_MOVABLE,
> +	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
> +	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +	/*
> +	 * MIGRATE_CMA migration type is designed to mimic the way
> +	 * ZONE_MOVABLE works.  Only movable pages can be allocated
> +	 * from MIGRATE_CMA pageblocks and page allocator never
> +	 * implicitly change migration type of MIGRATE_CMA pageblock.
> +	 *
> +	 * The way to use it is to change migratetype of a range of
> +	 * pageblocks to MIGRATE_CMA which can be done by
> +	 * __free_pageblock_cma() function.  What is important though
> +	 * is that a range of pageblocks must be aligned to
> +	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
> +	 * a single pageblock.
> +	 */
> +	MIGRATE_CMA,
> +#endif
> +	MIGRATE_ISOLATE,	/* can't allocate from here */
> +	MIGRATE_TYPES
> +};

It's not clear to me why you need this #ifdef. Does it hurt if the
migration type is defined but not used?

> @@ -198,6 +198,12 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config CMA_MIGRATE_TYPE
> +	bool
> +	help
> +	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
> +	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
> +
>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT

This is currently only selected on ARM with your patch set. 

> diff --git a/mm/compaction.c b/mm/compaction.c
> index 6cc604b..9e5cc59 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
>  		return false;
>  
> +	/* Keep MIGRATE_CMA alone as well. */
> +	/*
> +	 * XXX Revisit.  We currently cannot let compaction touch CMA
> +	 * pages since compaction insists on changing their migration
> +	 * type to MIGRATE_MOVABLE (see split_free_page() called from
> +	 * isolate_freepages_block() above).
> +	 */
> +	if (is_migrate_cma(migratetype))
> +		return false;
> +
>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>  		return true;

Do you plan to fix address this before merging the patch set, or is
it harmless enough to get in this way?

>  /*
>   * The order of subdivision here is critical for the IO subsystem.
> @@ -827,11 +852,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>   * This array describes the order lists are fallen back to when
>   * the free lists for the desirable migrate type are depleted
>   */
> -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
> +static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
> +#else
>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> -	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
> +#endif
> +	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
>  };
>  
>  /*
> @@ -1044,7 +1086,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>  			list_add(&page->lru, list);
>  		else
>  			list_add_tail(&page->lru, list);
> -		set_page_private(page, migratetype);
> +#ifdef CONFIG_CMA_MIGRATE_TYPE
> +		if (is_pageblock_cma(page))
> +			set_page_private(page, MIGRATE_CMA);
> +		else
> +#endif
> +			set_page_private(page, migratetype);
>  		list = &page->lru;
>  	}

I guess if you can get rid of the first #ifdef I mentioned above, these two can be
removed as well, without causing any run-time overhead.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 5/8] mm: MIGRATE_CMA isolation functions added
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-05 11:45     ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:45 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit changes various functions that change pages and
> pageblocks migrate type between MIGRATE_ISOLATE and
> MIGRATE_MOVABLE in such a way as to allow to work with
> MIGRATE_CMA migrate type.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 5/8] mm: MIGRATE_CMA isolation functions added
@ 2011-07-05 11:45     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:45 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit changes various functions that change pages and
> pageblocks migrate type between MIGRATE_ISOLATE and
> MIGRATE_MOVABLE in such a way as to allow to work with
> MIGRATE_CMA migrate type.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 5/8] mm: MIGRATE_CMA isolation functions added
@ 2011-07-05 11:45     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> From: Michal Nazarewicz <m.nazarewicz@samsung.com>
> 
> This commit changes various functions that change pages and
> pageblocks migrate type between MIGRATE_ISOLATE and
> MIGRATE_MOVABLE in such a way as to allow to work with
> MIGRATE_CMA migrate type.
> 
> Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8 RESEND] drivers: add Contiguous Memory Allocator
  2011-07-05 11:02     ` Marek Szyprowski
  (?)
@ 2011-07-05 11:50       ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:50 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>, but I noticed two
one-character mistakes:

> +if CMA
> +
> +config CMA_DEBUG
> +	bool "CMA debug messages (DEVELOPEMENT)"

s/DEVELOPEMENT/DEVELOPMENT/

> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 4c5701c..be6aab4 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> +obj-$(CONFIG_CMA) += dma-contiguous.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o

Please add another tab to indent the line in the same way as the others.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8 RESEND] drivers: add Contiguous Memory Allocator
@ 2011-07-05 11:50       ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:50 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>, but I noticed two
one-character mistakes:

> +if CMA
> +
> +config CMA_DEBUG
> +	bool "CMA debug messages (DEVELOPEMENT)"

s/DEVELOPEMENT/DEVELOPMENT/

> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 4c5701c..be6aab4 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> +obj-$(CONFIG_CMA) += dma-contiguous.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o

Please add another tab to indent the line in the same way as the others.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8 RESEND] drivers: add Contiguous Memory Allocator
@ 2011-07-05 11:50       ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>, but I noticed two
one-character mistakes:

> +if CMA
> +
> +config CMA_DEBUG
> +	bool "CMA debug messages (DEVELOPEMENT)"

s/DEVELOPEMENT/DEVELOPMENT/

> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 4c5701c..be6aab4 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> +obj-$(CONFIG_CMA) += dma-contiguous.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o

Please add another tab to indent the line in the same way as the others.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 7/8] ARM: integrate CMA with dma-mapping subsystem
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-05 11:50     ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:50 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> This patch adds support for CMA to dma-mapping subsystem for ARM
> architecture. By default a global CMA area is used, but specific devices
> are allowed to have their private memory areas if required (they can be
> created with dma_declare_contiguous() function during board
> initialization).
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 7/8] ARM: integrate CMA with dma-mapping subsystem
@ 2011-07-05 11:50     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:50 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> This patch adds support for CMA to dma-mapping subsystem for ARM
> architecture. By default a global CMA area is used, but specific devices
> are allowed to have their private memory areas if required (they can be
> created with dma_declare_contiguous() function during board
> initialization).
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 7/8] ARM: integrate CMA with dma-mapping subsystem
@ 2011-07-05 11:50     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> This patch adds support for CMA to dma-mapping subsystem for ARM
> architecture. By default a global CMA area is used, but specific devices
> are allowed to have their private memory areas if required (they can be
> created with dma_declare_contiguous() function during board
> initialization).
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 8/8] ARM: S5PV210: example of CMA private area for FIMC device on Goni board
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-05 11:51     ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:51 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> 
> This patch is an example how device private CMA area can be activated.
> It creates one CMA region and assigns it to the first s5p-fimc device on
> Samsung Goni S5PC110 board.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 8/8] ARM: S5PV210: example of CMA private area for FIMC device on Goni board
@ 2011-07-05 11:51     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:51 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> 
> This patch is an example how device private CMA area can be activated.
> It creates one CMA region and assigns it to the first s5p-fimc device on
> Samsung Goni S5PC110 board.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 8/8] ARM: S5PV210: example of CMA private area for FIMC device on Goni board
@ 2011-07-05 11:51     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 11:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> 
> This patch is an example how device private CMA area can be activated.
> It creates one CMA region and assigns it to the first s5p-fimc device on
> Samsung Goni S5PC110 board.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCHv11 0/8] Contiguous Memory Allocator
  2011-07-05  7:41 ` Marek Szyprowski
  (?)
@ 2011-07-05 12:07   ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 12:07 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> This is yet another round of Contiguous Memory Allocator patches. I hope
> that I've managed to resolve all the items discussed during the Memory
> Management summit at Linaro Meeting in Budapest and pointed later on
> mailing lists. The goal is to integrate it as tight as possible with
> other kernel subsystems (like memory management and dma-mapping) and
> finally merge to mainline.

You have certainly addressed all of my concerns, this looks really good now!

Andrew, can you add this to your -mm tree? What's your opinion on the
current state, do you think this is ready for merging in 3.1 or would
you want to have more reviews from core memory management people?

My reviews were mostly on the driver and platform API side, and I think
we're fine there now, but I don't really understand the impacts this has
in mm.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-05 12:07   ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 12:07 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Andrew Morton,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> This is yet another round of Contiguous Memory Allocator patches. I hope
> that I've managed to resolve all the items discussed during the Memory
> Management summit at Linaro Meeting in Budapest and pointed later on
> mailing lists. The goal is to integrate it as tight as possible with
> other kernel subsystems (like memory management and dma-mapping) and
> finally merge to mainline.

You have certainly addressed all of my concerns, this looks really good now!

Andrew, can you add this to your -mm tree? What's your opinion on the
current state, do you think this is ready for merging in 3.1 or would
you want to have more reviews from core memory management people?

My reviews were mostly on the driver and platform API side, and I think
we're fine there now, but I don't really understand the impacts this has
in mm.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-05 12:07   ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Marek Szyprowski wrote:
> This is yet another round of Contiguous Memory Allocator patches. I hope
> that I've managed to resolve all the items discussed during the Memory
> Management summit at Linaro Meeting in Budapest and pointed later on
> mailing lists. The goal is to integrate it as tight as possible with
> other kernel subsystems (like memory management and dma-mapping) and
> finally merge to mainline.

You have certainly addressed all of my concerns, this looks really good now!

Andrew, can you add this to your -mm tree? What's your opinion on the
current state, do you think this is ready for merging in 3.1 or would
you want to have more reviews from core memory management people?

My reviews were mostly on the driver and platform API side, and I think
we're fine there now, but I don't really understand the impacts this has
in mm.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 4/8] mm: MIGRATE_CMA migration type added
  2011-07-05 11:44     ` Arnd Bergmann
  (?)
@ 2011-07-05 12:27       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 12:27 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, Ankita Garg, Daniel Walker, Jesse Barker,
	Mel Gorman, Chunsang Jeong, Jonathan Corbet, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	KAMEZAWA Hiroyuki, Andrew Morton, linux-arm-kernel, linux-media

On Tue, Jul 05, 2011 at 01:44:31PM +0200, Arnd Bergmann wrote:
> > @@ -198,6 +198,12 @@ config MIGRATION
> >  	  pages as migration can relocate pages to satisfy a huge page
> >  	  allocation instead of reclaiming.
> >  
> > +config CMA_MIGRATE_TYPE
> > +	bool
> > +	help
> > +	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
> > +	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
> > +
> >  config PHYS_ADDR_T_64BIT
> >  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
> 
> This is currently only selected on ARM with your patch set. 

That's because CMA is targeted at solving the "we need massive contiguous
DMA areas" problem on ARM SoCs.

And it does this without addressing the technical architecture problems
surrounding multiple aliasing mappings with differing attributes which
actually make it unsuitable for use on ARM.  This is not the first time
I've pointed that out, and I'm now at the point of basically ignoring
this CMA work because I'm tired of constantly pointing this out.

My silence on this subject must not be taken as placid acceptance of the
approach, but revulsion at seemingly being constantly ignored and having
these patches pushed time and time again with nothing really changing on
that issue.

It will be a sad day if these patches make their way into mainline without
that being addressed, and will show contempt for architecture maintainers
if it does.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 4/8] mm: MIGRATE_CMA migration type added
@ 2011-07-05 12:27       ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 12:27 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, Ankita Garg, Daniel Walker, Jesse Barker,
	Mel Gorman, Chunsang Jeong, Jonathan Corbet, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	KAMEZAWA Hiroyuki, Andrew Morton, linux-arm-kernel, linux-media

On Tue, Jul 05, 2011 at 01:44:31PM +0200, Arnd Bergmann wrote:
> > @@ -198,6 +198,12 @@ config MIGRATION
> >  	  pages as migration can relocate pages to satisfy a huge page
> >  	  allocation instead of reclaiming.
> >  
> > +config CMA_MIGRATE_TYPE
> > +	bool
> > +	help
> > +	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
> > +	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
> > +
> >  config PHYS_ADDR_T_64BIT
> >  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
> 
> This is currently only selected on ARM with your patch set. 

That's because CMA is targeted at solving the "we need massive contiguous
DMA areas" problem on ARM SoCs.

And it does this without addressing the technical architecture problems
surrounding multiple aliasing mappings with differing attributes which
actually make it unsuitable for use on ARM.  This is not the first time
I've pointed that out, and I'm now at the point of basically ignoring
this CMA work because I'm tired of constantly pointing this out.

My silence on this subject must not be taken as placid acceptance of the
approach, but revulsion at seemingly being constantly ignored and having
these patches pushed time and time again with nothing really changing on
that issue.

It will be a sad day if these patches make their way into mainline without
that being addressed, and will show contempt for architecture maintainers
if it does.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
@ 2011-07-05 12:27       ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 12:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 05, 2011 at 01:44:31PM +0200, Arnd Bergmann wrote:
> > @@ -198,6 +198,12 @@ config MIGRATION
> >  	  pages as migration can relocate pages to satisfy a huge page
> >  	  allocation instead of reclaiming.
> >  
> > +config CMA_MIGRATE_TYPE
> > +	bool
> > +	help
> > +	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
> > +	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
> > +
> >  config PHYS_ADDR_T_64BIT
> >  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
> 
> This is currently only selected on ARM with your patch set. 

That's because CMA is targeted at solving the "we need massive contiguous
DMA areas" problem on ARM SoCs.

And it does this without addressing the technical architecture problems
surrounding multiple aliasing mappings with differing attributes which
actually make it unsuitable for use on ARM.  This is not the first time
I've pointed that out, and I'm now at the point of basically ignoring
this CMA work because I'm tired of constantly pointing this out.

My silence on this subject must not be taken as placid acceptance of the
approach, but revulsion at seemingly being constantly ignored and having
these patches pushed time and time again with nothing really changing on
that issue.

It will be a sad day if these patches make their way into mainline without
that being addressed, and will show contempt for architecture maintainers
if it does.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-05 11:33     ` Russell King - ARM Linux
  (?)
@ 2011-07-05 12:27       ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 12:27 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Daniel Walker, Jonathan Corbet,
	Mel Gorman, Chunsang Jeong, Michal Nazarewicz, Jesse Barker,
	Kyungmin Park, Ankita Garg, Andrew Morton, KAMEZAWA Hiroyuki,
	Jesse Barker

On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > The Contiguous Memory Allocator is a set of helper functions for DMA
> > mapping framework that improves allocations of contiguous memory chunks.
> > 
> > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > gives back to the system. Kernel is allowed to allocate movable pages
> > within CMA's managed memory so that it can be used for example for page
> > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > request such pages are migrated out of CMA area to free required
> > contiguous block and fulfill the request. This allows to allocate large
> > contiguous chunks of memory at any time assuming that there is enough
> > free memory available in the system.
> > 
> > This code is heavily based on earlier works by Michal Nazarewicz.
> 
> And how are you addressing the technical concerns about aliasing of
> cache attributes which I keep bringing up with this and you keep
> ignoring and telling me that I'm standing in your way.

This is of course an important issue, and it's the one item listed as
TODO in the introductory mail that sent.

It's also a preexisting problem as far as I can tell, and it needs
to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
and __alloc_system_pages as introduced in patch 7.

We've discussed this back and forth, and it always comes down to
one of two ugly solutions:

1. Put all of the MIGRATE_CMA and pages into highmem and change
__alloc_system_pages so it also allocates only from highmem pages.
The consequences of this are that we always need to build kernels
with highmem enabled and that we have less lowmem on systems that
are already small, both of which can be fairly expensive unless
you have lots of highmem already.

2. Add logic to unmap pages from the linear mapping, which is
very expensive because it forces the use of small pages in the
linear mapping (or in parts of it), and possibly means walking
all page tables to remove the PTEs on alloc and put them back
in on free.

I believe that Chunsang Jeong from Linaro is planning to
implement both variants and post them for review, so we can
decide which one to merge, or even to merge both and make
it a configuration option. See also
https://blueprints.launchpad.net/linaro-mm-sig/+spec/engr-mm-dma-mapping-2011.07

I don't think we need to make merging the CMA patches depending on
the other patches, it's clear that both need to be solved, and
they are independent enough.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05 12:27       ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 12:27 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Daniel Walker, Jonathan Corbet,
	Mel Gorman, Chunsang Jeong, Michal Nazarewicz, Jesse Barker,
	Kyungmin Park, Ankita Garg, Andrew Morton, KAMEZAWA Hiroyuki

On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > The Contiguous Memory Allocator is a set of helper functions for DMA
> > mapping framework that improves allocations of contiguous memory chunks.
> > 
> > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > gives back to the system. Kernel is allowed to allocate movable pages
> > within CMA's managed memory so that it can be used for example for page
> > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > request such pages are migrated out of CMA area to free required
> > contiguous block and fulfill the request. This allows to allocate large
> > contiguous chunks of memory at any time assuming that there is enough
> > free memory available in the system.
> > 
> > This code is heavily based on earlier works by Michal Nazarewicz.
> 
> And how are you addressing the technical concerns about aliasing of
> cache attributes which I keep bringing up with this and you keep
> ignoring and telling me that I'm standing in your way.

This is of course an important issue, and it's the one item listed as
TODO in the introductory mail that sent.

It's also a preexisting problem as far as I can tell, and it needs
to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
and __alloc_system_pages as introduced in patch 7.

We've discussed this back and forth, and it always comes down to
one of two ugly solutions:

1. Put all of the MIGRATE_CMA and pages into highmem and change
__alloc_system_pages so it also allocates only from highmem pages.
The consequences of this are that we always need to build kernels
with highmem enabled and that we have less lowmem on systems that
are already small, both of which can be fairly expensive unless
you have lots of highmem already.

2. Add logic to unmap pages from the linear mapping, which is
very expensive because it forces the use of small pages in the
linear mapping (or in parts of it), and possibly means walking
all page tables to remove the PTEs on alloc and put them back
in on free.

I believe that Chunsang Jeong from Linaro is planning to
implement both variants and post them for review, so we can
decide which one to merge, or even to merge both and make
it a configuration option. See also
https://blueprints.launchpad.net/linaro-mm-sig/+spec/engr-mm-dma-mapping-2011.07

I don't think we need to make merging the CMA patches depending on
the other patches, it's clear that both need to be solved, and
they are independent enough.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05 12:27       ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 12:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > The Contiguous Memory Allocator is a set of helper functions for DMA
> > mapping framework that improves allocations of contiguous memory chunks.
> > 
> > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > gives back to the system. Kernel is allowed to allocate movable pages
> > within CMA's managed memory so that it can be used for example for page
> > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > request such pages are migrated out of CMA area to free required
> > contiguous block and fulfill the request. This allows to allocate large
> > contiguous chunks of memory at any time assuming that there is enough
> > free memory available in the system.
> > 
> > This code is heavily based on earlier works by Michal Nazarewicz.
> 
> And how are you addressing the technical concerns about aliasing of
> cache attributes which I keep bringing up with this and you keep
> ignoring and telling me that I'm standing in your way.

This is of course an important issue, and it's the one item listed as
TODO in the introductory mail that sent.

It's also a preexisting problem as far as I can tell, and it needs
to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
and __alloc_system_pages as introduced in patch 7.

We've discussed this back and forth, and it always comes down to
one of two ugly solutions:

1. Put all of the MIGRATE_CMA and pages into highmem and change
__alloc_system_pages so it also allocates only from highmem pages.
The consequences of this are that we always need to build kernels
with highmem enabled and that we have less lowmem on systems that
are already small, both of which can be fairly expensive unless
you have lots of highmem already.

2. Add logic to unmap pages from the linear mapping, which is
very expensive because it forces the use of small pages in the
linear mapping (or in parts of it), and possibly means walking
all page tables to remove the PTEs on alloc and put them back
in on free.

I believe that Chunsang Jeong from Linaro is planning to
implement both variants and post them for review, so we can
decide which one to merge, or even to merge both and make
it a configuration option. See also
https://blueprints.launchpad.net/linaro-mm-sig/+spec/engr-mm-dma-mapping-2011.07

I don't think we need to make merging the CMA patches depending on
the other patches, it's clear that both need to be solved, and
they are independent enough.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCHv11 0/8] Contiguous Memory Allocator
  2011-07-05 12:07   ` Arnd Bergmann
  (?)
@ 2011-07-05 12:28     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 12:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, Ankita Garg, Daniel Walker, Jesse Barker,
	Mel Gorman, Chunsang Jeong, Jonathan Corbet, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	KAMEZAWA Hiroyuki, Andrew Morton, linux-arm-kernel, linux-media

On Tue, Jul 05, 2011 at 02:07:17PM +0200, Arnd Bergmann wrote:
> On Tuesday 05 July 2011, Marek Szyprowski wrote:
> > This is yet another round of Contiguous Memory Allocator patches. I hope
> > that I've managed to resolve all the items discussed during the Memory
> > Management summit at Linaro Meeting in Budapest and pointed later on
> > mailing lists. The goal is to integrate it as tight as possible with
> > other kernel subsystems (like memory management and dma-mapping) and
> > finally merge to mainline.
> 
> You have certainly addressed all of my concerns, this looks really good now!
> 
> Andrew, can you add this to your -mm tree? What's your opinion on the
> current state, do you think this is ready for merging in 3.1 or would
> you want to have more reviews from core memory management people?

See my other mails.  It is not ready for mainline.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-05 12:28     ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 12:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, Ankita Garg, Daniel Walker, Jesse Barker,
	Mel Gorman, Chunsang Jeong, Jonathan Corbet, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	KAMEZAWA Hiroyuki, Andrew Morton, linux-arm-kernel, linux-media

On Tue, Jul 05, 2011 at 02:07:17PM +0200, Arnd Bergmann wrote:
> On Tuesday 05 July 2011, Marek Szyprowski wrote:
> > This is yet another round of Contiguous Memory Allocator patches. I hope
> > that I've managed to resolve all the items discussed during the Memory
> > Management summit at Linaro Meeting in Budapest and pointed later on
> > mailing lists. The goal is to integrate it as tight as possible with
> > other kernel subsystems (like memory management and dma-mapping) and
> > finally merge to mainline.
> 
> You have certainly addressed all of my concerns, this looks really good now!
> 
> Andrew, can you add this to your -mm tree? What's your opinion on the
> current state, do you think this is ready for merging in 3.1 or would
> you want to have more reviews from core memory management people?

See my other mails.  It is not ready for mainline.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-05 12:28     ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 12:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 05, 2011 at 02:07:17PM +0200, Arnd Bergmann wrote:
> On Tuesday 05 July 2011, Marek Szyprowski wrote:
> > This is yet another round of Contiguous Memory Allocator patches. I hope
> > that I've managed to resolve all the items discussed during the Memory
> > Management summit at Linaro Meeting in Budapest and pointed later on
> > mailing lists. The goal is to integrate it as tight as possible with
> > other kernel subsystems (like memory management and dma-mapping) and
> > finally merge to mainline.
> 
> You have certainly addressed all of my concerns, this looks really good now!
> 
> Andrew, can you add this to your -mm tree? What's your opinion on the
> current state, do you think this is ready for merging in 3.1 or would
> you want to have more reviews from core memory management people?

See my other mails.  It is not ready for mainline.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-05 12:27       ` Arnd Bergmann
  (?)
@ 2011-07-05 12:30         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 12:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Daniel Walker, Jonathan Corbet,
	Mel Gorman, Chunsang Jeong, Michal Nazarewicz, Jesse Barker,
	Kyungmin Park, Ankita Garg, Andrew Morton, KAMEZAWA Hiroyuki

On Tue, Jul 05, 2011 at 02:27:44PM +0200, Arnd Bergmann wrote:
> It's also a preexisting problem as far as I can tell, and it needs
> to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> and __alloc_system_pages as introduced in patch 7.

Which is now resolved in linux-next, and has been through this cycle
as previously discussed.

It's taken some time because the guy who tested the patch for me said
he'd review other platforms but never did, so I've just about given up
waiting and stuffed it in ready for the 3.1 merge window irrespective
of anything else.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05 12:30         ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 12:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Daniel Walker, Jonathan Corbet,
	Mel Gorman, Chunsang Jeong, Michal Nazarewicz, Jesse Barker,
	Kyungmin Park, Ankita Garg, Andrew Morton, KAMEZAWA Hiroyuki

On Tue, Jul 05, 2011 at 02:27:44PM +0200, Arnd Bergmann wrote:
> It's also a preexisting problem as far as I can tell, and it needs
> to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> and __alloc_system_pages as introduced in patch 7.

Which is now resolved in linux-next, and has been through this cycle
as previously discussed.

It's taken some time because the guy who tested the patch for me said
he'd review other platforms but never did, so I've just about given up
waiting and stuffed it in ready for the 3.1 merge window irrespective
of anything else.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05 12:30         ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-05 12:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 05, 2011 at 02:27:44PM +0200, Arnd Bergmann wrote:
> It's also a preexisting problem as far as I can tell, and it needs
> to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> and __alloc_system_pages as introduced in patch 7.

Which is now resolved in linux-next, and has been through this cycle
as previously discussed.

It's taken some time because the guy who tested the patch for me said
he'd review other platforms but never did, so I've just about given up
waiting and stuffed it in ready for the 3.1 merge window irrespective
of anything else.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-05 12:30         ` Russell King - ARM Linux
  (?)
@ 2011-07-05 13:58           ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 13:58 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Russell King - ARM Linux, Daniel Walker, Jonathan Corbet,
	Mel Gorman, Chunsang Jeong, Jesse Barker, KAMEZAWA Hiroyuki,
	linux-kernel, Michal Nazarewicz, linaro-mm-sig, linux-mm,
	Kyungmin Park, Ankita Garg, Andrew Morton, Marek Szyprowski,
	linux-media

On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> On Tue, Jul 05, 2011 at 02:27:44PM +0200, Arnd Bergmann wrote:
> > It's also a preexisting problem as far as I can tell, and it needs
> > to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> > and __alloc_system_pages as introduced in patch 7.
> 
> Which is now resolved in linux-next, and has been through this cycle
> as previously discussed.
> 
> It's taken some time because the guy who tested the patch for me said
> he'd review other platforms but never did, so I've just about given up
> waiting and stuffed it in ready for the 3.1 merge window irrespective
> of anything else.

Ah, sorry I missed that patch on the mailing list, found it now in
your for-next branch.

If I'm reading your "ARM: DMA: steal memory for DMA coherent mappings"
correctly, the idea is to have a per-platform compile-time amount
of memory that is reserved purely for coherent allocations and
taking out of the buddy allocator, right?

As you say, this solves the problem for the non-CMA case, and does
not apply to CMA because the entire point of CMA is not to remove
the pages from the buddy allocator in order to preserve memory.

So with your patch getting merged, patch 7/8 obviously has both a
conflict and introduces a regression against the fix you did.
Consequently that patch needs to be redone in a way that fits on
top of your patch and avoids the double-mapping problem.

What about the rest? As I mentioned in private, adding invasive features
to core code is obviously not nice if it can be avoided, but my feeling
is that we can no longer claim that there is no need for this with so
much hardware relying on large contiguous memory ranges for DMA.
The patches have come a long way since the first version, especially
regarding the device driver interface and I think they are about as
good as it gets in that regard.

I do understand that without patch 7, there isn't a single architecture
using the feature, which is somewhat silly, but I'm also convinced
that other architectures will start using it, and that a solution for the
double mapping in the ways I mentioned in my previous mail is going
to happen. Probably not in 3.1 then, but we could put the patches into
-mm anyway until we get there.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05 13:58           ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 13:58 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Russell King - ARM Linux, Daniel Walker, Jonathan Corbet,
	Mel Gorman, Chunsang Jeong, Jesse Barker, KAMEZAWA Hiroyuki,
	linux-kernel, Michal Nazarewicz, linaro-mm-sig, linux-mm,
	Kyungmin Park, Ankita Garg, Andrew Morton, Marek Szyprowski,
	linux-media

On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> On Tue, Jul 05, 2011 at 02:27:44PM +0200, Arnd Bergmann wrote:
> > It's also a preexisting problem as far as I can tell, and it needs
> > to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> > and __alloc_system_pages as introduced in patch 7.
> 
> Which is now resolved in linux-next, and has been through this cycle
> as previously discussed.
> 
> It's taken some time because the guy who tested the patch for me said
> he'd review other platforms but never did, so I've just about given up
> waiting and stuffed it in ready for the 3.1 merge window irrespective
> of anything else.

Ah, sorry I missed that patch on the mailing list, found it now in
your for-next branch.

If I'm reading your "ARM: DMA: steal memory for DMA coherent mappings"
correctly, the idea is to have a per-platform compile-time amount
of memory that is reserved purely for coherent allocations and
taking out of the buddy allocator, right?

As you say, this solves the problem for the non-CMA case, and does
not apply to CMA because the entire point of CMA is not to remove
the pages from the buddy allocator in order to preserve memory.

So with your patch getting merged, patch 7/8 obviously has both a
conflict and introduces a regression against the fix you did.
Consequently that patch needs to be redone in a way that fits on
top of your patch and avoids the double-mapping problem.

What about the rest? As I mentioned in private, adding invasive features
to core code is obviously not nice if it can be avoided, but my feeling
is that we can no longer claim that there is no need for this with so
much hardware relying on large contiguous memory ranges for DMA.
The patches have come a long way since the first version, especially
regarding the device driver interface and I think they are about as
good as it gets in that regard.

I do understand that without patch 7, there isn't a single architecture
using the feature, which is somewhat silly, but I'm also convinced
that other architectures will start using it, and that a solution for the
double mapping in the ways I mentioned in my previous mail is going
to happen. Probably not in 3.1 then, but we could put the patches into
-mm anyway until we get there.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-05 13:58           ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-05 13:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> On Tue, Jul 05, 2011 at 02:27:44PM +0200, Arnd Bergmann wrote:
> > It's also a preexisting problem as far as I can tell, and it needs
> > to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> > and __alloc_system_pages as introduced in patch 7.
> 
> Which is now resolved in linux-next, and has been through this cycle
> as previously discussed.
> 
> It's taken some time because the guy who tested the patch for me said
> he'd review other platforms but never did, so I've just about given up
> waiting and stuffed it in ready for the 3.1 merge window irrespective
> of anything else.

Ah, sorry I missed that patch on the mailing list, found it now in
your for-next branch.

If I'm reading your "ARM: DMA: steal memory for DMA coherent mappings"
correctly, the idea is to have a per-platform compile-time amount
of memory that is reserved purely for coherent allocations and
taking out of the buddy allocator, right?

As you say, this solves the problem for the non-CMA case, and does
not apply to CMA because the entire point of CMA is not to remove
the pages from the buddy allocator in order to preserve memory.

So with your patch getting merged, patch 7/8 obviously has both a
conflict and introduces a regression against the fix you did.
Consequently that patch needs to be redone in a way that fits on
top of your patch and avoids the double-mapping problem.

What about the rest? As I mentioned in private, adding invasive features
to core code is obviously not nice if it can be avoided, but my feeling
is that we can no longer claim that there is no need for this with so
much hardware relying on large contiguous memory ranges for DMA.
The patches have come a long way since the first version, especially
regarding the device driver interface and I think they are about as
good as it gets in that regard.

I do understand that without patch 7, there isn't a single architecture
using the feature, which is somewhat silly, but I'm also convinced
that other architectures will start using it, and that a solution for the
double mapping in the ways I mentioned in my previous mail is going
to happen. Probably not in 3.1 then, but we could put the patches into
-mm anyway until we get there.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-05 11:33     ` Russell King - ARM Linux
  (?)
@ 2011-07-06 13:58       ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-06 13:58 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Arnd Bergmann',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Michal Nazarewicz',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	Marek Szyprowski

Hello,

On Tuesday, July 05, 2011 1:34 PM Russell King - ARM Linux wrote:

> On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > The Contiguous Memory Allocator is a set of helper functions for DMA
> > mapping framework that improves allocations of contiguous memory chunks.
> >
> > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > gives back to the system. Kernel is allowed to allocate movable pages
> > within CMA's managed memory so that it can be used for example for page
> > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > request such pages are migrated out of CMA area to free required
> > contiguous block and fulfill the request. This allows to allocate large
> > contiguous chunks of memory at any time assuming that there is enough
> > free memory available in the system.
> >
> > This code is heavily based on earlier works by Michal Nazarewicz.
> 
> And how are you addressing the technical concerns about aliasing of
> cache attributes which I keep bringing up with this and you keep
> ignoring and telling me that I'm standing in your way.

I'm perfectly aware of the issues with aliasing of cache attributes.

My idea is to change low memory linear mapping for all CMA areas on boot
time to use 2 level page tables (4KiB mappings instead of super-section
mappings). This way the page properties for a single page in CMA area can
be changed/updated at any time to match required coherent/writecombine
attributes. Linear mapping can be even removed completely if we want to 
create the it elsewhere in the address space. 

The only problem that might need to be resolved is GFP_ATOMIC allocation
(updating page properties probably requires some locking), but it can be
served from a special area which is created on boot without low-memory
mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
for large buffers anyway.

CMA limits the memory area from which coherent pages are being taken quite
well, so the change in the linear mapping method should have no significant
impact on the system performance.

I didn't implement such solution yet, because it is really hard to handle
all issues at the same time and creating the allocator was just a first
step.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 13:58       ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-06 13:58 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Arnd Bergmann',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Michal Nazarewicz',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki',
	Marek Szyprowski

Hello,

On Tuesday, July 05, 2011 1:34 PM Russell King - ARM Linux wrote:

> On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > The Contiguous Memory Allocator is a set of helper functions for DMA
> > mapping framework that improves allocations of contiguous memory chunks.
> >
> > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > gives back to the system. Kernel is allowed to allocate movable pages
> > within CMA's managed memory so that it can be used for example for page
> > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > request such pages are migrated out of CMA area to free required
> > contiguous block and fulfill the request. This allows to allocate large
> > contiguous chunks of memory at any time assuming that there is enough
> > free memory available in the system.
> >
> > This code is heavily based on earlier works by Michal Nazarewicz.
> 
> And how are you addressing the technical concerns about aliasing of
> cache attributes which I keep bringing up with this and you keep
> ignoring and telling me that I'm standing in your way.

I'm perfectly aware of the issues with aliasing of cache attributes.

My idea is to change low memory linear mapping for all CMA areas on boot
time to use 2 level page tables (4KiB mappings instead of super-section
mappings). This way the page properties for a single page in CMA area can
be changed/updated at any time to match required coherent/writecombine
attributes. Linear mapping can be even removed completely if we want to 
create the it elsewhere in the address space. 

The only problem that might need to be resolved is GFP_ATOMIC allocation
(updating page properties probably requires some locking), but it can be
served from a special area which is created on boot without low-memory
mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
for large buffers anyway.

CMA limits the memory area from which coherent pages are being taken quite
well, so the change in the linear mapping method should have no significant
impact on the system performance.

I didn't implement such solution yet, because it is really hard to handle
all issues at the same time and creating the allocator was just a first
step.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 13:58       ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-06 13:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Tuesday, July 05, 2011 1:34 PM Russell King - ARM Linux wrote:

> On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > The Contiguous Memory Allocator is a set of helper functions for DMA
> > mapping framework that improves allocations of contiguous memory chunks.
> >
> > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > gives back to the system. Kernel is allowed to allocate movable pages
> > within CMA's managed memory so that it can be used for example for page
> > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > request such pages are migrated out of CMA area to free required
> > contiguous block and fulfill the request. This allows to allocate large
> > contiguous chunks of memory at any time assuming that there is enough
> > free memory available in the system.
> >
> > This code is heavily based on earlier works by Michal Nazarewicz.
> 
> And how are you addressing the technical concerns about aliasing of
> cache attributes which I keep bringing up with this and you keep
> ignoring and telling me that I'm standing in your way.

I'm perfectly aware of the issues with aliasing of cache attributes.

My idea is to change low memory linear mapping for all CMA areas on boot
time to use 2 level page tables (4KiB mappings instead of super-section
mappings). This way the page properties for a single page in CMA area can
be changed/updated at any time to match required coherent/writecombine
attributes. Linear mapping can be even removed completely if we want to 
create the it elsewhere in the address space. 

The only problem that might need to be resolved is GFP_ATOMIC allocation
(updating page properties probably requires some locking), but it can be
served from a special area which is created on boot without low-memory
mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
for large buffers anyway.

CMA limits the memory area from which coherent pages are being taken quite
well, so the change in the linear mapping method should have no significant
impact on the system performance.

I didn't implement such solution yet, because it is really hard to handle
all issues at the same time and creating the allocator was just a first
step.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 13:58       ` Marek Szyprowski
  (?)
@ 2011-07-06 14:09         ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 14:09 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Russell King - ARM Linux',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'Ankita Garg',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

On Wednesday 06 July 2011, Marek Szyprowski wrote:
> The only problem that might need to be resolved is GFP_ATOMIC allocation
> (updating page properties probably requires some locking), but it can be
> served from a special area which is created on boot without low-memory
> mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
> for large buffers anyway.

Would it be easier to start with a version that only allocated from memory
without a low-memory mapping at first?

This would be similar to the approach that Russell's fix for the regular
dma_alloc_coherent has taken, except that you need to also allow the memory
to be used as highmem user pages.

Maybe you can simply adapt the default location of the contiguous memory
are like this:
- make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
- if ZONE_HIGHMEM exist during boot, put the CMA area in there
- otherwise, put the CMA area at the top end of lowmem, and change
  the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:09         ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 14:09 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Russell King - ARM Linux',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'Ankita Garg',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

On Wednesday 06 July 2011, Marek Szyprowski wrote:
> The only problem that might need to be resolved is GFP_ATOMIC allocation
> (updating page properties probably requires some locking), but it can be
> served from a special area which is created on boot without low-memory
> mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
> for large buffers anyway.

Would it be easier to start with a version that only allocated from memory
without a low-memory mapping at first?

This would be similar to the approach that Russell's fix for the regular
dma_alloc_coherent has taken, except that you need to also allow the memory
to be used as highmem user pages.

Maybe you can simply adapt the default location of the contiguous memory
are like this:
- make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
- if ZONE_HIGHMEM exist during boot, put the CMA area in there
- otherwise, put the CMA area at the top end of lowmem, and change
  the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:09         ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 14:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 06 July 2011, Marek Szyprowski wrote:
> The only problem that might need to be resolved is GFP_ATOMIC allocation
> (updating page properties probably requires some locking), but it can be
> served from a special area which is created on boot without low-memory
> mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
> for large buffers anyway.

Would it be easier to start with a version that only allocated from memory
without a low-memory mapping at first?

This would be similar to the approach that Russell's fix for the regular
dma_alloc_coherent has taken, except that you need to also allow the memory
to be used as highmem user pages.

Maybe you can simply adapt the default location of the contiguous memory
are like this:
- make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
- if ZONE_HIGHMEM exist during boot, put the CMA area in there
- otherwise, put the CMA area at the top end of lowmem, and change
  the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 14:09         ` Arnd Bergmann
  (?)
@ 2011-07-06 14:23           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 14:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Michal Nazarewicz',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki'

On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> Maybe you can simply adapt the default location of the contiguous memory
> are like this:
> - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> - otherwise, put the CMA area at the top end of lowmem, and change
>   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

One of the requirements of the allocator is that the returned memory
should be zero'd (because it can be exposed to userspace via ALSA
and frame buffers.)

Zeroing the memory from all the contexts which dma_alloc_coherent
is called from is a trivial matter if its in lowmem, but highmem is
harder.

Another issue is that when a platform has restricted DMA regions,
they typically don't fall into the highmem zone.  As the dmabounce
code allocates from the DMA coherent allocator to provide it with
guaranteed DMA-able memory, that would be rather inconvenient.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:23           ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 14:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Michal Nazarewicz',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	'KAMEZAWA Hiroyuki'

On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> Maybe you can simply adapt the default location of the contiguous memory
> are like this:
> - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> - otherwise, put the CMA area at the top end of lowmem, and change
>   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

One of the requirements of the allocator is that the returned memory
should be zero'd (because it can be exposed to userspace via ALSA
and frame buffers.)

Zeroing the memory from all the contexts which dma_alloc_coherent
is called from is a trivial matter if its in lowmem, but highmem is
harder.

Another issue is that when a platform has restricted DMA regions,
they typically don't fall into the highmem zone.  As the dmabounce
code allocates from the DMA coherent allocator to provide it with
guaranteed DMA-able memory, that would be rather inconvenient.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:23           ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 14:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> Maybe you can simply adapt the default location of the contiguous memory
> are like this:
> - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> - otherwise, put the CMA area at the top end of lowmem, and change
>   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

One of the requirements of the allocator is that the returned memory
should be zero'd (because it can be exposed to userspace via ALSA
and frame buffers.)

Zeroing the memory from all the contexts which dma_alloc_coherent
is called from is a trivial matter if its in lowmem, but highmem is
harder.

Another issue is that when a platform has restricted DMA regions,
they typically don't fall into the highmem zone.  As the dmabounce
code allocates from the DMA coherent allocator to provide it with
guaranteed DMA-able memory, that would be rather inconvenient.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 14:23           ` Russell King - ARM Linux
  (?)
@ 2011-07-06 14:37             ` Nicolas Pitre
  -1 siblings, 0 replies; 183+ messages in thread
From: Nicolas Pitre @ 2011-07-06 14:37 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Jesse Barker', 'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	linux-arm-kernel, linux-media

On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> Another issue is that when a platform has restricted DMA regions,
> they typically don't fall into the highmem zone.  As the dmabounce
> code allocates from the DMA coherent allocator to provide it with
> guaranteed DMA-able memory, that would be rather inconvenient.

Do we encounter this in practice i.e. do those platforms requiring large 
contiguous allocations motivating this work have such DMA restrictions?


Nicolas

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:37             ` Nicolas Pitre
  0 siblings, 0 replies; 183+ messages in thread
From: Nicolas Pitre @ 2011-07-06 14:37 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Jesse Barker', 'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	linux-arm-kernel, linux-media

On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> Another issue is that when a platform has restricted DMA regions,
> they typically don't fall into the highmem zone.  As the dmabounce
> code allocates from the DMA coherent allocator to provide it with
> guaranteed DMA-able memory, that would be rather inconvenient.

Do we encounter this in practice i.e. do those platforms requiring large 
contiguous allocations motivating this work have such DMA restrictions?


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:37             ` Nicolas Pitre
  0 siblings, 0 replies; 183+ messages in thread
From: Nicolas Pitre @ 2011-07-06 14:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> Another issue is that when a platform has restricted DMA regions,
> they typically don't fall into the highmem zone.  As the dmabounce
> code allocates from the DMA coherent allocator to provide it with
> guaranteed DMA-able memory, that would be rather inconvenient.

Do we encounter this in practice i.e. do those platforms requiring large 
contiguous allocations motivating this work have such DMA restrictions?


Nicolas

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 14:23           ` Russell King - ARM Linux
  (?)
@ 2011-07-06 14:51             ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 14:51 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Russell King - ARM Linux, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> > Maybe you can simply adapt the default location of the contiguous memory
> > are like this:
> > - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> > - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> > - otherwise, put the CMA area at the top end of lowmem, and change
> >   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.
> 
> One of the requirements of the allocator is that the returned memory
> should be zero'd (because it can be exposed to userspace via ALSA
> and frame buffers.)
> 
> Zeroing the memory from all the contexts which dma_alloc_coherent
> is called from is a trivial matter if its in lowmem, but highmem is
> harder.

I don't see how. The pages get allocated from an unmapped area
or memory, mapped into the kernel address space as uncached or wc
and then cleared. This should be the same for lowmem or highmem
pages.

What am I missing?

> Another issue is that when a platform has restricted DMA regions,
> they typically don't fall into the highmem zone.  As the dmabounce
> code allocates from the DMA coherent allocator to provide it with
> guaranteed DMA-able memory, that would be rather inconvenient.

True. The dmabounce code would consequently have to allocate
the memory through an internal function that avoids the
contiguous allocation area and goes straight to ZONE_DMA memory
as it does today.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:51             ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 14:51 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Russell King - ARM Linux, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> > Maybe you can simply adapt the default location of the contiguous memory
> > are like this:
> > - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> > - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> > - otherwise, put the CMA area at the top end of lowmem, and change
> >   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.
> 
> One of the requirements of the allocator is that the returned memory
> should be zero'd (because it can be exposed to userspace via ALSA
> and frame buffers.)
> 
> Zeroing the memory from all the contexts which dma_alloc_coherent
> is called from is a trivial matter if its in lowmem, but highmem is
> harder.

I don't see how. The pages get allocated from an unmapped area
or memory, mapped into the kernel address space as uncached or wc
and then cleared. This should be the same for lowmem or highmem
pages.

What am I missing?

> Another issue is that when a platform has restricted DMA regions,
> they typically don't fall into the highmem zone.  As the dmabounce
> code allocates from the DMA coherent allocator to provide it with
> guaranteed DMA-able memory, that would be rather inconvenient.

True. The dmabounce code would consequently have to allocate
the memory through an internal function that avoids the
contiguous allocation area and goes straight to ZONE_DMA memory
as it does today.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:51             ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 14:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> > Maybe you can simply adapt the default location of the contiguous memory
> > are like this:
> > - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> > - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> > - otherwise, put the CMA area at the top end of lowmem, and change
> >   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.
> 
> One of the requirements of the allocator is that the returned memory
> should be zero'd (because it can be exposed to userspace via ALSA
> and frame buffers.)
> 
> Zeroing the memory from all the contexts which dma_alloc_coherent
> is called from is a trivial matter if its in lowmem, but highmem is
> harder.

I don't see how. The pages get allocated from an unmapped area
or memory, mapped into the kernel address space as uncached or wc
and then cleared. This should be the same for lowmem or highmem
pages.

What am I missing?

> Another issue is that when a platform has restricted DMA regions,
> they typically don't fall into the highmem zone.  As the dmabounce
> code allocates from the DMA coherent allocator to provide it with
> guaranteed DMA-able memory, that would be rather inconvenient.

True. The dmabounce code would consequently have to allocate
the memory through an internal function that avoids the
contiguous allocation area and goes straight to ZONE_DMA memory
as it does today.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 14:09         ` Arnd Bergmann
  (?)
@ 2011-07-06 14:56           ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-06 14:56 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'Ankita Garg',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

Hello,

On Wednesday, July 06, 2011 4:09 PM Arnd Bergmann wrote:

> On Wednesday 06 July 2011, Marek Szyprowski wrote:
> > The only problem that might need to be resolved is GFP_ATOMIC allocation
> > (updating page properties probably requires some locking), but it can be
> > served from a special area which is created on boot without low-memory
> > mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
> > for large buffers anyway.
> 
> Would it be easier to start with a version that only allocated from memory
> without a low-memory mapping at first?
>
> This would be similar to the approach that Russell's fix for the regular
> dma_alloc_coherent has taken, except that you need to also allow the memory
> to be used as highmem user pages.
> 
> Maybe you can simply adapt the default location of the contiguous memory
> are like this:
> - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> - otherwise, put the CMA area at the top end of lowmem, and change
>   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

This will not solve our problems. We need CMA also to create at least one
device private area that for sure will be in low memory (video codec).

I will rewrite ARM dma-mapping & CMA integration patch basing on the latest 
ARM for-next patches and add proof-of-concept of the solution presented in my
previous mail (2-level page tables and unmapping pages from low-mem).

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center





^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:56           ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-06 14:56 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'Ankita Garg',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

Hello,

On Wednesday, July 06, 2011 4:09 PM Arnd Bergmann wrote:

> On Wednesday 06 July 2011, Marek Szyprowski wrote:
> > The only problem that might need to be resolved is GFP_ATOMIC allocation
> > (updating page properties probably requires some locking), but it can be
> > served from a special area which is created on boot without low-memory
> > mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
> > for large buffers anyway.
> 
> Would it be easier to start with a version that only allocated from memory
> without a low-memory mapping at first?
>
> This would be similar to the approach that Russell's fix for the regular
> dma_alloc_coherent has taken, except that you need to also allow the memory
> to be used as highmem user pages.
> 
> Maybe you can simply adapt the default location of the contiguous memory
> are like this:
> - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> - otherwise, put the CMA area at the top end of lowmem, and change
>   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

This will not solve our problems. We need CMA also to create at least one
device private area that for sure will be in low memory (video codec).

I will rewrite ARM dma-mapping & CMA integration patch basing on the latest 
ARM for-next patches and add proof-of-concept of the solution presented in my
previous mail (2-level page tables and unmapping pages from low-mem).

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:56           ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-06 14:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Wednesday, July 06, 2011 4:09 PM Arnd Bergmann wrote:

> On Wednesday 06 July 2011, Marek Szyprowski wrote:
> > The only problem that might need to be resolved is GFP_ATOMIC allocation
> > (updating page properties probably requires some locking), but it can be
> > served from a special area which is created on boot without low-memory
> > mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
> > for large buffers anyway.
> 
> Would it be easier to start with a version that only allocated from memory
> without a low-memory mapping at first?
>
> This would be similar to the approach that Russell's fix for the regular
> dma_alloc_coherent has taken, except that you need to also allow the memory
> to be used as highmem user pages.
> 
> Maybe you can simply adapt the default location of the contiguous memory
> are like this:
> - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> - otherwise, put the CMA area at the top end of lowmem, and change
>   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

This will not solve our problems. We need CMA also to create at least one
device private area that for sure will be in low memory (video codec).

I will rewrite ARM dma-mapping & CMA integration patch basing on the latest 
ARM for-next patches and add proof-of-concept of the solution presented in my
previous mail (2-level page tables and unmapping pages from low-mem).

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 14:37             ` Nicolas Pitre
  (?)
@ 2011-07-06 14:59               ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 14:59 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nicolas Pitre, Russell King - ARM Linux, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, 'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	linux-mm, 'KAMEZAWA Hiroyuki',
	linux-media

On Wednesday 06 July 2011, Nicolas Pitre wrote:
> On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> 
> > Another issue is that when a platform has restricted DMA regions,
> > they typically don't fall into the highmem zone.  As the dmabounce
> > code allocates from the DMA coherent allocator to provide it with
> > guaranteed DMA-able memory, that would be rather inconvenient.
> 
> Do we encounter this in practice i.e. do those platforms requiring large 
> contiguous allocations motivating this work have such DMA restrictions?

You can probably find one or two of those, but we don't have to optimize
for that case. I would at least expect the maximum size of the allocation
to be smaller than the DMA limit for these, and consequently mandate that
they define a sufficiently large CONSISTENT_DMA_SIZE for the crazy devices,
or possibly add a hack to unmap some low memory and call
dma_declare_coherent_memory() for the device.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:59               ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 14:59 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nicolas Pitre, Russell King - ARM Linux, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, 'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	linux-mm, 'KAMEZAWA Hiroyuki',
	linux-media

On Wednesday 06 July 2011, Nicolas Pitre wrote:
> On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> 
> > Another issue is that when a platform has restricted DMA regions,
> > they typically don't fall into the highmem zone.  As the dmabounce
> > code allocates from the DMA coherent allocator to provide it with
> > guaranteed DMA-able memory, that would be rather inconvenient.
> 
> Do we encounter this in practice i.e. do those platforms requiring large 
> contiguous allocations motivating this work have such DMA restrictions?

You can probably find one or two of those, but we don't have to optimize
for that case. I would at least expect the maximum size of the allocation
to be smaller than the DMA limit for these, and consequently mandate that
they define a sufficiently large CONSISTENT_DMA_SIZE for the crazy devices,
or possibly add a hack to unmap some low memory and call
dma_declare_coherent_memory() for the device.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 14:59               ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 14:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 06 July 2011, Nicolas Pitre wrote:
> On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> 
> > Another issue is that when a platform has restricted DMA regions,
> > they typically don't fall into the highmem zone.  As the dmabounce
> > code allocates from the DMA coherent allocator to provide it with
> > guaranteed DMA-able memory, that would be rather inconvenient.
> 
> Do we encounter this in practice i.e. do those platforms requiring large 
> contiguous allocations motivating this work have such DMA restrictions?

You can probably find one or two of those, but we don't have to optimize
for that case. I would at least expect the maximum size of the allocation
to be smaller than the DMA limit for these, and consequently mandate that
they define a sufficiently large CONSISTENT_DMA_SIZE for the crazy devices,
or possibly add a hack to unmap some low memory and call
dma_declare_coherent_memory() for the device.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 14:56           ` Marek Szyprowski
  (?)
@ 2011-07-06 15:37             ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 15:37 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Arnd Bergmann',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'Ankita Garg',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

On Wed, Jul 06, 2011 at 04:56:23PM +0200, Marek Szyprowski wrote:
> This will not solve our problems. We need CMA also to create at least one
> device private area that for sure will be in low memory (video codec).

You make these statements but you don't say why.  Can you please
explain why the video codec needs low memory - does it have a
restricted number of memory address bits which it can manipulate?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 15:37             ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 15:37 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Arnd Bergmann',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'Ankita Garg',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

On Wed, Jul 06, 2011 at 04:56:23PM +0200, Marek Szyprowski wrote:
> This will not solve our problems. We need CMA also to create at least one
> device private area that for sure will be in low memory (video codec).

You make these statements but you don't say why.  Can you please
explain why the video codec needs low memory - does it have a
restricted number of memory address bits which it can manipulate?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 15:37             ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 06, 2011 at 04:56:23PM +0200, Marek Szyprowski wrote:
> This will not solve our problems. We need CMA also to create at least one
> device private area that for sure will be in low memory (video codec).

You make these statements but you don't say why.  Can you please
explain why the video codec needs low memory - does it have a
restricted number of memory address bits which it can manipulate?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 15:37             ` Russell King - ARM Linux
  (?)
@ 2011-07-06 15:47               ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-06 15:47 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: 'Arnd Bergmann',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'Ankita Garg',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

Hello,

On Wednesday, July 06, 2011 5:37 PM Russell King - ARM Linux wrote:

> On Wed, Jul 06, 2011 at 04:56:23PM +0200, Marek Szyprowski wrote:
> > This will not solve our problems. We need CMA also to create at least one
> > device private area that for sure will be in low memory (video codec).
> 
> You make these statements but you don't say why.  Can you please
> explain why the video codec needs low memory - does it have a
> restricted number of memory address bits which it can manipulate?

Nope, it only needs to put some type of memory buffers in first bank 
(effectively in 30000000-34ffffff area) and the others in the second bank
(40000000-57ffffff area). The values are given for Samsung GONI board.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 15:47               ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-06 15:47 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: 'Arnd Bergmann',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'Ankita Garg',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

Hello,

On Wednesday, July 06, 2011 5:37 PM Russell King - ARM Linux wrote:

> On Wed, Jul 06, 2011 at 04:56:23PM +0200, Marek Szyprowski wrote:
> > This will not solve our problems. We need CMA also to create at least one
> > device private area that for sure will be in low memory (video codec).
> 
> You make these statements but you don't say why.  Can you please
> explain why the video codec needs low memory - does it have a
> restricted number of memory address bits which it can manipulate?

Nope, it only needs to put some type of memory buffers in first bank 
(effectively in 30000000-34ffffff area) and the others in the second bank
(40000000-57ffffff area). The values are given for Samsung GONI board.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 15:47               ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-06 15:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Wednesday, July 06, 2011 5:37 PM Russell King - ARM Linux wrote:

> On Wed, Jul 06, 2011 at 04:56:23PM +0200, Marek Szyprowski wrote:
> > This will not solve our problems. We need CMA also to create at least one
> > device private area that for sure will be in low memory (video codec).
> 
> You make these statements but you don't say why.  Can you please
> explain why the video codec needs low memory - does it have a
> restricted number of memory address bits which it can manipulate?

Nope, it only needs to put some type of memory buffers in first bank 
(effectively in 30000000-34ffffff area) and the others in the second bank
(40000000-57ffffff area). The values are given for Samsung GONI board.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 14:51             ` Arnd Bergmann
  (?)
@ 2011-07-06 15:48               ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 15:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> > > Maybe you can simply adapt the default location of the contiguous memory
> > > are like this:
> > > - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> > > - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> > > - otherwise, put the CMA area at the top end of lowmem, and change
> > >   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.
> > 
> > One of the requirements of the allocator is that the returned memory
> > should be zero'd (because it can be exposed to userspace via ALSA
> > and frame buffers.)
> > 
> > Zeroing the memory from all the contexts which dma_alloc_coherent
> > is called from is a trivial matter if its in lowmem, but highmem is
> > harder.
> 
> I don't see how. The pages get allocated from an unmapped area
> or memory, mapped into the kernel address space as uncached or wc
> and then cleared. This should be the same for lowmem or highmem
> pages.

You don't want to clear them via their uncached or WC mapping, but via
their cached mapping _before_ they get their alternative mapping, and
flush any cached out of that mapping - both L1 and L2 caches.

For lowmem pages, that's easy.  For highmem pages, they need to be
individually kmap'd to zero them etc.  (alloc_pages() warns on
GFP_HIGHMEM + GFP_ZERO from atomic contexts - and dma_alloc_coherent
must be callable from such contexts.)

That may be easier now that we don't have the explicit indicies for
kmap_atomics, but at that time it wasn't easily possible.

> > Another issue is that when a platform has restricted DMA regions,
> > they typically don't fall into the highmem zone.  As the dmabounce
> > code allocates from the DMA coherent allocator to provide it with
> > guaranteed DMA-able memory, that would be rather inconvenient.
> 
> True. The dmabounce code would consequently have to allocate
> the memory through an internal function that avoids the
> contiguous allocation area and goes straight to ZONE_DMA memory
> as it does today.

CMA's whole purpose for existing is to provide _dma-able_ contiguous
memory for things like cameras and such like found on crippled non-
scatter-gather hardware.  If that memory is not DMA-able what's the
point?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 15:48               ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 15:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> > > Maybe you can simply adapt the default location of the contiguous memory
> > > are like this:
> > > - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> > > - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> > > - otherwise, put the CMA area at the top end of lowmem, and change
> > >   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.
> > 
> > One of the requirements of the allocator is that the returned memory
> > should be zero'd (because it can be exposed to userspace via ALSA
> > and frame buffers.)
> > 
> > Zeroing the memory from all the contexts which dma_alloc_coherent
> > is called from is a trivial matter if its in lowmem, but highmem is
> > harder.
> 
> I don't see how. The pages get allocated from an unmapped area
> or memory, mapped into the kernel address space as uncached or wc
> and then cleared. This should be the same for lowmem or highmem
> pages.

You don't want to clear them via their uncached or WC mapping, but via
their cached mapping _before_ they get their alternative mapping, and
flush any cached out of that mapping - both L1 and L2 caches.

For lowmem pages, that's easy.  For highmem pages, they need to be
individually kmap'd to zero them etc.  (alloc_pages() warns on
GFP_HIGHMEM + GFP_ZERO from atomic contexts - and dma_alloc_coherent
must be callable from such contexts.)

That may be easier now that we don't have the explicit indicies for
kmap_atomics, but at that time it wasn't easily possible.

> > Another issue is that when a platform has restricted DMA regions,
> > they typically don't fall into the highmem zone.  As the dmabounce
> > code allocates from the DMA coherent allocator to provide it with
> > guaranteed DMA-able memory, that would be rather inconvenient.
> 
> True. The dmabounce code would consequently have to allocate
> the memory through an internal function that avoids the
> contiguous allocation area and goes straight to ZONE_DMA memory
> as it does today.

CMA's whole purpose for existing is to provide _dma-able_ contiguous
memory for things like cameras and such like found on crippled non-
scatter-gather hardware.  If that memory is not DMA-able what's the
point?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 15:48               ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 15:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> > > Maybe you can simply adapt the default location of the contiguous memory
> > > are like this:
> > > - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> > > - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> > > - otherwise, put the CMA area at the top end of lowmem, and change
> > >   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.
> > 
> > One of the requirements of the allocator is that the returned memory
> > should be zero'd (because it can be exposed to userspace via ALSA
> > and frame buffers.)
> > 
> > Zeroing the memory from all the contexts which dma_alloc_coherent
> > is called from is a trivial matter if its in lowmem, but highmem is
> > harder.
> 
> I don't see how. The pages get allocated from an unmapped area
> or memory, mapped into the kernel address space as uncached or wc
> and then cleared. This should be the same for lowmem or highmem
> pages.

You don't want to clear them via their uncached or WC mapping, but via
their cached mapping _before_ they get their alternative mapping, and
flush any cached out of that mapping - both L1 and L2 caches.

For lowmem pages, that's easy.  For highmem pages, they need to be
individually kmap'd to zero them etc.  (alloc_pages() warns on
GFP_HIGHMEM + GFP_ZERO from atomic contexts - and dma_alloc_coherent
must be callable from such contexts.)

That may be easier now that we don't have the explicit indicies for
kmap_atomics, but at that time it wasn't easily possible.

> > Another issue is that when a platform has restricted DMA regions,
> > they typically don't fall into the highmem zone.  As the dmabounce
> > code allocates from the DMA coherent allocator to provide it with
> > guaranteed DMA-able memory, that would be rather inconvenient.
> 
> True. The dmabounce code would consequently have to allocate
> the memory through an internal function that avoids the
> contiguous allocation area and goes straight to ZONE_DMA memory
> as it does today.

CMA's whole purpose for existing is to provide _dma-able_ contiguous
memory for things like cameras and such like found on crippled non-
scatter-gather hardware.  If that memory is not DMA-able what's the
point?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 15:48               ` Russell King - ARM Linux
  (?)
@ 2011-07-06 16:05                 ` Christoph Lameter
  -1 siblings, 0 replies; 183+ messages in thread
From: Christoph Lameter @ 2011-07-06 16:05 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, linux-arm-kernel, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> > > they typically don't fall into the highmem zone.  As the dmabounce
> > > code allocates from the DMA coherent allocator to provide it with
> > > guaranteed DMA-able memory, that would be rather inconvenient.
> >
> > True. The dmabounce code would consequently have to allocate
> > the memory through an internal function that avoids the
> > contiguous allocation area and goes straight to ZONE_DMA memory
> > as it does today.
>
> CMA's whole purpose for existing is to provide _dma-able_ contiguous
> memory for things like cameras and such like found on crippled non-
> scatter-gather hardware.  If that memory is not DMA-able what's the
> point?

ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
into all of memory (and so is ZONE_DMA32). Memory from ZONE_NORMAL can be
used for DMA as well and a fully capable device would be expected to
handle any memory in the system for DMA transfers.

"guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
you can call ZONE_DMA memory to be guaranteed if you guarantee that any
device must at mininum be able to perform DMA into ZONE_DMA memory. But
there may not be much of that memory around so you would want to limit
the use of that scarce resource.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 16:05                 ` Christoph Lameter
  0 siblings, 0 replies; 183+ messages in thread
From: Christoph Lameter @ 2011-07-06 16:05 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, linux-arm-kernel, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> > > they typically don't fall into the highmem zone.  As the dmabounce
> > > code allocates from the DMA coherent allocator to provide it with
> > > guaranteed DMA-able memory, that would be rather inconvenient.
> >
> > True. The dmabounce code would consequently have to allocate
> > the memory through an internal function that avoids the
> > contiguous allocation area and goes straight to ZONE_DMA memory
> > as it does today.
>
> CMA's whole purpose for existing is to provide _dma-able_ contiguous
> memory for things like cameras and such like found on crippled non-
> scatter-gather hardware.  If that memory is not DMA-able what's the
> point?

ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
into all of memory (and so is ZONE_DMA32). Memory from ZONE_NORMAL can be
used for DMA as well and a fully capable device would be expected to
handle any memory in the system for DMA transfers.

"guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
you can call ZONE_DMA memory to be guaranteed if you guarantee that any
device must at mininum be able to perform DMA into ZONE_DMA memory. But
there may not be much of that memory around so you would want to limit
the use of that scarce resource.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 16:05                 ` Christoph Lameter
  0 siblings, 0 replies; 183+ messages in thread
From: Christoph Lameter @ 2011-07-06 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> > > they typically don't fall into the highmem zone.  As the dmabounce
> > > code allocates from the DMA coherent allocator to provide it with
> > > guaranteed DMA-able memory, that would be rather inconvenient.
> >
> > True. The dmabounce code would consequently have to allocate
> > the memory through an internal function that avoids the
> > contiguous allocation area and goes straight to ZONE_DMA memory
> > as it does today.
>
> CMA's whole purpose for existing is to provide _dma-able_ contiguous
> memory for things like cameras and such like found on crippled non-
> scatter-gather hardware.  If that memory is not DMA-able what's the
> point?

ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
into all of memory (and so is ZONE_DMA32). Memory from ZONE_NORMAL can be
used for DMA as well and a fully capable device would be expected to
handle any memory in the system for DMA transfers.

"guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
you can call ZONE_DMA memory to be guaranteed if you guarantee that any
device must at mininum be able to perform DMA into ZONE_DMA memory. But
there may not be much of that memory around so you would want to limit
the use of that scarce resource.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 16:05                 ` Christoph Lameter
  (?)
@ 2011-07-06 16:09                   ` Michal Nazarewicz
  -1 siblings, 0 replies; 183+ messages in thread
From: Michal Nazarewicz @ 2011-07-06 16:09 UTC (permalink / raw)
  To: Russell King - ARM Linux, Christoph Lameter
  Cc: Arnd Bergmann, linux-arm-kernel, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wed, 06 Jul 2011 18:05:00 +0200, Christoph Lameter <cl@linux.com> wrote:
> ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot  
> DMA into all of memory (and so is ZONE_DMA32).  Memory from ZONE_NORMAL
> can be used for DMA as well and a fully capable device would be expected
> to handle any memory in the system for DMA transfers.
>
> "guaranteed" dmaable memory? DMA abilities are device specific. Well  
> maybe you can call ZONE_DMA memory to be guaranteed if you guarantee
> that any device must at mininum be able to perform DMA into ZONE_DMA
> memory. But there may not be much of that memory around so you would
> want to limit the use of that scarce resource.

As pointed in Marek's other mail, this reasoning is not helping in any
way.  In case of video codec on various Samsung devices (and from some
other threads this is not limited to Samsung), the codec needs separate
buffers in separate memory banks.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 16:09                   ` Michal Nazarewicz
  0 siblings, 0 replies; 183+ messages in thread
From: Michal Nazarewicz @ 2011-07-06 16:09 UTC (permalink / raw)
  To: Russell King - ARM Linux, Christoph Lameter
  Cc: Arnd Bergmann, linux-arm-kernel, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wed, 06 Jul 2011 18:05:00 +0200, Christoph Lameter <cl@linux.com> wrote:
> ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot  
> DMA into all of memory (and so is ZONE_DMA32).  Memory from ZONE_NORMAL
> can be used for DMA as well and a fully capable device would be expected
> to handle any memory in the system for DMA transfers.
>
> "guaranteed" dmaable memory? DMA abilities are device specific. Well  
> maybe you can call ZONE_DMA memory to be guaranteed if you guarantee
> that any device must at mininum be able to perform DMA into ZONE_DMA
> memory. But there may not be much of that memory around so you would
> want to limit the use of that scarce resource.

As pointed in Marek's other mail, this reasoning is not helping in any
way.  In case of video codec on various Samsung devices (and from some
other threads this is not limited to Samsung), the codec needs separate
buffers in separate memory banks.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 16:09                   ` Michal Nazarewicz
  0 siblings, 0 replies; 183+ messages in thread
From: Michal Nazarewicz @ 2011-07-06 16:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 06 Jul 2011 18:05:00 +0200, Christoph Lameter <cl@linux.com> wrote:
> ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot  
> DMA into all of memory (and so is ZONE_DMA32).  Memory from ZONE_NORMAL
> can be used for DMA as well and a fully capable device would be expected
> to handle any memory in the system for DMA transfers.
>
> "guaranteed" dmaable memory? DMA abilities are device specific. Well  
> maybe you can call ZONE_DMA memory to be guaranteed if you guarantee
> that any device must at mininum be able to perform DMA into ZONE_DMA
> memory. But there may not be much of that memory around so you would
> want to limit the use of that scarce resource.

As pointed in Marek's other mail, this reasoning is not helping in any
way.  In case of video codec on various Samsung devices (and from some
other threads this is not limited to Samsung), the codec needs separate
buffers in separate memory banks.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 16:09                   ` Michal Nazarewicz
  (?)
@ 2011-07-06 16:19                     ` Christoph Lameter
  -1 siblings, 0 replies; 183+ messages in thread
From: Christoph Lameter @ 2011-07-06 16:19 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Russell King - ARM Linux, Arnd Bergmann, linux-arm-kernel,
	'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Jesse Barker', 'KAMEZAWA Hiroyuki',
	linux-kernel, linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media, Andi Kleen

On Wed, 6 Jul 2011, Michal Nazarewicz wrote:

> On Wed, 06 Jul 2011 18:05:00 +0200, Christoph Lameter <cl@linux.com> wrote:
> > ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
> > into all of memory (and so is ZONE_DMA32).  Memory from ZONE_NORMAL
> > can be used for DMA as well and a fully capable device would be expected
> > to handle any memory in the system for DMA transfers.
> >
> > "guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
> > you can call ZONE_DMA memory to be guaranteed if you guarantee
> > that any device must at mininum be able to perform DMA into ZONE_DMA
> > memory. But there may not be much of that memory around so you would
> > want to limit the use of that scarce resource.
>
> As pointed in Marek's other mail, this reasoning is not helping in any
> way.  In case of video codec on various Samsung devices (and from some
> other threads this is not limited to Samsung), the codec needs separate
> buffers in separate memory banks.

What I described is the basic memory architecture of Linux. I am not that
familiar with ARM and the issue discussed here. Only got involved because
ZONE_DMA was mentioned. The nature of ZONE_DMA is often misunderstood.

The allocation of the memory banks for the Samsung devices has to fit
somehow into one of these zones. Its probably best to put the memory banks
into ZONE_NORMAL and not have any dependency on ZONE_DMA at all.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 16:19                     ` Christoph Lameter
  0 siblings, 0 replies; 183+ messages in thread
From: Christoph Lameter @ 2011-07-06 16:19 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Russell King - ARM Linux, Arnd Bergmann, linux-arm-kernel,
	'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Jesse Barker', 'KAMEZAWA Hiroyuki',
	linux-kernel, linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media, Andi Kleen

On Wed, 6 Jul 2011, Michal Nazarewicz wrote:

> On Wed, 06 Jul 2011 18:05:00 +0200, Christoph Lameter <cl@linux.com> wrote:
> > ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
> > into all of memory (and so is ZONE_DMA32).  Memory from ZONE_NORMAL
> > can be used for DMA as well and a fully capable device would be expected
> > to handle any memory in the system for DMA transfers.
> >
> > "guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
> > you can call ZONE_DMA memory to be guaranteed if you guarantee
> > that any device must at mininum be able to perform DMA into ZONE_DMA
> > memory. But there may not be much of that memory around so you would
> > want to limit the use of that scarce resource.
>
> As pointed in Marek's other mail, this reasoning is not helping in any
> way.  In case of video codec on various Samsung devices (and from some
> other threads this is not limited to Samsung), the codec needs separate
> buffers in separate memory banks.

What I described is the basic memory architecture of Linux. I am not that
familiar with ARM and the issue discussed here. Only got involved because
ZONE_DMA was mentioned. The nature of ZONE_DMA is often misunderstood.

The allocation of the memory banks for the Samsung devices has to fit
somehow into one of these zones. Its probably best to put the memory banks
into ZONE_NORMAL and not have any dependency on ZONE_DMA at all.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 16:19                     ` Christoph Lameter
  0 siblings, 0 replies; 183+ messages in thread
From: Christoph Lameter @ 2011-07-06 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 6 Jul 2011, Michal Nazarewicz wrote:

> On Wed, 06 Jul 2011 18:05:00 +0200, Christoph Lameter <cl@linux.com> wrote:
> > ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
> > into all of memory (and so is ZONE_DMA32).  Memory from ZONE_NORMAL
> > can be used for DMA as well and a fully capable device would be expected
> > to handle any memory in the system for DMA transfers.
> >
> > "guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
> > you can call ZONE_DMA memory to be guaranteed if you guarantee
> > that any device must at mininum be able to perform DMA into ZONE_DMA
> > memory. But there may not be much of that memory around so you would
> > want to limit the use of that scarce resource.
>
> As pointed in Marek's other mail, this reasoning is not helping in any
> way.  In case of video codec on various Samsung devices (and from some
> other threads this is not limited to Samsung), the codec needs separate
> buffers in separate memory banks.

What I described is the basic memory architecture of Linux. I am not that
familiar with ARM and the issue discussed here. Only got involved because
ZONE_DMA was mentioned. The nature of ZONE_DMA is often misunderstood.

The allocation of the memory banks for the Samsung devices has to fit
somehow into one of these zones. Its probably best to put the memory banks
into ZONE_NORMAL and not have any dependency on ZONE_DMA at all.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 15:48               ` Russell King - ARM Linux
  (?)
@ 2011-07-06 16:31                 ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 16:31 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arm-kernel, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> > On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > 
> > I don't see how. The pages get allocated from an unmapped area
> > or memory, mapped into the kernel address space as uncached or wc
> > and then cleared. This should be the same for lowmem or highmem
> > pages.
> 
> You don't want to clear them via their uncached or WC mapping, but via
> their cached mapping _before_ they get their alternative mapping, and
> flush any cached out of that mapping - both L1 and L2 caches.

But there can't be any other mapping, which is the whole point of
the exercise to use highmem.
Quoting from the new dma_alloc_area() function:

        c = arm_vmregion_alloc(&area->vm, align, size,
                            gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
        if (!c)
                return NULL;
        memset((void *)c->vm_start, 0, size);

area->vm here points to an uncached location, which means that
we already zero the data through the uncached mapping. I don't
see how it's getting worse than it is already.

> > > Another issue is that when a platform has restricted DMA regions,
> > > they typically don't fall into the highmem zone.  As the dmabounce
> > > code allocates from the DMA coherent allocator to provide it with
> > > guaranteed DMA-able memory, that would be rather inconvenient.
> > 
> > True. The dmabounce code would consequently have to allocate
> > the memory through an internal function that avoids the
> > contiguous allocation area and goes straight to ZONE_DMA memory
> > as it does today.
> 
> CMA's whole purpose for existing is to provide _dma-able_ contiguous
> memory for things like cameras and such like found on crippled non-
> scatter-gather hardware.  If that memory is not DMA-able what's the
> point?

I mean not any ZONE_DMA memory, but the memory backing coherent_areas[],
which is by definition DMA-able from any device and is what is currently
being used for the purpose.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 16:31                 ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 16:31 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arm-kernel, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> > On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > 
> > I don't see how. The pages get allocated from an unmapped area
> > or memory, mapped into the kernel address space as uncached or wc
> > and then cleared. This should be the same for lowmem or highmem
> > pages.
> 
> You don't want to clear them via their uncached or WC mapping, but via
> their cached mapping _before_ they get their alternative mapping, and
> flush any cached out of that mapping - both L1 and L2 caches.

But there can't be any other mapping, which is the whole point of
the exercise to use highmem.
Quoting from the new dma_alloc_area() function:

        c = arm_vmregion_alloc(&area->vm, align, size,
                            gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
        if (!c)
                return NULL;
        memset((void *)c->vm_start, 0, size);

area->vm here points to an uncached location, which means that
we already zero the data through the uncached mapping. I don't
see how it's getting worse than it is already.

> > > Another issue is that when a platform has restricted DMA regions,
> > > they typically don't fall into the highmem zone.  As the dmabounce
> > > code allocates from the DMA coherent allocator to provide it with
> > > guaranteed DMA-able memory, that would be rather inconvenient.
> > 
> > True. The dmabounce code would consequently have to allocate
> > the memory through an internal function that avoids the
> > contiguous allocation area and goes straight to ZONE_DMA memory
> > as it does today.
> 
> CMA's whole purpose for existing is to provide _dma-able_ contiguous
> memory for things like cameras and such like found on crippled non-
> scatter-gather hardware.  If that memory is not DMA-able what's the
> point?

I mean not any ZONE_DMA memory, but the memory backing coherent_areas[],
which is by definition DMA-able from any device and is what is currently
being used for the purpose.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 16:31                 ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> > On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > 
> > I don't see how. The pages get allocated from an unmapped area
> > or memory, mapped into the kernel address space as uncached or wc
> > and then cleared. This should be the same for lowmem or highmem
> > pages.
> 
> You don't want to clear them via their uncached or WC mapping, but via
> their cached mapping _before_ they get their alternative mapping, and
> flush any cached out of that mapping - both L1 and L2 caches.

But there can't be any other mapping, which is the whole point of
the exercise to use highmem.
Quoting from the new dma_alloc_area() function:

        c = arm_vmregion_alloc(&area->vm, align, size,
                            gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
        if (!c)
                return NULL;
        memset((void *)c->vm_start, 0, size);

area->vm here points to an uncached location, which means that
we already zero the data through the uncached mapping. I don't
see how it's getting worse than it is already.

> > > Another issue is that when a platform has restricted DMA regions,
> > > they typically don't fall into the highmem zone.  As the dmabounce
> > > code allocates from the DMA coherent allocator to provide it with
> > > guaranteed DMA-able memory, that would be rather inconvenient.
> > 
> > True. The dmabounce code would consequently have to allocate
> > the memory through an internal function that avoids the
> > contiguous allocation area and goes straight to ZONE_DMA memory
> > as it does today.
> 
> CMA's whole purpose for existing is to provide _dma-able_ contiguous
> memory for things like cameras and such like found on crippled non-
> scatter-gather hardware.  If that memory is not DMA-able what's the
> point?

I mean not any ZONE_DMA memory, but the memory backing coherent_areas[],
which is by definition DMA-able from any device and is what is currently
being used for the purpose.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 16:05                 ` Christoph Lameter
  (?)
@ 2011-07-06 17:02                   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 17:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Arnd Bergmann, linux-arm-kernel, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wed, Jul 06, 2011 at 11:05:00AM -0500, Christoph Lameter wrote:
> On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> 
> > > > they typically don't fall into the highmem zone.  As the dmabounce
> > > > code allocates from the DMA coherent allocator to provide it with
> > > > guaranteed DMA-able memory, that would be rather inconvenient.
> > >
> > > True. The dmabounce code would consequently have to allocate
> > > the memory through an internal function that avoids the
> > > contiguous allocation area and goes straight to ZONE_DMA memory
> > > as it does today.
> >
> > CMA's whole purpose for existing is to provide _dma-able_ contiguous
> > memory for things like cameras and such like found on crippled non-
> > scatter-gather hardware.  If that memory is not DMA-able what's the
> > point?
> 
> ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
> into all of memory (and so is ZONE_DMA32). Memory from ZONE_NORMAL can be
> used for DMA as well and a fully capable device would be expected to
> handle any memory in the system for DMA transfers.
> 
> "guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
> you can call ZONE_DMA memory to be guaranteed if you guarantee that any
> device must at mininum be able to perform DMA into ZONE_DMA memory. But
> there may not be much of that memory around so you would want to limit
> the use of that scarce resource.

Precisely, which is what ZONE_DMA is all about.  I *have* been a Linux
kernel hacker for the last 18 years and do know these things, especially
as ARM has had various issues with DMA memory limitations over those
years - and have successfully had platforms working reliably given that
and ZONE_DMA.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 17:02                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 17:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Arnd Bergmann, linux-arm-kernel, 'Daniel Walker',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Jesse Barker',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wed, Jul 06, 2011 at 11:05:00AM -0500, Christoph Lameter wrote:
> On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> 
> > > > they typically don't fall into the highmem zone.  As the dmabounce
> > > > code allocates from the DMA coherent allocator to provide it with
> > > > guaranteed DMA-able memory, that would be rather inconvenient.
> > >
> > > True. The dmabounce code would consequently have to allocate
> > > the memory through an internal function that avoids the
> > > contiguous allocation area and goes straight to ZONE_DMA memory
> > > as it does today.
> >
> > CMA's whole purpose for existing is to provide _dma-able_ contiguous
> > memory for things like cameras and such like found on crippled non-
> > scatter-gather hardware.  If that memory is not DMA-able what's the
> > point?
> 
> ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
> into all of memory (and so is ZONE_DMA32). Memory from ZONE_NORMAL can be
> used for DMA as well and a fully capable device would be expected to
> handle any memory in the system for DMA transfers.
> 
> "guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
> you can call ZONE_DMA memory to be guaranteed if you guarantee that any
> device must at mininum be able to perform DMA into ZONE_DMA memory. But
> there may not be much of that memory around so you would want to limit
> the use of that scarce resource.

Precisely, which is what ZONE_DMA is all about.  I *have* been a Linux
kernel hacker for the last 18 years and do know these things, especially
as ARM has had various issues with DMA memory limitations over those
years - and have successfully had platforms working reliably given that
and ZONE_DMA.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 17:02                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 17:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 06, 2011 at 11:05:00AM -0500, Christoph Lameter wrote:
> On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> 
> > > > they typically don't fall into the highmem zone.  As the dmabounce
> > > > code allocates from the DMA coherent allocator to provide it with
> > > > guaranteed DMA-able memory, that would be rather inconvenient.
> > >
> > > True. The dmabounce code would consequently have to allocate
> > > the memory through an internal function that avoids the
> > > contiguous allocation area and goes straight to ZONE_DMA memory
> > > as it does today.
> >
> > CMA's whole purpose for existing is to provide _dma-able_ contiguous
> > memory for things like cameras and such like found on crippled non-
> > scatter-gather hardware.  If that memory is not DMA-able what's the
> > point?
> 
> ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
> into all of memory (and so is ZONE_DMA32). Memory from ZONE_NORMAL can be
> used for DMA as well and a fully capable device would be expected to
> handle any memory in the system for DMA transfers.
> 
> "guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
> you can call ZONE_DMA memory to be guaranteed if you guarantee that any
> device must at mininum be able to perform DMA into ZONE_DMA memory. But
> there may not be much of that memory around so you would want to limit
> the use of that scarce resource.

Precisely, which is what ZONE_DMA is all about.  I *have* been a Linux
kernel hacker for the last 18 years and do know these things, especially
as ARM has had various issues with DMA memory limitations over those
years - and have successfully had platforms working reliably given that
and ZONE_DMA.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 16:19                     ` Christoph Lameter
  (?)
@ 2011-07-06 17:15                       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 17:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Michal Nazarewicz, Arnd Bergmann, linux-arm-kernel,
	'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Jesse Barker', 'KAMEZAWA Hiroyuki',
	linux-kernel, linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media, Andi Kleen

On Wed, Jul 06, 2011 at 11:19:00AM -0500, Christoph Lameter wrote:
> What I described is the basic memory architecture of Linux. I am not that
> familiar with ARM and the issue discussed here. Only got involved because
> ZONE_DMA was mentioned. The nature of ZONE_DMA is often misunderstood.
> 
> The allocation of the memory banks for the Samsung devices has to fit
> somehow into one of these zones. Its probably best to put the memory banks
> into ZONE_NORMAL and not have any dependency on ZONE_DMA at all.

Let me teach you about the ARM memory management on Linux.

Firstly, lets go over the structure of zones in Linux.  There are three
zones - ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM.  These zones are filled
in that order.  So, ZONE_DMA starts at zero.  Following on from ZONE_DMA
is ZONE_NORMAL memory, and lastly ZONE_HIGHMEM.

At boot, we pass all memory over to the kernel as follows:

1. If there is no DMA zone, then we pass all low memory over as ZONE_NORMAL.

2. If there is a DMA zone, by default we pass all low memory as ZONE_DMA.
   This is required so drivers which use GFP_DMA can work.

   Platforms with restricted DMA requirements can modify that layout to
   move memory from ZONE_DMA into ZONE_NORMAL, thereby restricting the
   upper address which the kernel allocators will give for GFP_DMA
   allocations.

3. In either case, any high memory as ZONE_HIGHMEM if configured (or memory
   is truncated if not.)

So, when we have (eg) a platform where only the _even_ MBs of memory are
DMA-able, we have a 1MB DMA zone at the beginning of system memory, and
everything else in ZONE_NORMAL.  This means GFP_DMA will return either
memory from the first 1MB or fail if it can't.  This is the behaviour we
desire.

Normal allocations will come from ZONE_NORMAL _first_ and then try ZONE_DMA
if there's no other alternative.  This is the same desired behaviour as
x86.

So, ARM is no different from x86, with the exception that the 16MB DMA
zone due to ISA ends up being different sizes on ARM depending on our
restrictions.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 17:15                       ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 17:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Michal Nazarewicz, Arnd Bergmann, linux-arm-kernel,
	'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Jesse Barker', 'KAMEZAWA Hiroyuki',
	linux-kernel, linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media, Andi Kleen

On Wed, Jul 06, 2011 at 11:19:00AM -0500, Christoph Lameter wrote:
> What I described is the basic memory architecture of Linux. I am not that
> familiar with ARM and the issue discussed here. Only got involved because
> ZONE_DMA was mentioned. The nature of ZONE_DMA is often misunderstood.
> 
> The allocation of the memory banks for the Samsung devices has to fit
> somehow into one of these zones. Its probably best to put the memory banks
> into ZONE_NORMAL and not have any dependency on ZONE_DMA at all.

Let me teach you about the ARM memory management on Linux.

Firstly, lets go over the structure of zones in Linux.  There are three
zones - ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM.  These zones are filled
in that order.  So, ZONE_DMA starts at zero.  Following on from ZONE_DMA
is ZONE_NORMAL memory, and lastly ZONE_HIGHMEM.

At boot, we pass all memory over to the kernel as follows:

1. If there is no DMA zone, then we pass all low memory over as ZONE_NORMAL.

2. If there is a DMA zone, by default we pass all low memory as ZONE_DMA.
   This is required so drivers which use GFP_DMA can work.

   Platforms with restricted DMA requirements can modify that layout to
   move memory from ZONE_DMA into ZONE_NORMAL, thereby restricting the
   upper address which the kernel allocators will give for GFP_DMA
   allocations.

3. In either case, any high memory as ZONE_HIGHMEM if configured (or memory
   is truncated if not.)

So, when we have (eg) a platform where only the _even_ MBs of memory are
DMA-able, we have a 1MB DMA zone at the beginning of system memory, and
everything else in ZONE_NORMAL.  This means GFP_DMA will return either
memory from the first 1MB or fail if it can't.  This is the behaviour we
desire.

Normal allocations will come from ZONE_NORMAL _first_ and then try ZONE_DMA
if there's no other alternative.  This is the same desired behaviour as
x86.

So, ARM is no different from x86, with the exception that the 16MB DMA
zone due to ISA ends up being different sizes on ARM depending on our
restrictions.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 17:15                       ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-06 17:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 06, 2011 at 11:19:00AM -0500, Christoph Lameter wrote:
> What I described is the basic memory architecture of Linux. I am not that
> familiar with ARM and the issue discussed here. Only got involved because
> ZONE_DMA was mentioned. The nature of ZONE_DMA is often misunderstood.
> 
> The allocation of the memory banks for the Samsung devices has to fit
> somehow into one of these zones. Its probably best to put the memory banks
> into ZONE_NORMAL and not have any dependency on ZONE_DMA at all.

Let me teach you about the ARM memory management on Linux.

Firstly, lets go over the structure of zones in Linux.  There are three
zones - ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM.  These zones are filled
in that order.  So, ZONE_DMA starts at zero.  Following on from ZONE_DMA
is ZONE_NORMAL memory, and lastly ZONE_HIGHMEM.

At boot, we pass all memory over to the kernel as follows:

1. If there is no DMA zone, then we pass all low memory over as ZONE_NORMAL.

2. If there is a DMA zone, by default we pass all low memory as ZONE_DMA.
   This is required so drivers which use GFP_DMA can work.

   Platforms with restricted DMA requirements can modify that layout to
   move memory from ZONE_DMA into ZONE_NORMAL, thereby restricting the
   upper address which the kernel allocators will give for GFP_DMA
   allocations.

3. In either case, any high memory as ZONE_HIGHMEM if configured (or memory
   is truncated if not.)

So, when we have (eg) a platform where only the _even_ MBs of memory are
DMA-able, we have a 1MB DMA zone at the beginning of system memory, and
everything else in ZONE_NORMAL.  This means GFP_DMA will return either
memory from the first 1MB or fail if it can't.  This is the behaviour we
desire.

Normal allocations will come from ZONE_NORMAL _first_ and then try ZONE_DMA
if there's no other alternative.  This is the same desired behaviour as
x86.

So, ARM is no different from x86, with the exception that the 16MB DMA
zone due to ISA ends up being different sizes on ARM depending on our
restrictions.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 17:15                       ` Russell King - ARM Linux
  (?)
@ 2011-07-06 19:03                         ` Christoph Lameter
  -1 siblings, 0 replies; 183+ messages in thread
From: Christoph Lameter @ 2011-07-06 19:03 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Michal Nazarewicz, Arnd Bergmann, linux-arm-kernel,
	'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Jesse Barker', 'KAMEZAWA Hiroyuki',
	linux-kernel, linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media, Andi Kleen

On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> So, ARM is no different from x86, with the exception that the 16MB DMA
> zone due to ISA ends up being different sizes on ARM depending on our
> restrictions.

Sounds good. Thank you.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 19:03                         ` Christoph Lameter
  0 siblings, 0 replies; 183+ messages in thread
From: Christoph Lameter @ 2011-07-06 19:03 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Michal Nazarewicz, Arnd Bergmann, linux-arm-kernel,
	'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Jesse Barker', 'KAMEZAWA Hiroyuki',
	linux-kernel, linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media, Andi Kleen

On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> So, ARM is no different from x86, with the exception that the 16MB DMA
> zone due to ISA ends up being different sizes on ARM depending on our
> restrictions.

Sounds good. Thank you.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 19:03                         ` Christoph Lameter
  0 siblings, 0 replies; 183+ messages in thread
From: Christoph Lameter @ 2011-07-06 19:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> So, ARM is no different from x86, with the exception that the 16MB DMA
> zone due to ISA ends up being different sizes on ARM depending on our
> restrictions.

Sounds good. Thank you.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 16:31                 ` Arnd Bergmann
  (?)
@ 2011-07-06 19:10                   ` Nicolas Pitre
  -1 siblings, 0 replies; 183+ messages in thread
From: Nicolas Pitre @ 2011-07-06 19:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King - ARM Linux, linux-arm-kernel,
	'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Jesse Barker', 'KAMEZAWA Hiroyuki',
	lkml, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wed, 6 Jul 2011, Arnd Bergmann wrote:

> On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> > > On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > > 
> > > I don't see how. The pages get allocated from an unmapped area
> > > or memory, mapped into the kernel address space as uncached or wc
> > > and then cleared. This should be the same for lowmem or highmem
> > > pages.
> > 
> > You don't want to clear them via their uncached or WC mapping, but via
> > their cached mapping _before_ they get their alternative mapping, and
> > flush any cached out of that mapping - both L1 and L2 caches.
> 
> But there can't be any other mapping, which is the whole point of
> the exercise to use highmem.
> Quoting from the new dma_alloc_area() function:
> 
>         c = arm_vmregion_alloc(&area->vm, align, size,
>                             gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
>         if (!c)
>                 return NULL;
>         memset((void *)c->vm_start, 0, size);
> 
> area->vm here points to an uncached location, which means that
> we already zero the data through the uncached mapping. I don't
> see how it's getting worse than it is already.

If you get a highmem page, because the cache is VIPT, that page might 
still be cached even if it wasn't mapped.  With a VIVT cache we must 
flush the cache whenever a highmem page is unmapped.  There is no such 
restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
highmem page you get doesn't have cache lines associated to it, you must 
first map it cacheable, then perform cache invalidation on it, and 
eventually remap it as non-cacheable.  This is necessary because there 
is no way to perform cache maintenance on L1 cache using physical 
addresses unfortunately.  See commit 7e5a69e83b for an example of what 
this entails (fortunately commit 3e4d3af501 made things much easier and 
therefore commit 39af22a79 greatly simplified things).



Nicolas

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 19:10                   ` Nicolas Pitre
  0 siblings, 0 replies; 183+ messages in thread
From: Nicolas Pitre @ 2011-07-06 19:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King - ARM Linux, linux-arm-kernel,
	'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Jesse Barker', 'KAMEZAWA Hiroyuki',
	lkml, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	Marek Szyprowski, linux-media

On Wed, 6 Jul 2011, Arnd Bergmann wrote:

> On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> > > On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > > 
> > > I don't see how. The pages get allocated from an unmapped area
> > > or memory, mapped into the kernel address space as uncached or wc
> > > and then cleared. This should be the same for lowmem or highmem
> > > pages.
> > 
> > You don't want to clear them via their uncached or WC mapping, but via
> > their cached mapping _before_ they get their alternative mapping, and
> > flush any cached out of that mapping - both L1 and L2 caches.
> 
> But there can't be any other mapping, which is the whole point of
> the exercise to use highmem.
> Quoting from the new dma_alloc_area() function:
> 
>         c = arm_vmregion_alloc(&area->vm, align, size,
>                             gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
>         if (!c)
>                 return NULL;
>         memset((void *)c->vm_start, 0, size);
> 
> area->vm here points to an uncached location, which means that
> we already zero the data through the uncached mapping. I don't
> see how it's getting worse than it is already.

If you get a highmem page, because the cache is VIPT, that page might 
still be cached even if it wasn't mapped.  With a VIVT cache we must 
flush the cache whenever a highmem page is unmapped.  There is no such 
restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
highmem page you get doesn't have cache lines associated to it, you must 
first map it cacheable, then perform cache invalidation on it, and 
eventually remap it as non-cacheable.  This is necessary because there 
is no way to perform cache maintenance on L1 cache using physical 
addresses unfortunately.  See commit 7e5a69e83b for an example of what 
this entails (fortunately commit 3e4d3af501 made things much easier and 
therefore commit 39af22a79 greatly simplified things).



Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 19:10                   ` Nicolas Pitre
  0 siblings, 0 replies; 183+ messages in thread
From: Nicolas Pitre @ 2011-07-06 19:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 6 Jul 2011, Arnd Bergmann wrote:

> On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> > > On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > > 
> > > I don't see how. The pages get allocated from an unmapped area
> > > or memory, mapped into the kernel address space as uncached or wc
> > > and then cleared. This should be the same for lowmem or highmem
> > > pages.
> > 
> > You don't want to clear them via their uncached or WC mapping, but via
> > their cached mapping _before_ they get their alternative mapping, and
> > flush any cached out of that mapping - both L1 and L2 caches.
> 
> But there can't be any other mapping, which is the whole point of
> the exercise to use highmem.
> Quoting from the new dma_alloc_area() function:
> 
>         c = arm_vmregion_alloc(&area->vm, align, size,
>                             gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
>         if (!c)
>                 return NULL;
>         memset((void *)c->vm_start, 0, size);
> 
> area->vm here points to an uncached location, which means that
> we already zero the data through the uncached mapping. I don't
> see how it's getting worse than it is already.

If you get a highmem page, because the cache is VIPT, that page might 
still be cached even if it wasn't mapped.  With a VIVT cache we must 
flush the cache whenever a highmem page is unmapped.  There is no such 
restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
highmem page you get doesn't have cache lines associated to it, you must 
first map it cacheable, then perform cache invalidation on it, and 
eventually remap it as non-cacheable.  This is necessary because there 
is no way to perform cache maintenance on L1 cache using physical 
addresses unfortunately.  See commit 7e5a69e83b for an example of what 
this entails (fortunately commit 3e4d3af501 made things much easier and 
therefore commit 39af22a79 greatly simplified things).



Nicolas

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 19:10                   ` Nicolas Pitre
  (?)
@ 2011-07-06 20:23                     ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 20:23 UTC (permalink / raw)
  To: linaro-mm-sig
  Cc: Nicolas Pitre, 'Daniel Walker',
	Russell King - ARM Linux, linux-media, 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	lkml, 'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'KAMEZAWA Hiroyuki',
	'Andrew Morton',
	linux-mm, linux-arm-kernel, 'Ankita Garg'

On Wednesday 06 July 2011 21:10:07 Nicolas Pitre wrote:
> If you get a highmem page, because the cache is VIPT, that page might 
> still be cached even if it wasn't mapped.  With a VIVT cache we must 
> flush the cache whenever a highmem page is unmapped.  There is no such 
> restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
> highmem page you get doesn't have cache lines associated to it, you must 
> first map it cacheable, then perform cache invalidation on it, and 
> eventually remap it as non-cacheable.  This is necessary because there 
> is no way to perform cache maintenance on L1 cache using physical 
> addresses unfortunately.  See commit 7e5a69e83b for an example of what 
> this entails (fortunately commit 3e4d3af501 made things much easier and 
> therefore commit 39af22a79 greatly simplified things).

Ok, thanks for the explanation. This definitely makes the highmem approach
much harder to get right, and slower. Let's hope then that Marek's approach
of using small pages for the contiguous memory region and changing their
attributes on the fly works out better than this.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 20:23                     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 20:23 UTC (permalink / raw)
  To: linaro-mm-sig
  Cc: Nicolas Pitre, 'Daniel Walker',
	Russell King - ARM Linux, linux-media, 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	lkml, 'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'KAMEZAWA Hiroyuki',
	'Andrew Morton',
	linux-mm, linux-arm-kernel, 'Ankita Garg'

On Wednesday 06 July 2011 21:10:07 Nicolas Pitre wrote:
> If you get a highmem page, because the cache is VIPT, that page might 
> still be cached even if it wasn't mapped.  With a VIVT cache we must 
> flush the cache whenever a highmem page is unmapped.  There is no such 
> restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
> highmem page you get doesn't have cache lines associated to it, you must 
> first map it cacheable, then perform cache invalidation on it, and 
> eventually remap it as non-cacheable.  This is necessary because there 
> is no way to perform cache maintenance on L1 cache using physical 
> addresses unfortunately.  See commit 7e5a69e83b for an example of what 
> this entails (fortunately commit 3e4d3af501 made things much easier and 
> therefore commit 39af22a79 greatly simplified things).

Ok, thanks for the explanation. This definitely makes the highmem approach
much harder to get right, and slower. Let's hope then that Marek's approach
of using small pages for the contiguous memory region and changing their
attributes on the fly works out better than this.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-06 20:23                     ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-06 20:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 06 July 2011 21:10:07 Nicolas Pitre wrote:
> If you get a highmem page, because the cache is VIPT, that page might 
> still be cached even if it wasn't mapped.  With a VIVT cache we must 
> flush the cache whenever a highmem page is unmapped.  There is no such 
> restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
> highmem page you get doesn't have cache lines associated to it, you must 
> first map it cacheable, then perform cache invalidation on it, and 
> eventually remap it as non-cacheable.  This is necessary because there 
> is no way to perform cache maintenance on L1 cache using physical 
> addresses unfortunately.  See commit 7e5a69e83b for an example of what 
> this entails (fortunately commit 3e4d3af501 made things much easier and 
> therefore commit 39af22a79 greatly simplified things).

Ok, thanks for the explanation. This definitely makes the highmem approach
much harder to get right, and slower. Let's hope then that Marek's approach
of using small pages for the contiguous memory region and changing their
attributes on the fly works out better than this.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCHv11 0/8] Contiguous Memory Allocator
  2011-07-05 12:07   ` Arnd Bergmann
  (?)
@ 2011-07-06 22:11     ` Andrew Morton
  -1 siblings, 0 replies; 183+ messages in thread
From: Andrew Morton @ 2011-07-06 22:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tue, 5 Jul 2011 14:07:17 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Tuesday 05 July 2011, Marek Szyprowski wrote:
> > This is yet another round of Contiguous Memory Allocator patches. I hope
> > that I've managed to resolve all the items discussed during the Memory
> > Management summit at Linaro Meeting in Budapest and pointed later on
> > mailing lists. The goal is to integrate it as tight as possible with
> > other kernel subsystems (like memory management and dma-mapping) and
> > finally merge to mainline.
> 
> You have certainly addressed all of my concerns, this looks really good now!
> 
> Andrew, can you add this to your -mm tree? What's your opinion on the
> current state, do you think this is ready for merging in 3.1 or would
> you want to have more reviews from core memory management people?
> 
> My reviews were mostly on the driver and platform API side, and I think
> we're fine there now, but I don't really understand the impacts this has
> in mm.

I could review it and put it in there on a preliminary basis for some
runtime testing.  But the question in my mind is how different will the
code be after the problems which rmk has identified have been fixed?

If "not very different" then that effort and testing will have been
worthwhile.

If "very different" or "unworkable" then it was all for naught.

So.  Do we have a feeling for the magnitude of the changes which will
be needed to fix these things up?


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-06 22:11     ` Andrew Morton
  0 siblings, 0 replies; 183+ messages in thread
From: Andrew Morton @ 2011-07-06 22:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Chunsang Jeong

On Tue, 5 Jul 2011 14:07:17 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Tuesday 05 July 2011, Marek Szyprowski wrote:
> > This is yet another round of Contiguous Memory Allocator patches. I hope
> > that I've managed to resolve all the items discussed during the Memory
> > Management summit at Linaro Meeting in Budapest and pointed later on
> > mailing lists. The goal is to integrate it as tight as possible with
> > other kernel subsystems (like memory management and dma-mapping) and
> > finally merge to mainline.
> 
> You have certainly addressed all of my concerns, this looks really good now!
> 
> Andrew, can you add this to your -mm tree? What's your opinion on the
> current state, do you think this is ready for merging in 3.1 or would
> you want to have more reviews from core memory management people?
> 
> My reviews were mostly on the driver and platform API side, and I think
> we're fine there now, but I don't really understand the impacts this has
> in mm.

I could review it and put it in there on a preliminary basis for some
runtime testing.  But the question in my mind is how different will the
code be after the problems which rmk has identified have been fixed?

If "not very different" then that effort and testing will have been
worthwhile.

If "very different" or "unworkable" then it was all for naught.

So.  Do we have a feeling for the magnitude of the changes which will
be needed to fix these things up?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-06 22:11     ` Andrew Morton
  0 siblings, 0 replies; 183+ messages in thread
From: Andrew Morton @ 2011-07-06 22:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 5 Jul 2011 14:07:17 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Tuesday 05 July 2011, Marek Szyprowski wrote:
> > This is yet another round of Contiguous Memory Allocator patches. I hope
> > that I've managed to resolve all the items discussed during the Memory
> > Management summit at Linaro Meeting in Budapest and pointed later on
> > mailing lists. The goal is to integrate it as tight as possible with
> > other kernel subsystems (like memory management and dma-mapping) and
> > finally merge to mainline.
> 
> You have certainly addressed all of my concerns, this looks really good now!
> 
> Andrew, can you add this to your -mm tree? What's your opinion on the
> current state, do you think this is ready for merging in 3.1 or would
> you want to have more reviews from core memory management people?
> 
> My reviews were mostly on the driver and platform API side, and I think
> we're fine there now, but I don't really understand the impacts this has
> in mm.

I could review it and put it in there on a preliminary basis for some
runtime testing.  But the question in my mind is how different will the
code be after the problems which rmk has identified have been fixed?

If "not very different" then that effort and testing will have been
worthwhile.

If "very different" or "unworkable" then it was all for naught.

So.  Do we have a feeling for the magnitude of the changes which will
be needed to fix these things up?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 20:23                     ` Arnd Bergmann
  (?)
@ 2011-07-07  5:29                       ` Nicolas Pitre
  -1 siblings, 0 replies; 183+ messages in thread
From: Nicolas Pitre @ 2011-07-07  5:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-mm-sig, 'Daniel Walker',
	Russell King - ARM Linux, linux-media, 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	lkml, 'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'KAMEZAWA Hiroyuki',
	'Andrew Morton',
	linux-mm, linux-arm-kernel, 'Ankita Garg'

On Wed, 6 Jul 2011, Arnd Bergmann wrote:

> On Wednesday 06 July 2011 21:10:07 Nicolas Pitre wrote:
> > If you get a highmem page, because the cache is VIPT, that page might 
> > still be cached even if it wasn't mapped.  With a VIVT cache we must 
> > flush the cache whenever a highmem page is unmapped.  There is no such 
> > restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
> > highmem page you get doesn't have cache lines associated to it, you must 
> > first map it cacheable, then perform cache invalidation on it, and 
> > eventually remap it as non-cacheable.  This is necessary because there 
> > is no way to perform cache maintenance on L1 cache using physical 
> > addresses unfortunately.  See commit 7e5a69e83b for an example of what 
> > this entails (fortunately commit 3e4d3af501 made things much easier and 
> > therefore commit 39af22a79 greatly simplified things).
> 
> Ok, thanks for the explanation. This definitely makes the highmem approach
> much harder to get right, and slower. Let's hope then that Marek's approach
> of using small pages for the contiguous memory region and changing their
> attributes on the fly works out better than this.

I would say that both approaches have fairly equivalent complexity.


Nicolas

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-07  5:29                       ` Nicolas Pitre
  0 siblings, 0 replies; 183+ messages in thread
From: Nicolas Pitre @ 2011-07-07  5:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-mm-sig, 'Daniel Walker',
	Russell King - ARM Linux, linux-media, 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	lkml, 'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'KAMEZAWA Hiroyuki',
	'Andrew Morton',
	linux-mm, linux-arm-kernel, 'Ankita Garg'

On Wed, 6 Jul 2011, Arnd Bergmann wrote:

> On Wednesday 06 July 2011 21:10:07 Nicolas Pitre wrote:
> > If you get a highmem page, because the cache is VIPT, that page might 
> > still be cached even if it wasn't mapped.  With a VIVT cache we must 
> > flush the cache whenever a highmem page is unmapped.  There is no such 
> > restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
> > highmem page you get doesn't have cache lines associated to it, you must 
> > first map it cacheable, then perform cache invalidation on it, and 
> > eventually remap it as non-cacheable.  This is necessary because there 
> > is no way to perform cache maintenance on L1 cache using physical 
> > addresses unfortunately.  See commit 7e5a69e83b for an example of what 
> > this entails (fortunately commit 3e4d3af501 made things much easier and 
> > therefore commit 39af22a79 greatly simplified things).
> 
> Ok, thanks for the explanation. This definitely makes the highmem approach
> much harder to get right, and slower. Let's hope then that Marek's approach
> of using small pages for the contiguous memory region and changing their
> attributes on the fly works out better than this.

I would say that both approaches have fairly equivalent complexity.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-07  5:29                       ` Nicolas Pitre
  0 siblings, 0 replies; 183+ messages in thread
From: Nicolas Pitre @ 2011-07-07  5:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 6 Jul 2011, Arnd Bergmann wrote:

> On Wednesday 06 July 2011 21:10:07 Nicolas Pitre wrote:
> > If you get a highmem page, because the cache is VIPT, that page might 
> > still be cached even if it wasn't mapped.  With a VIVT cache we must 
> > flush the cache whenever a highmem page is unmapped.  There is no such 
> > restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
> > highmem page you get doesn't have cache lines associated to it, you must 
> > first map it cacheable, then perform cache invalidation on it, and 
> > eventually remap it as non-cacheable.  This is necessary because there 
> > is no way to perform cache maintenance on L1 cache using physical 
> > addresses unfortunately.  See commit 7e5a69e83b for an example of what 
> > this entails (fortunately commit 3e4d3af501 made things much easier and 
> > therefore commit 39af22a79 greatly simplified things).
> 
> Ok, thanks for the explanation. This definitely makes the highmem approach
> much harder to get right, and slower. Let's hope then that Marek's approach
> of using small pages for the contiguous memory region and changing their
> attributes on the fly works out better than this.

I would say that both approaches have fairly equivalent complexity.


Nicolas

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCHv11 0/8] Contiguous Memory Allocator
  2011-07-06 22:11     ` Andrew Morton
  (?)
@ 2011-07-07  7:36       ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-07  7:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andrew Morton, Ankita Garg, Daniel Walker, Jesse Barker,
	Mel Gorman, Chunsang Jeong, Jonathan Corbet, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	KAMEZAWA Hiroyuki, Marek Szyprowski, linux-media

On Thursday 07 July 2011 00:11:12 Andrew Morton wrote:
> I could review it and put it in there on a preliminary basis for some
> runtime testing.  But the question in my mind is how different will the
> code be after the problems which rmk has identified have been fixed?
> 
> If "not very different" then that effort and testing will have been
> worthwhile.
> 
> If "very different" or "unworkable" then it was all for naught.
> 
> So.  Do we have a feeling for the magnitude of the changes which will
> be needed to fix these things up?

As far as I can tell, the changes that we still need are mostly in the 
ARM specific portion of the series. All architectures that have cache
coherent DMA by default (most of the other interesting ones) can just
call dma_alloc_from_contiguous() from their dma_alloc_coherent()
function without having to do extra work.

It's possible that there will be small changes to simplify to the
first six patches in order to simplify the ARM port, but I expect
them to stay basically as they are, unless someone complains about
them.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-07  7:36       ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-07  7:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andrew Morton, Ankita Garg, Daniel Walker, Jesse Barker,
	Mel Gorman, Chunsang Jeong, Jonathan Corbet, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	KAMEZAWA Hiroyuki, Marek Szyprowski, linux-media

On Thursday 07 July 2011 00:11:12 Andrew Morton wrote:
> I could review it and put it in there on a preliminary basis for some
> runtime testing.  But the question in my mind is how different will the
> code be after the problems which rmk has identified have been fixed?
> 
> If "not very different" then that effort and testing will have been
> worthwhile.
> 
> If "very different" or "unworkable" then it was all for naught.
> 
> So.  Do we have a feeling for the magnitude of the changes which will
> be needed to fix these things up?

As far as I can tell, the changes that we still need are mostly in the 
ARM specific portion of the series. All architectures that have cache
coherent DMA by default (most of the other interesting ones) can just
call dma_alloc_from_contiguous() from their dma_alloc_coherent()
function without having to do extra work.

It's possible that there will be small changes to simplify to the
first six patches in order to simplify the ARM port, but I expect
them to stay basically as they are, unless someone complains about
them.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-07  7:36       ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-07  7:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 07 July 2011 00:11:12 Andrew Morton wrote:
> I could review it and put it in there on a preliminary basis for some
> runtime testing.  But the question in my mind is how different will the
> code be after the problems which rmk has identified have been fixed?
> 
> If "not very different" then that effort and testing will have been
> worthwhile.
> 
> If "very different" or "unworkable" then it was all for naught.
> 
> So.  Do we have a feeling for the magnitude of the changes which will
> be needed to fix these things up?

As far as I can tell, the changes that we still need are mostly in the 
ARM specific portion of the series. All architectures that have cache
coherent DMA by default (most of the other interesting ones) can just
call dma_alloc_from_contiguous() from their dma_alloc_coherent()
function without having to do extra work.

It's possible that there will be small changes to simplify to the
first six patches in order to simplify the ARM port, but I expect
them to stay basically as they are, unless someone complains about
them.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-05 13:58           ` Arnd Bergmann
  (?)
@ 2011-07-08 17:25             ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-08 17:25 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Daniel Walker, Jonathan Corbet, Mel Gorman,
	Chunsang Jeong, Jesse Barker, KAMEZAWA Hiroyuki, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	Ankita Garg, Andrew Morton, Marek Szyprowski, linux-media

On Tue, Jul 05, 2011 at 03:58:39PM +0200, Arnd Bergmann wrote:
> Ah, sorry I missed that patch on the mailing list, found it now in
> your for-next branch.

I've been searching for this email to reply to for the last day or
so...

> If I'm reading your "ARM: DMA: steal memory for DMA coherent mappings"
> correctly, the idea is to have a per-platform compile-time amount
> of memory that is reserved purely for coherent allocations and
> taking out of the buddy allocator, right?

Yes, because every time I've looked at taking out memory mappings in
the first level page tables, it's always been a major issue.

We have a method where we can remove first level mappings on
uniprocessor systems in the ioremap code just fine - we use that so
that systems can setup section and supersection mappings.  They can
tear them down as well - and we update other tasks L1 page tables
when they get switched in.

This, however, doesn't work on SMP, because if you have a DMA allocation
(which is permitted from IRQ context) you must have some way of removing
the L1 page table entries from all CPUs TLBs and the page tables currently
in use and any future page tables which those CPUs may switch to.

The easy bit is "future page tables" - that can be done in the same way
as the ioremap() code does with a generation number, checked when a new
page table is switched in.  The problem is the current CPUs, and as we
know trying to call smp_call_function() with IRQs disabled is not
permitted due to deadlock.

So, in a SMP system, there is no safe way to remove L1 page table entries
from IRQ context.  That means if memory is mapped for the buddy allocators
using L1 page table entries, then it is fixed for that application on a
SMP system.

However, that's not really what I wanted to find this email for.  That
is I'm dropping the "ARM: DMA: steal memory for DMA coherent mappings"
patch for this merge window because - as I found out yesterday - it
prevents the Assabet platform booting, and so would be a regression.

Plus, I have a report of a regression with the streaming DMA API
speculative prefetch fixes causing the IOP ADMA raid5 async offload
stuff to explode - which may result in the streaming DMA API fixes
being reverted (which will leave ARMv6+ vulnerable to data corruption.)
As I have no time to work through the RAID5 code, async_tx code, and
IOP ADMA code to get to the bottom of it (because of this flood of
patches) I think a revert is looking likely - either that or I'll have
to tell the bug reporter to go away, which really isn't on.  It's on
LKML if anyone's interested in trying to diagnose it, the
"PROBLEM: ARM-dma-mapping-fix-for-speculative-prefetching cause OOPS"
thread.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-08 17:25             ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-08 17:25 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Daniel Walker, Jonathan Corbet, Mel Gorman,
	Chunsang Jeong, Jesse Barker, KAMEZAWA Hiroyuki, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	Ankita Garg, Andrew Morton, Marek Szyprowski, linux-media

On Tue, Jul 05, 2011 at 03:58:39PM +0200, Arnd Bergmann wrote:
> Ah, sorry I missed that patch on the mailing list, found it now in
> your for-next branch.

I've been searching for this email to reply to for the last day or
so...

> If I'm reading your "ARM: DMA: steal memory for DMA coherent mappings"
> correctly, the idea is to have a per-platform compile-time amount
> of memory that is reserved purely for coherent allocations and
> taking out of the buddy allocator, right?

Yes, because every time I've looked at taking out memory mappings in
the first level page tables, it's always been a major issue.

We have a method where we can remove first level mappings on
uniprocessor systems in the ioremap code just fine - we use that so
that systems can setup section and supersection mappings.  They can
tear them down as well - and we update other tasks L1 page tables
when they get switched in.

This, however, doesn't work on SMP, because if you have a DMA allocation
(which is permitted from IRQ context) you must have some way of removing
the L1 page table entries from all CPUs TLBs and the page tables currently
in use and any future page tables which those CPUs may switch to.

The easy bit is "future page tables" - that can be done in the same way
as the ioremap() code does with a generation number, checked when a new
page table is switched in.  The problem is the current CPUs, and as we
know trying to call smp_call_function() with IRQs disabled is not
permitted due to deadlock.

So, in a SMP system, there is no safe way to remove L1 page table entries
from IRQ context.  That means if memory is mapped for the buddy allocators
using L1 page table entries, then it is fixed for that application on a
SMP system.

However, that's not really what I wanted to find this email for.  That
is I'm dropping the "ARM: DMA: steal memory for DMA coherent mappings"
patch for this merge window because - as I found out yesterday - it
prevents the Assabet platform booting, and so would be a regression.

Plus, I have a report of a regression with the streaming DMA API
speculative prefetch fixes causing the IOP ADMA raid5 async offload
stuff to explode - which may result in the streaming DMA API fixes
being reverted (which will leave ARMv6+ vulnerable to data corruption.)
As I have no time to work through the RAID5 code, async_tx code, and
IOP ADMA code to get to the bottom of it (because of this flood of
patches) I think a revert is looking likely - either that or I'll have
to tell the bug reporter to go away, which really isn't on.  It's on
LKML if anyone's interested in trying to diagnose it, the
"PROBLEM: ARM-dma-mapping-fix-for-speculative-prefetching cause OOPS"
thread.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-08 17:25             ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-07-08 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 05, 2011 at 03:58:39PM +0200, Arnd Bergmann wrote:
> Ah, sorry I missed that patch on the mailing list, found it now in
> your for-next branch.

I've been searching for this email to reply to for the last day or
so...

> If I'm reading your "ARM: DMA: steal memory for DMA coherent mappings"
> correctly, the idea is to have a per-platform compile-time amount
> of memory that is reserved purely for coherent allocations and
> taking out of the buddy allocator, right?

Yes, because every time I've looked at taking out memory mappings in
the first level page tables, it's always been a major issue.

We have a method where we can remove first level mappings on
uniprocessor systems in the ioremap code just fine - we use that so
that systems can setup section and supersection mappings.  They can
tear them down as well - and we update other tasks L1 page tables
when they get switched in.

This, however, doesn't work on SMP, because if you have a DMA allocation
(which is permitted from IRQ context) you must have some way of removing
the L1 page table entries from all CPUs TLBs and the page tables currently
in use and any future page tables which those CPUs may switch to.

The easy bit is "future page tables" - that can be done in the same way
as the ioremap() code does with a generation number, checked when a new
page table is switched in.  The problem is the current CPUs, and as we
know trying to call smp_call_function() with IRQs disabled is not
permitted due to deadlock.

So, in a SMP system, there is no safe way to remove L1 page table entries
from IRQ context.  That means if memory is mapped for the buddy allocators
using L1 page table entries, then it is fixed for that application on a
SMP system.

However, that's not really what I wanted to find this email for.  That
is I'm dropping the "ARM: DMA: steal memory for DMA coherent mappings"
patch for this merge window because - as I found out yesterday - it
prevents the Assabet platform booting, and so would be a regression.

Plus, I have a report of a regression with the streaming DMA API
speculative prefetch fixes causing the IOP ADMA raid5 async offload
stuff to explode - which may result in the streaming DMA API fixes
being reverted (which will leave ARMv6+ vulnerable to data corruption.)
As I have no time to work through the RAID5 code, async_tx code, and
IOP ADMA code to get to the bottom of it (because of this flood of
patches) I think a revert is looking likely - either that or I'll have
to tell the bug reporter to go away, which really isn't on.  It's on
LKML if anyone's interested in trying to diagnose it, the
"PROBLEM: ARM-dma-mapping-fix-for-speculative-prefetching cause OOPS"
thread.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-06 14:59               ` Arnd Bergmann
  (?)
@ 2011-07-09 14:57                 ` Janusz Krzysztofik
  -1 siblings, 0 replies; 183+ messages in thread
From: Janusz Krzysztofik @ 2011-07-09 14:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Nicolas Pitre, Russell King - ARM Linux,
	'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, 'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	linux-mm, 'KAMEZAWA Hiroyuki',
	linux-media, Guennadi Liakhovetski, Marin Mitov, FUJITA Tomonori

On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > Another issue is that when a platform has restricted DMA regions,
> > > they typically don't fall into the highmem zone.  As the
> > > dmabounce code allocates from the DMA coherent allocator to
> > > provide it with guaranteed DMA-able memory, that would be rather
> > > inconvenient.
> > 
> > Do we encounter this in practice i.e. do those platforms requiring
> > large contiguous allocations motivating this work have such DMA
> > restrictions?
> 
> You can probably find one or two of those, but we don't have to
> optimize for that case. I would at least expect the maximum size of
> the allocation to be smaller than the DMA limit for these, and
> consequently mandate that they define a sufficiently large
> CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a hack to
> unmap some low memory and call
> dma_declare_coherent_memory() for the device.

Once found that Russell has dropped his "ARM: DMA: steal memory for DMA 
coherent mappings" for now, let me get back to this idea of a hack that 
would allow for safely calling dma_declare_coherent_memory() in order to 
assign a device with a block of contiguous memory for exclusive use. 
Assuming there should be no problem with successfully allocating a large 
continuous block of coherent memory at boot time with 
dma_alloc_coherent(), this block could be reserved for the device. The 
only problem is with the dma_declare_coherent_memory() calling 
ioremap(), which was designed with a device's dedicated physical memory 
in mind, but shouldn't be called on a memory already mapped.

There were three approaches proposed, two of them in August 2010:
http://www.spinics.net/lists/linux-media/msg22179.html,
http://www.spinics.net/lists/arm-kernel/msg96318.html,
and a third one in January 2011:
http://www.spinics.net/lists/linux-arch/msg12637.html.

As far as I can understand the reason why both of the first two were 
NAKed, it was suggested that videobuf-dma-contig shouldn't use coherent 
if all it requires is a contiguous memory, and a new API should be 
invented, or dma_pool API extended, for providing contiguous memory. The 
CMA was pointed out as a new work in progress contiguous memory API. Now 
it turns out it's not, it's only a helper to ensure that 
dma_alloc_coherent() always succeeds, and videobuf2-dma-contig is still 
going to allocate buffers from coherent memory.

(CCing both authors, Marin Mitov and Guennadi Liakhovetski, and their 
main opponent, FUJITA Tomonori)

The third solution was not discussed much after it was pointed out as 
being not very different from those two in terms of the above mentioned 
rationale.

All three solutions was different from now suggested method of unmapping 
some low memory and then calling dma_declare_coherent_memory() which 
ioremaps it in that those tried to reserve some boot time allocated 
coherent memory, already mapped correctly, without (io)remapping it.

If there are still problems with the CMA on one hand, and a need for a 
hack to handle "crazy devices" is still seen, regardless of CMA 
available and working or not, on the other, maybe we should get back to 
the idea of adopting coherent API to new requirements, review those 
three proposals again and select one which seems most acceptable to 
everyone? Being a submitter of the third, I'll be happy to refresh it if 
selected.

Thanks,
Janusz

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-09 14:57                 ` Janusz Krzysztofik
  0 siblings, 0 replies; 183+ messages in thread
From: Janusz Krzysztofik @ 2011-07-09 14:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Nicolas Pitre, Russell King - ARM Linux,
	'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, 'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Andrew Morton',
	linux-mm, 'KAMEZAWA Hiroyuki',
	linux-media, Guennadi Liakhovetski, Marin Mitov, FUJITA Tomonori

On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > Another issue is that when a platform has restricted DMA regions,
> > > they typically don't fall into the highmem zone.  As the
> > > dmabounce code allocates from the DMA coherent allocator to
> > > provide it with guaranteed DMA-able memory, that would be rather
> > > inconvenient.
> > 
> > Do we encounter this in practice i.e. do those platforms requiring
> > large contiguous allocations motivating this work have such DMA
> > restrictions?
> 
> You can probably find one or two of those, but we don't have to
> optimize for that case. I would at least expect the maximum size of
> the allocation to be smaller than the DMA limit for these, and
> consequently mandate that they define a sufficiently large
> CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a hack to
> unmap some low memory and call
> dma_declare_coherent_memory() for the device.

Once found that Russell has dropped his "ARM: DMA: steal memory for DMA 
coherent mappings" for now, let me get back to this idea of a hack that 
would allow for safely calling dma_declare_coherent_memory() in order to 
assign a device with a block of contiguous memory for exclusive use. 
Assuming there should be no problem with successfully allocating a large 
continuous block of coherent memory at boot time with 
dma_alloc_coherent(), this block could be reserved for the device. The 
only problem is with the dma_declare_coherent_memory() calling 
ioremap(), which was designed with a device's dedicated physical memory 
in mind, but shouldn't be called on a memory already mapped.

There were three approaches proposed, two of them in August 2010:
http://www.spinics.net/lists/linux-media/msg22179.html,
http://www.spinics.net/lists/arm-kernel/msg96318.html,
and a third one in January 2011:
http://www.spinics.net/lists/linux-arch/msg12637.html.

As far as I can understand the reason why both of the first two were 
NAKed, it was suggested that videobuf-dma-contig shouldn't use coherent 
if all it requires is a contiguous memory, and a new API should be 
invented, or dma_pool API extended, for providing contiguous memory. The 
CMA was pointed out as a new work in progress contiguous memory API. Now 
it turns out it's not, it's only a helper to ensure that 
dma_alloc_coherent() always succeeds, and videobuf2-dma-contig is still 
going to allocate buffers from coherent memory.

(CCing both authors, Marin Mitov and Guennadi Liakhovetski, and their 
main opponent, FUJITA Tomonori)

The third solution was not discussed much after it was pointed out as 
being not very different from those two in terms of the above mentioned 
rationale.

All three solutions was different from now suggested method of unmapping 
some low memory and then calling dma_declare_coherent_memory() which 
ioremaps it in that those tried to reserve some boot time allocated 
coherent memory, already mapped correctly, without (io)remapping it.

If there are still problems with the CMA on one hand, and a need for a 
hack to handle "crazy devices" is still seen, regardless of CMA 
available and working or not, on the other, maybe we should get back to 
the idea of adopting coherent API to new requirements, review those 
three proposals again and select one which seems most acceptable to 
everyone? Being a submitter of the third, I'll be happy to refresh it if 
selected.

Thanks,
Janusz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-09 14:57                 ` Janusz Krzysztofik
  0 siblings, 0 replies; 183+ messages in thread
From: Janusz Krzysztofik @ 2011-07-09 14:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > Another issue is that when a platform has restricted DMA regions,
> > > they typically don't fall into the highmem zone.  As the
> > > dmabounce code allocates from the DMA coherent allocator to
> > > provide it with guaranteed DMA-able memory, that would be rather
> > > inconvenient.
> > 
> > Do we encounter this in practice i.e. do those platforms requiring
> > large contiguous allocations motivating this work have such DMA
> > restrictions?
> 
> You can probably find one or two of those, but we don't have to
> optimize for that case. I would at least expect the maximum size of
> the allocation to be smaller than the DMA limit for these, and
> consequently mandate that they define a sufficiently large
> CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a hack to
> unmap some low memory and call
> dma_declare_coherent_memory() for the device.

Once found that Russell has dropped his "ARM: DMA: steal memory for DMA 
coherent mappings" for now, let me get back to this idea of a hack that 
would allow for safely calling dma_declare_coherent_memory() in order to 
assign a device with a block of contiguous memory for exclusive use. 
Assuming there should be no problem with successfully allocating a large 
continuous block of coherent memory at boot time with 
dma_alloc_coherent(), this block could be reserved for the device. The 
only problem is with the dma_declare_coherent_memory() calling 
ioremap(), which was designed with a device's dedicated physical memory 
in mind, but shouldn't be called on a memory already mapped.

There were three approaches proposed, two of them in August 2010:
http://www.spinics.net/lists/linux-media/msg22179.html,
http://www.spinics.net/lists/arm-kernel/msg96318.html,
and a third one in January 2011:
http://www.spinics.net/lists/linux-arch/msg12637.html.

As far as I can understand the reason why both of the first two were 
NAKed, it was suggested that videobuf-dma-contig shouldn't use coherent 
if all it requires is a contiguous memory, and a new API should be 
invented, or dma_pool API extended, for providing contiguous memory. The 
CMA was pointed out as a new work in progress contiguous memory API. Now 
it turns out it's not, it's only a helper to ensure that 
dma_alloc_coherent() always succeeds, and videobuf2-dma-contig is still 
going to allocate buffers from coherent memory.

(CCing both authors, Marin Mitov and Guennadi Liakhovetski, and their 
main opponent, FUJITA Tomonori)

The third solution was not discussed much after it was pointed out as 
being not very different from those two in terms of the above mentioned 
rationale.

All three solutions was different from now suggested method of unmapping 
some low memory and then calling dma_declare_coherent_memory() which 
ioremaps it in that those tried to reserve some boot time allocated 
coherent memory, already mapped correctly, without (io)remapping it.

If there are still problems with the CMA on one hand, and a need for a 
hack to handle "crazy devices" is still seen, regardless of CMA 
available and working or not, on the other, maybe we should get back to 
the idea of adopting coherent API to new requirements, review those 
three proposals again and select one which seems most acceptable to 
everyone? Being a submitter of the third, I'll be happy to refresh it if 
selected.

Thanks,
Janusz

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCHv11 0/8] Contiguous Memory Allocator
  2011-07-06 22:11     ` Andrew Morton
  (?)
@ 2011-07-11 13:24       ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-11 13:24 UTC (permalink / raw)
  To: 'Andrew Morton', 'Arnd Bergmann'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'KAMEZAWA Hiroyuki',
	'Ankita Garg', 'Daniel Walker',
	'Mel Gorman', 'Jesse Barker',
	'Jonathan Corbet', 'Chunsang Jeong'

Hello,

On Thursday, July 07, 2011 12:11 AM Andrew Morton wrote:

> On Tue, 5 Jul 2011 14:07:17 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Tuesday 05 July 2011, Marek Szyprowski wrote:
> > > This is yet another round of Contiguous Memory Allocator patches. I
> hope
> > > that I've managed to resolve all the items discussed during the Memory
> > > Management summit at Linaro Meeting in Budapest and pointed later on
> > > mailing lists. The goal is to integrate it as tight as possible with
> > > other kernel subsystems (like memory management and dma-mapping) and
> > > finally merge to mainline.
> >
> > You have certainly addressed all of my concerns, this looks really good
> now!
> >
> > Andrew, can you add this to your -mm tree? What's your opinion on the
> > current state, do you think this is ready for merging in 3.1 or would
> > you want to have more reviews from core memory management people?
> >
> > My reviews were mostly on the driver and platform API side, and I think
> > we're fine there now, but I don't really understand the impacts this has
> > in mm.
> 
> I could review it and put it in there on a preliminary basis for some
> runtime testing.  But the question in my mind is how different will the
> code be after the problems which rmk has identified have been fixed?
> 
> If "not very different" then that effort and testing will have been
> worthwhile.

The issue reported by Russell is very ARM specific and can be solved mostly
in arch/arm/mm/dma-mapping.c, maybe with some minor changes/helpers in
drivers/base/dma-contiguous.c The core part in linux/mm probably won't be
affected by these changes at all.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-11 13:24       ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-11 13:24 UTC (permalink / raw)
  To: 'Andrew Morton', 'Arnd Bergmann'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'KAMEZAWA Hiroyuki',
	'Ankita Garg', 'Daniel Walker',
	'Mel Gorman', 'Jesse Barker',
	'Jonathan Corbet', 'Chunsang Jeong'

Hello,

On Thursday, July 07, 2011 12:11 AM Andrew Morton wrote:

> On Tue, 5 Jul 2011 14:07:17 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Tuesday 05 July 2011, Marek Szyprowski wrote:
> > > This is yet another round of Contiguous Memory Allocator patches. I
> hope
> > > that I've managed to resolve all the items discussed during the Memory
> > > Management summit at Linaro Meeting in Budapest and pointed later on
> > > mailing lists. The goal is to integrate it as tight as possible with
> > > other kernel subsystems (like memory management and dma-mapping) and
> > > finally merge to mainline.
> >
> > You have certainly addressed all of my concerns, this looks really good
> now!
> >
> > Andrew, can you add this to your -mm tree? What's your opinion on the
> > current state, do you think this is ready for merging in 3.1 or would
> > you want to have more reviews from core memory management people?
> >
> > My reviews were mostly on the driver and platform API side, and I think
> > we're fine there now, but I don't really understand the impacts this has
> > in mm.
> 
> I could review it and put it in there on a preliminary basis for some
> runtime testing.  But the question in my mind is how different will the
> code be after the problems which rmk has identified have been fixed?
> 
> If "not very different" then that effort and testing will have been
> worthwhile.

The issue reported by Russell is very ARM specific and can be solved mostly
in arch/arm/mm/dma-mapping.c, maybe with some minor changes/helpers in
drivers/base/dma-contiguous.c The core part in linux/mm probably won't be
affected by these changes at all.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCHv11 0/8] Contiguous Memory Allocator
@ 2011-07-11 13:24       ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-11 13:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Thursday, July 07, 2011 12:11 AM Andrew Morton wrote:

> On Tue, 5 Jul 2011 14:07:17 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Tuesday 05 July 2011, Marek Szyprowski wrote:
> > > This is yet another round of Contiguous Memory Allocator patches. I
> hope
> > > that I've managed to resolve all the items discussed during the Memory
> > > Management summit at Linaro Meeting in Budapest and pointed later on
> > > mailing lists. The goal is to integrate it as tight as possible with
> > > other kernel subsystems (like memory management and dma-mapping) and
> > > finally merge to mainline.
> >
> > You have certainly addressed all of my concerns, this looks really good
> now!
> >
> > Andrew, can you add this to your -mm tree? What's your opinion on the
> > current state, do you think this is ready for merging in 3.1 or would
> > you want to have more reviews from core memory management people?
> >
> > My reviews were mostly on the driver and platform API side, and I think
> > we're fine there now, but I don't really understand the impacts this has
> > in mm.
> 
> I could review it and put it in there on a preliminary basis for some
> runtime testing.  But the question in my mind is how different will the
> code be after the problems which rmk has identified have been fixed?
> 
> If "not very different" then that effort and testing will have been
> worthwhile.

The issue reported by Russell is very ARM specific and can be solved mostly
in arch/arm/mm/dma-mapping.c, maybe with some minor changes/helpers in
drivers/base/dma-contiguous.c The core part in linux/mm probably won't be
affected by these changes at all.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-09 14:57                 ` Janusz Krzysztofik
  (?)
@ 2011-07-11 13:47                   ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-11 13:47 UTC (permalink / raw)
  To: 'Janusz Krzysztofik', 'Arnd Bergmann'
  Cc: 'Marin Mitov', 'Daniel Walker',
	'Russell King - ARM Linux', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	'Guennadi Liakhovetski',
	linaro-mm-sig, 'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'FUJITA Tomonori',
	'Andrew Morton',
	linux-mm, linux-arm-kernel, linux-media

Hello,

On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:

> On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > Another issue is that when a platform has restricted DMA regions,
> > > > they typically don't fall into the highmem zone.  As the
> > > > dmabounce code allocates from the DMA coherent allocator to
> > > > provide it with guaranteed DMA-able memory, that would be rather
> > > > inconvenient.
> > >
> > > Do we encounter this in practice i.e. do those platforms requiring
> > > large contiguous allocations motivating this work have such DMA
> > > restrictions?
> >
> > You can probably find one or two of those, but we don't have to
> > optimize for that case. I would at least expect the maximum size of
> > the allocation to be smaller than the DMA limit for these, and
> > consequently mandate that they define a sufficiently large
> > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a hack to
> > unmap some low memory and call
> > dma_declare_coherent_memory() for the device.
> 
> Once found that Russell has dropped his "ARM: DMA: steal memory for DMA
> coherent mappings" for now, let me get back to this idea of a hack that
> would allow for safely calling dma_declare_coherent_memory() in order to
> assign a device with a block of contiguous memory for exclusive use.

We tested such approach and finally with 3.0-rc1 it works fine. You can find
an example for dma_declare_coherent() together with required memblock_remove()
calls in the following patch series:
http://www.spinics.net/lists/linux-samsung-soc/msg05026.html 
"[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and EXYNOS4"

> Assuming there should be no problem with successfully allocating a large
> continuous block of coherent memory at boot time with
> dma_alloc_coherent(), this block could be reserved for the device. The
> only problem is with the dma_declare_coherent_memory() calling
> ioremap(), which was designed with a device's dedicated physical memory
> in mind, but shouldn't be called on a memory already mapped.

All these issues with ioremap has been finally resolved in 3.0-rc1. Like
Russell pointed me in http://www.spinics.net/lists/arm-kernel/msg127644.html,
ioremap can be fixed to work on early reserved memory areas by selecting
ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.

> There were three approaches proposed, two of them in August 2010:
> http://www.spinics.net/lists/linux-media/msg22179.html,
> http://www.spinics.net/lists/arm-kernel/msg96318.html,
> and a third one in January 2011:
> http://www.spinics.net/lists/linux-arch/msg12637.html.
> 
> As far as I can understand the reason why both of the first two were
> NAKed, it was suggested that videobuf-dma-contig shouldn't use coherent
> if all it requires is a contiguous memory, and a new API should be
> invented, or dma_pool API extended, for providing contiguous memory.

This is another story. DMA-mapping framework definitely needs some 
extensions to allow more detailed specification of the allocated memory
(currently we have only coherent and nearly ARM-specific writecombine).
During Linaro Memory Management summit we agreed that the 
dma_alloc_attrs() function might be needed to clean-up the API and
provide a nice way of adding new memory parameters. Having a possibility
to allocate contiguous cached buffers might be one of the new DMA
attributes. Here are some details of my proposal:
http://www.spinics.net/lists/linux-mm/msg21235.html

> The
> CMA was pointed out as a new work in progress contiguous memory API.

That was probably the biggest mistake at the beginning. We definitely 
should have learned dma-mapping framework and its internals.

> Now
> it turns out it's not, it's only a helper to ensure that
> dma_alloc_coherent() always succeeds, and videobuf2-dma-contig is still
> going to allocate buffers from coherent memory.

I hope that once the dma_alloc_attrs() API will be accepted, I will add
support for memory attributes to videobuf2-dma-contig allocator. 
 
> (CCing both authors, Marin Mitov and Guennadi Liakhovetski, and their
> main opponent, FUJITA Tomonori)
> 
> The third solution was not discussed much after it was pointed out as
> being not very different from those two in terms of the above mentioned
> rationale.
> 
> All three solutions was different from now suggested method of unmapping
> some low memory and then calling dma_declare_coherent_memory() which
> ioremaps it in that those tried to reserve some boot time allocated
> coherent memory, already mapped correctly, without (io)remapping it.
> 
> If there are still problems with the CMA on one hand, and a need for a
> hack to handle "crazy devices" is still seen, regardless of CMA
> available and working or not, on the other, maybe we should get back to
> the idea of adopting coherent API to new requirements, review those
> three proposals again and select one which seems most acceptable to
> everyone? Being a submitter of the third, I'll be happy to refresh it if
> selected.

I'm open to discussion.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-11 13:47                   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-11 13:47 UTC (permalink / raw)
  To: 'Janusz Krzysztofik', 'Arnd Bergmann'
  Cc: 'Marin Mitov', 'Daniel Walker',
	'Russell King - ARM Linux', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	'Guennadi Liakhovetski',
	linaro-mm-sig, 'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'FUJITA Tomonori',
	'Andrew Morton',
	linux-mm, linux-arm-kernel, linux-media

Hello,

On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:

> On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > Another issue is that when a platform has restricted DMA regions,
> > > > they typically don't fall into the highmem zone.  As the
> > > > dmabounce code allocates from the DMA coherent allocator to
> > > > provide it with guaranteed DMA-able memory, that would be rather
> > > > inconvenient.
> > >
> > > Do we encounter this in practice i.e. do those platforms requiring
> > > large contiguous allocations motivating this work have such DMA
> > > restrictions?
> >
> > You can probably find one or two of those, but we don't have to
> > optimize for that case. I would at least expect the maximum size of
> > the allocation to be smaller than the DMA limit for these, and
> > consequently mandate that they define a sufficiently large
> > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a hack to
> > unmap some low memory and call
> > dma_declare_coherent_memory() for the device.
> 
> Once found that Russell has dropped his "ARM: DMA: steal memory for DMA
> coherent mappings" for now, let me get back to this idea of a hack that
> would allow for safely calling dma_declare_coherent_memory() in order to
> assign a device with a block of contiguous memory for exclusive use.

We tested such approach and finally with 3.0-rc1 it works fine. You can find
an example for dma_declare_coherent() together with required memblock_remove()
calls in the following patch series:
http://www.spinics.net/lists/linux-samsung-soc/msg05026.html 
"[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and EXYNOS4"

> Assuming there should be no problem with successfully allocating a large
> continuous block of coherent memory at boot time with
> dma_alloc_coherent(), this block could be reserved for the device. The
> only problem is with the dma_declare_coherent_memory() calling
> ioremap(), which was designed with a device's dedicated physical memory
> in mind, but shouldn't be called on a memory already mapped.

All these issues with ioremap has been finally resolved in 3.0-rc1. Like
Russell pointed me in http://www.spinics.net/lists/arm-kernel/msg127644.html,
ioremap can be fixed to work on early reserved memory areas by selecting
ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.

> There were three approaches proposed, two of them in August 2010:
> http://www.spinics.net/lists/linux-media/msg22179.html,
> http://www.spinics.net/lists/arm-kernel/msg96318.html,
> and a third one in January 2011:
> http://www.spinics.net/lists/linux-arch/msg12637.html.
> 
> As far as I can understand the reason why both of the first two were
> NAKed, it was suggested that videobuf-dma-contig shouldn't use coherent
> if all it requires is a contiguous memory, and a new API should be
> invented, or dma_pool API extended, for providing contiguous memory.

This is another story. DMA-mapping framework definitely needs some 
extensions to allow more detailed specification of the allocated memory
(currently we have only coherent and nearly ARM-specific writecombine).
During Linaro Memory Management summit we agreed that the 
dma_alloc_attrs() function might be needed to clean-up the API and
provide a nice way of adding new memory parameters. Having a possibility
to allocate contiguous cached buffers might be one of the new DMA
attributes. Here are some details of my proposal:
http://www.spinics.net/lists/linux-mm/msg21235.html

> The
> CMA was pointed out as a new work in progress contiguous memory API.

That was probably the biggest mistake at the beginning. We definitely 
should have learned dma-mapping framework and its internals.

> Now
> it turns out it's not, it's only a helper to ensure that
> dma_alloc_coherent() always succeeds, and videobuf2-dma-contig is still
> going to allocate buffers from coherent memory.

I hope that once the dma_alloc_attrs() API will be accepted, I will add
support for memory attributes to videobuf2-dma-contig allocator. 
 
> (CCing both authors, Marin Mitov and Guennadi Liakhovetski, and their
> main opponent, FUJITA Tomonori)
> 
> The third solution was not discussed much after it was pointed out as
> being not very different from those two in terms of the above mentioned
> rationale.
> 
> All three solutions was different from now suggested method of unmapping
> some low memory and then calling dma_declare_coherent_memory() which
> ioremaps it in that those tried to reserve some boot time allocated
> coherent memory, already mapped correctly, without (io)remapping it.
> 
> If there are still problems with the CMA on one hand, and a need for a
> hack to handle "crazy devices" is still seen, regardless of CMA
> available and working or not, on the other, maybe we should get back to
> the idea of adopting coherent API to new requirements, review those
> three proposals again and select one which seems most acceptable to
> everyone? Being a submitter of the third, I'll be happy to refresh it if
> selected.

I'm open to discussion.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-11 13:47                   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-11 13:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:

> On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > Another issue is that when a platform has restricted DMA regions,
> > > > they typically don't fall into the highmem zone.  As the
> > > > dmabounce code allocates from the DMA coherent allocator to
> > > > provide it with guaranteed DMA-able memory, that would be rather
> > > > inconvenient.
> > >
> > > Do we encounter this in practice i.e. do those platforms requiring
> > > large contiguous allocations motivating this work have such DMA
> > > restrictions?
> >
> > You can probably find one or two of those, but we don't have to
> > optimize for that case. I would at least expect the maximum size of
> > the allocation to be smaller than the DMA limit for these, and
> > consequently mandate that they define a sufficiently large
> > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a hack to
> > unmap some low memory and call
> > dma_declare_coherent_memory() for the device.
> 
> Once found that Russell has dropped his "ARM: DMA: steal memory for DMA
> coherent mappings" for now, let me get back to this idea of a hack that
> would allow for safely calling dma_declare_coherent_memory() in order to
> assign a device with a block of contiguous memory for exclusive use.

We tested such approach and finally with 3.0-rc1 it works fine. You can find
an example for dma_declare_coherent() together with required memblock_remove()
calls in the following patch series:
http://www.spinics.net/lists/linux-samsung-soc/msg05026.html 
"[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and EXYNOS4"

> Assuming there should be no problem with successfully allocating a large
> continuous block of coherent memory at boot time with
> dma_alloc_coherent(), this block could be reserved for the device. The
> only problem is with the dma_declare_coherent_memory() calling
> ioremap(), which was designed with a device's dedicated physical memory
> in mind, but shouldn't be called on a memory already mapped.

All these issues with ioremap has been finally resolved in 3.0-rc1. Like
Russell pointed me in http://www.spinics.net/lists/arm-kernel/msg127644.html,
ioremap can be fixed to work on early reserved memory areas by selecting
ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.

> There were three approaches proposed, two of them in August 2010:
> http://www.spinics.net/lists/linux-media/msg22179.html,
> http://www.spinics.net/lists/arm-kernel/msg96318.html,
> and a third one in January 2011:
> http://www.spinics.net/lists/linux-arch/msg12637.html.
> 
> As far as I can understand the reason why both of the first two were
> NAKed, it was suggested that videobuf-dma-contig shouldn't use coherent
> if all it requires is a contiguous memory, and a new API should be
> invented, or dma_pool API extended, for providing contiguous memory.

This is another story. DMA-mapping framework definitely needs some 
extensions to allow more detailed specification of the allocated memory
(currently we have only coherent and nearly ARM-specific writecombine).
During Linaro Memory Management summit we agreed that the 
dma_alloc_attrs() function might be needed to clean-up the API and
provide a nice way of adding new memory parameters. Having a possibility
to allocate contiguous cached buffers might be one of the new DMA
attributes. Here are some details of my proposal:
http://www.spinics.net/lists/linux-mm/msg21235.html

> The
> CMA was pointed out as a new work in progress contiguous memory API.

That was probably the biggest mistake at the beginning. We definitely 
should have learned dma-mapping framework and its internals.

> Now
> it turns out it's not, it's only a helper to ensure that
> dma_alloc_coherent() always succeeds, and videobuf2-dma-contig is still
> going to allocate buffers from coherent memory.

I hope that once the dma_alloc_attrs() API will be accepted, I will add
support for memory attributes to videobuf2-dma-contig allocator. 
 
> (CCing both authors, Marin Mitov and Guennadi Liakhovetski, and their
> main opponent, FUJITA Tomonori)
> 
> The third solution was not discussed much after it was pointed out as
> being not very different from those two in terms of the above mentioned
> rationale.
> 
> All three solutions was different from now suggested method of unmapping
> some low memory and then calling dma_declare_coherent_memory() which
> ioremaps it in that those tried to reserve some boot time allocated
> coherent memory, already mapped correctly, without (io)remapping it.
> 
> If there are still problems with the CMA on one hand, and a need for a
> hack to handle "crazy devices" is still seen, regardless of CMA
> available and working or not, on the other, maybe we should get back to
> the idea of adopting coherent API to new requirements, review those
> three proposals again and select one which seems most acceptable to
> everyone? Being a submitter of the third, I'll be happy to refresh it if
> selected.

I'm open to discussion.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-11 13:47                   ` Marek Szyprowski
  (?)
@ 2011-07-11 19:01                     ` Janusz Krzysztofik
  -1 siblings, 0 replies; 183+ messages in thread
From: Janusz Krzysztofik @ 2011-07-11 19:01 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Arnd Bergmann', 'Marin Mitov',
	'Daniel Walker', 'Russell King - ARM Linux',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	'Guennadi Liakhovetski',
	linaro-mm-sig, 'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'FUJITA Tomonori',
	'Andrew Morton',
	linux-mm, linux-arm-kernel, linux-media

Dnia poniedziałek, 11 lipca 2011 o 15:47:32 Marek Szyprowski napisał(a):
> Hello,
> 
> On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:
> > On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > > Another issue is that when a platform has restricted DMA
> > > > > regions, they typically don't fall into the highmem zone. 
> > > > > As the dmabounce code allocates from the DMA coherent
> > > > > allocator to provide it with guaranteed DMA-able memory,
> > > > > that would be rather inconvenient.
> > > > 
> > > > Do we encounter this in practice i.e. do those platforms
> > > > requiring large contiguous allocations motivating this work
> > > > have such DMA restrictions?
> > > 
> > > You can probably find one or two of those, but we don't have to
> > > optimize for that case. I would at least expect the maximum size
> > > of the allocation to be smaller than the DMA limit for these,
> > > and consequently mandate that they define a sufficiently large
> > > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a
> > > hack to unmap some low memory and call
> > > dma_declare_coherent_memory() for the device.
> > 
> > Once found that Russell has dropped his "ARM: DMA: steal memory for
> > DMA coherent mappings" for now, let me get back to this idea of a
> > hack that would allow for safely calling
> > dma_declare_coherent_memory() in order to assign a device with a
> > block of contiguous memory for exclusive use.
> 
> We tested such approach and finally with 3.0-rc1 it works fine. You
> can find an example for dma_declare_coherent() together with
> required memblock_remove() calls in the following patch series:
> http://www.spinics.net/lists/linux-samsung-soc/msg05026.html
> "[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and
> EXYNOS4"
> 
> > Assuming there should be no problem with successfully allocating a
> > large continuous block of coherent memory at boot time with
> > dma_alloc_coherent(), this block could be reserved for the device.
> > The only problem is with the dma_declare_coherent_memory() calling
> > ioremap(), which was designed with a device's dedicated physical
> > memory in mind, but shouldn't be called on a memory already
> > mapped.
> 
> All these issues with ioremap has been finally resolved in 3.0-rc1.
> Like Russell pointed me in
> http://www.spinics.net/lists/arm-kernel/msg127644.html, ioremap can
> be fixed to work on early reserved memory areas by selecting
> ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.

I'm not sure. Recently I tried to refresh my now 7 months old patch in 
which I used that 'memblock_remove() then dma_declare_coherent_memery()' 
method[1]. It was different from your S5P MFC example in that it didn't 
punch any holes in the system memory, only stole a block of SDRAM from 
its tail. But Russell reminded me again: "we should not be mapping SDRAM 
using device mappings."[2]. Would defining ARCH_HAS_HOLES_MEMORYMODEL 
(even if it was justified) make any diference in my case? I don't think 
so. Wnat I think, after Russell, is that we still need that obligatory 
ioremap() removed from dma_declare_coherent_memory(), or made it 
optional, or a separate dma_declare_coherent_memory()-like function 
without (obligatory) ioremap() provided by the DMA API, in order to get 
the dma_declare_coherent_memery() method being accepted without any 
reservations when used inside arch/arm, I'm afraid.

Thanks,
Janusz

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-December/034644.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2011-June/052488.html

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-11 19:01                     ` Janusz Krzysztofik
  0 siblings, 0 replies; 183+ messages in thread
From: Janusz Krzysztofik @ 2011-07-11 19:01 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Arnd Bergmann', 'Marin Mitov',
	'Daniel Walker', 'Russell King - ARM Linux',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	'Guennadi Liakhovetski',
	linaro-mm-sig, 'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'FUJITA Tomonori',
	'Andrew Morton',
	linux-mm, linux-arm-kernel, linux-media

Dnia poniedziałek, 11 lipca 2011 o 15:47:32 Marek Szyprowski napisał(a):
> Hello,
> 
> On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:
> > On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > > Another issue is that when a platform has restricted DMA
> > > > > regions, they typically don't fall into the highmem zone. 
> > > > > As the dmabounce code allocates from the DMA coherent
> > > > > allocator to provide it with guaranteed DMA-able memory,
> > > > > that would be rather inconvenient.
> > > > 
> > > > Do we encounter this in practice i.e. do those platforms
> > > > requiring large contiguous allocations motivating this work
> > > > have such DMA restrictions?
> > > 
> > > You can probably find one or two of those, but we don't have to
> > > optimize for that case. I would at least expect the maximum size
> > > of the allocation to be smaller than the DMA limit for these,
> > > and consequently mandate that they define a sufficiently large
> > > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a
> > > hack to unmap some low memory and call
> > > dma_declare_coherent_memory() for the device.
> > 
> > Once found that Russell has dropped his "ARM: DMA: steal memory for
> > DMA coherent mappings" for now, let me get back to this idea of a
> > hack that would allow for safely calling
> > dma_declare_coherent_memory() in order to assign a device with a
> > block of contiguous memory for exclusive use.
> 
> We tested such approach and finally with 3.0-rc1 it works fine. You
> can find an example for dma_declare_coherent() together with
> required memblock_remove() calls in the following patch series:
> http://www.spinics.net/lists/linux-samsung-soc/msg05026.html
> "[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and
> EXYNOS4"
> 
> > Assuming there should be no problem with successfully allocating a
> > large continuous block of coherent memory at boot time with
> > dma_alloc_coherent(), this block could be reserved for the device.
> > The only problem is with the dma_declare_coherent_memory() calling
> > ioremap(), which was designed with a device's dedicated physical
> > memory in mind, but shouldn't be called on a memory already
> > mapped.
> 
> All these issues with ioremap has been finally resolved in 3.0-rc1.
> Like Russell pointed me in
> http://www.spinics.net/lists/arm-kernel/msg127644.html, ioremap can
> be fixed to work on early reserved memory areas by selecting
> ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.

I'm not sure. Recently I tried to refresh my now 7 months old patch in 
which I used that 'memblock_remove() then dma_declare_coherent_memery()' 
method[1]. It was different from your S5P MFC example in that it didn't 
punch any holes in the system memory, only stole a block of SDRAM from 
its tail. But Russell reminded me again: "we should not be mapping SDRAM 
using device mappings."[2]. Would defining ARCH_HAS_HOLES_MEMORYMODEL 
(even if it was justified) make any diference in my case? I don't think 
so. Wnat I think, after Russell, is that we still need that obligatory 
ioremap() removed from dma_declare_coherent_memory(), or made it 
optional, or a separate dma_declare_coherent_memory()-like function 
without (obligatory) ioremap() provided by the DMA API, in order to get 
the dma_declare_coherent_memery() method being accepted without any 
reservations when used inside arch/arm, I'm afraid.

Thanks,
Janusz

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-December/034644.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2011-June/052488.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-11 19:01                     ` Janusz Krzysztofik
  0 siblings, 0 replies; 183+ messages in thread
From: Janusz Krzysztofik @ 2011-07-11 19:01 UTC (permalink / raw)
  To: linux-arm-kernel

Dnia poniedzia?ek, 11 lipca 2011 o 15:47:32 Marek Szyprowski napisa?(a):
> Hello,
> 
> On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:
> > On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > > Another issue is that when a platform has restricted DMA
> > > > > regions, they typically don't fall into the highmem zone. 
> > > > > As the dmabounce code allocates from the DMA coherent
> > > > > allocator to provide it with guaranteed DMA-able memory,
> > > > > that would be rather inconvenient.
> > > > 
> > > > Do we encounter this in practice i.e. do those platforms
> > > > requiring large contiguous allocations motivating this work
> > > > have such DMA restrictions?
> > > 
> > > You can probably find one or two of those, but we don't have to
> > > optimize for that case. I would at least expect the maximum size
> > > of the allocation to be smaller than the DMA limit for these,
> > > and consequently mandate that they define a sufficiently large
> > > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a
> > > hack to unmap some low memory and call
> > > dma_declare_coherent_memory() for the device.
> > 
> > Once found that Russell has dropped his "ARM: DMA: steal memory for
> > DMA coherent mappings" for now, let me get back to this idea of a
> > hack that would allow for safely calling
> > dma_declare_coherent_memory() in order to assign a device with a
> > block of contiguous memory for exclusive use.
> 
> We tested such approach and finally with 3.0-rc1 it works fine. You
> can find an example for dma_declare_coherent() together with
> required memblock_remove() calls in the following patch series:
> http://www.spinics.net/lists/linux-samsung-soc/msg05026.html
> "[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and
> EXYNOS4"
> 
> > Assuming there should be no problem with successfully allocating a
> > large continuous block of coherent memory at boot time with
> > dma_alloc_coherent(), this block could be reserved for the device.
> > The only problem is with the dma_declare_coherent_memory() calling
> > ioremap(), which was designed with a device's dedicated physical
> > memory in mind, but shouldn't be called on a memory already
> > mapped.
> 
> All these issues with ioremap has been finally resolved in 3.0-rc1.
> Like Russell pointed me in
> http://www.spinics.net/lists/arm-kernel/msg127644.html, ioremap can
> be fixed to work on early reserved memory areas by selecting
> ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.

I'm not sure. Recently I tried to refresh my now 7 months old patch in 
which I used that 'memblock_remove() then dma_declare_coherent_memery()' 
method[1]. It was different from your S5P MFC example in that it didn't 
punch any holes in the system memory, only stole a block of SDRAM from 
its tail. But Russell reminded me again: "we should not be mapping SDRAM 
using device mappings."[2]. Would defining ARCH_HAS_HOLES_MEMORYMODEL 
(even if it was justified) make any diference in my case? I don't think 
so. Wnat I think, after Russell, is that we still need that obligatory 
ioremap() removed from dma_declare_coherent_memory(), or made it 
optional, or a separate dma_declare_coherent_memory()-like function 
without (obligatory) ioremap() provided by the DMA API, in order to get 
the dma_declare_coherent_memery() method being accepted without any 
reservations when used inside arch/arm, I'm afraid.

Thanks,
Janusz

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-December/034644.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2011-June/052488.html

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-11 19:01                     ` Janusz Krzysztofik
  (?)
@ 2011-07-12  5:34                       ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-12  5:34 UTC (permalink / raw)
  To: 'Janusz Krzysztofik'
  Cc: 'Arnd Bergmann', 'Marin Mitov',
	'Daniel Walker', 'Russell King - ARM Linux',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	'Guennadi Liakhovetski',
	linaro-mm-sig, 'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'FUJITA Tomonori',
	'Andrew Morton',
	linux-mm, linux-arm-kernel, linux-media

Hello,

On Monday, July 11, 2011 9:01 PM Janusz Krzysztofik wrote:

> Dnia poniedziałek, 11 lipca 2011 o 15:47:32 Marek Szyprowski napisał(a):
> > Hello,
> >
> > On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:
> > > On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > > > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > > > Another issue is that when a platform has restricted DMA
> > > > > > regions, they typically don't fall into the highmem zone.
> > > > > > As the dmabounce code allocates from the DMA coherent
> > > > > > allocator to provide it with guaranteed DMA-able memory,
> > > > > > that would be rather inconvenient.
> > > > >
> > > > > Do we encounter this in practice i.e. do those platforms
> > > > > requiring large contiguous allocations motivating this work
> > > > > have such DMA restrictions?
> > > >
> > > > You can probably find one or two of those, but we don't have to
> > > > optimize for that case. I would at least expect the maximum size
> > > > of the allocation to be smaller than the DMA limit for these,
> > > > and consequently mandate that they define a sufficiently large
> > > > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a
> > > > hack to unmap some low memory and call
> > > > dma_declare_coherent_memory() for the device.
> > >
> > > Once found that Russell has dropped his "ARM: DMA: steal memory for
> > > DMA coherent mappings" for now, let me get back to this idea of a
> > > hack that would allow for safely calling
> > > dma_declare_coherent_memory() in order to assign a device with a
> > > block of contiguous memory for exclusive use.
> >
> > We tested such approach and finally with 3.0-rc1 it works fine. You
> > can find an example for dma_declare_coherent() together with
> > required memblock_remove() calls in the following patch series:
> > http://www.spinics.net/lists/linux-samsung-soc/msg05026.html
> > "[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and
> > EXYNOS4"
> >
> > > Assuming there should be no problem with successfully allocating a
> > > large continuous block of coherent memory at boot time with
> > > dma_alloc_coherent(), this block could be reserved for the device.
> > > The only problem is with the dma_declare_coherent_memory() calling
> > > ioremap(), which was designed with a device's dedicated physical
> > > memory in mind, but shouldn't be called on a memory already
> > > mapped.
> >
> > All these issues with ioremap has been finally resolved in 3.0-rc1.
> > Like Russell pointed me in
> > http://www.spinics.net/lists/arm-kernel/msg127644.html, ioremap can
> > be fixed to work on early reserved memory areas by selecting
> > ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.
> 
> I'm not sure. Recently I tried to refresh my now 7 months old patch in
> which I used that 'memblock_remove() then dma_declare_coherent_memery()'
> method[1]. It was different from your S5P MFC example in that it didn't
> punch any holes in the system memory, only stole a block of SDRAM from
> its tail. But Russell reminded me again: "we should not be mapping SDRAM
> using device mappings."[2]. Would defining ARCH_HAS_HOLES_MEMORYMODEL
> (even if it was justified) make any diference in my case? I don't think
> so.

Defining ARCH_HAS_HOLES_MEMORYMODEL changes the behavior of valid_pfn()
macro/function, which is used in the ioremap(). When defined, valid_pfn()
checks if the selected pfn is inside system memory or not (using memblock
information). If the area is removed with memblock_remove(), then a check
with valid_pfn() fails and ioremap() doesn't complain about mapping
system memory.

> Wnat I think, after Russell, is that we still need that obligatory
> ioremap() removed from dma_declare_coherent_memory(), or made it
> optional, or a separate dma_declare_coherent_memory()-like function
> without (obligatory) ioremap() provided by the DMA API, in order to get
> the dma_declare_coherent_memery() method being accepted without any
> reservations when used inside arch/arm, I'm afraid.

Please check again with 3.0-rc1. ARCH_HAS_HOLES_MEMORYMODEL solution was
suggested by Russell. It looks like this is the correct solution for this
problem, because I don't believe that ioremap() will be removed from 
dma_declare_coherent() anytime soon. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-12  5:34                       ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-12  5:34 UTC (permalink / raw)
  To: 'Janusz Krzysztofik'
  Cc: 'Arnd Bergmann', 'Marin Mitov',
	'Daniel Walker', 'Russell King - ARM Linux',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'KAMEZAWA Hiroyuki',
	linux-kernel, 'Michal Nazarewicz',
	'Guennadi Liakhovetski',
	linaro-mm-sig, 'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'FUJITA Tomonori',
	'Andrew Morton',
	linux-mm, linux-arm-kernel, linux-media

Hello,

On Monday, July 11, 2011 9:01 PM Janusz Krzysztofik wrote:

> Dnia poniedziałek, 11 lipca 2011 o 15:47:32 Marek Szyprowski napisał(a):
> > Hello,
> >
> > On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:
> > > On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > > > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > > > Another issue is that when a platform has restricted DMA
> > > > > > regions, they typically don't fall into the highmem zone.
> > > > > > As the dmabounce code allocates from the DMA coherent
> > > > > > allocator to provide it with guaranteed DMA-able memory,
> > > > > > that would be rather inconvenient.
> > > > >
> > > > > Do we encounter this in practice i.e. do those platforms
> > > > > requiring large contiguous allocations motivating this work
> > > > > have such DMA restrictions?
> > > >
> > > > You can probably find one or two of those, but we don't have to
> > > > optimize for that case. I would at least expect the maximum size
> > > > of the allocation to be smaller than the DMA limit for these,
> > > > and consequently mandate that they define a sufficiently large
> > > > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a
> > > > hack to unmap some low memory and call
> > > > dma_declare_coherent_memory() for the device.
> > >
> > > Once found that Russell has dropped his "ARM: DMA: steal memory for
> > > DMA coherent mappings" for now, let me get back to this idea of a
> > > hack that would allow for safely calling
> > > dma_declare_coherent_memory() in order to assign a device with a
> > > block of contiguous memory for exclusive use.
> >
> > We tested such approach and finally with 3.0-rc1 it works fine. You
> > can find an example for dma_declare_coherent() together with
> > required memblock_remove() calls in the following patch series:
> > http://www.spinics.net/lists/linux-samsung-soc/msg05026.html
> > "[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and
> > EXYNOS4"
> >
> > > Assuming there should be no problem with successfully allocating a
> > > large continuous block of coherent memory at boot time with
> > > dma_alloc_coherent(), this block could be reserved for the device.
> > > The only problem is with the dma_declare_coherent_memory() calling
> > > ioremap(), which was designed with a device's dedicated physical
> > > memory in mind, but shouldn't be called on a memory already
> > > mapped.
> >
> > All these issues with ioremap has been finally resolved in 3.0-rc1.
> > Like Russell pointed me in
> > http://www.spinics.net/lists/arm-kernel/msg127644.html, ioremap can
> > be fixed to work on early reserved memory areas by selecting
> > ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.
> 
> I'm not sure. Recently I tried to refresh my now 7 months old patch in
> which I used that 'memblock_remove() then dma_declare_coherent_memery()'
> method[1]. It was different from your S5P MFC example in that it didn't
> punch any holes in the system memory, only stole a block of SDRAM from
> its tail. But Russell reminded me again: "we should not be mapping SDRAM
> using device mappings."[2]. Would defining ARCH_HAS_HOLES_MEMORYMODEL
> (even if it was justified) make any diference in my case? I don't think
> so.

Defining ARCH_HAS_HOLES_MEMORYMODEL changes the behavior of valid_pfn()
macro/function, which is used in the ioremap(). When defined, valid_pfn()
checks if the selected pfn is inside system memory or not (using memblock
information). If the area is removed with memblock_remove(), then a check
with valid_pfn() fails and ioremap() doesn't complain about mapping
system memory.

> Wnat I think, after Russell, is that we still need that obligatory
> ioremap() removed from dma_declare_coherent_memory(), or made it
> optional, or a separate dma_declare_coherent_memory()-like function
> without (obligatory) ioremap() provided by the DMA API, in order to get
> the dma_declare_coherent_memery() method being accepted without any
> reservations when used inside arch/arm, I'm afraid.

Please check again with 3.0-rc1. ARCH_HAS_HOLES_MEMORYMODEL solution was
suggested by Russell. It looks like this is the correct solution for this
problem, because I don't believe that ioremap() will be removed from 
dma_declare_coherent() anytime soon. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [Linaro-mm-sig] [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-12  5:34                       ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-12  5:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, July 11, 2011 9:01 PM Janusz Krzysztofik wrote:

> Dnia poniedzia?ek, 11 lipca 2011 o 15:47:32 Marek Szyprowski napisa?(a):
> > Hello,
> >
> > On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:
> > > On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > > > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > > > Another issue is that when a platform has restricted DMA
> > > > > > regions, they typically don't fall into the highmem zone.
> > > > > > As the dmabounce code allocates from the DMA coherent
> > > > > > allocator to provide it with guaranteed DMA-able memory,
> > > > > > that would be rather inconvenient.
> > > > >
> > > > > Do we encounter this in practice i.e. do those platforms
> > > > > requiring large contiguous allocations motivating this work
> > > > > have such DMA restrictions?
> > > >
> > > > You can probably find one or two of those, but we don't have to
> > > > optimize for that case. I would at least expect the maximum size
> > > > of the allocation to be smaller than the DMA limit for these,
> > > > and consequently mandate that they define a sufficiently large
> > > > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a
> > > > hack to unmap some low memory and call
> > > > dma_declare_coherent_memory() for the device.
> > >
> > > Once found that Russell has dropped his "ARM: DMA: steal memory for
> > > DMA coherent mappings" for now, let me get back to this idea of a
> > > hack that would allow for safely calling
> > > dma_declare_coherent_memory() in order to assign a device with a
> > > block of contiguous memory for exclusive use.
> >
> > We tested such approach and finally with 3.0-rc1 it works fine. You
> > can find an example for dma_declare_coherent() together with
> > required memblock_remove() calls in the following patch series:
> > http://www.spinics.net/lists/linux-samsung-soc/msg05026.html
> > "[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and
> > EXYNOS4"
> >
> > > Assuming there should be no problem with successfully allocating a
> > > large continuous block of coherent memory at boot time with
> > > dma_alloc_coherent(), this block could be reserved for the device.
> > > The only problem is with the dma_declare_coherent_memory() calling
> > > ioremap(), which was designed with a device's dedicated physical
> > > memory in mind, but shouldn't be called on a memory already
> > > mapped.
> >
> > All these issues with ioremap has been finally resolved in 3.0-rc1.
> > Like Russell pointed me in
> > http://www.spinics.net/lists/arm-kernel/msg127644.html, ioremap can
> > be fixed to work on early reserved memory areas by selecting
> > ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.
> 
> I'm not sure. Recently I tried to refresh my now 7 months old patch in
> which I used that 'memblock_remove() then dma_declare_coherent_memery()'
> method[1]. It was different from your S5P MFC example in that it didn't
> punch any holes in the system memory, only stole a block of SDRAM from
> its tail. But Russell reminded me again: "we should not be mapping SDRAM
> using device mappings."[2]. Would defining ARCH_HAS_HOLES_MEMORYMODEL
> (even if it was justified) make any diference in my case? I don't think
> so.

Defining ARCH_HAS_HOLES_MEMORYMODEL changes the behavior of valid_pfn()
macro/function, which is used in the ioremap(). When defined, valid_pfn()
checks if the selected pfn is inside system memory or not (using memblock
information). If the area is removed with memblock_remove(), then a check
with valid_pfn() fails and ioremap() doesn't complain about mapping
system memory.

> Wnat I think, after Russell, is that we still need that obligatory
> ioremap() removed from dma_declare_coherent_memory(), or made it
> optional, or a separate dma_declare_coherent_memory()-like function
> without (obligatory) ioremap() provided by the DMA API, in order to get
> the dma_declare_coherent_memery() method being accepted without any
> reservations when used inside arch/arm, I'm afraid.

Please check again with 3.0-rc1. ARCH_HAS_HOLES_MEMORYMODEL solution was
suggested by Russell. It looks like this is the correct solution for this
problem, because I don't believe that ioremap() will be removed from 
dma_declare_coherent() anytime soon. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-08 17:25             ` Russell King - ARM Linux
  (?)
@ 2011-07-12 13:39               ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-12 13:39 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arm-kernel, Daniel Walker, Jonathan Corbet, Mel Gorman,
	Chunsang Jeong, Jesse Barker, KAMEZAWA Hiroyuki, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	Ankita Garg, Andrew Morton, Marek Szyprowski, linux-media

On Friday 08 July 2011, Russell King - ARM Linux wrote:
> On Tue, Jul 05, 2011 at 03:58:39PM +0200, Arnd Bergmann wrote:
> 
> > If I'm reading your "ARM: DMA: steal memory for DMA coherent mappings"
> > correctly, the idea is to have a per-platform compile-time amount
> > of memory that is reserved purely for coherent allocations and
> > taking out of the buddy allocator, right?
> 
> Yes, because every time I've looked at taking out memory mappings in
> the first level page tables, it's always been a major issue.
> 
> We have a method where we can remove first level mappings on
> uniprocessor systems in the ioremap code just fine - we use that so
> that systems can setup section and supersection mappings.  They can
> tear them down as well - and we update other tasks L1 page tables
> when they get switched in.
> 
> This, however, doesn't work on SMP, because if you have a DMA allocation
> (which is permitted from IRQ context) you must have some way of removing
> the L1 page table entries from all CPUs TLBs and the page tables currently
> in use and any future page tables which those CPUs may switch to.

Ah, interesting. So there is no tlb flush broadcast operation and it
always goes through IPI?

> So, in a SMP system, there is no safe way to remove L1 page table entries
> from IRQ context.  That means if memory is mapped for the buddy allocators
> using L1 page table entries, then it is fixed for that application on a
> SMP system.

Ok. Can we limit GFP_ATOMIC to memory that doesn't need to be remapped then?
I guess we can assume that there is no regression if we just skip
the dma_alloc_contiguous step in dma_alloc_coherent for any atomic
callers and immediately fall back to the regular allocator.

Unfortunately, this still means we have to keep both methods. I was
hoping that with CMA doing dynamic remapping there would be no need for
keeping a significant number of pages reserved for this.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-12 13:39               ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-12 13:39 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arm-kernel, Daniel Walker, Jonathan Corbet, Mel Gorman,
	Chunsang Jeong, Jesse Barker, KAMEZAWA Hiroyuki, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	Ankita Garg, Andrew Morton, Marek Szyprowski, linux-media

On Friday 08 July 2011, Russell King - ARM Linux wrote:
> On Tue, Jul 05, 2011 at 03:58:39PM +0200, Arnd Bergmann wrote:
> 
> > If I'm reading your "ARM: DMA: steal memory for DMA coherent mappings"
> > correctly, the idea is to have a per-platform compile-time amount
> > of memory that is reserved purely for coherent allocations and
> > taking out of the buddy allocator, right?
> 
> Yes, because every time I've looked at taking out memory mappings in
> the first level page tables, it's always been a major issue.
> 
> We have a method where we can remove first level mappings on
> uniprocessor systems in the ioremap code just fine - we use that so
> that systems can setup section and supersection mappings.  They can
> tear them down as well - and we update other tasks L1 page tables
> when they get switched in.
> 
> This, however, doesn't work on SMP, because if you have a DMA allocation
> (which is permitted from IRQ context) you must have some way of removing
> the L1 page table entries from all CPUs TLBs and the page tables currently
> in use and any future page tables which those CPUs may switch to.

Ah, interesting. So there is no tlb flush broadcast operation and it
always goes through IPI?

> So, in a SMP system, there is no safe way to remove L1 page table entries
> from IRQ context.  That means if memory is mapped for the buddy allocators
> using L1 page table entries, then it is fixed for that application on a
> SMP system.

Ok. Can we limit GFP_ATOMIC to memory that doesn't need to be remapped then?
I guess we can assume that there is no regression if we just skip
the dma_alloc_contiguous step in dma_alloc_coherent for any atomic
callers and immediately fall back to the regular allocator.

Unfortunately, this still means we have to keep both methods. I was
hoping that with CMA doing dynamic remapping there would be no need for
keeping a significant number of pages reserved for this.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-12 13:39               ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2011-07-12 13:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 08 July 2011, Russell King - ARM Linux wrote:
> On Tue, Jul 05, 2011 at 03:58:39PM +0200, Arnd Bergmann wrote:
> 
> > If I'm reading your "ARM: DMA: steal memory for DMA coherent mappings"
> > correctly, the idea is to have a per-platform compile-time amount
> > of memory that is reserved purely for coherent allocations and
> > taking out of the buddy allocator, right?
> 
> Yes, because every time I've looked at taking out memory mappings in
> the first level page tables, it's always been a major issue.
> 
> We have a method where we can remove first level mappings on
> uniprocessor systems in the ioremap code just fine - we use that so
> that systems can setup section and supersection mappings.  They can
> tear them down as well - and we update other tasks L1 page tables
> when they get switched in.
> 
> This, however, doesn't work on SMP, because if you have a DMA allocation
> (which is permitted from IRQ context) you must have some way of removing
> the L1 page table entries from all CPUs TLBs and the page tables currently
> in use and any future page tables which those CPUs may switch to.

Ah, interesting. So there is no tlb flush broadcast operation and it
always goes through IPI?

> So, in a SMP system, there is no safe way to remove L1 page table entries
> from IRQ context.  That means if memory is mapped for the buddy allocators
> using L1 page table entries, then it is fixed for that application on a
> SMP system.

Ok. Can we limit GFP_ATOMIC to memory that doesn't need to be remapped then?
I guess we can assume that there is no regression if we just skip
the dma_alloc_contiguous step in dma_alloc_coherent for any atomic
callers and immediately fall back to the regular allocator.

Unfortunately, this still means we have to keep both methods. I was
hoping that with CMA doing dynamic remapping there would be no need for
keeping a significant number of pages reserved for this.

	Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-05  7:41   ` Marek Szyprowski
  (?)
@ 2011-07-14 12:29     ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-14 12:29 UTC (permalink / raw)
  To: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig
  Cc: 'Michal Nazarewicz', 'Kyungmin Park',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Ankita Garg', 'Daniel Walker',
	'Mel Gorman', 'Arnd Bergmann',
	'Jesse Barker', 'Jonathan Corbet',
	'Chunsang Jeong'

Hello,

I've just found two nasty bugs in this version of CMA. Sadly, both are the
results of posting the patches in a big hurry. I'm really sorry. 

Alignment argument was not passed correctly to the 
bitmap_find_next_zero_area() function and there was an ugly bug in the
dma_release_from_contiguous() function. 

On Tuesday, July 05, 2011 9:42 AM Marek Szyprowski wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  drivers/base/Kconfig           |   77 +++++++++
>  drivers/base/Makefile          |    1 +
>  drivers/base/dma-contiguous.c  |  367
> ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-contiguous.h |  104 +++++++++++
>  4 files changed, 549 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index d57e8d0..95ae1a7 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -168,4 +168,81 @@ config SYS_HYPERVISOR
>  	bool
>  	default n
> 
> +config CMA
> +	bool "Contiguous Memory Allocator"
> +	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
> +	select MIGRATION
> +	select CMA_MIGRATE_TYPE
> +	help
> +	  This enables the Contiguous Memory Allocator which allows drivers
> +	  to allocate big physically-contiguous blocks of memory for use with
> +	  hardware components that do not support I/O map nor scatter-gather.
> +
> +	  For more information see <include/linux/dma-contiguous.h>.
> +	  If unsure, say "n".
> +
> +if CMA
> +
> +config CMA_DEBUG
> +	bool "CMA debug messages (DEVELOPEMENT)"
> +	help
> +	  Turns on debug messages in CMA.  This produces KERN_DEBUG
> +	  messages for every CMA call as well as various messages while
> +	  processing calls such as dma_alloc_from_contiguous().
> +	  This option does not affect warning and error messages.
> +
> +comment "Default contiguous memory area size:"
> +
> +config CMA_SIZE_ABSOLUTE
> +	int "Absolute size (in MiB)"
> +	default 16
> +	help
> +	  Defines the size (in MiB) of the default memory area for Contiguous
> +	  Memory Allocator.
> +
> +config CMA_SIZE_PERCENTAGE
> +	int "Percentage of total memory"
> +	default 10
> +	help
> +	  Defines the size of the default memory area for Contiguous Memory
> +	  Allocator as a percentage of the total memory in the system.
> +
> +choice
> +	prompt "Selected region size"
> +	default CMA_SIZE_SEL_ABSOLUTE
> +
> +config CMA_SIZE_SEL_ABSOLUTE
> +	bool "Use absolute value only"
> +
> +config CMA_SIZE_SEL_PERCENTAGE
> +	bool "Use percentage value only"
> +
> +config CMA_SIZE_SEL_MIN
> +	bool "Use lower value (minimum)"
> +
> +config CMA_SIZE_SEL_MAX
> +	bool "Use higher value (maximum)"
> +
> +endchoice
> +
> +config CMA_ALIGNMENT
> +	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
> +	range 4 9
> +	default 8
> +	help
> +	  DMA mapping framework by default aligns all buffers to the smallest
> +	  PAGE_SIZE order which is greater than or equal to the requested
> buffer
> +	  size. This works well for buffers up to a few hundreds kilobytes,
> but
> +	  for larger buffers it just a memory waste. With this parameter you
> can
> +	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
> +	  buffers will be aligned only to this specified order. The order is
> +	  expressed as a power of two multiplied by the PAGE_SIZE.
> +
> +	  For example, if your system defaults to 4KiB pages, the order value
> +	  of 8 means that the buffers will be aligned up to 1MiB only.
> +
> +	  If unsure, leave the default value "8".
> +
> +endif
> +
>  endmenu
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 4c5701c..be6aab4 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> +obj-$(CONFIG_CMA) += dma-contiguous.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
>  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> new file mode 100644
> index 0000000..707b901
> --- /dev/null
> +++ b/drivers/base/dma-contiguous.c
> @@ -0,0 +1,367 @@
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +#define pr_fmt(fmt) "cma: " fmt
> +
> +#ifdef CONFIG_CMA_DEBUG
> +#ifndef DEBUG
> +#  define DEBUG
> +#endif
> +#endif
> +
> +#include <asm/page.h>
> +#include <asm/sizes.h>
> +
> +#include <linux/memblock.h>
> +#include <linux/err.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/page-isolation.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +#include <linux/mm_types.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/dma-contiguous.h>
> +
> +struct cma {
> +	unsigned long	base_pfn;
> +	unsigned long	count;
> +	unsigned long	*bitmap;
> +};
> +
> +struct cma *dma_contiguous_default_area;
> +
> +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
> +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> +static long size_cmdline = -1;
> +
> +static int __init early_cma(char *p)
> +{
> +	pr_debug("%s(%s)\n", __func__, p);
> +	size_cmdline = memparse(p, &p);
> +	return 0;
> +}
> +early_param("cma", early_cma);
> +
> +/**
> + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> + *
> + * This funtion reserves memory from memblock subsystem. It should be
> + * called by arch specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +void __init dma_contiguous_reserve(void)
> +{
> +	struct memblock_region *reg;
> +	unsigned long selected_size = 0;
> +	unsigned long total_pages = 0;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	/*
> +	 * We cannot use memblock_phys_mem_size() here, because
> +	 * memblock_analyze() has not been called yet.
> +	 */
> +	for_each_memblock(memory, reg)
> +		total_pages += memblock_region_memory_end_pfn(reg) -
> +			       memblock_region_memory_base_pfn(reg);
> +
> +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> +
> +	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
> +		 (total_pages << PAGE_SHIFT) / SZ_1M);
> +
> +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> +	selected_size = size_abs;
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
> +	selected_size = size_percent;
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_MIN
> +	selected_size = min(size_abs, size_percent);
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_MAX
> +	selected_size = max(size_abs, size_percent);
> +#endif
> +
> +	if (size_cmdline != -1)
> +		selected_size = size_cmdline;
> +
> +	if (!selected_size)
> +		return;
> +
> +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +		 selected_size / SZ_1M);
> +
> +	dma_declare_contiguous(NULL, selected_size, 0);
> +};
> +
> +static DEFINE_MUTEX(cma_mutex);
> +
> +#ifdef CONFIG_DEBUG_VM
> +
> +static int __cma_activate_area(unsigned long base_pfn, unsigned long
> count)
> +{
> +	unsigned long pfn = base_pfn;
> +	unsigned i = count;
> +	struct zone *zone;
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	VM_BUG_ON(!pfn_valid(pfn));
> +	zone = page_zone(pfn_to_page(pfn));
> +
> +	do {
> +		VM_BUG_ON(!pfn_valid(pfn));
> +		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
> +		if (!(pfn & (pageblock_nr_pages - 1)))
> +			init_cma_reserved_pageblock(pfn_to_page(pfn));
> +		++pfn;
> +	} while (--i);
> +
> +	return 0;
> +}
> +
> +#else
> +
> +static int __cma_activate_area(unsigned long base_pfn, unsigned long
> count)
> +{
> +	unsigned i = count >> pageblock_order;
> +	struct page *p = pfn_to_page(base_pfn);
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	do {
> +		init_cma_reserved_pageblock(p);
> +		p += pageblock_nr_pages;
> +	} while (--i);
> +
> +	return 0;
> +}
> +
> +#endif
> +
> +static struct cma *__cma_create_area(unsigned long base_pfn,
> +				     unsigned long count)
> +{
> +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> +	struct cma *cma;
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> +	if (!cma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cma->base_pfn = base_pfn;
> +	cma->count = count;
> +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> +
> +	if (!cma->bitmap)
> +		goto no_mem;
> +
> +	__cma_activate_area(base_pfn, count);
> +
> +	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
> +	return cma;
> +
> +no_mem:
> +	kfree(cma);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
> +static struct cma_reserved {
> +	unsigned long start;
> +	unsigned long size;
> +	struct device *dev;
> +} cma_reserved[8] __initdata;
> +static unsigned cma_reserved_count __initdata;
> +
> +static int __init __cma_init_reserved_areas(void)
> +{
> +	struct cma_reserved *r = cma_reserved;
> +	unsigned i = cma_reserved_count;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	for (; i; --i, ++r) {
> +		struct cma *cma;
> +		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
> +					r->size >> PAGE_SHIFT);
> +		if (!IS_ERR(cma)) {
> +			pr_debug("%s: created area %p\n", __func__, cma);
> +			if (r->dev)
> +				set_dev_cma_area(r->dev, cma);
> +			else
> +				dma_contiguous_default_area = cma;
> +		}
> +	}
> +	return 0;
> +}
> +core_initcall(__cma_init_reserved_areas);
> +
> +/**
> + * dma_declare_contiguous() - reserve area for contiguous memory handling
> + *			      for particular device
> + * @dev:   Pointer to device structure.
> + * @size:  Size of the reserved memory.
> + * @start: Start address of the reserved memory (optional, 0 for any).
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> +				  phys_addr_t start)
> +{
> +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> +	unsigned long alignment;
> +
> +	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
> +
> +	/* Sanity checks */
> +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> +		return -ENOSPC;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +	/* Sanitise input arguments */
> +	alignment = PAGE_SIZE << (MAX_ORDER + 1);
> +	start = ALIGN(start, alignment);
> +	size  = ALIGN(size , alignment);
> +
> +	/* Reserve memory */
> +	if (start) {
> +		if (memblock_is_region_reserved(start, size) ||
> +		    memblock_reserve(start, size) < 0)
> +			return -EBUSY;
> +	} else {
> +		/*
> +		 * Use __memblock_alloc_base() since
> +		 * memblock_alloc_base() panic()s.
> +		 */
> +		u64 addr = __memblock_alloc_base(size, alignment, 0);
> +		if (!addr) {
> +			return -ENOMEM;
> +		} else if (addr + size > ~(unsigned long)0) {
> +			memblock_free(addr, size);
> +			return -EOVERFLOW;
> +		} else {
> +			start = addr;
> +		}
> +	}
> +
> +	/*
> +	 * Each reserved area must be initialised later, when more kernel
> +	 * subsystems (like slab allocator) are available.
> +	 */
> +	r->start = start;
> +	r->size = size;
> +	r->dev = dev;
> +	cma_reserved_count++;
> +	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
> +	       size / SZ_1M, (void *)start);
> +	return 0;
> +}
> +
> +/**
> + * dma_alloc_from_contiguous() - allocate pages from contiguous area
> + * @dev:   Pointer to device for which the allocation is performed.
> + * @count: Requested number of pages.
> + * @align: Requested alignment of pages (in PAGE_SIZE order).
> + *
> + * This funtion allocates memory buffer for specified device. It uses
> + * device specific contiguous memory area if available or the default
> + * global one. Requires architecture specific get_dev_cma_area() helper
> + * function.
> + */
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int align)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn, pageno;
> +	int ret;
> +
> +	if (!cma)
> +		return NULL;
> +
> +	if (align > CONFIG_CMA_ALIGNMENT)
> +		align = CONFIG_CMA_ALIGNMENT;
> +
> +	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
> +
> +	if (!count)
> +		return NULL;
> +
> +	mutex_lock(&cma_mutex);
> +

> +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
> +					    align);

Fixed version:
pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
					      (1 << align) - 1);

> +	if (pageno >= cma->count) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +	bitmap_set(cma->bitmap, pageno, count);
> +
> +	pfn = cma->base_pfn + pageno;
> +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
> +	if (ret)
> +		goto free;
> +
> +	mutex_unlock(&cma_mutex);
> +
> +	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
> +	return pfn_to_page(pfn);
> +free:
> +	bitmap_clear(cma->bitmap, pageno, count);
> +error:
> +	mutex_unlock(&cma_mutex);
> +	return NULL;
> +}
> +
> +/**
> + * dma_release_from_contiguous() - release allocated pages
> + * @dev:   Pointer to device for which the pages were allocated.
> + * @pages: Allocated pages.
> + * @count: Number of allocated pages.
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn;
> +
> +	if (!cma || !pages)
> +		return 0;
> +
> +	pr_debug("%s([%p])\n", __func__, (void *)pages);
> +
> +	pfn = page_to_pfn(pages);
> +
> +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)

Fixed version:
	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)

> +		return 0;
> +
> +	mutex_lock(&cma_mutex);
> +
> +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> +	free_contig_pages(pages, count);
> +
> +	mutex_unlock(&cma_mutex);
> +	return 1;
> +}
> diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-
> contiguous.h
> new file mode 100644
> index 0000000..98312c9
> --- /dev/null
> +++ b/include/linux/dma-contiguous.h
> @@ -0,0 +1,104 @@
> +#ifndef __LINUX_CMA_H
> +#define __LINUX_CMA_H
> +
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +/*
> + * Contiguous Memory Allocator
> + *
> + *   The Contiguous Memory Allocator (CMA) makes it possible to
> + *   allocate big contiguous chunks of memory after the system has
> + *   booted.
> + *
> + * Why is it needed?
> + *
> + *   Various devices on embedded systems have no scatter-getter and/or
> + *   IO map support and require contiguous blocks of memory to
> + *   operate.  They include devices such as cameras, hardware video
> + *   coders, etc.
> + *
> + *   Such devices often require big memory buffers (a full HD frame
> + *   is, for instance, more then 2 mega pixels large, i.e. more than 6
> + *   MB of memory), which makes mechanisms such as kmalloc() or
> + *   alloc_page() ineffective.
> + *
> + *   At the same time, a solution where a big memory region is
> + *   reserved for a device is suboptimal since often more memory is
> + *   reserved then strictly required and, moreover, the memory is
> + *   inaccessible to page system even if device drivers don't use it.
> + *
> + *   CMA tries to solve this issue by operating on memory regions
> + *   where only movable pages can be allocated from.  This way, kernel
> + *   can use the memory for pagecache and when device driver requests
> + *   it, allocated pages can be migrated.
> + *
> + * Driver usage
> + *
> + *   CMA should not be used by the device drivers directly. It is
> + *   only a helper framework for dma-mapping subsystem.
> + *
> + *   For more information, see kernel-docs in drivers/base/dma-
> contiguous.c
> + */
> +
> +#ifdef __KERNEL__
> +
> +struct cma;
> +struct page;
> +struct device;
> +
> +#ifdef CONFIG_CMA
> +
> +extern struct cma *dma_contiguous_default_area;
> +
> +void dma_contiguous_reserve(void);
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   phys_addr_t base);
> +
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order);
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count);
> +
> +#else
> +
> +#define dna_contiguous_default_area NULL
> +
> +static inline void dma_contiguous_reserve(void) { }
> +
> +static inline
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   unsigned long base)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order)
> +{
> +	return NULL;
> +}
> +
> +static inline
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	return 0;
> +}
> +
> +#endif
> +
> +#endif
> +
> +#endif
> --
> 1.7.1.569.g6f426

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-14 12:29     ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-14 12:29 UTC (permalink / raw)
  To: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig
  Cc: 'Michal Nazarewicz', 'Kyungmin Park',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Ankita Garg', 'Daniel Walker',
	'Mel Gorman', 'Arnd Bergmann',
	'Jesse Barker', 'Jonathan Corbet',
	'Chunsang Jeong'

Hello,

I've just found two nasty bugs in this version of CMA. Sadly, both are the
results of posting the patches in a big hurry. I'm really sorry. 

Alignment argument was not passed correctly to the 
bitmap_find_next_zero_area() function and there was an ugly bug in the
dma_release_from_contiguous() function. 

On Tuesday, July 05, 2011 9:42 AM Marek Szyprowski wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  drivers/base/Kconfig           |   77 +++++++++
>  drivers/base/Makefile          |    1 +
>  drivers/base/dma-contiguous.c  |  367
> ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-contiguous.h |  104 +++++++++++
>  4 files changed, 549 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index d57e8d0..95ae1a7 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -168,4 +168,81 @@ config SYS_HYPERVISOR
>  	bool
>  	default n
> 
> +config CMA
> +	bool "Contiguous Memory Allocator"
> +	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
> +	select MIGRATION
> +	select CMA_MIGRATE_TYPE
> +	help
> +	  This enables the Contiguous Memory Allocator which allows drivers
> +	  to allocate big physically-contiguous blocks of memory for use with
> +	  hardware components that do not support I/O map nor scatter-gather.
> +
> +	  For more information see <include/linux/dma-contiguous.h>.
> +	  If unsure, say "n".
> +
> +if CMA
> +
> +config CMA_DEBUG
> +	bool "CMA debug messages (DEVELOPEMENT)"
> +	help
> +	  Turns on debug messages in CMA.  This produces KERN_DEBUG
> +	  messages for every CMA call as well as various messages while
> +	  processing calls such as dma_alloc_from_contiguous().
> +	  This option does not affect warning and error messages.
> +
> +comment "Default contiguous memory area size:"
> +
> +config CMA_SIZE_ABSOLUTE
> +	int "Absolute size (in MiB)"
> +	default 16
> +	help
> +	  Defines the size (in MiB) of the default memory area for Contiguous
> +	  Memory Allocator.
> +
> +config CMA_SIZE_PERCENTAGE
> +	int "Percentage of total memory"
> +	default 10
> +	help
> +	  Defines the size of the default memory area for Contiguous Memory
> +	  Allocator as a percentage of the total memory in the system.
> +
> +choice
> +	prompt "Selected region size"
> +	default CMA_SIZE_SEL_ABSOLUTE
> +
> +config CMA_SIZE_SEL_ABSOLUTE
> +	bool "Use absolute value only"
> +
> +config CMA_SIZE_SEL_PERCENTAGE
> +	bool "Use percentage value only"
> +
> +config CMA_SIZE_SEL_MIN
> +	bool "Use lower value (minimum)"
> +
> +config CMA_SIZE_SEL_MAX
> +	bool "Use higher value (maximum)"
> +
> +endchoice
> +
> +config CMA_ALIGNMENT
> +	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
> +	range 4 9
> +	default 8
> +	help
> +	  DMA mapping framework by default aligns all buffers to the smallest
> +	  PAGE_SIZE order which is greater than or equal to the requested
> buffer
> +	  size. This works well for buffers up to a few hundreds kilobytes,
> but
> +	  for larger buffers it just a memory waste. With this parameter you
> can
> +	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
> +	  buffers will be aligned only to this specified order. The order is
> +	  expressed as a power of two multiplied by the PAGE_SIZE.
> +
> +	  For example, if your system defaults to 4KiB pages, the order value
> +	  of 8 means that the buffers will be aligned up to 1MiB only.
> +
> +	  If unsure, leave the default value "8".
> +
> +endif
> +
>  endmenu
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 4c5701c..be6aab4 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> +obj-$(CONFIG_CMA) += dma-contiguous.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
>  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> new file mode 100644
> index 0000000..707b901
> --- /dev/null
> +++ b/drivers/base/dma-contiguous.c
> @@ -0,0 +1,367 @@
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +#define pr_fmt(fmt) "cma: " fmt
> +
> +#ifdef CONFIG_CMA_DEBUG
> +#ifndef DEBUG
> +#  define DEBUG
> +#endif
> +#endif
> +
> +#include <asm/page.h>
> +#include <asm/sizes.h>
> +
> +#include <linux/memblock.h>
> +#include <linux/err.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/page-isolation.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +#include <linux/mm_types.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/dma-contiguous.h>
> +
> +struct cma {
> +	unsigned long	base_pfn;
> +	unsigned long	count;
> +	unsigned long	*bitmap;
> +};
> +
> +struct cma *dma_contiguous_default_area;
> +
> +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
> +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> +static long size_cmdline = -1;
> +
> +static int __init early_cma(char *p)
> +{
> +	pr_debug("%s(%s)\n", __func__, p);
> +	size_cmdline = memparse(p, &p);
> +	return 0;
> +}
> +early_param("cma", early_cma);
> +
> +/**
> + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> + *
> + * This funtion reserves memory from memblock subsystem. It should be
> + * called by arch specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +void __init dma_contiguous_reserve(void)
> +{
> +	struct memblock_region *reg;
> +	unsigned long selected_size = 0;
> +	unsigned long total_pages = 0;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	/*
> +	 * We cannot use memblock_phys_mem_size() here, because
> +	 * memblock_analyze() has not been called yet.
> +	 */
> +	for_each_memblock(memory, reg)
> +		total_pages += memblock_region_memory_end_pfn(reg) -
> +			       memblock_region_memory_base_pfn(reg);
> +
> +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> +
> +	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
> +		 (total_pages << PAGE_SHIFT) / SZ_1M);
> +
> +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> +	selected_size = size_abs;
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
> +	selected_size = size_percent;
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_MIN
> +	selected_size = min(size_abs, size_percent);
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_MAX
> +	selected_size = max(size_abs, size_percent);
> +#endif
> +
> +	if (size_cmdline != -1)
> +		selected_size = size_cmdline;
> +
> +	if (!selected_size)
> +		return;
> +
> +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +		 selected_size / SZ_1M);
> +
> +	dma_declare_contiguous(NULL, selected_size, 0);
> +};
> +
> +static DEFINE_MUTEX(cma_mutex);
> +
> +#ifdef CONFIG_DEBUG_VM
> +
> +static int __cma_activate_area(unsigned long base_pfn, unsigned long
> count)
> +{
> +	unsigned long pfn = base_pfn;
> +	unsigned i = count;
> +	struct zone *zone;
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	VM_BUG_ON(!pfn_valid(pfn));
> +	zone = page_zone(pfn_to_page(pfn));
> +
> +	do {
> +		VM_BUG_ON(!pfn_valid(pfn));
> +		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
> +		if (!(pfn & (pageblock_nr_pages - 1)))
> +			init_cma_reserved_pageblock(pfn_to_page(pfn));
> +		++pfn;
> +	} while (--i);
> +
> +	return 0;
> +}
> +
> +#else
> +
> +static int __cma_activate_area(unsigned long base_pfn, unsigned long
> count)
> +{
> +	unsigned i = count >> pageblock_order;
> +	struct page *p = pfn_to_page(base_pfn);
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	do {
> +		init_cma_reserved_pageblock(p);
> +		p += pageblock_nr_pages;
> +	} while (--i);
> +
> +	return 0;
> +}
> +
> +#endif
> +
> +static struct cma *__cma_create_area(unsigned long base_pfn,
> +				     unsigned long count)
> +{
> +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> +	struct cma *cma;
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> +	if (!cma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cma->base_pfn = base_pfn;
> +	cma->count = count;
> +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> +
> +	if (!cma->bitmap)
> +		goto no_mem;
> +
> +	__cma_activate_area(base_pfn, count);
> +
> +	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
> +	return cma;
> +
> +no_mem:
> +	kfree(cma);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
> +static struct cma_reserved {
> +	unsigned long start;
> +	unsigned long size;
> +	struct device *dev;
> +} cma_reserved[8] __initdata;
> +static unsigned cma_reserved_count __initdata;
> +
> +static int __init __cma_init_reserved_areas(void)
> +{
> +	struct cma_reserved *r = cma_reserved;
> +	unsigned i = cma_reserved_count;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	for (; i; --i, ++r) {
> +		struct cma *cma;
> +		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
> +					r->size >> PAGE_SHIFT);
> +		if (!IS_ERR(cma)) {
> +			pr_debug("%s: created area %p\n", __func__, cma);
> +			if (r->dev)
> +				set_dev_cma_area(r->dev, cma);
> +			else
> +				dma_contiguous_default_area = cma;
> +		}
> +	}
> +	return 0;
> +}
> +core_initcall(__cma_init_reserved_areas);
> +
> +/**
> + * dma_declare_contiguous() - reserve area for contiguous memory handling
> + *			      for particular device
> + * @dev:   Pointer to device structure.
> + * @size:  Size of the reserved memory.
> + * @start: Start address of the reserved memory (optional, 0 for any).
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> +				  phys_addr_t start)
> +{
> +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> +	unsigned long alignment;
> +
> +	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
> +
> +	/* Sanity checks */
> +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> +		return -ENOSPC;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +	/* Sanitise input arguments */
> +	alignment = PAGE_SIZE << (MAX_ORDER + 1);
> +	start = ALIGN(start, alignment);
> +	size  = ALIGN(size , alignment);
> +
> +	/* Reserve memory */
> +	if (start) {
> +		if (memblock_is_region_reserved(start, size) ||
> +		    memblock_reserve(start, size) < 0)
> +			return -EBUSY;
> +	} else {
> +		/*
> +		 * Use __memblock_alloc_base() since
> +		 * memblock_alloc_base() panic()s.
> +		 */
> +		u64 addr = __memblock_alloc_base(size, alignment, 0);
> +		if (!addr) {
> +			return -ENOMEM;
> +		} else if (addr + size > ~(unsigned long)0) {
> +			memblock_free(addr, size);
> +			return -EOVERFLOW;
> +		} else {
> +			start = addr;
> +		}
> +	}
> +
> +	/*
> +	 * Each reserved area must be initialised later, when more kernel
> +	 * subsystems (like slab allocator) are available.
> +	 */
> +	r->start = start;
> +	r->size = size;
> +	r->dev = dev;
> +	cma_reserved_count++;
> +	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
> +	       size / SZ_1M, (void *)start);
> +	return 0;
> +}
> +
> +/**
> + * dma_alloc_from_contiguous() - allocate pages from contiguous area
> + * @dev:   Pointer to device for which the allocation is performed.
> + * @count: Requested number of pages.
> + * @align: Requested alignment of pages (in PAGE_SIZE order).
> + *
> + * This funtion allocates memory buffer for specified device. It uses
> + * device specific contiguous memory area if available or the default
> + * global one. Requires architecture specific get_dev_cma_area() helper
> + * function.
> + */
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int align)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn, pageno;
> +	int ret;
> +
> +	if (!cma)
> +		return NULL;
> +
> +	if (align > CONFIG_CMA_ALIGNMENT)
> +		align = CONFIG_CMA_ALIGNMENT;
> +
> +	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
> +
> +	if (!count)
> +		return NULL;
> +
> +	mutex_lock(&cma_mutex);
> +

> +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
> +					    align);

Fixed version:
pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
					      (1 << align) - 1);

> +	if (pageno >= cma->count) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +	bitmap_set(cma->bitmap, pageno, count);
> +
> +	pfn = cma->base_pfn + pageno;
> +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
> +	if (ret)
> +		goto free;
> +
> +	mutex_unlock(&cma_mutex);
> +
> +	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
> +	return pfn_to_page(pfn);
> +free:
> +	bitmap_clear(cma->bitmap, pageno, count);
> +error:
> +	mutex_unlock(&cma_mutex);
> +	return NULL;
> +}
> +
> +/**
> + * dma_release_from_contiguous() - release allocated pages
> + * @dev:   Pointer to device for which the pages were allocated.
> + * @pages: Allocated pages.
> + * @count: Number of allocated pages.
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn;
> +
> +	if (!cma || !pages)
> +		return 0;
> +
> +	pr_debug("%s([%p])\n", __func__, (void *)pages);
> +
> +	pfn = page_to_pfn(pages);
> +
> +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)

Fixed version:
	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)

> +		return 0;
> +
> +	mutex_lock(&cma_mutex);
> +
> +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> +	free_contig_pages(pages, count);
> +
> +	mutex_unlock(&cma_mutex);
> +	return 1;
> +}
> diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-
> contiguous.h
> new file mode 100644
> index 0000000..98312c9
> --- /dev/null
> +++ b/include/linux/dma-contiguous.h
> @@ -0,0 +1,104 @@
> +#ifndef __LINUX_CMA_H
> +#define __LINUX_CMA_H
> +
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +/*
> + * Contiguous Memory Allocator
> + *
> + *   The Contiguous Memory Allocator (CMA) makes it possible to
> + *   allocate big contiguous chunks of memory after the system has
> + *   booted.
> + *
> + * Why is it needed?
> + *
> + *   Various devices on embedded systems have no scatter-getter and/or
> + *   IO map support and require contiguous blocks of memory to
> + *   operate.  They include devices such as cameras, hardware video
> + *   coders, etc.
> + *
> + *   Such devices often require big memory buffers (a full HD frame
> + *   is, for instance, more then 2 mega pixels large, i.e. more than 6
> + *   MB of memory), which makes mechanisms such as kmalloc() or
> + *   alloc_page() ineffective.
> + *
> + *   At the same time, a solution where a big memory region is
> + *   reserved for a device is suboptimal since often more memory is
> + *   reserved then strictly required and, moreover, the memory is
> + *   inaccessible to page system even if device drivers don't use it.
> + *
> + *   CMA tries to solve this issue by operating on memory regions
> + *   where only movable pages can be allocated from.  This way, kernel
> + *   can use the memory for pagecache and when device driver requests
> + *   it, allocated pages can be migrated.
> + *
> + * Driver usage
> + *
> + *   CMA should not be used by the device drivers directly. It is
> + *   only a helper framework for dma-mapping subsystem.
> + *
> + *   For more information, see kernel-docs in drivers/base/dma-
> contiguous.c
> + */
> +
> +#ifdef __KERNEL__
> +
> +struct cma;
> +struct page;
> +struct device;
> +
> +#ifdef CONFIG_CMA
> +
> +extern struct cma *dma_contiguous_default_area;
> +
> +void dma_contiguous_reserve(void);
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   phys_addr_t base);
> +
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order);
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count);
> +
> +#else
> +
> +#define dna_contiguous_default_area NULL
> +
> +static inline void dma_contiguous_reserve(void) { }
> +
> +static inline
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   unsigned long base)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order)
> +{
> +	return NULL;
> +}
> +
> +static inline
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	return 0;
> +}
> +
> +#endif
> +
> +#endif
> +
> +#endif
> --
> 1.7.1.569.g6f426

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-07-14 12:29     ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

I've just found two nasty bugs in this version of CMA. Sadly, both are the
results of posting the patches in a big hurry. I'm really sorry. 

Alignment argument was not passed correctly to the 
bitmap_find_next_zero_area() function and there was an ugly bug in the
dma_release_from_contiguous() function. 

On Tuesday, July 05, 2011 9:42 AM Marek Szyprowski wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  drivers/base/Kconfig           |   77 +++++++++
>  drivers/base/Makefile          |    1 +
>  drivers/base/dma-contiguous.c  |  367
> ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-contiguous.h |  104 +++++++++++
>  4 files changed, 549 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index d57e8d0..95ae1a7 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -168,4 +168,81 @@ config SYS_HYPERVISOR
>  	bool
>  	default n
> 
> +config CMA
> +	bool "Contiguous Memory Allocator"
> +	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
> +	select MIGRATION
> +	select CMA_MIGRATE_TYPE
> +	help
> +	  This enables the Contiguous Memory Allocator which allows drivers
> +	  to allocate big physically-contiguous blocks of memory for use with
> +	  hardware components that do not support I/O map nor scatter-gather.
> +
> +	  For more information see <include/linux/dma-contiguous.h>.
> +	  If unsure, say "n".
> +
> +if CMA
> +
> +config CMA_DEBUG
> +	bool "CMA debug messages (DEVELOPEMENT)"
> +	help
> +	  Turns on debug messages in CMA.  This produces KERN_DEBUG
> +	  messages for every CMA call as well as various messages while
> +	  processing calls such as dma_alloc_from_contiguous().
> +	  This option does not affect warning and error messages.
> +
> +comment "Default contiguous memory area size:"
> +
> +config CMA_SIZE_ABSOLUTE
> +	int "Absolute size (in MiB)"
> +	default 16
> +	help
> +	  Defines the size (in MiB) of the default memory area for Contiguous
> +	  Memory Allocator.
> +
> +config CMA_SIZE_PERCENTAGE
> +	int "Percentage of total memory"
> +	default 10
> +	help
> +	  Defines the size of the default memory area for Contiguous Memory
> +	  Allocator as a percentage of the total memory in the system.
> +
> +choice
> +	prompt "Selected region size"
> +	default CMA_SIZE_SEL_ABSOLUTE
> +
> +config CMA_SIZE_SEL_ABSOLUTE
> +	bool "Use absolute value only"
> +
> +config CMA_SIZE_SEL_PERCENTAGE
> +	bool "Use percentage value only"
> +
> +config CMA_SIZE_SEL_MIN
> +	bool "Use lower value (minimum)"
> +
> +config CMA_SIZE_SEL_MAX
> +	bool "Use higher value (maximum)"
> +
> +endchoice
> +
> +config CMA_ALIGNMENT
> +	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
> +	range 4 9
> +	default 8
> +	help
> +	  DMA mapping framework by default aligns all buffers to the smallest
> +	  PAGE_SIZE order which is greater than or equal to the requested
> buffer
> +	  size. This works well for buffers up to a few hundreds kilobytes,
> but
> +	  for larger buffers it just a memory waste. With this parameter you
> can
> +	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
> +	  buffers will be aligned only to this specified order. The order is
> +	  expressed as a power of two multiplied by the PAGE_SIZE.
> +
> +	  For example, if your system defaults to 4KiB pages, the order value
> +	  of 8 means that the buffers will be aligned up to 1MiB only.
> +
> +	  If unsure, leave the default value "8".
> +
> +endif
> +
>  endmenu
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 4c5701c..be6aab4 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> +obj-$(CONFIG_CMA) += dma-contiguous.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
>  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> new file mode 100644
> index 0000000..707b901
> --- /dev/null
> +++ b/drivers/base/dma-contiguous.c
> @@ -0,0 +1,367 @@
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +#define pr_fmt(fmt) "cma: " fmt
> +
> +#ifdef CONFIG_CMA_DEBUG
> +#ifndef DEBUG
> +#  define DEBUG
> +#endif
> +#endif
> +
> +#include <asm/page.h>
> +#include <asm/sizes.h>
> +
> +#include <linux/memblock.h>
> +#include <linux/err.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/page-isolation.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +#include <linux/mm_types.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/dma-contiguous.h>
> +
> +struct cma {
> +	unsigned long	base_pfn;
> +	unsigned long	count;
> +	unsigned long	*bitmap;
> +};
> +
> +struct cma *dma_contiguous_default_area;
> +
> +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
> +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> +static long size_cmdline = -1;
> +
> +static int __init early_cma(char *p)
> +{
> +	pr_debug("%s(%s)\n", __func__, p);
> +	size_cmdline = memparse(p, &p);
> +	return 0;
> +}
> +early_param("cma", early_cma);
> +
> +/**
> + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> + *
> + * This funtion reserves memory from memblock subsystem. It should be
> + * called by arch specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +void __init dma_contiguous_reserve(void)
> +{
> +	struct memblock_region *reg;
> +	unsigned long selected_size = 0;
> +	unsigned long total_pages = 0;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	/*
> +	 * We cannot use memblock_phys_mem_size() here, because
> +	 * memblock_analyze() has not been called yet.
> +	 */
> +	for_each_memblock(memory, reg)
> +		total_pages += memblock_region_memory_end_pfn(reg) -
> +			       memblock_region_memory_base_pfn(reg);
> +
> +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> +
> +	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
> +		 (total_pages << PAGE_SHIFT) / SZ_1M);
> +
> +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> +	selected_size = size_abs;
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
> +	selected_size = size_percent;
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_MIN
> +	selected_size = min(size_abs, size_percent);
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_MAX
> +	selected_size = max(size_abs, size_percent);
> +#endif
> +
> +	if (size_cmdline != -1)
> +		selected_size = size_cmdline;
> +
> +	if (!selected_size)
> +		return;
> +
> +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +		 selected_size / SZ_1M);
> +
> +	dma_declare_contiguous(NULL, selected_size, 0);
> +};
> +
> +static DEFINE_MUTEX(cma_mutex);
> +
> +#ifdef CONFIG_DEBUG_VM
> +
> +static int __cma_activate_area(unsigned long base_pfn, unsigned long
> count)
> +{
> +	unsigned long pfn = base_pfn;
> +	unsigned i = count;
> +	struct zone *zone;
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	VM_BUG_ON(!pfn_valid(pfn));
> +	zone = page_zone(pfn_to_page(pfn));
> +
> +	do {
> +		VM_BUG_ON(!pfn_valid(pfn));
> +		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
> +		if (!(pfn & (pageblock_nr_pages - 1)))
> +			init_cma_reserved_pageblock(pfn_to_page(pfn));
> +		++pfn;
> +	} while (--i);
> +
> +	return 0;
> +}
> +
> +#else
> +
> +static int __cma_activate_area(unsigned long base_pfn, unsigned long
> count)
> +{
> +	unsigned i = count >> pageblock_order;
> +	struct page *p = pfn_to_page(base_pfn);
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	do {
> +		init_cma_reserved_pageblock(p);
> +		p += pageblock_nr_pages;
> +	} while (--i);
> +
> +	return 0;
> +}
> +
> +#endif
> +
> +static struct cma *__cma_create_area(unsigned long base_pfn,
> +				     unsigned long count)
> +{
> +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> +	struct cma *cma;
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> +	if (!cma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cma->base_pfn = base_pfn;
> +	cma->count = count;
> +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> +
> +	if (!cma->bitmap)
> +		goto no_mem;
> +
> +	__cma_activate_area(base_pfn, count);
> +
> +	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
> +	return cma;
> +
> +no_mem:
> +	kfree(cma);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
> +static struct cma_reserved {
> +	unsigned long start;
> +	unsigned long size;
> +	struct device *dev;
> +} cma_reserved[8] __initdata;
> +static unsigned cma_reserved_count __initdata;
> +
> +static int __init __cma_init_reserved_areas(void)
> +{
> +	struct cma_reserved *r = cma_reserved;
> +	unsigned i = cma_reserved_count;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	for (; i; --i, ++r) {
> +		struct cma *cma;
> +		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
> +					r->size >> PAGE_SHIFT);
> +		if (!IS_ERR(cma)) {
> +			pr_debug("%s: created area %p\n", __func__, cma);
> +			if (r->dev)
> +				set_dev_cma_area(r->dev, cma);
> +			else
> +				dma_contiguous_default_area = cma;
> +		}
> +	}
> +	return 0;
> +}
> +core_initcall(__cma_init_reserved_areas);
> +
> +/**
> + * dma_declare_contiguous() - reserve area for contiguous memory handling
> + *			      for particular device
> + * @dev:   Pointer to device structure.
> + * @size:  Size of the reserved memory.
> + * @start: Start address of the reserved memory (optional, 0 for any).
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> +				  phys_addr_t start)
> +{
> +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> +	unsigned long alignment;
> +
> +	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
> +
> +	/* Sanity checks */
> +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> +		return -ENOSPC;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +	/* Sanitise input arguments */
> +	alignment = PAGE_SIZE << (MAX_ORDER + 1);
> +	start = ALIGN(start, alignment);
> +	size  = ALIGN(size , alignment);
> +
> +	/* Reserve memory */
> +	if (start) {
> +		if (memblock_is_region_reserved(start, size) ||
> +		    memblock_reserve(start, size) < 0)
> +			return -EBUSY;
> +	} else {
> +		/*
> +		 * Use __memblock_alloc_base() since
> +		 * memblock_alloc_base() panic()s.
> +		 */
> +		u64 addr = __memblock_alloc_base(size, alignment, 0);
> +		if (!addr) {
> +			return -ENOMEM;
> +		} else if (addr + size > ~(unsigned long)0) {
> +			memblock_free(addr, size);
> +			return -EOVERFLOW;
> +		} else {
> +			start = addr;
> +		}
> +	}
> +
> +	/*
> +	 * Each reserved area must be initialised later, when more kernel
> +	 * subsystems (like slab allocator) are available.
> +	 */
> +	r->start = start;
> +	r->size = size;
> +	r->dev = dev;
> +	cma_reserved_count++;
> +	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
> +	       size / SZ_1M, (void *)start);
> +	return 0;
> +}
> +
> +/**
> + * dma_alloc_from_contiguous() - allocate pages from contiguous area
> + * @dev:   Pointer to device for which the allocation is performed.
> + * @count: Requested number of pages.
> + * @align: Requested alignment of pages (in PAGE_SIZE order).
> + *
> + * This funtion allocates memory buffer for specified device. It uses
> + * device specific contiguous memory area if available or the default
> + * global one. Requires architecture specific get_dev_cma_area() helper
> + * function.
> + */
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int align)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn, pageno;
> +	int ret;
> +
> +	if (!cma)
> +		return NULL;
> +
> +	if (align > CONFIG_CMA_ALIGNMENT)
> +		align = CONFIG_CMA_ALIGNMENT;
> +
> +	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
> +
> +	if (!count)
> +		return NULL;
> +
> +	mutex_lock(&cma_mutex);
> +

> +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
> +					    align);

Fixed version:
pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
					      (1 << align) - 1);

> +	if (pageno >= cma->count) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +	bitmap_set(cma->bitmap, pageno, count);
> +
> +	pfn = cma->base_pfn + pageno;
> +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
> +	if (ret)
> +		goto free;
> +
> +	mutex_unlock(&cma_mutex);
> +
> +	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
> +	return pfn_to_page(pfn);
> +free:
> +	bitmap_clear(cma->bitmap, pageno, count);
> +error:
> +	mutex_unlock(&cma_mutex);
> +	return NULL;
> +}
> +
> +/**
> + * dma_release_from_contiguous() - release allocated pages
> + * @dev:   Pointer to device for which the pages were allocated.
> + * @pages: Allocated pages.
> + * @count: Number of allocated pages.
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn;
> +
> +	if (!cma || !pages)
> +		return 0;
> +
> +	pr_debug("%s([%p])\n", __func__, (void *)pages);
> +
> +	pfn = page_to_pfn(pages);
> +
> +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)

Fixed version:
	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)

> +		return 0;
> +
> +	mutex_lock(&cma_mutex);
> +
> +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> +	free_contig_pages(pages, count);
> +
> +	mutex_unlock(&cma_mutex);
> +	return 1;
> +}
> diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-
> contiguous.h
> new file mode 100644
> index 0000000..98312c9
> --- /dev/null
> +++ b/include/linux/dma-contiguous.h
> @@ -0,0 +1,104 @@
> +#ifndef __LINUX_CMA_H
> +#define __LINUX_CMA_H
> +
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +/*
> + * Contiguous Memory Allocator
> + *
> + *   The Contiguous Memory Allocator (CMA) makes it possible to
> + *   allocate big contiguous chunks of memory after the system has
> + *   booted.
> + *
> + * Why is it needed?
> + *
> + *   Various devices on embedded systems have no scatter-getter and/or
> + *   IO map support and require contiguous blocks of memory to
> + *   operate.  They include devices such as cameras, hardware video
> + *   coders, etc.
> + *
> + *   Such devices often require big memory buffers (a full HD frame
> + *   is, for instance, more then 2 mega pixels large, i.e. more than 6
> + *   MB of memory), which makes mechanisms such as kmalloc() or
> + *   alloc_page() ineffective.
> + *
> + *   At the same time, a solution where a big memory region is
> + *   reserved for a device is suboptimal since often more memory is
> + *   reserved then strictly required and, moreover, the memory is
> + *   inaccessible to page system even if device drivers don't use it.
> + *
> + *   CMA tries to solve this issue by operating on memory regions
> + *   where only movable pages can be allocated from.  This way, kernel
> + *   can use the memory for pagecache and when device driver requests
> + *   it, allocated pages can be migrated.
> + *
> + * Driver usage
> + *
> + *   CMA should not be used by the device drivers directly. It is
> + *   only a helper framework for dma-mapping subsystem.
> + *
> + *   For more information, see kernel-docs in drivers/base/dma-
> contiguous.c
> + */
> +
> +#ifdef __KERNEL__
> +
> +struct cma;
> +struct page;
> +struct device;
> +
> +#ifdef CONFIG_CMA
> +
> +extern struct cma *dma_contiguous_default_area;
> +
> +void dma_contiguous_reserve(void);
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   phys_addr_t base);
> +
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order);
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count);
> +
> +#else
> +
> +#define dna_contiguous_default_area NULL
> +
> +static inline void dma_contiguous_reserve(void) { }
> +
> +static inline
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   unsigned long base)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order)
> +{
> +	return NULL;
> +}
> +
> +static inline
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	return 0;
> +}
> +
> +#endif
> +
> +#endif
> +
> +#endif
> --
> 1.7.1.569.g6f426

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-07-05 12:27       ` Arnd Bergmann
  (?)
@ 2011-08-03 17:43         ` James Bottomley
  -1 siblings, 0 replies; 183+ messages in thread
From: James Bottomley @ 2011-08-03 17:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King - ARM Linux, Marek Szyprowski, linux-kernel,
	linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig,
	Daniel Walker, Jonathan Corbet, Mel Gorman, Chunsang Jeong,
	Michal Nazarewicz, Jesse Barker, Kyungmin Park, Ankita Garg,
	Andrew Morton, KAMEZAWA Hiroyuki, ksummit-2011-discuss

[cc to ks-discuss added, since this may be a relevant topic]

On Tue, 2011-07-05 at 14:27 +0200, Arnd Bergmann wrote:
> On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> > On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > > The Contiguous Memory Allocator is a set of helper functions for DMA
> > > mapping framework that improves allocations of contiguous memory chunks.
> > > 
> > > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > > gives back to the system. Kernel is allowed to allocate movable pages
> > > within CMA's managed memory so that it can be used for example for page
> > > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > > request such pages are migrated out of CMA area to free required
> > > contiguous block and fulfill the request. This allows to allocate large
> > > contiguous chunks of memory at any time assuming that there is enough
> > > free memory available in the system.
> > > 
> > > This code is heavily based on earlier works by Michal Nazarewicz.
> > 
> > And how are you addressing the technical concerns about aliasing of
> > cache attributes which I keep bringing up with this and you keep
> > ignoring and telling me that I'm standing in your way.

Just to chime in here, parisc has an identical issue.  If the CPU ever
sees an alias with different attributes for the same page, it will HPMC
the box (that's basically the bios will kill the system as being
architecturally inconsistent), so an architecture neutral solution on
this point is essential to us as well.

> This is of course an important issue, and it's the one item listed as
> TODO in the introductory mail that sent.
> 
> It's also a preexisting problem as far as I can tell, and it needs
> to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> and __alloc_system_pages as introduced in patch 7.
> 
> We've discussed this back and forth, and it always comes down to
> one of two ugly solutions:
> 
> 1. Put all of the MIGRATE_CMA and pages into highmem and change
> __alloc_system_pages so it also allocates only from highmem pages.
> The consequences of this are that we always need to build kernels
> with highmem enabled and that we have less lowmem on systems that
> are already small, both of which can be fairly expensive unless
> you have lots of highmem already.

So this would require that systems using the API have a highmem? (parisc
doesn't today).

> 2. Add logic to unmap pages from the linear mapping, which is
> very expensive because it forces the use of small pages in the
> linear mapping (or in parts of it), and possibly means walking
> all page tables to remove the PTEs on alloc and put them back
> in on free.
> 
> I believe that Chunsang Jeong from Linaro is planning to
> implement both variants and post them for review, so we can
> decide which one to merge, or even to merge both and make
> it a configuration option. See also
> https://blueprints.launchpad.net/linaro-mm-sig/+spec/engr-mm-dma-mapping-2011.07
> 
> I don't think we need to make merging the CMA patches depending on
> the other patches, it's clear that both need to be solved, and
> they are independent enough.

I assume from the above that ARM has a hardware page walker?

The way I'd fix this on parisc, because we have a software based TLB, is
to rely on the fact that a page may only be used either for DMA or for
Page Cache, so the aliases should never be interleaved.  Since you know
the point at which the page flips from DMA to Cache (and vice versa),
I'd purge the TLB entry and flush the page at that point and rely on the
usage guarantees to ensure that the alias TLB entry doesn't reappear.
This isn't inexpensive but the majority of the cost is the cache flush
which is a requirement to clean the aliases anyway (a TLB entry purge is
pretty cheap).

Would this work for the ARM hardware walker as well?  It would require
you to have a TLB entry purge instruction as well as some architectural
guarantees about not speculating the TLB.

James



^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-08-03 17:43         ` James Bottomley
  0 siblings, 0 replies; 183+ messages in thread
From: James Bottomley @ 2011-08-03 17:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King - ARM Linux, Marek Szyprowski, linux-kernel,
	linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig,
	Daniel Walker, Jonathan Corbet, Mel Gorman, Chunsang Jeong,
	Michal Nazarewicz, Jesse Barker, Kyungmin Park, Ankita Garg,
	Andrew Morton, KAMEZAWA Hiroyuki, ksummit-2011-discuss

[cc to ks-discuss added, since this may be a relevant topic]

On Tue, 2011-07-05 at 14:27 +0200, Arnd Bergmann wrote:
> On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> > On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > > The Contiguous Memory Allocator is a set of helper functions for DMA
> > > mapping framework that improves allocations of contiguous memory chunks.
> > > 
> > > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > > gives back to the system. Kernel is allowed to allocate movable pages
> > > within CMA's managed memory so that it can be used for example for page
> > > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > > request such pages are migrated out of CMA area to free required
> > > contiguous block and fulfill the request. This allows to allocate large
> > > contiguous chunks of memory at any time assuming that there is enough
> > > free memory available in the system.
> > > 
> > > This code is heavily based on earlier works by Michal Nazarewicz.
> > 
> > And how are you addressing the technical concerns about aliasing of
> > cache attributes which I keep bringing up with this and you keep
> > ignoring and telling me that I'm standing in your way.

Just to chime in here, parisc has an identical issue.  If the CPU ever
sees an alias with different attributes for the same page, it will HPMC
the box (that's basically the bios will kill the system as being
architecturally inconsistent), so an architecture neutral solution on
this point is essential to us as well.

> This is of course an important issue, and it's the one item listed as
> TODO in the introductory mail that sent.
> 
> It's also a preexisting problem as far as I can tell, and it needs
> to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> and __alloc_system_pages as introduced in patch 7.
> 
> We've discussed this back and forth, and it always comes down to
> one of two ugly solutions:
> 
> 1. Put all of the MIGRATE_CMA and pages into highmem and change
> __alloc_system_pages so it also allocates only from highmem pages.
> The consequences of this are that we always need to build kernels
> with highmem enabled and that we have less lowmem on systems that
> are already small, both of which can be fairly expensive unless
> you have lots of highmem already.

So this would require that systems using the API have a highmem? (parisc
doesn't today).

> 2. Add logic to unmap pages from the linear mapping, which is
> very expensive because it forces the use of small pages in the
> linear mapping (or in parts of it), and possibly means walking
> all page tables to remove the PTEs on alloc and put them back
> in on free.
> 
> I believe that Chunsang Jeong from Linaro is planning to
> implement both variants and post them for review, so we can
> decide which one to merge, or even to merge both and make
> it a configuration option. See also
> https://blueprints.launchpad.net/linaro-mm-sig/+spec/engr-mm-dma-mapping-2011.07
> 
> I don't think we need to make merging the CMA patches depending on
> the other patches, it's clear that both need to be solved, and
> they are independent enough.

I assume from the above that ARM has a hardware page walker?

The way I'd fix this on parisc, because we have a software based TLB, is
to rely on the fact that a page may only be used either for DMA or for
Page Cache, so the aliases should never be interleaved.  Since you know
the point at which the page flips from DMA to Cache (and vice versa),
I'd purge the TLB entry and flush the page at that point and rely on the
usage guarantees to ensure that the alias TLB entry doesn't reappear.
This isn't inexpensive but the majority of the cost is the cache flush
which is a requirement to clean the aliases anyway (a TLB entry purge is
pretty cheap).

Would this work for the ARM hardware walker as well?  It would require
you to have a TLB entry purge instruction as well as some architectural
guarantees about not speculating the TLB.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-08-03 17:43         ` James Bottomley
  0 siblings, 0 replies; 183+ messages in thread
From: James Bottomley @ 2011-08-03 17:43 UTC (permalink / raw)
  To: linux-arm-kernel

[cc to ks-discuss added, since this may be a relevant topic]

On Tue, 2011-07-05 at 14:27 +0200, Arnd Bergmann wrote:
> On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> > On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > > The Contiguous Memory Allocator is a set of helper functions for DMA
> > > mapping framework that improves allocations of contiguous memory chunks.
> > > 
> > > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > > gives back to the system. Kernel is allowed to allocate movable pages
> > > within CMA's managed memory so that it can be used for example for page
> > > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > > request such pages are migrated out of CMA area to free required
> > > contiguous block and fulfill the request. This allows to allocate large
> > > contiguous chunks of memory at any time assuming that there is enough
> > > free memory available in the system.
> > > 
> > > This code is heavily based on earlier works by Michal Nazarewicz.
> > 
> > And how are you addressing the technical concerns about aliasing of
> > cache attributes which I keep bringing up with this and you keep
> > ignoring and telling me that I'm standing in your way.

Just to chime in here, parisc has an identical issue.  If the CPU ever
sees an alias with different attributes for the same page, it will HPMC
the box (that's basically the bios will kill the system as being
architecturally inconsistent), so an architecture neutral solution on
this point is essential to us as well.

> This is of course an important issue, and it's the one item listed as
> TODO in the introductory mail that sent.
> 
> It's also a preexisting problem as far as I can tell, and it needs
> to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> and __alloc_system_pages as introduced in patch 7.
> 
> We've discussed this back and forth, and it always comes down to
> one of two ugly solutions:
> 
> 1. Put all of the MIGRATE_CMA and pages into highmem and change
> __alloc_system_pages so it also allocates only from highmem pages.
> The consequences of this are that we always need to build kernels
> with highmem enabled and that we have less lowmem on systems that
> are already small, both of which can be fairly expensive unless
> you have lots of highmem already.

So this would require that systems using the API have a highmem? (parisc
doesn't today).

> 2. Add logic to unmap pages from the linear mapping, which is
> very expensive because it forces the use of small pages in the
> linear mapping (or in parts of it), and possibly means walking
> all page tables to remove the PTEs on alloc and put them back
> in on free.
> 
> I believe that Chunsang Jeong from Linaro is planning to
> implement both variants and post them for review, so we can
> decide which one to merge, or even to merge both and make
> it a configuration option. See also
> https://blueprints.launchpad.net/linaro-mm-sig/+spec/engr-mm-dma-mapping-2011.07
> 
> I don't think we need to make merging the CMA patches depending on
> the other patches, it's clear that both need to be solved, and
> they are independent enough.

I assume from the above that ARM has a hardware page walker?

The way I'd fix this on parisc, because we have a software based TLB, is
to rely on the fact that a page may only be used either for DMA or for
Page Cache, so the aliases should never be interleaved.  Since you know
the point at which the page flips from DMA to Cache (and vice versa),
I'd purge the TLB entry and flush the page at that point and rely on the
usage guarantees to ensure that the alias TLB entry doesn't reappear.
This isn't inexpensive but the majority of the cost is the cache flush
which is a requirement to clean the aliases anyway (a TLB entry purge is
pretty cheap).

Would this work for the ARM hardware walker as well?  It would require
you to have a TLB entry purge instruction as well as some architectural
guarantees about not speculating the TLB.

James

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-08-03 17:43         ` James Bottomley
  (?)
@ 2011-09-26 12:06           ` Marek Szyprowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-09-26 12:06 UTC (permalink / raw)
  To: 'James Bottomley', 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'Ankita Garg',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	ksummit-2011-discuss

Hello,

I'm sorry for the late reply. I must have missed this mail...

On Wednesday, August 03, 2011 7:44 PM James Bottomley wrote:

> [cc to ks-discuss added, since this may be a relevant topic]
> 
> On Tue, 2011-07-05 at 14:27 +0200, Arnd Bergmann wrote:
> > On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> > > On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > > > The Contiguous Memory Allocator is a set of helper functions for DMA
> > > > mapping framework that improves allocations of contiguous memory chunks.
> > > >
> > > > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > > > gives back to the system. Kernel is allowed to allocate movable pages
> > > > within CMA's managed memory so that it can be used for example for page
> > > > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > > > request such pages are migrated out of CMA area to free required
> > > > contiguous block and fulfill the request. This allows to allocate large
> > > > contiguous chunks of memory at any time assuming that there is enough
> > > > free memory available in the system.
> > > >
> > > > This code is heavily based on earlier works by Michal Nazarewicz.
> > >
> > > And how are you addressing the technical concerns about aliasing of
> > > cache attributes which I keep bringing up with this and you keep
> > > ignoring and telling me that I'm standing in your way.
> 
> Just to chime in here, parisc has an identical issue.  If the CPU ever
> sees an alias with different attributes for the same page, it will HPMC
> the box (that's basically the bios will kill the system as being
> architecturally inconsistent), so an architecture neutral solution on
> this point is essential to us as well.
>
> > This is of course an important issue, and it's the one item listed as
> > TODO in the introductory mail that sent.
> >
> > It's also a preexisting problem as far as I can tell, and it needs
> > to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> > and __alloc_system_pages as introduced in patch 7.
> >
> > We've discussed this back and forth, and it always comes down to
> > one of two ugly solutions:
> >
> > 1. Put all of the MIGRATE_CMA and pages into highmem and change
> > __alloc_system_pages so it also allocates only from highmem pages.
> > The consequences of this are that we always need to build kernels
> > with highmem enabled and that we have less lowmem on systems that
> > are already small, both of which can be fairly expensive unless
> > you have lots of highmem already.
> 
> So this would require that systems using the API have a highmem? (parisc
> doesn't today).

Yes, such solution will require highmem. It will introduce the highmem 
issues to systems that typically don't use highmem, that's why I searched
for other solutions.
 
> > 2. Add logic to unmap pages from the linear mapping, which is
> > very expensive because it forces the use of small pages in the
> > linear mapping (or in parts of it), and possibly means walking
> > all page tables to remove the PTEs on alloc and put them back
> > in on free.
> >
> > I believe that Chunsang Jeong from Linaro is planning to
> > implement both variants and post them for review, so we can
> > decide which one to merge, or even to merge both and make
> > it a configuration option. See also
> > https://blueprints.launchpad.net/linaro-mm-sig/+spec/engr-mm-dma-mapping-2011.07
> >
> > I don't think we need to make merging the CMA patches depending on
> > the other patches, it's clear that both need to be solved, and
> > they are independent enough.
> 
> I assume from the above that ARM has a hardware page walker?

Right.

> The way I'd fix this on parisc, because we have a software based TLB, is
> to rely on the fact that a page may only be used either for DMA or for
> Page Cache, so the aliases should never be interleaved.  Since you know
> the point at which the page flips from DMA to Cache (and vice versa),
> I'd purge the TLB entry and flush the page at that point and rely on the
> usage guarantees to ensure that the alias TLB entry doesn't reappear.
> This isn't inexpensive but the majority of the cost is the cache flush
> which is a requirement to clean the aliases anyway (a TLB entry purge is
> pretty cheap).
> 
> Would this work for the ARM hardware walker as well?  It would require
> you to have a TLB entry purge instruction as well as some architectural
> guarantees about not speculating the TLB.

The main problem with ARM linear mapping is the fact that it is created 
using 2MiB sections, so entries for kernel linear mapping fits entirely in
first lever of process page table. This implies that direct changing this
linear mapping is not easy task and must be performed for all tasks in the
system. In my CMA v12+ patches I decided to use simpler way of solving this
issue. I rely on the fact that DMA memory is allocated only from CMA regions,
so during early boot I change the kernel linear mappings for these regions.
Instead of 2MiB sections, I use regular 4KiB pages which create 2 level of
page tables. Second level of page table for these regions can be easily
shared for all processes in the system.

This way I can easily update cache attributes for any single 4KiB page that
is used for DMA and avoid any aliasing at all. The only drawback of this
method is larger TLB pressure what might result in some slowdown during
heavy IO if pages with 4KiB linear mapping are used. However with my 
hardware has only slow io (with eMMC I get only about 30MiB/s) so I cannot
notice any impact of the mapping method on the io speed.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-09-26 12:06           ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-09-26 12:06 UTC (permalink / raw)
  To: 'James Bottomley', 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Jonathan Corbet',
	'Mel Gorman', 'Chunsang Jeong',
	'Michal Nazarewicz', 'Jesse Barker',
	'Kyungmin Park', 'Ankita Garg',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	ksummit-2011-discuss

Hello,

I'm sorry for the late reply. I must have missed this mail...

On Wednesday, August 03, 2011 7:44 PM James Bottomley wrote:

> [cc to ks-discuss added, since this may be a relevant topic]
> 
> On Tue, 2011-07-05 at 14:27 +0200, Arnd Bergmann wrote:
> > On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> > > On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > > > The Contiguous Memory Allocator is a set of helper functions for DMA
> > > > mapping framework that improves allocations of contiguous memory chunks.
> > > >
> > > > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > > > gives back to the system. Kernel is allowed to allocate movable pages
> > > > within CMA's managed memory so that it can be used for example for page
> > > > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > > > request such pages are migrated out of CMA area to free required
> > > > contiguous block and fulfill the request. This allows to allocate large
> > > > contiguous chunks of memory at any time assuming that there is enough
> > > > free memory available in the system.
> > > >
> > > > This code is heavily based on earlier works by Michal Nazarewicz.
> > >
> > > And how are you addressing the technical concerns about aliasing of
> > > cache attributes which I keep bringing up with this and you keep
> > > ignoring and telling me that I'm standing in your way.
> 
> Just to chime in here, parisc has an identical issue.  If the CPU ever
> sees an alias with different attributes for the same page, it will HPMC
> the box (that's basically the bios will kill the system as being
> architecturally inconsistent), so an architecture neutral solution on
> this point is essential to us as well.
>
> > This is of course an important issue, and it's the one item listed as
> > TODO in the introductory mail that sent.
> >
> > It's also a preexisting problem as far as I can tell, and it needs
> > to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> > and __alloc_system_pages as introduced in patch 7.
> >
> > We've discussed this back and forth, and it always comes down to
> > one of two ugly solutions:
> >
> > 1. Put all of the MIGRATE_CMA and pages into highmem and change
> > __alloc_system_pages so it also allocates only from highmem pages.
> > The consequences of this are that we always need to build kernels
> > with highmem enabled and that we have less lowmem on systems that
> > are already small, both of which can be fairly expensive unless
> > you have lots of highmem already.
> 
> So this would require that systems using the API have a highmem? (parisc
> doesn't today).

Yes, such solution will require highmem. It will introduce the highmem 
issues to systems that typically don't use highmem, that's why I searched
for other solutions.
 
> > 2. Add logic to unmap pages from the linear mapping, which is
> > very expensive because it forces the use of small pages in the
> > linear mapping (or in parts of it), and possibly means walking
> > all page tables to remove the PTEs on alloc and put them back
> > in on free.
> >
> > I believe that Chunsang Jeong from Linaro is planning to
> > implement both variants and post them for review, so we can
> > decide which one to merge, or even to merge both and make
> > it a configuration option. See also
> > https://blueprints.launchpad.net/linaro-mm-sig/+spec/engr-mm-dma-mapping-2011.07
> >
> > I don't think we need to make merging the CMA patches depending on
> > the other patches, it's clear that both need to be solved, and
> > they are independent enough.
> 
> I assume from the above that ARM has a hardware page walker?

Right.

> The way I'd fix this on parisc, because we have a software based TLB, is
> to rely on the fact that a page may only be used either for DMA or for
> Page Cache, so the aliases should never be interleaved.  Since you know
> the point at which the page flips from DMA to Cache (and vice versa),
> I'd purge the TLB entry and flush the page at that point and rely on the
> usage guarantees to ensure that the alias TLB entry doesn't reappear.
> This isn't inexpensive but the majority of the cost is the cache flush
> which is a requirement to clean the aliases anyway (a TLB entry purge is
> pretty cheap).
> 
> Would this work for the ARM hardware walker as well?  It would require
> you to have a TLB entry purge instruction as well as some architectural
> guarantees about not speculating the TLB.

The main problem with ARM linear mapping is the fact that it is created 
using 2MiB sections, so entries for kernel linear mapping fits entirely in
first lever of process page table. This implies that direct changing this
linear mapping is not easy task and must be performed for all tasks in the
system. In my CMA v12+ patches I decided to use simpler way of solving this
issue. I rely on the fact that DMA memory is allocated only from CMA regions,
so during early boot I change the kernel linear mappings for these regions.
Instead of 2MiB sections, I use regular 4KiB pages which create 2 level of
page tables. Second level of page table for these regions can be easily
shared for all processes in the system.

This way I can easily update cache attributes for any single 4KiB page that
is used for DMA and avoid any aliasing at all. The only drawback of this
method is larger TLB pressure what might result in some slowdown during
heavy IO if pages with 4KiB linear mapping are used. However with my 
hardware has only slow io (with eMMC I get only about 30MiB/s) so I cannot
notice any impact of the mapping method on the io speed.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-09-26 12:06           ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-09-26 12:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

I'm sorry for the late reply. I must have missed this mail...

On Wednesday, August 03, 2011 7:44 PM James Bottomley wrote:

> [cc to ks-discuss added, since this may be a relevant topic]
> 
> On Tue, 2011-07-05 at 14:27 +0200, Arnd Bergmann wrote:
> > On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> > > On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > > > The Contiguous Memory Allocator is a set of helper functions for DMA
> > > > mapping framework that improves allocations of contiguous memory chunks.
> > > >
> > > > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > > > gives back to the system. Kernel is allowed to allocate movable pages
> > > > within CMA's managed memory so that it can be used for example for page
> > > > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > > > request such pages are migrated out of CMA area to free required
> > > > contiguous block and fulfill the request. This allows to allocate large
> > > > contiguous chunks of memory at any time assuming that there is enough
> > > > free memory available in the system.
> > > >
> > > > This code is heavily based on earlier works by Michal Nazarewicz.
> > >
> > > And how are you addressing the technical concerns about aliasing of
> > > cache attributes which I keep bringing up with this and you keep
> > > ignoring and telling me that I'm standing in your way.
> 
> Just to chime in here, parisc has an identical issue.  If the CPU ever
> sees an alias with different attributes for the same page, it will HPMC
> the box (that's basically the bios will kill the system as being
> architecturally inconsistent), so an architecture neutral solution on
> this point is essential to us as well.
>
> > This is of course an important issue, and it's the one item listed as
> > TODO in the introductory mail that sent.
> >
> > It's also a preexisting problem as far as I can tell, and it needs
> > to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> > and __alloc_system_pages as introduced in patch 7.
> >
> > We've discussed this back and forth, and it always comes down to
> > one of two ugly solutions:
> >
> > 1. Put all of the MIGRATE_CMA and pages into highmem and change
> > __alloc_system_pages so it also allocates only from highmem pages.
> > The consequences of this are that we always need to build kernels
> > with highmem enabled and that we have less lowmem on systems that
> > are already small, both of which can be fairly expensive unless
> > you have lots of highmem already.
> 
> So this would require that systems using the API have a highmem? (parisc
> doesn't today).

Yes, such solution will require highmem. It will introduce the highmem 
issues to systems that typically don't use highmem, that's why I searched
for other solutions.
 
> > 2. Add logic to unmap pages from the linear mapping, which is
> > very expensive because it forces the use of small pages in the
> > linear mapping (or in parts of it), and possibly means walking
> > all page tables to remove the PTEs on alloc and put them back
> > in on free.
> >
> > I believe that Chunsang Jeong from Linaro is planning to
> > implement both variants and post them for review, so we can
> > decide which one to merge, or even to merge both and make
> > it a configuration option. See also
> > https://blueprints.launchpad.net/linaro-mm-sig/+spec/engr-mm-dma-mapping-2011.07
> >
> > I don't think we need to make merging the CMA patches depending on
> > the other patches, it's clear that both need to be solved, and
> > they are independent enough.
> 
> I assume from the above that ARM has a hardware page walker?

Right.

> The way I'd fix this on parisc, because we have a software based TLB, is
> to rely on the fact that a page may only be used either for DMA or for
> Page Cache, so the aliases should never be interleaved.  Since you know
> the point at which the page flips from DMA to Cache (and vice versa),
> I'd purge the TLB entry and flush the page at that point and rely on the
> usage guarantees to ensure that the alias TLB entry doesn't reappear.
> This isn't inexpensive but the majority of the cost is the cache flush
> which is a requirement to clean the aliases anyway (a TLB entry purge is
> pretty cheap).
> 
> Would this work for the ARM hardware walker as well?  It would require
> you to have a TLB entry purge instruction as well as some architectural
> guarantees about not speculating the TLB.

The main problem with ARM linear mapping is the fact that it is created 
using 2MiB sections, so entries for kernel linear mapping fits entirely in
first lever of process page table. This implies that direct changing this
linear mapping is not easy task and must be performed for all tasks in the
system. In my CMA v12+ patches I decided to use simpler way of solving this
issue. I rely on the fact that DMA memory is allocated only from CMA regions,
so during early boot I change the kernel linear mappings for these regions.
Instead of 2MiB sections, I use regular 4KiB pages which create 2 level of
page tables. Second level of page table for these regions can be easily
shared for all processes in the system.

This way I can easily update cache attributes for any single 4KiB page that
is used for DMA and avoid any aliasing at all. The only drawback of this
method is larger TLB pressure what might result in some slowdown during
heavy IO if pages with 4KiB linear mapping are used. However with my 
hardware has only slow io (with eMMC I get only about 30MiB/s) so I cannot
notice any impact of the mapping method on the io speed.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
  2011-08-03 17:43         ` James Bottomley
  (?)
@ 2011-09-26 13:00           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-09-26 13:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Arnd Bergmann, Daniel Walker, Jonathan Corbet, Mel Gorman,
	Chunsang Jeong, Jesse Barker, KAMEZAWA Hiroyuki, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	Ankita Garg, Andrew Morton, Marek Szyprowski,
	ksummit-2011-discuss, linux-arm-kernel, linux-media

On Wed, Aug 03, 2011 at 12:43:50PM -0500, James Bottomley wrote:
> I assume from the above that ARM has a hardware page walker?

Correct, and speculative prefetch (which isn't prevented by not having
TLB entries), so you can't keep entries out of the TLB.  If it's in
the page tables it can end up in the TLB.

The problem is that we could end up with conflicting attributes available
to the hardware for the same physical page, and it is _completely_
undefined how hardware behaves with that (except that it does not halt -
and there's no exception path for the condition because there's no
detection of the problem case.)

So, if you had one mapping which was fully cacheable and another mapping
which wasn't, you can flush the TLB all you like - it could be possible
that you still up with an access through the non-cacheable mapping being
cached (either hitting speculatively prefetched cache lines via the
cacheable mapping, or the cacheable attributes being applied to the
non-cacheable mapping - or conversely uncacheable attributes applied to
the cacheable mapping.)

Essentially, the condition is labelled 'unpredictable' in the TRMs,
which basically means that not even observed behaviour can be relied
upon, because there may be cases where the observed behaviour fails.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-09-26 13:00           ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-09-26 13:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Arnd Bergmann, Daniel Walker, Jonathan Corbet, Mel Gorman,
	Chunsang Jeong, Jesse Barker, KAMEZAWA Hiroyuki, linux-kernel,
	Michal Nazarewicz, linaro-mm-sig, linux-mm, Kyungmin Park,
	Ankita Garg, Andrew Morton, Marek Szyprowski,
	ksummit-2011-discuss, linux-arm-kernel, linux-media

On Wed, Aug 03, 2011 at 12:43:50PM -0500, James Bottomley wrote:
> I assume from the above that ARM has a hardware page walker?

Correct, and speculative prefetch (which isn't prevented by not having
TLB entries), so you can't keep entries out of the TLB.  If it's in
the page tables it can end up in the TLB.

The problem is that we could end up with conflicting attributes available
to the hardware for the same physical page, and it is _completely_
undefined how hardware behaves with that (except that it does not halt -
and there's no exception path for the condition because there's no
detection of the problem case.)

So, if you had one mapping which was fully cacheable and another mapping
which wasn't, you can flush the TLB all you like - it could be possible
that you still up with an access through the non-cacheable mapping being
cached (either hitting speculatively prefetched cache lines via the
cacheable mapping, or the cacheable attributes being applied to the
non-cacheable mapping - or conversely uncacheable attributes applied to
the cacheable mapping.)

Essentially, the condition is labelled 'unpredictable' in the TRMs,
which basically means that not even observed behaviour can be relied
upon, because there may be cases where the observed behaviour fails.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 6/8] drivers: add Contiguous Memory Allocator
@ 2011-09-26 13:00           ` Russell King - ARM Linux
  0 siblings, 0 replies; 183+ messages in thread
From: Russell King - ARM Linux @ 2011-09-26 13:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 03, 2011 at 12:43:50PM -0500, James Bottomley wrote:
> I assume from the above that ARM has a hardware page walker?

Correct, and speculative prefetch (which isn't prevented by not having
TLB entries), so you can't keep entries out of the TLB.  If it's in
the page tables it can end up in the TLB.

The problem is that we could end up with conflicting attributes available
to the hardware for the same physical page, and it is _completely_
undefined how hardware behaves with that (except that it does not halt -
and there's no exception path for the condition because there's no
detection of the problem case.)

So, if you had one mapping which was fully cacheable and another mapping
which wasn't, you can flush the TLB all you like - it could be possible
that you still up with an access through the non-cacheable mapping being
cached (either hitting speculatively prefetched cache lines via the
cacheable mapping, or the cacheable attributes being applied to the
non-cacheable mapping - or conversely uncacheable attributes applied to
the cacheable mapping.)

Essentially, the condition is labelled 'unpredictable' in the TRMs,
which basically means that not even observed behaviour can be relied
upon, because there may be cases where the observed behaviour fails.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
  2011-08-19 14:27 [PATCHv15 0/8] Contiguous Memory Allocator Marek Szyprowski
  2011-08-19 14:27   ` Marek Szyprowski
@ 2011-08-19 14:27   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-08-19 14:27 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   41 +++++++++++++++---
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   88 +++++++++++++++++++++++++++++++---------
 5 files changed, 120 insertions(+), 28 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index be1ac8d..74b7f27 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..856d9cf 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index f2f1ca1..dd6e1ea 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35423c2..c9dfed0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	totalram_pages += pageblock_nr_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +850,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +949,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +969,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1001,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1080,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1224,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1322,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5554,8 +5602,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
@ 2011-08-19 14:27   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-08-19 14:27 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   41 +++++++++++++++---
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   88 +++++++++++++++++++++++++++++++---------
 5 files changed, 120 insertions(+), 28 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index be1ac8d..74b7f27 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..856d9cf 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index f2f1ca1..dd6e1ea 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35423c2..c9dfed0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	totalram_pages += pageblock_nr_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +850,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +949,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +969,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1001,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1080,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1224,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1322,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5554,8 +5602,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
@ 2011-08-19 14:27   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-08-19 14:27 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   41 +++++++++++++++---
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   88 +++++++++++++++++++++++++++++++---------
 5 files changed, 120 insertions(+), 28 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index be1ac8d..74b7f27 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..856d9cf 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index f2f1ca1..dd6e1ea 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35423c2..c9dfed0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	totalram_pages += pageblock_nr_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +850,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +949,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +969,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1001,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1080,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1224,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1322,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5554,8 +5602,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
  2011-07-20  8:57 [PATCHv12 " Marek Szyprowski
  2011-07-20  8:57   ` Marek Szyprowski
@ 2011-07-20  8:57   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-20  8:57 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong, Russell King

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/mmzone.h         |   41 +++++++++++++++---
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   88 +++++++++++++++++++++++++++++++---------
 5 files changed, 120 insertions(+), 28 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..5152597 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..856d9cf 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 8ca47a5..6ffedd8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2cea044..e6c403c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	++totalram_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +850,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +949,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +969,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1001,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1080,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1224,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1322,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5486,8 +5534,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
@ 2011-07-20  8:57   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-20  8:57 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Chunsang Jeong, Russell King

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/mmzone.h         |   41 +++++++++++++++---
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   88 +++++++++++++++++++++++++++++++---------
 5 files changed, 120 insertions(+), 28 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..5152597 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..856d9cf 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 8ca47a5..6ffedd8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2cea044..e6c403c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	++totalram_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +850,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +949,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +969,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1001,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1080,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1224,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1322,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5486,8 +5534,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 4/8] mm: MIGRATE_CMA migration type added
@ 2011-07-20  8:57   ` Marek Szyprowski
  0 siblings, 0 replies; 183+ messages in thread
From: Marek Szyprowski @ 2011-07-20  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 include/linux/mmzone.h         |   41 +++++++++++++++---
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   88 +++++++++++++++++++++++++++++++---------
 5 files changed, 120 insertions(+), 28 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..5152597 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..856d9cf 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 8ca47a5..6ffedd8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2cea044..e6c403c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	++totalram_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +850,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +949,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +969,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1001,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1080,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1224,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1322,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5486,8 +5534,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 183+ messages in thread

end of thread, other threads:[~2011-09-26 13:01 UTC | newest]

Thread overview: 183+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-05  7:41 [PATCHv11 0/8] Contiguous Memory Allocator Marek Szyprowski
2011-07-05  7:41 ` Marek Szyprowski
2011-07-05  7:41 ` Marek Szyprowski
2011-07-05  7:41 ` [PATCH 1/8] mm: move some functions from memory_hotplug.c to page_isolation.c Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05 11:27   ` Arnd Bergmann
2011-07-05 11:27     ` Arnd Bergmann
2011-07-05 11:27     ` Arnd Bergmann
2011-07-05  7:41 ` [PATCH 2/8] mm: alloc_contig_freed_pages() added Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05 11:30   ` Arnd Bergmann
2011-07-05 11:30     ` Arnd Bergmann
2011-07-05 11:30     ` Arnd Bergmann
2011-07-05  7:41 ` [PATCH 3/8] mm: alloc_contig_range() added Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05 11:31   ` Arnd Bergmann
2011-07-05 11:31     ` Arnd Bergmann
2011-07-05 11:31     ` Arnd Bergmann
2011-07-05  7:41 ` [PATCH 4/8] mm: MIGRATE_CMA migration type added Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05 11:44   ` Arnd Bergmann
2011-07-05 11:44     ` Arnd Bergmann
2011-07-05 11:44     ` Arnd Bergmann
2011-07-05 12:27     ` Russell King - ARM Linux
2011-07-05 12:27       ` Russell King - ARM Linux
2011-07-05 12:27       ` Russell King - ARM Linux
2011-07-05  7:41 ` [PATCH 5/8] mm: MIGRATE_CMA isolation functions added Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05 11:45   ` Arnd Bergmann
2011-07-05 11:45     ` Arnd Bergmann
2011-07-05 11:45     ` Arnd Bergmann
2011-07-05  7:41 ` [PATCH 6/8] drivers: add Contiguous Memory Allocator Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05 10:24   ` Marek Szyprowski
2011-07-05 10:24     ` Marek Szyprowski
2011-07-05 10:24     ` Marek Szyprowski
2011-07-05 11:02   ` [PATCH 6/8 RESEND] " Marek Szyprowski
2011-07-05 11:02     ` Marek Szyprowski
2011-07-05 11:02     ` Marek Szyprowski
2011-07-05 11:50     ` Arnd Bergmann
2011-07-05 11:50       ` Arnd Bergmann
2011-07-05 11:50       ` Arnd Bergmann
2011-07-05 11:33   ` [PATCH 6/8] " Russell King - ARM Linux
2011-07-05 11:33     ` Russell King - ARM Linux
2011-07-05 11:33     ` Russell King - ARM Linux
2011-07-05 12:27     ` Arnd Bergmann
2011-07-05 12:27       ` Arnd Bergmann
2011-07-05 12:27       ` Arnd Bergmann
2011-07-05 12:30       ` Russell King - ARM Linux
2011-07-05 12:30         ` Russell King - ARM Linux
2011-07-05 12:30         ` Russell King - ARM Linux
2011-07-05 13:58         ` Arnd Bergmann
2011-07-05 13:58           ` Arnd Bergmann
2011-07-05 13:58           ` Arnd Bergmann
2011-07-08 17:25           ` Russell King - ARM Linux
2011-07-08 17:25             ` Russell King - ARM Linux
2011-07-08 17:25             ` Russell King - ARM Linux
2011-07-12 13:39             ` Arnd Bergmann
2011-07-12 13:39               ` Arnd Bergmann
2011-07-12 13:39               ` Arnd Bergmann
2011-08-03 17:43       ` James Bottomley
2011-08-03 17:43         ` James Bottomley
2011-08-03 17:43         ` James Bottomley
2011-09-26 12:06         ` Marek Szyprowski
2011-09-26 12:06           ` Marek Szyprowski
2011-09-26 12:06           ` Marek Szyprowski
2011-09-26 13:00         ` Russell King - ARM Linux
2011-09-26 13:00           ` Russell King - ARM Linux
2011-09-26 13:00           ` Russell King - ARM Linux
2011-07-06 13:58     ` Marek Szyprowski
2011-07-06 13:58       ` Marek Szyprowski
2011-07-06 13:58       ` Marek Szyprowski
2011-07-06 14:09       ` Arnd Bergmann
2011-07-06 14:09         ` Arnd Bergmann
2011-07-06 14:09         ` Arnd Bergmann
2011-07-06 14:23         ` Russell King - ARM Linux
2011-07-06 14:23           ` Russell King - ARM Linux
2011-07-06 14:23           ` Russell King - ARM Linux
2011-07-06 14:37           ` [Linaro-mm-sig] " Nicolas Pitre
2011-07-06 14:37             ` Nicolas Pitre
2011-07-06 14:37             ` Nicolas Pitre
2011-07-06 14:59             ` Arnd Bergmann
2011-07-06 14:59               ` Arnd Bergmann
2011-07-06 14:59               ` Arnd Bergmann
2011-07-09 14:57               ` Janusz Krzysztofik
2011-07-09 14:57                 ` Janusz Krzysztofik
2011-07-09 14:57                 ` Janusz Krzysztofik
2011-07-11 13:47                 ` Marek Szyprowski
2011-07-11 13:47                   ` Marek Szyprowski
2011-07-11 13:47                   ` Marek Szyprowski
2011-07-11 19:01                   ` Janusz Krzysztofik
2011-07-11 19:01                     ` Janusz Krzysztofik
2011-07-11 19:01                     ` Janusz Krzysztofik
2011-07-12  5:34                     ` Marek Szyprowski
2011-07-12  5:34                       ` Marek Szyprowski
2011-07-12  5:34                       ` Marek Szyprowski
2011-07-06 14:51           ` Arnd Bergmann
2011-07-06 14:51             ` Arnd Bergmann
2011-07-06 14:51             ` Arnd Bergmann
2011-07-06 15:48             ` Russell King - ARM Linux
2011-07-06 15:48               ` Russell King - ARM Linux
2011-07-06 15:48               ` Russell King - ARM Linux
2011-07-06 16:05               ` Christoph Lameter
2011-07-06 16:05                 ` Christoph Lameter
2011-07-06 16:05                 ` Christoph Lameter
2011-07-06 16:09                 ` Michal Nazarewicz
2011-07-06 16:09                   ` Michal Nazarewicz
2011-07-06 16:09                   ` Michal Nazarewicz
2011-07-06 16:19                   ` Christoph Lameter
2011-07-06 16:19                     ` Christoph Lameter
2011-07-06 16:19                     ` Christoph Lameter
2011-07-06 17:15                     ` Russell King - ARM Linux
2011-07-06 17:15                       ` Russell King - ARM Linux
2011-07-06 17:15                       ` Russell King - ARM Linux
2011-07-06 19:03                       ` Christoph Lameter
2011-07-06 19:03                         ` Christoph Lameter
2011-07-06 19:03                         ` Christoph Lameter
2011-07-06 17:02                 ` Russell King - ARM Linux
2011-07-06 17:02                   ` Russell King - ARM Linux
2011-07-06 17:02                   ` Russell King - ARM Linux
2011-07-06 16:31               ` Arnd Bergmann
2011-07-06 16:31                 ` Arnd Bergmann
2011-07-06 16:31                 ` Arnd Bergmann
2011-07-06 19:10                 ` Nicolas Pitre
2011-07-06 19:10                   ` Nicolas Pitre
2011-07-06 19:10                   ` Nicolas Pitre
2011-07-06 20:23                   ` [Linaro-mm-sig] " Arnd Bergmann
2011-07-06 20:23                     ` Arnd Bergmann
2011-07-06 20:23                     ` Arnd Bergmann
2011-07-07  5:29                     ` Nicolas Pitre
2011-07-07  5:29                       ` Nicolas Pitre
2011-07-07  5:29                       ` Nicolas Pitre
2011-07-06 14:56         ` Marek Szyprowski
2011-07-06 14:56           ` Marek Szyprowski
2011-07-06 14:56           ` Marek Szyprowski
2011-07-06 15:37           ` Russell King - ARM Linux
2011-07-06 15:37             ` Russell King - ARM Linux
2011-07-06 15:37             ` Russell King - ARM Linux
2011-07-06 15:47             ` Marek Szyprowski
2011-07-06 15:47               ` Marek Szyprowski
2011-07-06 15:47               ` Marek Szyprowski
2011-07-14 12:29   ` Marek Szyprowski
2011-07-14 12:29     ` Marek Szyprowski
2011-07-14 12:29     ` Marek Szyprowski
2011-07-05  7:41 ` [PATCH 7/8] ARM: integrate CMA with dma-mapping subsystem Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05 11:50   ` Arnd Bergmann
2011-07-05 11:50     ` Arnd Bergmann
2011-07-05 11:50     ` Arnd Bergmann
2011-07-05  7:41 ` [PATCH 8/8] ARM: S5PV210: example of CMA private area for FIMC device on Goni board Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05  7:41   ` Marek Szyprowski
2011-07-05 11:51   ` Arnd Bergmann
2011-07-05 11:51     ` Arnd Bergmann
2011-07-05 11:51     ` Arnd Bergmann
2011-07-05 12:07 ` [PATCHv11 0/8] Contiguous Memory Allocator Arnd Bergmann
2011-07-05 12:07   ` Arnd Bergmann
2011-07-05 12:07   ` Arnd Bergmann
2011-07-05 12:28   ` Russell King - ARM Linux
2011-07-05 12:28     ` Russell King - ARM Linux
2011-07-05 12:28     ` Russell King - ARM Linux
2011-07-06 22:11   ` Andrew Morton
2011-07-06 22:11     ` Andrew Morton
2011-07-06 22:11     ` Andrew Morton
2011-07-07  7:36     ` Arnd Bergmann
2011-07-07  7:36       ` Arnd Bergmann
2011-07-07  7:36       ` Arnd Bergmann
2011-07-11 13:24     ` Marek Szyprowski
2011-07-11 13:24       ` Marek Szyprowski
2011-07-11 13:24       ` Marek Szyprowski
2011-07-20  8:57 [PATCHv12 " Marek Szyprowski
2011-07-20  8:57 ` [PATCH 4/8] mm: MIGRATE_CMA migration type added Marek Szyprowski
2011-07-20  8:57   ` Marek Szyprowski
2011-07-20  8:57   ` Marek Szyprowski
2011-08-19 14:27 [PATCHv15 0/8] Contiguous Memory Allocator Marek Szyprowski
2011-08-19 14:27 ` [PATCH 4/8] mm: MIGRATE_CMA migration type added Marek Szyprowski
2011-08-19 14:27   ` Marek Szyprowski
2011-08-19 14:27   ` Marek Szyprowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.