All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv14 0/9] Contiguous Memory Allocator
@ 2011-08-12 10:58 ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

Hello again,

Like I promissed on Linaro Sprint meeting, yet another round of
Contiguous Memory Allocator patches are now available. I hope that this
version finally solves all pending issues and can be widely tested in
ARM platform as well as finally be merged to -next kernel tree for even
more testing.

This version provides a solution for complete integration of CMA to DMA
mapping subsystem on ARM architecture. The issue caused by double dma
pages mapping and possible aliasing in coherent memory mapping has been
finally resolved, both for GFP_ATOMIC case (allocations comes from
reserved DMA memory pool) and non-GFP_ATOMIC case (allocations comes
from CMA managed areas).

All patches are prepared for Linux Kernel v3.1-rc1.

A few words for these who see CMA for the first time:

   The Contiguous Memory Allocator (CMA) makes it possible for device
   drivers to allocate big contiguous chunks of memory after the system
   has booted. 

   The main difference from the similar frameworks is the fact that CMA
   allows to transparently reuse memory region reserved for the big
   chunk allocation as a system memory, so no memory is wasted when no
   big chunk is allocated. Once the alloc request is issued, the
   framework will migrate system pages to create a required big chunk of
   physically contiguous memory.

   For more information you can refer to nice LWN articles: 
   http://lwn.net/Articles/447405/ and http://lwn.net/Articles/450286/
   as well as links to previous versions of the CMA framework.

   The CMA framework has been initially developed by Michal Nazarewicz
   at Samsung Poland R&D Center. Since version 9, I've taken over the
   development, because Michal has left the company.

The current version of CMA is a set of helper functions for DMA mapping
framework that handles allocation of contiguous memory blocks. The
difference between this patchset and Kamezawa's alloc_contig_pages()
are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
   which may be unsuitable for embeded systems where a few MiBs are
   required.

   Lack of the requirement on the alignment means that several threads
   might try to access the same pageblock/page.  To prevent this from
   happening CMA uses a mutex so that only one allocating/releasing
   function may run at one point.

2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
   similarly to ZONE_MOVABLE but can be put in arbitrary places.

   This is required for us since we need to define two disjoint memory
   ranges inside system RAM.  (ie. in two memory banks (do not confuse
   with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
   migrated.  CMA on the other hand maintains its own allocator to
   decide where to allocate memory for device drivers and then tries
   to migrate pages from that part if needed.  This is not strictly
   required but I somehow feel it might be faster.

The integration with ARM DMA-mapping subsystem is done on 2 levels.
During early boot memory reserved for contiguous areas are remapped with
2-level page tables. This enables us to change cache attributes of the
individual pages from such area on request. Then, DMA mapping subsystem
is updated to use dma_alloc_from_contiguous() call instead of
alloc_pages() and perform page attributes remapping.

Current version have been tested on Samsung S5PC110 based Goni machine
and s5p-fimc V4L2 driver. The driver itself uses videobuf2 dma-contig
memory allocator, which in turn relies on dma_alloc_coherent() from
DMA-mapping subsystem. By integrating CMA with DMA-mapping we managed to
get this driver working with CMA without any single change required in
the driver or videobuf2-dma-contig allocator.

TODO (optional):
- implement support for contiguous memory areas placed in HIGHMEM zone

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


Links to previous versions of the patchset:
v13: (internal, intentionally not released)
v12: <http://www.spinics.net/lists/linux-media/msg35674.html>
v11: <http://www.spinics.net/lists/linux-mm/msg21868.html>
v10: <http://www.spinics.net/lists/linux-mm/msg20761.html>
 v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
 v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
 v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v5: (intentionally left out as CMA v5 was identical to CMA v4)
 v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
 v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
 v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
 v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v14:
    1. Merged with "ARM: DMA: steal memory for DMA coherent mappings" 
       patch, added support for GFP_ATOMIC allocations.

    2. Added checks for NULL device pointer

v13: (internal, intentionally not released)

v12:
    1. Fixed 2 nasty bugs in dma-contiguous allocator:
       - alignment argument was not passed correctly
       - range for dma_release_from_contiguous was not checked correctly

    2. Added support for architecture specfic dma_contiguous_early_fixup()
       function

    3. CMA and DMA-mapping integration for ARM architechture has been
       rewritten to take care of the memory aliasing issue that might
       happen for newer ARM CPUs (mapping of the same pages with different
       cache attributes is forbidden). TODO: add support for GFP_ATOMIC
       allocations basing on the "ARM: DMA: steal memory for DMA coherent
       mappings" patch and implement support for contiguous memory areas
       that are placed in HIGHMEM zone

v11:
    1. Removed genalloc usage and replaced it with direct calls to
       bitmap_* functions, dropped patches that are not needed
       anymore (genalloc extensions)

    2. Moved all contiguous area management code from mm/cma.c
       to drivers/base/dma-contiguous.c

    3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous

    4. Introduced global, system wide (default) contiguous area
       configured with kernel config and kernel cmdline parameters

    5. Simplified initialization to just one function:
       dma_declare_contiguous()

    6. Added example of device private memory contiguous area

v10:
    1. Rebased onto 3.0-rc2 and resolved all conflicts

    2. Simplified CMA to be just a pure memory allocator, for use
       with platfrom/bus specific subsystems, like dma-mapping.
       Removed all device specific functions are calls.

    3. Integrated with ARM DMA-mapping subsystem.

    4. Code cleanup here and there.

    5. Removed private context support.

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

    2. Fixed a bunch of nasty bugs that happened when the allocation
       failed (mainly kernel oops due to NULL ptr dereference).

    3. Introduced testing code: cma-regions compatibility layer and
       videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
       CMA and put in page_allocator.c.  This function tries to
       migrate all LRU pages in specified range and then allocate the
       range using alloc_contig_freed_pages().

    2. Support for MIGRATE_CMA has been separated from the CMA code.
       I have not tested if CMA works with ZONE_MOVABLE but I see no
       reasons why it shouldn't.

    3. I have added a @private argument when creating CMA contexts so
       that one can reserve memory and not share it with the rest of
       the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
       mapping has been removed from the patchset.  This is not to say
       that this code is not needed, it's just not worth posting
       everything in one patchset.

       Currently, CMA is "just" an allocator.  It uses it's own
       migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
       which behave just like ZONE_MOVABLE but dispite the latter can
       be put in arbitrary places.

    2. The migration code that was introduced in the previous version
       actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
       The implementation is not yet complete though.

       Migration support means that when CMA is not using memory
       reserved for it, page allocator can allocate pages from it.
       When CMA wants to use the memory, the pages have to be moved
       and/or evicted as to make room for CMA.

       To make it possible it must be guaranteed that only movable and
       reclaimable pages are allocated in CMA controlled regions.
       This is done by introducing a MIGRATE_CMA migrate type that
       guarantees exactly that.

       Some of the migration code is "borrowed" from Kamezawa
       Hiroyuki's alloc_contig_pages() implementation.  The main
       difference is that thanks to MIGRATE_CMA migrate type CMA
       assumes that memory controlled by CMA are is always movable or
       reclaimable so that it makes allocation decisions regardless of
       the whether some pages are actually allocated and migrates them
       if needed.

       The most interesting patches from the patchset that implement
       the functionality are:

         09/13: mm: alloc_contig_free_pages() added
         10/13: mm: MIGRATE_CMA migration type added
         11/13: mm: MIGRATE_CMA isolation functions added
         12/13: mm: cma: Migration support added [wip]

       Currently, kernel panics in some situations which I am trying
       to investigate.

    2. cma_pin() and cma_unpin() functions has been added (after
       a conversation with Johan Mossberg).  The idea is that whenever
       hardware does not use the memory (no transaction is on) the
       chunk can be moved around.  This would allow defragmentation to
       be implemented if desired.  No defragmentation algorithm is
       provided at this time.

    3. Sysfs support has been replaced with debugfs.  I always felt
       unsure about the sysfs interface and when Greg KH pointed it
       out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
       that platform will provide a "*=<regions>" rule in the map
       attribute.

    2. The terminology has been changed slightly renaming "kind" to
       "type" of memory.  In the previous revisions, the documentation
       indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
       a separate patch, the fourth one).  As a consequence, the
       cma_set_defaults() function has been changed -- it no longer
       accepts a string with list of regions but an array of regions.

    2. The "asterisk" attribute has been removed.  Now, each region
       has an "asterisk" flag which lets one specify whether this
       region should by considered "asterisk" region.

    3. SysFS support has been moved to a separate patch (the third one
       in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed.  In exchange,
       a SysFS entry has been created under kernel/mm/contiguous.

       The intended way of specifying the attributes is
       a cma_set_defaults() function called by platform initialisation
       code.  "regions" attribute (the string specified by "cma"
       command line parameter) can be overwritten with command line
       parameter; the other attributes can be changed during run-time
       using the SysFS entries.

    2. The behaviour of the "map" attribute has been modified
       slightly.  Currently, if no rule matches given device it is
       assigned regions specified by the "asterisk" attribute.  It is
       by default built from the region names given in "regions"
       attribute.

    3. Devices can register private regions as well as regions that
       can be shared but are not reserved using standard CMA
       mechanisms.  A private region has no name and can be accessed
       only by devices that have the pointer to it.

    4. The way allocators are registered has changed.  Currently,
       a cma_allocator_register() function is used for that purpose.
       Moreover, allocators are attached to regions the first time
       memory is registered from the region or when allocator is
       registered which means that allocators can be dynamic modules
       that are loaded after the kernel booted (of course, it won't be
       possible to allocate a chunk of memory from a region if
       allocator is not loaded).

    5. Index of new functions:

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions, size_t size,
    +               dma_addr_t alignment)

    +static inline int
    +cma_info_about(struct cma_info *info, const const char *regions)

    +int __must_check cma_region_register(struct cma_region *reg);

    +dma_addr_t __must_check
    +cma_alloc_from_region(struct cma_region *reg,
    +                      size_t size, dma_addr_t alignment);

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions,
    +               size_t size, dma_addr_t alignment);

    +int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

    Code "stolen" from Kamezawa.  The first patch just moves code
    around and the second provide function for "allocates" already
    freed memory.

  mm: alloc_contig_range() added

    This is what Kamezawa asked: a function that tries to migrate all
    pages from given range and then use alloc_contig_freed_pages()
    (defined by the previous commit) to allocate those pages. 

  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

    Introduction of the new migratetype and support for it in CMA.
    MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
    memory range can be marked as one.

  mm: cma: Contiguous Memory Allocator added

    The code CMA code. Manages CMA contexts and performs memory
    allocations.

  ARM: DMA: steal memory for DMA coherent mappings

    Ancillary patch with significant DMA-mapping subsystem redesing,
    resolves double mapping issue by creating DMA exclusive memory
    area.

  ARM: integrate CMA with dma-mapping subsystem

    Main client of CMA frame work. CMA serves as a alloc_pages()
    replacement.

  ARM: S5PV210: example of CMA private area for FIMC device on Goni board

    Example of platform/board specific code that creates cma
    context and assigns it to particular device.


Patch summary:

KAMEZAWA Hiroyuki (2):
  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

Marek Szyprowski (3):
  drivers: add Contiguous Memory Allocator
  ARM: integrate CMA with DMA-mapping subsystem
  ARM: S5PV210: example of CMA private area for FIMC device on Goni
    board

Michal Nazarewicz (3):
  mm: alloc_contig_range() added
  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

Russell King (1):
  ARM: DMA: steal memory for DMA coherent mappings

 arch/Kconfig                          |    3 +
 arch/arm/Kconfig                      |    1 +
 arch/arm/include/asm/device.h         |    3 +
 arch/arm/include/asm/dma-contiguous.h |   33 +++
 arch/arm/include/asm/dma-mapping.h    |    3 +-
 arch/arm/include/asm/mach/map.h       |    3 +
 arch/arm/include/asm/memory.h         |    7 +
 arch/arm/mach-s5pv210/Kconfig         |    1 +
 arch/arm/mach-s5pv210/mach-goni.c     |    4 +
 arch/arm/mm/dma-mapping.c             |  462 ++++++++++++++++++++-------------
 arch/arm/mm/init.c                    |    4 +
 arch/arm/mm/mm.h                      |    5 +
 arch/arm/mm/mmu.c                     |   53 +++-
 drivers/base/Kconfig                  |   77 ++++++
 drivers/base/Makefile                 |    1 +
 drivers/base/dma-contiguous.c         |  396 ++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h        |  106 ++++++++
 include/linux/mmzone.h                |   41 +++-
 include/linux/page-isolation.h        |   51 +++-
 mm/Kconfig                            |    8 +-
 mm/compaction.c                       |   10 +
 mm/memory_hotplug.c                   |  111 --------
 mm/page_alloc.c                       |  287 +++++++++++++++++++--
 mm/page_isolation.c                   |  129 +++++++++-
 24 files changed, 1449 insertions(+), 350 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

-- 
1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCHv14 0/9] Contiguous Memory Allocator
@ 2011-08-12 10:58 ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

Hello again,

Like I promissed on Linaro Sprint meeting, yet another round of
Contiguous Memory Allocator patches are now available. I hope that this
version finally solves all pending issues and can be widely tested in
ARM platform as well as finally be merged to -next kernel tree for even
more testing.

This version provides a solution for complete integration of CMA to DMA
mapping subsystem on ARM architecture. The issue caused by double dma
pages mapping and possible aliasing in coherent memory mapping has been
finally resolved, both for GFP_ATOMIC case (allocations comes from
reserved DMA memory pool) and non-GFP_ATOMIC case (allocations comes
from CMA managed areas).

All patches are prepared for Linux Kernel v3.1-rc1.

A few words for these who see CMA for the first time:

   The Contiguous Memory Allocator (CMA) makes it possible for device
   drivers to allocate big contiguous chunks of memory after the system
   has booted. 

   The main difference from the similar frameworks is the fact that CMA
   allows to transparently reuse memory region reserved for the big
   chunk allocation as a system memory, so no memory is wasted when no
   big chunk is allocated. Once the alloc request is issued, the
   framework will migrate system pages to create a required big chunk of
   physically contiguous memory.

   For more information you can refer to nice LWN articles: 
   http://lwn.net/Articles/447405/ and http://lwn.net/Articles/450286/
   as well as links to previous versions of the CMA framework.

   The CMA framework has been initially developed by Michal Nazarewicz
   at Samsung Poland R&D Center. Since version 9, I've taken over the
   development, because Michal has left the company.

The current version of CMA is a set of helper functions for DMA mapping
framework that handles allocation of contiguous memory blocks. The
difference between this patchset and Kamezawa's alloc_contig_pages()
are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
   which may be unsuitable for embeded systems where a few MiBs are
   required.

   Lack of the requirement on the alignment means that several threads
   might try to access the same pageblock/page.  To prevent this from
   happening CMA uses a mutex so that only one allocating/releasing
   function may run at one point.

2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
   similarly to ZONE_MOVABLE but can be put in arbitrary places.

   This is required for us since we need to define two disjoint memory
   ranges inside system RAM.  (ie. in two memory banks (do not confuse
   with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
   migrated.  CMA on the other hand maintains its own allocator to
   decide where to allocate memory for device drivers and then tries
   to migrate pages from that part if needed.  This is not strictly
   required but I somehow feel it might be faster.

The integration with ARM DMA-mapping subsystem is done on 2 levels.
During early boot memory reserved for contiguous areas are remapped with
2-level page tables. This enables us to change cache attributes of the
individual pages from such area on request. Then, DMA mapping subsystem
is updated to use dma_alloc_from_contiguous() call instead of
alloc_pages() and perform page attributes remapping.

Current version have been tested on Samsung S5PC110 based Goni machine
and s5p-fimc V4L2 driver. The driver itself uses videobuf2 dma-contig
memory allocator, which in turn relies on dma_alloc_coherent() from
DMA-mapping subsystem. By integrating CMA with DMA-mapping we managed to
get this driver working with CMA without any single change required in
the driver or videobuf2-dma-contig allocator.

TODO (optional):
- implement support for contiguous memory areas placed in HIGHMEM zone

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


Links to previous versions of the patchset:
v13: (internal, intentionally not released)
v12: <http://www.spinics.net/lists/linux-media/msg35674.html>
v11: <http://www.spinics.net/lists/linux-mm/msg21868.html>
v10: <http://www.spinics.net/lists/linux-mm/msg20761.html>
 v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
 v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
 v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v5: (intentionally left out as CMA v5 was identical to CMA v4)
 v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
 v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
 v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
 v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v14:
    1. Merged with "ARM: DMA: steal memory for DMA coherent mappings" 
       patch, added support for GFP_ATOMIC allocations.

    2. Added checks for NULL device pointer

v13: (internal, intentionally not released)

v12:
    1. Fixed 2 nasty bugs in dma-contiguous allocator:
       - alignment argument was not passed correctly
       - range for dma_release_from_contiguous was not checked correctly

    2. Added support for architecture specfic dma_contiguous_early_fixup()
       function

    3. CMA and DMA-mapping integration for ARM architechture has been
       rewritten to take care of the memory aliasing issue that might
       happen for newer ARM CPUs (mapping of the same pages with different
       cache attributes is forbidden). TODO: add support for GFP_ATOMIC
       allocations basing on the "ARM: DMA: steal memory for DMA coherent
       mappings" patch and implement support for contiguous memory areas
       that are placed in HIGHMEM zone

v11:
    1. Removed genalloc usage and replaced it with direct calls to
       bitmap_* functions, dropped patches that are not needed
       anymore (genalloc extensions)

    2. Moved all contiguous area management code from mm/cma.c
       to drivers/base/dma-contiguous.c

    3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous

    4. Introduced global, system wide (default) contiguous area
       configured with kernel config and kernel cmdline parameters

    5. Simplified initialization to just one function:
       dma_declare_contiguous()

    6. Added example of device private memory contiguous area

v10:
    1. Rebased onto 3.0-rc2 and resolved all conflicts

    2. Simplified CMA to be just a pure memory allocator, for use
       with platfrom/bus specific subsystems, like dma-mapping.
       Removed all device specific functions are calls.

    3. Integrated with ARM DMA-mapping subsystem.

    4. Code cleanup here and there.

    5. Removed private context support.

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

    2. Fixed a bunch of nasty bugs that happened when the allocation
       failed (mainly kernel oops due to NULL ptr dereference).

    3. Introduced testing code: cma-regions compatibility layer and
       videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
       CMA and put in page_allocator.c.  This function tries to
       migrate all LRU pages in specified range and then allocate the
       range using alloc_contig_freed_pages().

    2. Support for MIGRATE_CMA has been separated from the CMA code.
       I have not tested if CMA works with ZONE_MOVABLE but I see no
       reasons why it shouldn't.

    3. I have added a @private argument when creating CMA contexts so
       that one can reserve memory and not share it with the rest of
       the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
       mapping has been removed from the patchset.  This is not to say
       that this code is not needed, it's just not worth posting
       everything in one patchset.

       Currently, CMA is "just" an allocator.  It uses it's own
       migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
       which behave just like ZONE_MOVABLE but dispite the latter can
       be put in arbitrary places.

    2. The migration code that was introduced in the previous version
       actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
       The implementation is not yet complete though.

       Migration support means that when CMA is not using memory
       reserved for it, page allocator can allocate pages from it.
       When CMA wants to use the memory, the pages have to be moved
       and/or evicted as to make room for CMA.

       To make it possible it must be guaranteed that only movable and
       reclaimable pages are allocated in CMA controlled regions.
       This is done by introducing a MIGRATE_CMA migrate type that
       guarantees exactly that.

       Some of the migration code is "borrowed" from Kamezawa
       Hiroyuki's alloc_contig_pages() implementation.  The main
       difference is that thanks to MIGRATE_CMA migrate type CMA
       assumes that memory controlled by CMA are is always movable or
       reclaimable so that it makes allocation decisions regardless of
       the whether some pages are actually allocated and migrates them
       if needed.

       The most interesting patches from the patchset that implement
       the functionality are:

         09/13: mm: alloc_contig_free_pages() added
         10/13: mm: MIGRATE_CMA migration type added
         11/13: mm: MIGRATE_CMA isolation functions added
         12/13: mm: cma: Migration support added [wip]

       Currently, kernel panics in some situations which I am trying
       to investigate.

    2. cma_pin() and cma_unpin() functions has been added (after
       a conversation with Johan Mossberg).  The idea is that whenever
       hardware does not use the memory (no transaction is on) the
       chunk can be moved around.  This would allow defragmentation to
       be implemented if desired.  No defragmentation algorithm is
       provided at this time.

    3. Sysfs support has been replaced with debugfs.  I always felt
       unsure about the sysfs interface and when Greg KH pointed it
       out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
       that platform will provide a "*=<regions>" rule in the map
       attribute.

    2. The terminology has been changed slightly renaming "kind" to
       "type" of memory.  In the previous revisions, the documentation
       indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
       a separate patch, the fourth one).  As a consequence, the
       cma_set_defaults() function has been changed -- it no longer
       accepts a string with list of regions but an array of regions.

    2. The "asterisk" attribute has been removed.  Now, each region
       has an "asterisk" flag which lets one specify whether this
       region should by considered "asterisk" region.

    3. SysFS support has been moved to a separate patch (the third one
       in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed.  In exchange,
       a SysFS entry has been created under kernel/mm/contiguous.

       The intended way of specifying the attributes is
       a cma_set_defaults() function called by platform initialisation
       code.  "regions" attribute (the string specified by "cma"
       command line parameter) can be overwritten with command line
       parameter; the other attributes can be changed during run-time
       using the SysFS entries.

    2. The behaviour of the "map" attribute has been modified
       slightly.  Currently, if no rule matches given device it is
       assigned regions specified by the "asterisk" attribute.  It is
       by default built from the region names given in "regions"
       attribute.

    3. Devices can register private regions as well as regions that
       can be shared but are not reserved using standard CMA
       mechanisms.  A private region has no name and can be accessed
       only by devices that have the pointer to it.

    4. The way allocators are registered has changed.  Currently,
       a cma_allocator_register() function is used for that purpose.
       Moreover, allocators are attached to regions the first time
       memory is registered from the region or when allocator is
       registered which means that allocators can be dynamic modules
       that are loaded after the kernel booted (of course, it won't be
       possible to allocate a chunk of memory from a region if
       allocator is not loaded).

    5. Index of new functions:

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions, size_t size,
    +               dma_addr_t alignment)

    +static inline int
    +cma_info_about(struct cma_info *info, const const char *regions)

    +int __must_check cma_region_register(struct cma_region *reg);

    +dma_addr_t __must_check
    +cma_alloc_from_region(struct cma_region *reg,
    +                      size_t size, dma_addr_t alignment);

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions,
    +               size_t size, dma_addr_t alignment);

    +int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

    Code "stolen" from Kamezawa.  The first patch just moves code
    around and the second provide function for "allocates" already
    freed memory.

  mm: alloc_contig_range() added

    This is what Kamezawa asked: a function that tries to migrate all
    pages from given range and then use alloc_contig_freed_pages()
    (defined by the previous commit) to allocate those pages. 

  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

    Introduction of the new migratetype and support for it in CMA.
    MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
    memory range can be marked as one.

  mm: cma: Contiguous Memory Allocator added

    The code CMA code. Manages CMA contexts and performs memory
    allocations.

  ARM: DMA: steal memory for DMA coherent mappings

    Ancillary patch with significant DMA-mapping subsystem redesing,
    resolves double mapping issue by creating DMA exclusive memory
    area.

  ARM: integrate CMA with dma-mapping subsystem

    Main client of CMA frame work. CMA serves as a alloc_pages()
    replacement.

  ARM: S5PV210: example of CMA private area for FIMC device on Goni board

    Example of platform/board specific code that creates cma
    context and assigns it to particular device.


Patch summary:

KAMEZAWA Hiroyuki (2):
  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

Marek Szyprowski (3):
  drivers: add Contiguous Memory Allocator
  ARM: integrate CMA with DMA-mapping subsystem
  ARM: S5PV210: example of CMA private area for FIMC device on Goni
    board

Michal Nazarewicz (3):
  mm: alloc_contig_range() added
  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

Russell King (1):
  ARM: DMA: steal memory for DMA coherent mappings

 arch/Kconfig                          |    3 +
 arch/arm/Kconfig                      |    1 +
 arch/arm/include/asm/device.h         |    3 +
 arch/arm/include/asm/dma-contiguous.h |   33 +++
 arch/arm/include/asm/dma-mapping.h    |    3 +-
 arch/arm/include/asm/mach/map.h       |    3 +
 arch/arm/include/asm/memory.h         |    7 +
 arch/arm/mach-s5pv210/Kconfig         |    1 +
 arch/arm/mach-s5pv210/mach-goni.c     |    4 +
 arch/arm/mm/dma-mapping.c             |  462 ++++++++++++++++++++-------------
 arch/arm/mm/init.c                    |    4 +
 arch/arm/mm/mm.h                      |    5 +
 arch/arm/mm/mmu.c                     |   53 +++-
 drivers/base/Kconfig                  |   77 ++++++
 drivers/base/Makefile                 |    1 +
 drivers/base/dma-contiguous.c         |  396 ++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h        |  106 ++++++++
 include/linux/mmzone.h                |   41 +++-
 include/linux/page-isolation.h        |   51 +++-
 mm/Kconfig                            |    8 +-
 mm/compaction.c                       |   10 +
 mm/memory_hotplug.c                   |  111 --------
 mm/page_alloc.c                       |  287 +++++++++++++++++++--
 mm/page_isolation.c                   |  129 +++++++++-
 24 files changed, 1449 insertions(+), 350 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCHv14 0/9] Contiguous Memory Allocator
@ 2011-08-12 10:58 ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hello again,

Like I promissed on Linaro Sprint meeting, yet another round of
Contiguous Memory Allocator patches are now available. I hope that this
version finally solves all pending issues and can be widely tested in
ARM platform as well as finally be merged to -next kernel tree for even
more testing.

This version provides a solution for complete integration of CMA to DMA
mapping subsystem on ARM architecture. The issue caused by double dma
pages mapping and possible aliasing in coherent memory mapping has been
finally resolved, both for GFP_ATOMIC case (allocations comes from
reserved DMA memory pool) and non-GFP_ATOMIC case (allocations comes
from CMA managed areas).

All patches are prepared for Linux Kernel v3.1-rc1.

A few words for these who see CMA for the first time:

   The Contiguous Memory Allocator (CMA) makes it possible for device
   drivers to allocate big contiguous chunks of memory after the system
   has booted. 

   The main difference from the similar frameworks is the fact that CMA
   allows to transparently reuse memory region reserved for the big
   chunk allocation as a system memory, so no memory is wasted when no
   big chunk is allocated. Once the alloc request is issued, the
   framework will migrate system pages to create a required big chunk of
   physically contiguous memory.

   For more information you can refer to nice LWN articles: 
   http://lwn.net/Articles/447405/ and http://lwn.net/Articles/450286/
   as well as links to previous versions of the CMA framework.

   The CMA framework has been initially developed by Michal Nazarewicz
   at Samsung Poland R&D Center. Since version 9, I've taken over the
   development, because Michal has left the company.

The current version of CMA is a set of helper functions for DMA mapping
framework that handles allocation of contiguous memory blocks. The
difference between this patchset and Kamezawa's alloc_contig_pages()
are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
   which may be unsuitable for embeded systems where a few MiBs are
   required.

   Lack of the requirement on the alignment means that several threads
   might try to access the same pageblock/page.  To prevent this from
   happening CMA uses a mutex so that only one allocating/releasing
   function may run at one point.

2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
   similarly to ZONE_MOVABLE but can be put in arbitrary places.

   This is required for us since we need to define two disjoint memory
   ranges inside system RAM.  (ie. in two memory banks (do not confuse
   with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
   migrated.  CMA on the other hand maintains its own allocator to
   decide where to allocate memory for device drivers and then tries
   to migrate pages from that part if needed.  This is not strictly
   required but I somehow feel it might be faster.

The integration with ARM DMA-mapping subsystem is done on 2 levels.
During early boot memory reserved for contiguous areas are remapped with
2-level page tables. This enables us to change cache attributes of the
individual pages from such area on request. Then, DMA mapping subsystem
is updated to use dma_alloc_from_contiguous() call instead of
alloc_pages() and perform page attributes remapping.

Current version have been tested on Samsung S5PC110 based Goni machine
and s5p-fimc V4L2 driver. The driver itself uses videobuf2 dma-contig
memory allocator, which in turn relies on dma_alloc_coherent() from
DMA-mapping subsystem. By integrating CMA with DMA-mapping we managed to
get this driver working with CMA without any single change required in
the driver or videobuf2-dma-contig allocator.

TODO (optional):
- implement support for contiguous memory areas placed in HIGHMEM zone

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


Links to previous versions of the patchset:
v13: (internal, intentionally not released)
v12: <http://www.spinics.net/lists/linux-media/msg35674.html>
v11: <http://www.spinics.net/lists/linux-mm/msg21868.html>
v10: <http://www.spinics.net/lists/linux-mm/msg20761.html>
 v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
 v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
 v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v5: (intentionally left out as CMA v5 was identical to CMA v4)
 v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
 v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
 v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
 v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v14:
    1. Merged with "ARM: DMA: steal memory for DMA coherent mappings" 
       patch, added support for GFP_ATOMIC allocations.

    2. Added checks for NULL device pointer

v13: (internal, intentionally not released)

v12:
    1. Fixed 2 nasty bugs in dma-contiguous allocator:
       - alignment argument was not passed correctly
       - range for dma_release_from_contiguous was not checked correctly

    2. Added support for architecture specfic dma_contiguous_early_fixup()
       function

    3. CMA and DMA-mapping integration for ARM architechture has been
       rewritten to take care of the memory aliasing issue that might
       happen for newer ARM CPUs (mapping of the same pages with different
       cache attributes is forbidden). TODO: add support for GFP_ATOMIC
       allocations basing on the "ARM: DMA: steal memory for DMA coherent
       mappings" patch and implement support for contiguous memory areas
       that are placed in HIGHMEM zone

v11:
    1. Removed genalloc usage and replaced it with direct calls to
       bitmap_* functions, dropped patches that are not needed
       anymore (genalloc extensions)

    2. Moved all contiguous area management code from mm/cma.c
       to drivers/base/dma-contiguous.c

    3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous

    4. Introduced global, system wide (default) contiguous area
       configured with kernel config and kernel cmdline parameters

    5. Simplified initialization to just one function:
       dma_declare_contiguous()

    6. Added example of device private memory contiguous area

v10:
    1. Rebased onto 3.0-rc2 and resolved all conflicts

    2. Simplified CMA to be just a pure memory allocator, for use
       with platfrom/bus specific subsystems, like dma-mapping.
       Removed all device specific functions are calls.

    3. Integrated with ARM DMA-mapping subsystem.

    4. Code cleanup here and there.

    5. Removed private context support.

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

    2. Fixed a bunch of nasty bugs that happened when the allocation
       failed (mainly kernel oops due to NULL ptr dereference).

    3. Introduced testing code: cma-regions compatibility layer and
       videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
       CMA and put in page_allocator.c.  This function tries to
       migrate all LRU pages in specified range and then allocate the
       range using alloc_contig_freed_pages().

    2. Support for MIGRATE_CMA has been separated from the CMA code.
       I have not tested if CMA works with ZONE_MOVABLE but I see no
       reasons why it shouldn't.

    3. I have added a @private argument when creating CMA contexts so
       that one can reserve memory and not share it with the rest of
       the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
       mapping has been removed from the patchset.  This is not to say
       that this code is not needed, it's just not worth posting
       everything in one patchset.

       Currently, CMA is "just" an allocator.  It uses it's own
       migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
       which behave just like ZONE_MOVABLE but dispite the latter can
       be put in arbitrary places.

    2. The migration code that was introduced in the previous version
       actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
       The implementation is not yet complete though.

       Migration support means that when CMA is not using memory
       reserved for it, page allocator can allocate pages from it.
       When CMA wants to use the memory, the pages have to be moved
       and/or evicted as to make room for CMA.

       To make it possible it must be guaranteed that only movable and
       reclaimable pages are allocated in CMA controlled regions.
       This is done by introducing a MIGRATE_CMA migrate type that
       guarantees exactly that.

       Some of the migration code is "borrowed" from Kamezawa
       Hiroyuki's alloc_contig_pages() implementation.  The main
       difference is that thanks to MIGRATE_CMA migrate type CMA
       assumes that memory controlled by CMA are is always movable or
       reclaimable so that it makes allocation decisions regardless of
       the whether some pages are actually allocated and migrates them
       if needed.

       The most interesting patches from the patchset that implement
       the functionality are:

         09/13: mm: alloc_contig_free_pages() added
         10/13: mm: MIGRATE_CMA migration type added
         11/13: mm: MIGRATE_CMA isolation functions added
         12/13: mm: cma: Migration support added [wip]

       Currently, kernel panics in some situations which I am trying
       to investigate.

    2. cma_pin() and cma_unpin() functions has been added (after
       a conversation with Johan Mossberg).  The idea is that whenever
       hardware does not use the memory (no transaction is on) the
       chunk can be moved around.  This would allow defragmentation to
       be implemented if desired.  No defragmentation algorithm is
       provided at this time.

    3. Sysfs support has been replaced with debugfs.  I always felt
       unsure about the sysfs interface and when Greg KH pointed it
       out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
       that platform will provide a "*=<regions>" rule in the map
       attribute.

    2. The terminology has been changed slightly renaming "kind" to
       "type" of memory.  In the previous revisions, the documentation
       indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
       a separate patch, the fourth one).  As a consequence, the
       cma_set_defaults() function has been changed -- it no longer
       accepts a string with list of regions but an array of regions.

    2. The "asterisk" attribute has been removed.  Now, each region
       has an "asterisk" flag which lets one specify whether this
       region should by considered "asterisk" region.

    3. SysFS support has been moved to a separate patch (the third one
       in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed.  In exchange,
       a SysFS entry has been created under kernel/mm/contiguous.

       The intended way of specifying the attributes is
       a cma_set_defaults() function called by platform initialisation
       code.  "regions" attribute (the string specified by "cma"
       command line parameter) can be overwritten with command line
       parameter; the other attributes can be changed during run-time
       using the SysFS entries.

    2. The behaviour of the "map" attribute has been modified
       slightly.  Currently, if no rule matches given device it is
       assigned regions specified by the "asterisk" attribute.  It is
       by default built from the region names given in "regions"
       attribute.

    3. Devices can register private regions as well as regions that
       can be shared but are not reserved using standard CMA
       mechanisms.  A private region has no name and can be accessed
       only by devices that have the pointer to it.

    4. The way allocators are registered has changed.  Currently,
       a cma_allocator_register() function is used for that purpose.
       Moreover, allocators are attached to regions the first time
       memory is registered from the region or when allocator is
       registered which means that allocators can be dynamic modules
       that are loaded after the kernel booted (of course, it won't be
       possible to allocate a chunk of memory from a region if
       allocator is not loaded).

    5. Index of new functions:

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions, size_t size,
    +               dma_addr_t alignment)

    +static inline int
    +cma_info_about(struct cma_info *info, const const char *regions)

    +int __must_check cma_region_register(struct cma_region *reg);

    +dma_addr_t __must_check
    +cma_alloc_from_region(struct cma_region *reg,
    +                      size_t size, dma_addr_t alignment);

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions,
    +               size_t size, dma_addr_t alignment);

    +int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

    Code "stolen" from Kamezawa.  The first patch just moves code
    around and the second provide function for "allocates" already
    freed memory.

  mm: alloc_contig_range() added

    This is what Kamezawa asked: a function that tries to migrate all
    pages from given range and then use alloc_contig_freed_pages()
    (defined by the previous commit) to allocate those pages. 

  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

    Introduction of the new migratetype and support for it in CMA.
    MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
    memory range can be marked as one.

  mm: cma: Contiguous Memory Allocator added

    The code CMA code. Manages CMA contexts and performs memory
    allocations.

  ARM: DMA: steal memory for DMA coherent mappings

    Ancillary patch with significant DMA-mapping subsystem redesing,
    resolves double mapping issue by creating DMA exclusive memory
    area.

  ARM: integrate CMA with dma-mapping subsystem

    Main client of CMA frame work. CMA serves as a alloc_pages()
    replacement.

  ARM: S5PV210: example of CMA private area for FIMC device on Goni board

    Example of platform/board specific code that creates cma
    context and assigns it to particular device.


Patch summary:

KAMEZAWA Hiroyuki (2):
  mm: move some functions from memory_hotplug.c to page_isolation.c
  mm: alloc_contig_freed_pages() added

Marek Szyprowski (3):
  drivers: add Contiguous Memory Allocator
  ARM: integrate CMA with DMA-mapping subsystem
  ARM: S5PV210: example of CMA private area for FIMC device on Goni
    board

Michal Nazarewicz (3):
  mm: alloc_contig_range() added
  mm: MIGRATE_CMA migration type added
  mm: MIGRATE_CMA isolation functions added

Russell King (1):
  ARM: DMA: steal memory for DMA coherent mappings

 arch/Kconfig                          |    3 +
 arch/arm/Kconfig                      |    1 +
 arch/arm/include/asm/device.h         |    3 +
 arch/arm/include/asm/dma-contiguous.h |   33 +++
 arch/arm/include/asm/dma-mapping.h    |    3 +-
 arch/arm/include/asm/mach/map.h       |    3 +
 arch/arm/include/asm/memory.h         |    7 +
 arch/arm/mach-s5pv210/Kconfig         |    1 +
 arch/arm/mach-s5pv210/mach-goni.c     |    4 +
 arch/arm/mm/dma-mapping.c             |  462 ++++++++++++++++++++-------------
 arch/arm/mm/init.c                    |    4 +
 arch/arm/mm/mm.h                      |    5 +
 arch/arm/mm/mmu.c                     |   53 +++-
 drivers/base/Kconfig                  |   77 ++++++
 drivers/base/Makefile                 |    1 +
 drivers/base/dma-contiguous.c         |  396 ++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h        |  106 ++++++++
 include/linux/mmzone.h                |   41 +++-
 include/linux/page-isolation.h        |   51 +++-
 mm/Kconfig                            |    8 +-
 mm/compaction.c                       |   10 +
 mm/memory_hotplug.c                   |  111 --------
 mm/page_alloc.c                       |  287 +++++++++++++++++++--
 mm/page_isolation.c                   |  129 +++++++++-
 24 files changed, 1449 insertions(+), 350 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

-- 
1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
  2011-08-12 10:58 ` Marek Szyprowski
  (?)
@ 2011-08-12 10:58   ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
[m.nazarewicz: reworded commit message]
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: rebased and updated to Linux v3.0-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    7 +++
 mm/memory_hotplug.c            |  111 --------------------------------------
 mm/page_isolation.c            |  114 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 111 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6e7d8b2..3419dd6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -707,117 +707,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!get_page_unless_zero(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			put_page(page);
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			put_page(page);
-			/* Because we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..270a026 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,114 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!get_page_unless_zero(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			put_page(page);
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			put_page(page);
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								true, true);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
[m.nazarewicz: reworded commit message]
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: rebased and updated to Linux v3.0-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    7 +++
 mm/memory_hotplug.c            |  111 --------------------------------------
 mm/page_isolation.c            |  114 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 111 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6e7d8b2..3419dd6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -707,117 +707,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!get_page_unless_zero(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			put_page(page);
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			put_page(page);
-			/* Because we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..270a026 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,114 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!get_page_unless_zero(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			put_page(page);
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			put_page(page);
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								true, true);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
[m.nazarewicz: reworded commit message]
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: rebased and updated to Linux v3.0-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    7 +++
 mm/memory_hotplug.c            |  111 --------------------------------------
 mm/page_isolation.c            |  114 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 111 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6e7d8b2..3419dd6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -707,117 +707,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!get_page_unless_zero(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			put_page(page);
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			put_page(page);
-			/* Because we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..270a026 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,114 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!get_page_unless_zero(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			put_page(page);
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			put_page(page);
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								true, true);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
  2011-08-12 10:58 ` Marek Szyprowski
  (?)
@ 2011-08-12 10:58   ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range.  Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    3 ++
 mm/page_alloc.c                |   44 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+					      unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
  * For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e8ecb6..ad6ae3f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5668,6 +5668,50 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+				       gfp_t flag)
+{
+	unsigned long pfn = start, count;
+	struct page *page;
+	struct zone *zone;
+	int order;
+
+	VM_BUG_ON(!pfn_valid(start));
+	zone = page_zone(pfn_to_page(start));
+
+	spin_lock_irq(&zone->lock);
+
+	page = pfn_to_page(pfn);
+	for (;;) {
+		VM_BUG_ON(page_count(page) || !PageBuddy(page));
+		list_del(&page->lru);
+		order = page_order(page);
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+		pfn  += 1 << order;
+		if (pfn >= end)
+			break;
+		VM_BUG_ON(!pfn_valid(pfn));
+		page += 1 << order;
+	}
+
+	spin_unlock_irq(&zone->lock);
+
+	/* After this, pages in the range can be freed one be one */
+	page = pfn_to_page(start);
+	for (count = pfn - start; count; --count, ++page)
+		prep_new_page(page, 0, flag);
+
+	return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+	for (; nr_pages; --nr_pages, ++page)
+		__free_page(page);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range.  Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    3 ++
 mm/page_alloc.c                |   44 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+					      unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
  * For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e8ecb6..ad6ae3f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5668,6 +5668,50 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+				       gfp_t flag)
+{
+	unsigned long pfn = start, count;
+	struct page *page;
+	struct zone *zone;
+	int order;
+
+	VM_BUG_ON(!pfn_valid(start));
+	zone = page_zone(pfn_to_page(start));
+
+	spin_lock_irq(&zone->lock);
+
+	page = pfn_to_page(pfn);
+	for (;;) {
+		VM_BUG_ON(page_count(page) || !PageBuddy(page));
+		list_del(&page->lru);
+		order = page_order(page);
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+		pfn  += 1 << order;
+		if (pfn >= end)
+			break;
+		VM_BUG_ON(!pfn_valid(pfn));
+		page += 1 << order;
+	}
+
+	spin_unlock_irq(&zone->lock);
+
+	/* After this, pages in the range can be freed one be one */
+	page = pfn_to_page(start);
+	for (count = pfn - start; count; --count, ++page)
+		prep_new_page(page, 0, flag);
+
+	return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+	for (; nr_pages; --nr_pages, ++page)
+		__free_page(page);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/9] mm: alloc_contig_freed_pages() added
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range.  Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    3 ++
 mm/page_alloc.c                |   44 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+					      unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
  * For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e8ecb6..ad6ae3f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5668,6 +5668,50 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+				       gfp_t flag)
+{
+	unsigned long pfn = start, count;
+	struct page *page;
+	struct zone *zone;
+	int order;
+
+	VM_BUG_ON(!pfn_valid(start));
+	zone = page_zone(pfn_to_page(start));
+
+	spin_lock_irq(&zone->lock);
+
+	page = pfn_to_page(pfn);
+	for (;;) {
+		VM_BUG_ON(page_count(page) || !PageBuddy(page));
+		list_del(&page->lru);
+		order = page_order(page);
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+		pfn  += 1 << order;
+		if (pfn >= end)
+			break;
+		VM_BUG_ON(!pfn_valid(pfn));
+		page += 1 << order;
+	}
+
+	spin_unlock_irq(&zone->lock);
+
+	/* After this, pages in the range can be freed one be one */
+	page = pfn_to_page(start);
+	for (count = pfn - start; count; --count, ++page)
+		prep_new_page(page, 0, flag);
+
+	return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+	for (; nr_pages; --nr_pages, ++page)
+		__free_page(page);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 3/9] mm: alloc_contig_range() added
  2011-08-12 10:58 ` Marek Szyprowski
  (?)
@ 2011-08-12 10:58   ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit adds the alloc_contig_range() function which tries
to allecate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: renamed some variables for easier code reading]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    2 +
 mm/page_alloc.c                |  144 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 146 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..c5d1a7c 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+			      gfp_t flags);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ad6ae3f..35423c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5706,6 +5706,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
 	return pfn;
 }
 
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY	5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+	int migration_failed = 0, ret;
+	unsigned long pfn = start;
+
+	/*
+	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
+	 * __alloc_contig_pages().
+	 */
+
+	for (;;) {
+		pfn = scan_lru_pages(pfn, end);
+		if (!pfn || pfn >= end)
+			break;
+
+		ret = do_migrate_range(pfn, end);
+		if (!ret) {
+			migration_failed = 0;
+		} else if (ret != -EBUSY
+			|| ++migration_failed >= MIGRATION_RETRY) {
+			return ret;
+		} else {
+			/* There are unstable pages.on pagevec. */
+			lru_add_drain_all();
+			/*
+			 * there may be pages on pcplist before
+			 * we mark the range as ISOLATED.
+			 */
+			drain_all_pages();
+		}
+		cond_resched();
+	}
+
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+
+	/* Make sure all pages are isolated */
+	if (WARN_ON(test_pages_isolated(start, end)))
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ * @flags:	flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+		       gfp_t flags)
+{
+	unsigned long outer_start, outer_end;
+	int ret;
+
+	/*
+	 * What we do here is we mark all pageblocks in range as
+	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
+	 * align the range to MAX_ORDER pages so that page allocator
+	 * won't try to merge buddies from different pageblocks and
+	 * change MIGRATE_ISOLATE to some other migration type.
+	 *
+	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+	 * migrate the pages from an unaligned range (ie. pages that
+	 * we are interested in).  This will put all the pages in
+	 * range back to page allocator as MIGRATE_ISOLATE.
+	 *
+	 * When this is done, we take the pages in range from page
+	 * allocator removing them from the buddy system.  This way
+	 * page allocator will never consider using them.
+	 *
+	 * This lets us mark the pageblocks back as
+	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+	 * MAX_ORDER aligned range but not in the unaligned, original
+	 * range are put back to page allocator so that buddy can use
+	 * them.
+	 */
+
+	ret = start_isolate_page_range(pfn_to_maxpage(start),
+				       pfn_to_maxpage_up(end));
+	if (ret)
+		goto done;
+
+	ret = __alloc_contig_migrate_range(start, end);
+	if (ret)
+		goto done;
+
+	/*
+	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
+	 * more, all pages in [start, end) are free in page allocator.
+	 * What we are going to do is to allocate all pages from
+	 * [start, end) (that is remove them from page allocater).
+	 *
+	 * The only problem is that pages at the beginning and at the
+	 * end of interesting range may be not aligned with pages that
+	 * page allocator holds, ie. they can be part of higher order
+	 * pages.  Because of this, we reserve the bigger range and
+	 * once this is done free the pages we are not interested in.
+	 */
+
+	ret = 0;
+	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+		if (WARN_ON(++ret >= MAX_ORDER))
+			return -EINVAL;
+
+	outer_start = start & (~0UL << ret);
+	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	/* Free head and tail (if any) */
+	if (start != outer_start)
+		free_contig_pages(pfn_to_page(outer_start), start - outer_start);
+	if (end != outer_end)
+		free_contig_pages(pfn_to_page(end), outer_end - end);
+
+	ret = 0;
+done:
+	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	return ret;
+}
+
 void free_contig_pages(struct page *page, int nr_pages)
 {
 	for (; nr_pages; --nr_pages, ++page)
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 3/9] mm: alloc_contig_range() added
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit adds the alloc_contig_range() function which tries
to allecate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: renamed some variables for easier code reading]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    2 +
 mm/page_alloc.c                |  144 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 146 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..c5d1a7c 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+			      gfp_t flags);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ad6ae3f..35423c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5706,6 +5706,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
 	return pfn;
 }
 
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY	5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+	int migration_failed = 0, ret;
+	unsigned long pfn = start;
+
+	/*
+	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
+	 * __alloc_contig_pages().
+	 */
+
+	for (;;) {
+		pfn = scan_lru_pages(pfn, end);
+		if (!pfn || pfn >= end)
+			break;
+
+		ret = do_migrate_range(pfn, end);
+		if (!ret) {
+			migration_failed = 0;
+		} else if (ret != -EBUSY
+			|| ++migration_failed >= MIGRATION_RETRY) {
+			return ret;
+		} else {
+			/* There are unstable pages.on pagevec. */
+			lru_add_drain_all();
+			/*
+			 * there may be pages on pcplist before
+			 * we mark the range as ISOLATED.
+			 */
+			drain_all_pages();
+		}
+		cond_resched();
+	}
+
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+
+	/* Make sure all pages are isolated */
+	if (WARN_ON(test_pages_isolated(start, end)))
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ * @flags:	flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+		       gfp_t flags)
+{
+	unsigned long outer_start, outer_end;
+	int ret;
+
+	/*
+	 * What we do here is we mark all pageblocks in range as
+	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
+	 * align the range to MAX_ORDER pages so that page allocator
+	 * won't try to merge buddies from different pageblocks and
+	 * change MIGRATE_ISOLATE to some other migration type.
+	 *
+	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+	 * migrate the pages from an unaligned range (ie. pages that
+	 * we are interested in).  This will put all the pages in
+	 * range back to page allocator as MIGRATE_ISOLATE.
+	 *
+	 * When this is done, we take the pages in range from page
+	 * allocator removing them from the buddy system.  This way
+	 * page allocator will never consider using them.
+	 *
+	 * This lets us mark the pageblocks back as
+	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+	 * MAX_ORDER aligned range but not in the unaligned, original
+	 * range are put back to page allocator so that buddy can use
+	 * them.
+	 */
+
+	ret = start_isolate_page_range(pfn_to_maxpage(start),
+				       pfn_to_maxpage_up(end));
+	if (ret)
+		goto done;
+
+	ret = __alloc_contig_migrate_range(start, end);
+	if (ret)
+		goto done;
+
+	/*
+	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
+	 * more, all pages in [start, end) are free in page allocator.
+	 * What we are going to do is to allocate all pages from
+	 * [start, end) (that is remove them from page allocater).
+	 *
+	 * The only problem is that pages at the beginning and at the
+	 * end of interesting range may be not aligned with pages that
+	 * page allocator holds, ie. they can be part of higher order
+	 * pages.  Because of this, we reserve the bigger range and
+	 * once this is done free the pages we are not interested in.
+	 */
+
+	ret = 0;
+	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+		if (WARN_ON(++ret >= MAX_ORDER))
+			return -EINVAL;
+
+	outer_start = start & (~0UL << ret);
+	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	/* Free head and tail (if any) */
+	if (start != outer_start)
+		free_contig_pages(pfn_to_page(outer_start), start - outer_start);
+	if (end != outer_end)
+		free_contig_pages(pfn_to_page(end), outer_end - end);
+
+	ret = 0;
+done:
+	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	return ret;
+}
+
 void free_contig_pages(struct page *page, int nr_pages)
 {
 	for (; nr_pages; --nr_pages, ++page)
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 3/9] mm: alloc_contig_range() added
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit adds the alloc_contig_range() function which tries
to allecate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: renamed some variables for easier code reading]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |    2 +
 mm/page_alloc.c                |  144 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 146 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..c5d1a7c 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+			      gfp_t flags);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ad6ae3f..35423c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5706,6 +5706,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
 	return pfn;
 }
 
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY	5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+	int migration_failed = 0, ret;
+	unsigned long pfn = start;
+
+	/*
+	 * Some code "borrowed" from KAMEZAWA Hiroyuki's
+	 * __alloc_contig_pages().
+	 */
+
+	for (;;) {
+		pfn = scan_lru_pages(pfn, end);
+		if (!pfn || pfn >= end)
+			break;
+
+		ret = do_migrate_range(pfn, end);
+		if (!ret) {
+			migration_failed = 0;
+		} else if (ret != -EBUSY
+			|| ++migration_failed >= MIGRATION_RETRY) {
+			return ret;
+		} else {
+			/* There are unstable pages.on pagevec. */
+			lru_add_drain_all();
+			/*
+			 * there may be pages on pcplist before
+			 * we mark the range as ISOLATED.
+			 */
+			drain_all_pages();
+		}
+		cond_resched();
+	}
+
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+
+	/* Make sure all pages are isolated */
+	if (WARN_ON(test_pages_isolated(start, end)))
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ * @flags:	flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+		       gfp_t flags)
+{
+	unsigned long outer_start, outer_end;
+	int ret;
+
+	/*
+	 * What we do here is we mark all pageblocks in range as
+	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
+	 * align the range to MAX_ORDER pages so that page allocator
+	 * won't try to merge buddies from different pageblocks and
+	 * change MIGRATE_ISOLATE to some other migration type.
+	 *
+	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+	 * migrate the pages from an unaligned range (ie. pages that
+	 * we are interested in).  This will put all the pages in
+	 * range back to page allocator as MIGRATE_ISOLATE.
+	 *
+	 * When this is done, we take the pages in range from page
+	 * allocator removing them from the buddy system.  This way
+	 * page allocator will never consider using them.
+	 *
+	 * This lets us mark the pageblocks back as
+	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+	 * MAX_ORDER aligned range but not in the unaligned, original
+	 * range are put back to page allocator so that buddy can use
+	 * them.
+	 */
+
+	ret = start_isolate_page_range(pfn_to_maxpage(start),
+				       pfn_to_maxpage_up(end));
+	if (ret)
+		goto done;
+
+	ret = __alloc_contig_migrate_range(start, end);
+	if (ret)
+		goto done;
+
+	/*
+	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
+	 * more, all pages in [start, end) are free in page allocator.
+	 * What we are going to do is to allocate all pages from
+	 * [start, end) (that is remove them from page allocater).
+	 *
+	 * The only problem is that pages at the beginning and at the
+	 * end of interesting range may be not aligned with pages that
+	 * page allocator holds, ie. they can be part of higher order
+	 * pages.  Because of this, we reserve the bigger range and
+	 * once this is done free the pages we are not interested in.
+	 */
+
+	ret = 0;
+	while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+		if (WARN_ON(++ret >= MAX_ORDER))
+			return -EINVAL;
+
+	outer_start = start & (~0UL << ret);
+	outer_end   = alloc_contig_freed_pages(outer_start, end, flags);
+
+	/* Free head and tail (if any) */
+	if (start != outer_start)
+		free_contig_pages(pfn_to_page(outer_start), start - outer_start);
+	if (end != outer_end)
+		free_contig_pages(pfn_to_page(end), outer_end - end);
+
+	ret = 0;
+done:
+	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	return ret;
+}
+
 void free_contig_pages(struct page *page, int nr_pages)
 {
 	for (; nr_pages; --nr_pages, ++page)
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 4/9] mm: MIGRATE_CMA migration type added
  2011-08-12 10:58 ` Marek Szyprowski
  (?)
@ 2011-08-12 10:58   ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   41 +++++++++++++++---
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   88 +++++++++++++++++++++++++++++++---------
 5 files changed, 120 insertions(+), 28 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index be1ac8d..74b7f27 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..856d9cf 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index f2f1ca1..dd6e1ea 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35423c2..aecf32a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	++totalram_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +850,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +949,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +969,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1001,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1080,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1224,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1322,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5554,8 +5602,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   41 +++++++++++++++---
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   88 +++++++++++++++++++++++++++++++---------
 5 files changed, 120 insertions(+), 28 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index be1ac8d..74b7f27 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..856d9cf 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index f2f1ca1..dd6e1ea 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35423c2..aecf32a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	++totalram_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +850,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +949,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +969,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1001,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1080,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1224,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1322,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5554,8 +5602,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 4/9] mm: MIGRATE_CMA migration type added
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmzone.h         |   41 +++++++++++++++---
 include/linux/page-isolation.h |    1 +
 mm/Kconfig                     |    8 +++-
 mm/compaction.c                |   10 +++++
 mm/page_alloc.c                |   88 +++++++++++++++++++++++++++++++---------
 5 files changed, 120 insertions(+), 28 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index be1ac8d..74b7f27 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..856d9cf 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,4 +46,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+extern void init_cma_reserved_pageblock(struct page *page);
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index f2f1ca1..dd6e1ea 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -189,7 +189,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
@@ -198,6 +198,12 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config CMA_MIGRATE_TYPE
+	bool
+	help
+	  This enables the use the MIGRATE_CMA migrate type, which lets lets CMA
+	  work on almost arbitrary memory range and not only inside ZONE_MOVABLE.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..9e5cc59 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
 		return false;
 
+	/* Keep MIGRATE_CMA alone as well. */
+	/*
+	 * XXX Revisit.  We currently cannot let compaction touch CMA
+	 * pages since compaction insists on changing their migration
+	 * type to MIGRATE_MOVABLE (see split_free_page() called from
+	 * isolate_freepages_block() above).
+	 */
+	if (is_migrate_cma(migratetype))
+		return false;
+
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35423c2..aecf32a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,6 +719,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	}
 }
 
+#ifdef CONFIG_CMA_MIGRATE_TYPE
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	struct page *p = page;
+	unsigned i = pageblock_nr_pages;
+
+	prefetchw(p);
+	do {
+		if (--i)
+			prefetchw(p + 1);
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	++totalram_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -827,11 +850,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -926,12 +949,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -946,19 +969,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
-								start_migratetype);
+							     start_migratetype);
 
-				/* Claim the whole block if over half of it is free */
+				/*
+				 * Claim the whole block if over half
+				 * of it is free
+				 */
 				if (pages >= (1 << (pageblock_order-1)) ||
-						page_group_by_mobility_disabled)
+				    page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
-								start_migratetype);
+							start_migratetype);
 
 				migratetype = start_migratetype;
 			}
@@ -968,11 +1001,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1044,7 +1080,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1185,9 +1224,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * offlined but treat RESERVE as movable pages so we can get those
 	 * areas back if necessary. Otherwise, we may have to free
 	 * excessively into the page allocator
+	 *
+	 * Still, do not change migration type of MIGRATE_CMA pages (if
+	 * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+	 * be allocated from MIGRATE_CMA block and we don't want to allow
+	 * that).  In this respect, treat MIGRATE_CMA like
+	 * MIGRATE_ISOLATE.
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE
+			  || is_migrate_cma(migratetype))) {
 			free_one_page(zone, page, 0, migratetype);
 			goto out;
 		}
@@ -1276,7 +1322,9 @@ int split_free_page(struct page *page)
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			if (!is_pageblock_cma(page))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
 	}
 
 	return 1 << order;
@@ -5554,8 +5602,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 5/9] mm: MIGRATE_CMA isolation functions added
  2011-08-12 10:58 ` Marek Szyprowski
  (?)
@ 2011-08-12 10:58   ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |   40 +++++++++++++++++++++++++++-------------
 mm/page_alloc.c                |   19 ++++++++++++-------
 mm/page_isolation.c            |   15 ++++++++-------
 3 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 856d9cf..b2a81fd 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,39 +3,53 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
  * free all pages in the range. test_page_isolated() can be used for
  * test it.
  */
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+	__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags);
+			      gfp_t flags, unsigned migratetype);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aecf32a..e3d756a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5702,7 +5702,7 @@ out:
 	return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags;
@@ -5710,8 +5710,8 @@ void unset_migratetype_isolate(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	set_pageblock_migratetype(page, migratetype);
+	move_freepages_block(zone, page, migratetype);
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -5816,6 +5816,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
  * @flags:	flags passed to alloc_contig_freed_pages().
+ * @migratetype:	migratetype of the underlaying pageblocks (either
+ *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ *			in range must have the same migratetype and it must
+ *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, hovewer it's callers responsibility to guarantee that we
@@ -5827,7 +5831,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags)
+		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
 	int ret;
@@ -5855,8 +5859,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * them.
 	 */
 
-	ret = start_isolate_page_range(pfn_to_maxpage(start),
-				       pfn_to_maxpage_up(end));
+	ret = __start_isolate_page_range(pfn_to_maxpage(start),
+					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
 		goto done;
 
@@ -5894,7 +5898,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
-	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+				  migratetype);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 270a026..e232b25 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 }
 
 /*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * start_pfn/end_pfn must be aligned to pageblock_order.
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn));
+		__unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
 	return -EBUSY;
 }
@@ -67,8 +68,8 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page);
+		__unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 5/9] mm: MIGRATE_CMA isolation functions added
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |   40 +++++++++++++++++++++++++++-------------
 mm/page_alloc.c                |   19 ++++++++++++-------
 mm/page_isolation.c            |   15 ++++++++-------
 3 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 856d9cf..b2a81fd 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,39 +3,53 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
  * free all pages in the range. test_page_isolated() can be used for
  * test it.
  */
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+	__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags);
+			      gfp_t flags, unsigned migratetype);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aecf32a..e3d756a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5702,7 +5702,7 @@ out:
 	return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags;
@@ -5710,8 +5710,8 @@ void unset_migratetype_isolate(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	set_pageblock_migratetype(page, migratetype);
+	move_freepages_block(zone, page, migratetype);
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -5816,6 +5816,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
  * @flags:	flags passed to alloc_contig_freed_pages().
+ * @migratetype:	migratetype of the underlaying pageblocks (either
+ *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ *			in range must have the same migratetype and it must
+ *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, hovewer it's callers responsibility to guarantee that we
@@ -5827,7 +5831,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags)
+		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
 	int ret;
@@ -5855,8 +5859,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * them.
 	 */
 
-	ret = start_isolate_page_range(pfn_to_maxpage(start),
-				       pfn_to_maxpage_up(end));
+	ret = __start_isolate_page_range(pfn_to_maxpage(start),
+					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
 		goto done;
 
@@ -5894,7 +5898,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
-	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+				  migratetype);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 270a026..e232b25 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 }
 
 /*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * start_pfn/end_pfn must be aligned to pageblock_order.
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn));
+		__unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
 	return -EBUSY;
 }
@@ -67,8 +68,8 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page);
+		__unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 5/9] mm: MIGRATE_CMA isolation functions added
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

From: Michal Nazarewicz <m.nazarewicz@samsung.com>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/page-isolation.h |   40 +++++++++++++++++++++++++++-------------
 mm/page_alloc.c                |   19 ++++++++++++-------
 mm/page_isolation.c            |   15 ++++++++-------
 3 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 856d9cf..b2a81fd 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,39 +3,53 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
  * free all pages in the range. test_page_isolated() can be used for
  * test it.
  */
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+	__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
 					      unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      gfp_t flags);
+			      gfp_t flags, unsigned migratetype);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aecf32a..e3d756a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5702,7 +5702,7 @@ out:
 	return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags;
@@ -5710,8 +5710,8 @@ void unset_migratetype_isolate(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	set_pageblock_migratetype(page, migratetype);
+	move_freepages_block(zone, page, migratetype);
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -5816,6 +5816,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
  * @flags:	flags passed to alloc_contig_freed_pages().
+ * @migratetype:	migratetype of the underlaying pageblocks (either
+ *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ *			in range must have the same migratetype and it must
+ *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, hovewer it's callers responsibility to guarantee that we
@@ -5827,7 +5831,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-		       gfp_t flags)
+		       gfp_t flags, unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
 	int ret;
@@ -5855,8 +5859,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * them.
 	 */
 
-	ret = start_isolate_page_range(pfn_to_maxpage(start),
-				       pfn_to_maxpage_up(end));
+	ret = __start_isolate_page_range(pfn_to_maxpage(start),
+					 pfn_to_maxpage_up(end), migratetype);
 	if (ret)
 		goto done;
 
@@ -5894,7 +5898,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = 0;
 done:
-	undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+	__undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+				  migratetype);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 270a026..e232b25 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 }
 
 /*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * start_pfn/end_pfn must be aligned to pageblock_order.
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			       unsigned migratetype)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn));
+		__unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
 	return -EBUSY;
 }
@@ -67,8 +68,8 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			      unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page);
+		__unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
  2011-08-12 10:58 ` Marek Szyprowski
  (?)
@ 2011-08-12 10:58   ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/Kconfig                   |    3 +
 drivers/base/Kconfig           |   77 ++++++++
 drivers/base/Makefile          |    1 +
 drivers/base/dma-contiguous.c  |  396 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h |  106 +++++++++++
 5 files changed, 583 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 4b0669c..a3b39a2 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
 config HAVE_DMA_ATTRS
 	bool
 
+config HAVE_DMA_CONTIGUOUS
+	bool
+
 config USE_GENERIC_SMP_HELPERS
 	bool
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 21cf46f..02c0552 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,4 +174,81 @@ config SYS_HYPERVISOR
 
 source "drivers/base/regmap/Kconfig"
 
+config CMA
+	bool "Contiguous Memory Allocator"
+	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 99a375a..794546f 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..7fdeaba
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,396 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/dma-contiguous.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-contiguous.h>
+
+#ifndef SZ_1M
+#define SZ_1M (1 << 20)
+#endif
+
+#ifdef phys_to_pfn
+/* nothing to do */
+#elif defined __phys_to_pfn
+#  define phys_to_pfn __phys_to_pfn
+#elif defined PFN_PHYS
+#  define phys_to_pfn PFN_PHYS
+#else
+#  error correct phys_to_pfn implementation needed
+#endif
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+static unsigned long __init __cma_early_get_total_pages(void)
+{
+	struct memblock_region *reg;
+	unsigned long total_pages = 0;
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+	return total_pages;
+}
+
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from early allocator. It should be
+ * called by arch specific code once the early allocator (memblock or bootmem)
+ * has been activated and all other subsystems have already allocated/reserved
+ * memory.
+ */
+void __init dma_contiguous_reserve(void)
+{
+	unsigned long selected_size = 0;
+	unsigned long total_pages;
+
+	pr_debug("%s()\n", __func__);
+
+	total_pages = __cma_early_get_total_pages();
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
+		 (total_pages << PAGE_SHIFT) / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
+	selected_size = size_percent;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MIN
+	selected_size = min(size_abs, size_percent);
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MAX
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count;
+	struct zone *zone;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		VM_BUG_ON(!pfn_valid(pfn));
+		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		if (!(pfn & (pageblock_nr_pages - 1)))
+			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		++pfn;
+	} while (--i);
+
+	return 0;
+}
+
+#else
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned i = count >> pageblock_order;
+	struct page *p = pfn_to_page(base_pfn);
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	do {
+		init_cma_reserved_pageblock(p);
+		p += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+}
+
+#endif
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	phys_addr_t start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[MAX_CMA_AREAS] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(phys_to_pfn(r->start),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			pr_debug("%s: created area %p\n", __func__, cma);
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code when early allocator (memblock or bootmem)
+ * is still activate.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t base)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(%p+%p)\n", __func__, (void *)base, (void *)size);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << (MAX_ORDER + 1);
+	base = ALIGN(base, alignment);
+	size  = ALIGN(size , alignment);
+
+	/* Reserve memory */
+	if (base) {
+		if (memblock_is_region_reserved(base, size) ||
+		    memblock_reserve(base, size) < 0)
+			return -EBUSY;
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		phys_addr_t addr = __memblock_alloc_base(size, alignment, 0);
+		if (!addr) {
+			return -ENOMEM;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			return -EOVERFLOW;
+		} else {
+			base = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = base;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
+	       size / SZ_1M, (void *)base);
+
+	/*
+	 * Architecture specific contiguous memory fixup.
+	 */
+	dma_contiguous_early_fixup(base, size);
+	return 0;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	unsigned int align;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (order > CONFIG_CMA_ALIGNMENT)
+		order = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, order);
+
+	align = (1 << order) - 1;
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    align);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returning [%p]\n", __func__, pfn_to_page(pfn));
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion releases memory allocated by dma_alloc_from_contiguous().
+ * It return 0 when provided pages doen't belongs to contiguous area and
+ * 1 on success.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s([%p])\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pages, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..39aa128
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,106 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+extern struct cma *dma_contiguous_default_area;
+
+#ifdef CONFIG_CMA
+
+void dma_contiguous_reserve(void);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#define cma_available()	(1)
+
+#else
+
+static inline void dma_contiguous_reserve(void) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   unsigned long base)
+{
+	return -EINVAL;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#define cma_available()	(0)
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/Kconfig                   |    3 +
 drivers/base/Kconfig           |   77 ++++++++
 drivers/base/Makefile          |    1 +
 drivers/base/dma-contiguous.c  |  396 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h |  106 +++++++++++
 5 files changed, 583 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 4b0669c..a3b39a2 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
 config HAVE_DMA_ATTRS
 	bool
 
+config HAVE_DMA_CONTIGUOUS
+	bool
+
 config USE_GENERIC_SMP_HELPERS
 	bool
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 21cf46f..02c0552 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,4 +174,81 @@ config SYS_HYPERVISOR
 
 source "drivers/base/regmap/Kconfig"
 
+config CMA
+	bool "Contiguous Memory Allocator"
+	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 99a375a..794546f 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..7fdeaba
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,396 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/dma-contiguous.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-contiguous.h>
+
+#ifndef SZ_1M
+#define SZ_1M (1 << 20)
+#endif
+
+#ifdef phys_to_pfn
+/* nothing to do */
+#elif defined __phys_to_pfn
+#  define phys_to_pfn __phys_to_pfn
+#elif defined PFN_PHYS
+#  define phys_to_pfn PFN_PHYS
+#else
+#  error correct phys_to_pfn implementation needed
+#endif
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+static unsigned long __init __cma_early_get_total_pages(void)
+{
+	struct memblock_region *reg;
+	unsigned long total_pages = 0;
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+	return total_pages;
+}
+
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from early allocator. It should be
+ * called by arch specific code once the early allocator (memblock or bootmem)
+ * has been activated and all other subsystems have already allocated/reserved
+ * memory.
+ */
+void __init dma_contiguous_reserve(void)
+{
+	unsigned long selected_size = 0;
+	unsigned long total_pages;
+
+	pr_debug("%s()\n", __func__);
+
+	total_pages = __cma_early_get_total_pages();
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
+		 (total_pages << PAGE_SHIFT) / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
+	selected_size = size_percent;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MIN
+	selected_size = min(size_abs, size_percent);
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MAX
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count;
+	struct zone *zone;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		VM_BUG_ON(!pfn_valid(pfn));
+		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		if (!(pfn & (pageblock_nr_pages - 1)))
+			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		++pfn;
+	} while (--i);
+
+	return 0;
+}
+
+#else
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned i = count >> pageblock_order;
+	struct page *p = pfn_to_page(base_pfn);
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	do {
+		init_cma_reserved_pageblock(p);
+		p += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+}
+
+#endif
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	phys_addr_t start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[MAX_CMA_AREAS] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(phys_to_pfn(r->start),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			pr_debug("%s: created area %p\n", __func__, cma);
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code when early allocator (memblock or bootmem)
+ * is still activate.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t base)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(%p+%p)\n", __func__, (void *)base, (void *)size);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << (MAX_ORDER + 1);
+	base = ALIGN(base, alignment);
+	size  = ALIGN(size , alignment);
+
+	/* Reserve memory */
+	if (base) {
+		if (memblock_is_region_reserved(base, size) ||
+		    memblock_reserve(base, size) < 0)
+			return -EBUSY;
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		phys_addr_t addr = __memblock_alloc_base(size, alignment, 0);
+		if (!addr) {
+			return -ENOMEM;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			return -EOVERFLOW;
+		} else {
+			base = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = base;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
+	       size / SZ_1M, (void *)base);
+
+	/*
+	 * Architecture specific contiguous memory fixup.
+	 */
+	dma_contiguous_early_fixup(base, size);
+	return 0;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	unsigned int align;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (order > CONFIG_CMA_ALIGNMENT)
+		order = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, order);
+
+	align = (1 << order) - 1;
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    align);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returning [%p]\n", __func__, pfn_to_page(pfn));
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion releases memory allocated by dma_alloc_from_contiguous().
+ * It return 0 when provided pages doen't belongs to contiguous area and
+ * 1 on success.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s([%p])\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pages, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..39aa128
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,106 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+extern struct cma *dma_contiguous_default_area;
+
+#ifdef CONFIG_CMA
+
+void dma_contiguous_reserve(void);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#define cma_available()	(1)
+
+#else
+
+static inline void dma_contiguous_reserve(void) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   unsigned long base)
+{
+	return -EINVAL;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#define cma_available()	(0)
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 6/9] drivers: add Contiguous Memory Allocator
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/Kconfig                   |    3 +
 drivers/base/Kconfig           |   77 ++++++++
 drivers/base/Makefile          |    1 +
 drivers/base/dma-contiguous.c  |  396 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h |  106 +++++++++++
 5 files changed, 583 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 4b0669c..a3b39a2 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
 config HAVE_DMA_ATTRS
 	bool
 
+config HAVE_DMA_CONTIGUOUS
+	bool
+
 config USE_GENERIC_SMP_HELPERS
 	bool
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 21cf46f..02c0552 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,4 +174,81 @@ config SYS_HYPERVISOR
 
 source "drivers/base/regmap/Kconfig"
 
+config CMA
+	bool "Contiguous Memory Allocator"
+	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 99a375a..794546f 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..7fdeaba
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,396 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/dma-contiguous.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-contiguous.h>
+
+#ifndef SZ_1M
+#define SZ_1M (1 << 20)
+#endif
+
+#ifdef phys_to_pfn
+/* nothing to do */
+#elif defined __phys_to_pfn
+#  define phys_to_pfn __phys_to_pfn
+#elif defined PFN_PHYS
+#  define phys_to_pfn PFN_PHYS
+#else
+#  error correct phys_to_pfn implementation needed
+#endif
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+static unsigned long __init __cma_early_get_total_pages(void)
+{
+	struct memblock_region *reg;
+	unsigned long total_pages = 0;
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+	return total_pages;
+}
+
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from early allocator. It should be
+ * called by arch specific code once the early allocator (memblock or bootmem)
+ * has been activated and all other subsystems have already allocated/reserved
+ * memory.
+ */
+void __init dma_contiguous_reserve(void)
+{
+	unsigned long selected_size = 0;
+	unsigned long total_pages;
+
+	pr_debug("%s()\n", __func__);
+
+	total_pages = __cma_early_get_total_pages();
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
+		 (total_pages << PAGE_SHIFT) / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
+	selected_size = size_percent;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MIN
+	selected_size = min(size_abs, size_percent);
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MAX
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count;
+	struct zone *zone;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		VM_BUG_ON(!pfn_valid(pfn));
+		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		if (!(pfn & (pageblock_nr_pages - 1)))
+			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		++pfn;
+	} while (--i);
+
+	return 0;
+}
+
+#else
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned i = count >> pageblock_order;
+	struct page *p = pfn_to_page(base_pfn);
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	do {
+		init_cma_reserved_pageblock(p);
+		p += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+}
+
+#endif
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	phys_addr_t start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[MAX_CMA_AREAS] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(phys_to_pfn(r->start),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			pr_debug("%s: created area %p\n", __func__, cma);
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code when early allocator (memblock or bootmem)
+ * is still activate.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t base)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(%p+%p)\n", __func__, (void *)base, (void *)size);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << (MAX_ORDER + 1);
+	base = ALIGN(base, alignment);
+	size  = ALIGN(size , alignment);
+
+	/* Reserve memory */
+	if (base) {
+		if (memblock_is_region_reserved(base, size) ||
+		    memblock_reserve(base, size) < 0)
+			return -EBUSY;
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		phys_addr_t addr = __memblock_alloc_base(size, alignment, 0);
+		if (!addr) {
+			return -ENOMEM;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			return -EOVERFLOW;
+		} else {
+			base = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = base;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
+	       size / SZ_1M, (void *)base);
+
+	/*
+	 * Architecture specific contiguous memory fixup.
+	 */
+	dma_contiguous_early_fixup(base, size);
+	return 0;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	unsigned int align;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (order > CONFIG_CMA_ALIGNMENT)
+		order = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, order);
+
+	align = (1 << order) - 1;
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    align);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returning [%p]\n", __func__, pfn_to_page(pfn));
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion releases memory allocated by dma_alloc_from_contiguous().
+ * It return 0 when provided pages doen't belongs to contiguous area and
+ * 1 on success.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s([%p])\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pages, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..39aa128
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,106 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+extern struct cma *dma_contiguous_default_area;
+
+#ifdef CONFIG_CMA
+
+void dma_contiguous_reserve(void);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#define cma_available()	(1)
+
+#else
+
+static inline void dma_contiguous_reserve(void) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   unsigned long base)
+{
+	return -EINVAL;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#define cma_available()	(0)
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-12 10:58 ` Marek Szyprowski
  (?)
@ 2011-08-12 10:58   ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: Russell King <rmk+kernel@arm.linux.org.uk>

Steal memory from the kernel to provide coherent DMA memory to drivers.
This avoids the problem with multiple mappings with differing attributes
on later CPUs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[m.szyprowski: rebased onto 3.1-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/include/asm/dma-mapping.h |    3 +-
 arch/arm/include/asm/mach/map.h    |    2 +
 arch/arm/include/asm/memory.h      |    7 +
 arch/arm/mm/dma-mapping.c          |  313 +++++++++++++++++++-----------------
 arch/arm/mm/init.c                 |    1 +
 arch/arm/mm/mm.h                   |    2 +
 arch/arm/mm/mmu.c                  |   24 +++
 7 files changed, 202 insertions(+), 150 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 7a21d0b..2f50659 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -199,8 +199,7 @@ int dma_mmap_coherent(struct device *, struct vm_area_struct *,
 extern void *dma_alloc_writecombine(struct device *, size_t, dma_addr_t *,
 		gfp_t);
 
-#define dma_free_writecombine(dev,size,cpu_addr,handle) \
-	dma_free_coherent(dev,size,cpu_addr,handle)
+extern void dma_free_writecombine(struct device *, size_t, void *, dma_addr_t);
 
 int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
 		void *, dma_addr_t, size_t);
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index d2fedb5..3845215 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -29,6 +29,8 @@ struct map_desc {
 #define MT_MEMORY_NONCACHED	11
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
+#define MT_DMA_COHERENT		14
+#define MT_WC_COHERENT		15
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index b8de516..334b288 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -88,6 +88,13 @@
 #define CONSISTENT_END		(0xffe00000UL)
 #define CONSISTENT_BASE		(CONSISTENT_END - CONSISTENT_DMA_SIZE)
 
+#ifndef CONSISTENT_WC_SIZE
+#define CONSISTENT_WC_SIZE	SZ_2M
+#endif
+
+#define CONSISTENT_WC_END	CONSISTENT_BASE
+#define CONSISTENT_WC_BASE	(CONSISTENT_WC_END - CONSISTENT_WC_SIZE)
+
 #else /* CONFIG_MMU */
 
 /*
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0a0a1e7..b643262 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -18,12 +18,14 @@
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
 #include <linux/highmem.h>
+#include <linux/memblock.h>
 
 #include <asm/memory.h>
 #include <asm/highmem.h>
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
+#include <asm/mach/map.h>
 
 #include "mm.h"
 
@@ -117,93 +119,128 @@ static void __dma_free_buffer(struct page *page, size_t size)
 }
 
 #ifdef CONFIG_MMU
-/* Sanity check size */
-#if (CONSISTENT_DMA_SIZE % SZ_2M)
-#error "CONSISTENT_DMA_SIZE must be multiple of 2MiB"
+/* Sanity check sizes */
+#if CONSISTENT_DMA_SIZE % SECTION_SIZE
+#error "CONSISTENT_DMA_SIZE must be a multiple of the section size"
+#endif
+#if CONSISTENT_WC_SIZE % SECTION_SIZE
+#error "CONSISTENT_WC_SIZE must be a multiple of the section size"
+#endif
+#if ((CONSISTENT_DMA_SIZE + CONSISTENT_WC_SIZE) % SZ_2M)
+#error "Sum of CONSISTENT_DMA_SIZE and CONSISTENT_WC_SIZE must be multiple of 2MiB"
 #endif
 
-#define CONSISTENT_OFFSET(x)	(((unsigned long)(x) - CONSISTENT_BASE) >> PAGE_SHIFT)
-#define CONSISTENT_PTE_INDEX(x) (((unsigned long)(x) - CONSISTENT_BASE) >> PGDIR_SHIFT)
-#define NUM_CONSISTENT_PTES (CONSISTENT_DMA_SIZE >> PGDIR_SHIFT)
+#include "vmregion.h"
 
-/*
- * These are the page tables (2MB each) covering uncached, DMA consistent allocations
- */
-static pte_t *consistent_pte[NUM_CONSISTENT_PTES];
+struct dma_coherent_area {
+	struct arm_vmregion_head vm;
+	unsigned long pfn;
+	unsigned int type;
+	const char *name;
+};
 
-#include "vmregion.h"
+static struct dma_coherent_area coherent_wc_head = {
+	.vm = {
+		.vm_start	= CONSISTENT_WC_BASE,
+		.vm_end		= CONSISTENT_WC_END,
+	},
+	.type = MT_WC_COHERENT,
+	.name = "WC ",
+};
 
-static struct arm_vmregion_head consistent_head = {
-	.vm_lock	= __SPIN_LOCK_UNLOCKED(&consistent_head.vm_lock),
-	.vm_list	= LIST_HEAD_INIT(consistent_head.vm_list),
-	.vm_start	= CONSISTENT_BASE,
-	.vm_end		= CONSISTENT_END,
+static struct dma_coherent_area coherent_dma_head = {
+	.vm = {
+		.vm_start	= CONSISTENT_BASE,
+		.vm_end		= CONSISTENT_END,
+	},
+	.type = MT_DMA_COHERENT,
+	.name = "DMA coherent ",
 };
 
-#ifdef CONFIG_HUGETLB_PAGE
-#error ARM Coherent DMA allocator does not (yet) support huge TLB
-#endif
+static struct dma_coherent_area *coherent_areas[2] __initdata =
+	{ &coherent_wc_head, &coherent_dma_head };
 
-/*
- * Initialise the consistent memory allocation.
- */
-static int __init consistent_init(void)
+static struct dma_coherent_area *coherent_map[2];
+#define coherent_wc_area coherent_map[0]
+#define coherent_dma_area coherent_map[1]
+
+void dma_coherent_reserve(void)
 {
-	int ret = 0;
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-	int i = 0;
-	u32 base = CONSISTENT_BASE;
+	phys_addr_t base, max_addr;
+	unsigned long size;
+	int can_share, i;
 
-	do {
-		pgd = pgd_offset(&init_mm, base);
+	if (arch_is_coherent())
+		return;
 
-		pud = pud_alloc(&init_mm, pgd, base);
-		if (!pud) {
-			printk(KERN_ERR "%s: no pud tables\n", __func__);
-			ret = -ENOMEM;
-			break;
-		}
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+	/* ARMv6: only when DMA_MEM_BUFFERABLE is enabled */
+	can_share = cpu_architecture() >= CPU_ARCH_ARMv6;
+#else
+	/* ARMv7+: WC and DMA areas have the same properties, so can share */
+	can_share = cpu_architecture() >= CPU_ARCH_ARMv7;
+#endif
+	if (can_share) {
+		coherent_wc_head.name = "DMA coherent/WC ";
+		coherent_wc_head.vm.vm_end = coherent_dma_head.vm.vm_end;
+		coherent_dma_head.vm.vm_start = coherent_dma_head.vm.vm_end;
+		coherent_dma_area = coherent_wc_area = &coherent_wc_head;
+	} else {
+		memcpy(coherent_map, coherent_areas, sizeof(coherent_map));
+	}
 
-		pmd = pmd_alloc(&init_mm, pud, base);
-		if (!pmd) {
-			printk(KERN_ERR "%s: no pmd tables\n", __func__);
-			ret = -ENOMEM;
-			break;
-		}
-		WARN_ON(!pmd_none(*pmd));
+#ifdef CONFIG_ARM_DMA_ZONE_SIZE
+	max_addr = PHYS_OFFSET + ARM_DMA_ZONE_SIZE - 1;
+#else
+	max_addr = MEMBLOCK_ALLOC_ANYWHERE;
+#endif
+	for (i = 0; i < ARRAY_SIZE(coherent_areas); i++) {
+		struct dma_coherent_area *area = coherent_areas[i];
 
-		pte = pte_alloc_kernel(pmd, base);
-		if (!pte) {
-			printk(KERN_ERR "%s: no pte tables\n", __func__);
-			ret = -ENOMEM;
-			break;
-		}
+		size = area->vm.vm_end - area->vm.vm_start;
+		if (!size)
+			continue;
 
-		consistent_pte[i++] = pte;
-		base += (1 << PGDIR_SHIFT);
-	} while (base < CONSISTENT_END);
+		spin_lock_init(&area->vm.vm_lock);
+		INIT_LIST_HEAD(&area->vm.vm_list);
 
-	return ret;
+		base = memblock_alloc_base(size, SZ_1M, max_addr);
+		memblock_free(base, size);
+		memblock_remove(base, size);
+
+		area->pfn = __phys_to_pfn(base);
+
+		pr_info("DMA: %luMiB %smemory allocated at 0x%08llx phys\n",
+			size / 1048576, area->name, (unsigned long long)base);
+	}
 }
 
-core_initcall(consistent_init);
+void __init dma_coherent_mapping(void)
+{
+	struct map_desc map[ARRAY_SIZE(coherent_areas)];
+	int nr;
 
-static void *
-__dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
+	for (nr = 0; nr < ARRAY_SIZE(map); nr++) {
+		struct dma_coherent_area *area = coherent_areas[nr];
+
+		map[nr].pfn = area->pfn;
+		map[nr].virtual = area->vm.vm_start;
+		map[nr].length = area->vm.vm_end - area->vm.vm_start;
+		map[nr].type = area->type;
+		if (map[nr].length == 0)
+			break;
+	}
+
+	iotable_init(map, nr);
+}
+
+static void *dma_alloc_area(size_t size, unsigned long *pfn, gfp_t gfp,
+	struct dma_coherent_area *area)
 {
 	struct arm_vmregion *c;
 	size_t align;
 	int bit;
 
-	if (!consistent_pte[0]) {
-		printk(KERN_ERR "%s: not initialised\n", __func__);
-		dump_stack();
-		return NULL;
-	}
-
 	/*
 	 * Align the virtual region allocation - maximum alignment is
 	 * a section size, minimum is a page size.  This helps reduce
@@ -218,45 +255,21 @@ __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
 	/*
 	 * Allocate a virtual address in the consistent mapping region.
 	 */
-	c = arm_vmregion_alloc(&consistent_head, align, size,
+	c = arm_vmregion_alloc(&area->vm, align, size,
 			    gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
-	if (c) {
-		pte_t *pte;
-		int idx = CONSISTENT_PTE_INDEX(c->vm_start);
-		u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
-
-		pte = consistent_pte[idx] + off;
-		c->vm_pages = page;
-
-		do {
-			BUG_ON(!pte_none(*pte));
-
-			set_pte_ext(pte, mk_pte(page, prot), 0);
-			page++;
-			pte++;
-			off++;
-			if (off >= PTRS_PER_PTE) {
-				off = 0;
-				pte = consistent_pte[++idx];
-			}
-		} while (size -= PAGE_SIZE);
-
-		dsb();
+	if (!c)
+		return NULL;
 
-		return (void *)c->vm_start;
-	}
-	return NULL;
+	memset((void *)c->vm_start, 0, size);
+	*pfn = area->pfn + ((c->vm_start - area->vm.vm_start) >> PAGE_SHIFT);
+	return (void *)c->vm_start;
 }
 
-static void __dma_free_remap(void *cpu_addr, size_t size)
+static void dma_free_area(void *cpu_addr, size_t size, struct dma_coherent_area *area)
 {
 	struct arm_vmregion *c;
-	unsigned long addr;
-	pte_t *ptep;
-	int idx;
-	u32 off;
 
-	c = arm_vmregion_find_remove(&consistent_head, (unsigned long)cpu_addr);
+	c = arm_vmregion_find_remove(&area->vm, (unsigned long)cpu_addr);
 	if (!c) {
 		printk(KERN_ERR "%s: trying to free invalid coherent area: %p\n",
 		       __func__, cpu_addr);
@@ -271,61 +284,62 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
 		size = c->vm_end - c->vm_start;
 	}
 
-	idx = CONSISTENT_PTE_INDEX(c->vm_start);
-	off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
-	ptep = consistent_pte[idx] + off;
-	addr = c->vm_start;
-	do {
-		pte_t pte = ptep_get_and_clear(&init_mm, addr, ptep);
-
-		ptep++;
-		addr += PAGE_SIZE;
-		off++;
-		if (off >= PTRS_PER_PTE) {
-			off = 0;
-			ptep = consistent_pte[++idx];
-		}
+	arm_vmregion_free(&area->vm, c);
+}
 
-		if (pte_none(pte) || !pte_present(pte))
-			printk(KERN_CRIT "%s: bad page in kernel page table\n",
-			       __func__);
-	} while (size -= PAGE_SIZE);
+#define nommu() (0)
 
-	flush_tlb_kernel_range(c->vm_start, c->vm_end);
+#else	/* !CONFIG_MMU */
 
-	arm_vmregion_free(&consistent_head, c);
-}
+#define dma_alloc_area(size, pfn, gfp, area)	({ *(pfn) = 0; NULL })
+#define dma_free_area(addr, size, area)		do { } while (0)
 
-#else	/* !CONFIG_MMU */
+#define nommu()	(1)
+#define coherent_wc_area NULL
+#define coherent_dma_area NULL
 
-#define __dma_alloc_remap(page, size, gfp, prot)	page_address(page)
-#define __dma_free_remap(addr, size)			do { } while (0)
+void dma_coherent_reserve(void)
+{
+}
 
 #endif	/* CONFIG_MMU */
 
 static void *
 __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
-	    pgprot_t prot)
+	    struct dma_coherent_area *area)
 {
-	struct page *page;
-	void *addr;
+	unsigned long pfn;
+	void *ret;
 
 	*handle = ~0;
 	size = PAGE_ALIGN(size);
 
-	page = __dma_alloc_buffer(dev, size, gfp);
-	if (!page)
-		return NULL;
+	if (arch_is_coherent() || nommu()) {
+		struct page *page = __dma_alloc_buffer(dev, size, gfp);
+		if (!page)
+			return NULL;
+		pfn = page_to_pfn(page);
+		ret = page_address(page);
+	} else {
+		ret = dma_alloc_area(size, &pfn, gfp, area);
+	}
 
-	if (!arch_is_coherent())
-		addr = __dma_alloc_remap(page, size, gfp, prot);
-	else
-		addr = page_address(page);
+	if (ret)
+		*handle = pfn_to_dma(dev, pfn);
 
-	if (addr)
-		*handle = pfn_to_dma(dev, page_to_pfn(page));
+	return ret;
+}
+
+static void __dma_free(struct device *dev, size_t size, void *cpu_addr,
+	dma_addr_t handle, struct dma_coherent_area *area)
+{
+	size = PAGE_ALIGN(size);
 
-	return addr;
+	if (arch_is_coherent() || nommu()) {
+		__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+	} else {
+		dma_free_area(cpu_addr, size, area);
+	}
 }
 
 /*
@@ -340,8 +354,7 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gf
 	if (dma_alloc_from_coherent(dev, size, handle, &memory))
 		return memory;
 
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_dmacoherent(pgprot_kernel));
+	return __dma_alloc(dev, size, handle, gfp, coherent_dma_area);
 }
 EXPORT_SYMBOL(dma_alloc_coherent);
 
@@ -352,13 +365,13 @@ EXPORT_SYMBOL(dma_alloc_coherent);
 void *
 dma_alloc_writecombine(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
 {
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_writecombine(pgprot_kernel));
+	return __dma_alloc(dev, size, handle, gfp, coherent_wc_area);
 }
 EXPORT_SYMBOL(dma_alloc_writecombine);
 
 static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
-		    void *cpu_addr, dma_addr_t dma_addr, size_t size)
+		    void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		    struct dma_coherent_area *area)
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
@@ -367,7 +380,7 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 
 	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 
-	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
+	c = arm_vmregion_find(&area->vm, (unsigned long)cpu_addr);
 	if (c) {
 		unsigned long off = vma->vm_pgoff;
 
@@ -390,7 +403,7 @@ int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
 		      void *cpu_addr, dma_addr_t dma_addr, size_t size)
 {
 	vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
+	return dma_mmap(dev, vma, cpu_addr, dma_addr, size, coherent_dma_area);
 }
 EXPORT_SYMBOL(dma_mmap_coherent);
 
@@ -398,7 +411,7 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
 			  void *cpu_addr, dma_addr_t dma_addr, size_t size)
 {
 	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
+	return dma_mmap(dev, vma, cpu_addr, dma_addr, size, coherent_wc_area);
 }
 EXPORT_SYMBOL(dma_mmap_writecombine);
 
@@ -413,14 +426,18 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
-	size = PAGE_ALIGN(size);
+	__dma_free(dev, size, cpu_addr, handle, coherent_dma_area);
+}
+EXPORT_SYMBOL(dma_free_coherent);
 
-	if (!arch_is_coherent())
-		__dma_free_remap(cpu_addr, size);
+void dma_free_writecombine(struct device *dev, size_t size, void *cpu_addr,
+	dma_addr_t handle)
+{
+	WARN_ON(irqs_disabled());
 
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+	__dma_free(dev, size, cpu_addr, handle, coherent_wc_area);
 }
-EXPORT_SYMBOL(dma_free_coherent);
+EXPORT_SYMBOL(dma_free_writecombine);
 
 /*
  * Make an area consistent for devices.
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 2fee782..77076a6 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -365,6 +365,7 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 
 	arm_mm_memblock_reserve();
 	arm_dt_memblock_reserve();
+	dma_coherent_reserve();
 
 	/* reserve any platform specific memblock areas */
 	if (mdesc->reserve)
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 0105667..3abaa2c 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -31,3 +31,5 @@ extern u32 arm_dma_limit;
 
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
+void dma_coherent_reserve(void);
+void dma_coherent_mapping(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 594d677..027f118 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -273,6 +273,16 @@ static struct mem_type mem_types[] = {
 		.prot_l1   = PMD_TYPE_TABLE,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_DMA_COHERENT] = {
+		.prot_sect	= PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
+				  PMD_SECT_S,
+		.domain		= DOMAIN_IO,
+	},
+	[MT_WC_COHERENT] = {
+		.prot_sect	= PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
+				  PMD_SECT_S,
+		.domain		= DOMAIN_IO,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
@@ -353,6 +363,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_NONSHARED].prot_sect |= PMD_SECT_XN;
 			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_XN;
 			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_XN;
+			mem_types[MT_DMA_COHERENT].prot_sect |= PMD_SECT_XN;
 		}
 		if (cpu_arch >= CPU_ARCH_ARMv7 && (cr & CR_TRE)) {
 			/*
@@ -457,13 +468,24 @@ static void __init build_mem_type_table(void)
 			/* Non-cacheable Normal is XCB = 001 */
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |=
 				PMD_SECT_BUFFERED;
+			mem_types[MT_WC_COHERENT].prot_sect |=
+				PMD_SECT_BUFFERED;
+			mem_types[MT_DMA_COHERENT].prot_sect |=
+				PMD_SECT_BUFFERED;
 		} else {
 			/* For both ARMv6 and non-TEX-remapping ARMv7 */
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |=
 				PMD_SECT_TEX(1);
+			mem_types[MT_WC_COHERENT].prot_sect |=
+				PMD_SECT_TEX(1);
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+			mem_types[MT_DMA_COHERENT].prot_sect |=
+				PMD_SECT_TEX(1);
+#endif
 		}
 	} else {
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_BUFFERABLE;
+		mem_types[MT_WC_COHERENT].prot_sect |= PMD_SECT_BUFFERED;
 	}
 
 	for (i = 0; i < 16; i++) {
@@ -976,6 +998,8 @@ static void __init devicemaps_init(struct machine_desc *mdesc)
 		create_mapping(&map);
 	}
 
+	dma_coherent_mapping();
+
 	/*
 	 * Ask the machine support to map in the statically mapped devices.
 	 */
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

From: Russell King <rmk+kernel@arm.linux.org.uk>

Steal memory from the kernel to provide coherent DMA memory to drivers.
This avoids the problem with multiple mappings with differing attributes
on later CPUs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[m.szyprowski: rebased onto 3.1-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/include/asm/dma-mapping.h |    3 +-
 arch/arm/include/asm/mach/map.h    |    2 +
 arch/arm/include/asm/memory.h      |    7 +
 arch/arm/mm/dma-mapping.c          |  313 +++++++++++++++++++-----------------
 arch/arm/mm/init.c                 |    1 +
 arch/arm/mm/mm.h                   |    2 +
 arch/arm/mm/mmu.c                  |   24 +++
 7 files changed, 202 insertions(+), 150 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 7a21d0b..2f50659 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -199,8 +199,7 @@ int dma_mmap_coherent(struct device *, struct vm_area_struct *,
 extern void *dma_alloc_writecombine(struct device *, size_t, dma_addr_t *,
 		gfp_t);
 
-#define dma_free_writecombine(dev,size,cpu_addr,handle) \
-	dma_free_coherent(dev,size,cpu_addr,handle)
+extern void dma_free_writecombine(struct device *, size_t, void *, dma_addr_t);
 
 int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
 		void *, dma_addr_t, size_t);
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index d2fedb5..3845215 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -29,6 +29,8 @@ struct map_desc {
 #define MT_MEMORY_NONCACHED	11
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
+#define MT_DMA_COHERENT		14
+#define MT_WC_COHERENT		15
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index b8de516..334b288 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -88,6 +88,13 @@
 #define CONSISTENT_END		(0xffe00000UL)
 #define CONSISTENT_BASE		(CONSISTENT_END - CONSISTENT_DMA_SIZE)
 
+#ifndef CONSISTENT_WC_SIZE
+#define CONSISTENT_WC_SIZE	SZ_2M
+#endif
+
+#define CONSISTENT_WC_END	CONSISTENT_BASE
+#define CONSISTENT_WC_BASE	(CONSISTENT_WC_END - CONSISTENT_WC_SIZE)
+
 #else /* CONFIG_MMU */
 
 /*
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0a0a1e7..b643262 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -18,12 +18,14 @@
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
 #include <linux/highmem.h>
+#include <linux/memblock.h>
 
 #include <asm/memory.h>
 #include <asm/highmem.h>
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
+#include <asm/mach/map.h>
 
 #include "mm.h"
 
@@ -117,93 +119,128 @@ static void __dma_free_buffer(struct page *page, size_t size)
 }
 
 #ifdef CONFIG_MMU
-/* Sanity check size */
-#if (CONSISTENT_DMA_SIZE % SZ_2M)
-#error "CONSISTENT_DMA_SIZE must be multiple of 2MiB"
+/* Sanity check sizes */
+#if CONSISTENT_DMA_SIZE % SECTION_SIZE
+#error "CONSISTENT_DMA_SIZE must be a multiple of the section size"
+#endif
+#if CONSISTENT_WC_SIZE % SECTION_SIZE
+#error "CONSISTENT_WC_SIZE must be a multiple of the section size"
+#endif
+#if ((CONSISTENT_DMA_SIZE + CONSISTENT_WC_SIZE) % SZ_2M)
+#error "Sum of CONSISTENT_DMA_SIZE and CONSISTENT_WC_SIZE must be multiple of 2MiB"
 #endif
 
-#define CONSISTENT_OFFSET(x)	(((unsigned long)(x) - CONSISTENT_BASE) >> PAGE_SHIFT)
-#define CONSISTENT_PTE_INDEX(x) (((unsigned long)(x) - CONSISTENT_BASE) >> PGDIR_SHIFT)
-#define NUM_CONSISTENT_PTES (CONSISTENT_DMA_SIZE >> PGDIR_SHIFT)
+#include "vmregion.h"
 
-/*
- * These are the page tables (2MB each) covering uncached, DMA consistent allocations
- */
-static pte_t *consistent_pte[NUM_CONSISTENT_PTES];
+struct dma_coherent_area {
+	struct arm_vmregion_head vm;
+	unsigned long pfn;
+	unsigned int type;
+	const char *name;
+};
 
-#include "vmregion.h"
+static struct dma_coherent_area coherent_wc_head = {
+	.vm = {
+		.vm_start	= CONSISTENT_WC_BASE,
+		.vm_end		= CONSISTENT_WC_END,
+	},
+	.type = MT_WC_COHERENT,
+	.name = "WC ",
+};
 
-static struct arm_vmregion_head consistent_head = {
-	.vm_lock	= __SPIN_LOCK_UNLOCKED(&consistent_head.vm_lock),
-	.vm_list	= LIST_HEAD_INIT(consistent_head.vm_list),
-	.vm_start	= CONSISTENT_BASE,
-	.vm_end		= CONSISTENT_END,
+static struct dma_coherent_area coherent_dma_head = {
+	.vm = {
+		.vm_start	= CONSISTENT_BASE,
+		.vm_end		= CONSISTENT_END,
+	},
+	.type = MT_DMA_COHERENT,
+	.name = "DMA coherent ",
 };
 
-#ifdef CONFIG_HUGETLB_PAGE
-#error ARM Coherent DMA allocator does not (yet) support huge TLB
-#endif
+static struct dma_coherent_area *coherent_areas[2] __initdata =
+	{ &coherent_wc_head, &coherent_dma_head };
 
-/*
- * Initialise the consistent memory allocation.
- */
-static int __init consistent_init(void)
+static struct dma_coherent_area *coherent_map[2];
+#define coherent_wc_area coherent_map[0]
+#define coherent_dma_area coherent_map[1]
+
+void dma_coherent_reserve(void)
 {
-	int ret = 0;
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-	int i = 0;
-	u32 base = CONSISTENT_BASE;
+	phys_addr_t base, max_addr;
+	unsigned long size;
+	int can_share, i;
 
-	do {
-		pgd = pgd_offset(&init_mm, base);
+	if (arch_is_coherent())
+		return;
 
-		pud = pud_alloc(&init_mm, pgd, base);
-		if (!pud) {
-			printk(KERN_ERR "%s: no pud tables\n", __func__);
-			ret = -ENOMEM;
-			break;
-		}
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+	/* ARMv6: only when DMA_MEM_BUFFERABLE is enabled */
+	can_share = cpu_architecture() >= CPU_ARCH_ARMv6;
+#else
+	/* ARMv7+: WC and DMA areas have the same properties, so can share */
+	can_share = cpu_architecture() >= CPU_ARCH_ARMv7;
+#endif
+	if (can_share) {
+		coherent_wc_head.name = "DMA coherent/WC ";
+		coherent_wc_head.vm.vm_end = coherent_dma_head.vm.vm_end;
+		coherent_dma_head.vm.vm_start = coherent_dma_head.vm.vm_end;
+		coherent_dma_area = coherent_wc_area = &coherent_wc_head;
+	} else {
+		memcpy(coherent_map, coherent_areas, sizeof(coherent_map));
+	}
 
-		pmd = pmd_alloc(&init_mm, pud, base);
-		if (!pmd) {
-			printk(KERN_ERR "%s: no pmd tables\n", __func__);
-			ret = -ENOMEM;
-			break;
-		}
-		WARN_ON(!pmd_none(*pmd));
+#ifdef CONFIG_ARM_DMA_ZONE_SIZE
+	max_addr = PHYS_OFFSET + ARM_DMA_ZONE_SIZE - 1;
+#else
+	max_addr = MEMBLOCK_ALLOC_ANYWHERE;
+#endif
+	for (i = 0; i < ARRAY_SIZE(coherent_areas); i++) {
+		struct dma_coherent_area *area = coherent_areas[i];
 
-		pte = pte_alloc_kernel(pmd, base);
-		if (!pte) {
-			printk(KERN_ERR "%s: no pte tables\n", __func__);
-			ret = -ENOMEM;
-			break;
-		}
+		size = area->vm.vm_end - area->vm.vm_start;
+		if (!size)
+			continue;
 
-		consistent_pte[i++] = pte;
-		base += (1 << PGDIR_SHIFT);
-	} while (base < CONSISTENT_END);
+		spin_lock_init(&area->vm.vm_lock);
+		INIT_LIST_HEAD(&area->vm.vm_list);
 
-	return ret;
+		base = memblock_alloc_base(size, SZ_1M, max_addr);
+		memblock_free(base, size);
+		memblock_remove(base, size);
+
+		area->pfn = __phys_to_pfn(base);
+
+		pr_info("DMA: %luMiB %smemory allocated at 0x%08llx phys\n",
+			size / 1048576, area->name, (unsigned long long)base);
+	}
 }
 
-core_initcall(consistent_init);
+void __init dma_coherent_mapping(void)
+{
+	struct map_desc map[ARRAY_SIZE(coherent_areas)];
+	int nr;
 
-static void *
-__dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
+	for (nr = 0; nr < ARRAY_SIZE(map); nr++) {
+		struct dma_coherent_area *area = coherent_areas[nr];
+
+		map[nr].pfn = area->pfn;
+		map[nr].virtual = area->vm.vm_start;
+		map[nr].length = area->vm.vm_end - area->vm.vm_start;
+		map[nr].type = area->type;
+		if (map[nr].length == 0)
+			break;
+	}
+
+	iotable_init(map, nr);
+}
+
+static void *dma_alloc_area(size_t size, unsigned long *pfn, gfp_t gfp,
+	struct dma_coherent_area *area)
 {
 	struct arm_vmregion *c;
 	size_t align;
 	int bit;
 
-	if (!consistent_pte[0]) {
-		printk(KERN_ERR "%s: not initialised\n", __func__);
-		dump_stack();
-		return NULL;
-	}
-
 	/*
 	 * Align the virtual region allocation - maximum alignment is
 	 * a section size, minimum is a page size.  This helps reduce
@@ -218,45 +255,21 @@ __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
 	/*
 	 * Allocate a virtual address in the consistent mapping region.
 	 */
-	c = arm_vmregion_alloc(&consistent_head, align, size,
+	c = arm_vmregion_alloc(&area->vm, align, size,
 			    gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
-	if (c) {
-		pte_t *pte;
-		int idx = CONSISTENT_PTE_INDEX(c->vm_start);
-		u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
-
-		pte = consistent_pte[idx] + off;
-		c->vm_pages = page;
-
-		do {
-			BUG_ON(!pte_none(*pte));
-
-			set_pte_ext(pte, mk_pte(page, prot), 0);
-			page++;
-			pte++;
-			off++;
-			if (off >= PTRS_PER_PTE) {
-				off = 0;
-				pte = consistent_pte[++idx];
-			}
-		} while (size -= PAGE_SIZE);
-
-		dsb();
+	if (!c)
+		return NULL;
 
-		return (void *)c->vm_start;
-	}
-	return NULL;
+	memset((void *)c->vm_start, 0, size);
+	*pfn = area->pfn + ((c->vm_start - area->vm.vm_start) >> PAGE_SHIFT);
+	return (void *)c->vm_start;
 }
 
-static void __dma_free_remap(void *cpu_addr, size_t size)
+static void dma_free_area(void *cpu_addr, size_t size, struct dma_coherent_area *area)
 {
 	struct arm_vmregion *c;
-	unsigned long addr;
-	pte_t *ptep;
-	int idx;
-	u32 off;
 
-	c = arm_vmregion_find_remove(&consistent_head, (unsigned long)cpu_addr);
+	c = arm_vmregion_find_remove(&area->vm, (unsigned long)cpu_addr);
 	if (!c) {
 		printk(KERN_ERR "%s: trying to free invalid coherent area: %p\n",
 		       __func__, cpu_addr);
@@ -271,61 +284,62 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
 		size = c->vm_end - c->vm_start;
 	}
 
-	idx = CONSISTENT_PTE_INDEX(c->vm_start);
-	off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
-	ptep = consistent_pte[idx] + off;
-	addr = c->vm_start;
-	do {
-		pte_t pte = ptep_get_and_clear(&init_mm, addr, ptep);
-
-		ptep++;
-		addr += PAGE_SIZE;
-		off++;
-		if (off >= PTRS_PER_PTE) {
-			off = 0;
-			ptep = consistent_pte[++idx];
-		}
+	arm_vmregion_free(&area->vm, c);
+}
 
-		if (pte_none(pte) || !pte_present(pte))
-			printk(KERN_CRIT "%s: bad page in kernel page table\n",
-			       __func__);
-	} while (size -= PAGE_SIZE);
+#define nommu() (0)
 
-	flush_tlb_kernel_range(c->vm_start, c->vm_end);
+#else	/* !CONFIG_MMU */
 
-	arm_vmregion_free(&consistent_head, c);
-}
+#define dma_alloc_area(size, pfn, gfp, area)	({ *(pfn) = 0; NULL })
+#define dma_free_area(addr, size, area)		do { } while (0)
 
-#else	/* !CONFIG_MMU */
+#define nommu()	(1)
+#define coherent_wc_area NULL
+#define coherent_dma_area NULL
 
-#define __dma_alloc_remap(page, size, gfp, prot)	page_address(page)
-#define __dma_free_remap(addr, size)			do { } while (0)
+void dma_coherent_reserve(void)
+{
+}
 
 #endif	/* CONFIG_MMU */
 
 static void *
 __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
-	    pgprot_t prot)
+	    struct dma_coherent_area *area)
 {
-	struct page *page;
-	void *addr;
+	unsigned long pfn;
+	void *ret;
 
 	*handle = ~0;
 	size = PAGE_ALIGN(size);
 
-	page = __dma_alloc_buffer(dev, size, gfp);
-	if (!page)
-		return NULL;
+	if (arch_is_coherent() || nommu()) {
+		struct page *page = __dma_alloc_buffer(dev, size, gfp);
+		if (!page)
+			return NULL;
+		pfn = page_to_pfn(page);
+		ret = page_address(page);
+	} else {
+		ret = dma_alloc_area(size, &pfn, gfp, area);
+	}
 
-	if (!arch_is_coherent())
-		addr = __dma_alloc_remap(page, size, gfp, prot);
-	else
-		addr = page_address(page);
+	if (ret)
+		*handle = pfn_to_dma(dev, pfn);
 
-	if (addr)
-		*handle = pfn_to_dma(dev, page_to_pfn(page));
+	return ret;
+}
+
+static void __dma_free(struct device *dev, size_t size, void *cpu_addr,
+	dma_addr_t handle, struct dma_coherent_area *area)
+{
+	size = PAGE_ALIGN(size);
 
-	return addr;
+	if (arch_is_coherent() || nommu()) {
+		__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+	} else {
+		dma_free_area(cpu_addr, size, area);
+	}
 }
 
 /*
@@ -340,8 +354,7 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gf
 	if (dma_alloc_from_coherent(dev, size, handle, &memory))
 		return memory;
 
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_dmacoherent(pgprot_kernel));
+	return __dma_alloc(dev, size, handle, gfp, coherent_dma_area);
 }
 EXPORT_SYMBOL(dma_alloc_coherent);
 
@@ -352,13 +365,13 @@ EXPORT_SYMBOL(dma_alloc_coherent);
 void *
 dma_alloc_writecombine(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
 {
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_writecombine(pgprot_kernel));
+	return __dma_alloc(dev, size, handle, gfp, coherent_wc_area);
 }
 EXPORT_SYMBOL(dma_alloc_writecombine);
 
 static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
-		    void *cpu_addr, dma_addr_t dma_addr, size_t size)
+		    void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		    struct dma_coherent_area *area)
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
@@ -367,7 +380,7 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 
 	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 
-	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
+	c = arm_vmregion_find(&area->vm, (unsigned long)cpu_addr);
 	if (c) {
 		unsigned long off = vma->vm_pgoff;
 
@@ -390,7 +403,7 @@ int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
 		      void *cpu_addr, dma_addr_t dma_addr, size_t size)
 {
 	vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
+	return dma_mmap(dev, vma, cpu_addr, dma_addr, size, coherent_dma_area);
 }
 EXPORT_SYMBOL(dma_mmap_coherent);
 
@@ -398,7 +411,7 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
 			  void *cpu_addr, dma_addr_t dma_addr, size_t size)
 {
 	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
+	return dma_mmap(dev, vma, cpu_addr, dma_addr, size, coherent_wc_area);
 }
 EXPORT_SYMBOL(dma_mmap_writecombine);
 
@@ -413,14 +426,18 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
-	size = PAGE_ALIGN(size);
+	__dma_free(dev, size, cpu_addr, handle, coherent_dma_area);
+}
+EXPORT_SYMBOL(dma_free_coherent);
 
-	if (!arch_is_coherent())
-		__dma_free_remap(cpu_addr, size);
+void dma_free_writecombine(struct device *dev, size_t size, void *cpu_addr,
+	dma_addr_t handle)
+{
+	WARN_ON(irqs_disabled());
 
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+	__dma_free(dev, size, cpu_addr, handle, coherent_wc_area);
 }
-EXPORT_SYMBOL(dma_free_coherent);
+EXPORT_SYMBOL(dma_free_writecombine);
 
 /*
  * Make an area consistent for devices.
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 2fee782..77076a6 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -365,6 +365,7 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 
 	arm_mm_memblock_reserve();
 	arm_dt_memblock_reserve();
+	dma_coherent_reserve();
 
 	/* reserve any platform specific memblock areas */
 	if (mdesc->reserve)
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 0105667..3abaa2c 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -31,3 +31,5 @@ extern u32 arm_dma_limit;
 
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
+void dma_coherent_reserve(void);
+void dma_coherent_mapping(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 594d677..027f118 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -273,6 +273,16 @@ static struct mem_type mem_types[] = {
 		.prot_l1   = PMD_TYPE_TABLE,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_DMA_COHERENT] = {
+		.prot_sect	= PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
+				  PMD_SECT_S,
+		.domain		= DOMAIN_IO,
+	},
+	[MT_WC_COHERENT] = {
+		.prot_sect	= PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
+				  PMD_SECT_S,
+		.domain		= DOMAIN_IO,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
@@ -353,6 +363,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_NONSHARED].prot_sect |= PMD_SECT_XN;
 			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_XN;
 			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_XN;
+			mem_types[MT_DMA_COHERENT].prot_sect |= PMD_SECT_XN;
 		}
 		if (cpu_arch >= CPU_ARCH_ARMv7 && (cr & CR_TRE)) {
 			/*
@@ -457,13 +468,24 @@ static void __init build_mem_type_table(void)
 			/* Non-cacheable Normal is XCB = 001 */
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |=
 				PMD_SECT_BUFFERED;
+			mem_types[MT_WC_COHERENT].prot_sect |=
+				PMD_SECT_BUFFERED;
+			mem_types[MT_DMA_COHERENT].prot_sect |=
+				PMD_SECT_BUFFERED;
 		} else {
 			/* For both ARMv6 and non-TEX-remapping ARMv7 */
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |=
 				PMD_SECT_TEX(1);
+			mem_types[MT_WC_COHERENT].prot_sect |=
+				PMD_SECT_TEX(1);
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+			mem_types[MT_DMA_COHERENT].prot_sect |=
+				PMD_SECT_TEX(1);
+#endif
 		}
 	} else {
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_BUFFERABLE;
+		mem_types[MT_WC_COHERENT].prot_sect |= PMD_SECT_BUFFERED;
 	}
 
 	for (i = 0; i < 16; i++) {
@@ -976,6 +998,8 @@ static void __init devicemaps_init(struct machine_desc *mdesc)
 		create_mapping(&map);
 	}
 
+	dma_coherent_mapping();
+
 	/*
 	 * Ask the machine support to map in the statically mapped devices.
 	 */
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

From: Russell King <rmk+kernel@arm.linux.org.uk>

Steal memory from the kernel to provide coherent DMA memory to drivers.
This avoids the problem with multiple mappings with differing attributes
on later CPUs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[m.szyprowski: rebased onto 3.1-rc1]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/include/asm/dma-mapping.h |    3 +-
 arch/arm/include/asm/mach/map.h    |    2 +
 arch/arm/include/asm/memory.h      |    7 +
 arch/arm/mm/dma-mapping.c          |  313 +++++++++++++++++++-----------------
 arch/arm/mm/init.c                 |    1 +
 arch/arm/mm/mm.h                   |    2 +
 arch/arm/mm/mmu.c                  |   24 +++
 7 files changed, 202 insertions(+), 150 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 7a21d0b..2f50659 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -199,8 +199,7 @@ int dma_mmap_coherent(struct device *, struct vm_area_struct *,
 extern void *dma_alloc_writecombine(struct device *, size_t, dma_addr_t *,
 		gfp_t);
 
-#define dma_free_writecombine(dev,size,cpu_addr,handle) \
-	dma_free_coherent(dev,size,cpu_addr,handle)
+extern void dma_free_writecombine(struct device *, size_t, void *, dma_addr_t);
 
 int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
 		void *, dma_addr_t, size_t);
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index d2fedb5..3845215 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -29,6 +29,8 @@ struct map_desc {
 #define MT_MEMORY_NONCACHED	11
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
+#define MT_DMA_COHERENT		14
+#define MT_WC_COHERENT		15
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index b8de516..334b288 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -88,6 +88,13 @@
 #define CONSISTENT_END		(0xffe00000UL)
 #define CONSISTENT_BASE		(CONSISTENT_END - CONSISTENT_DMA_SIZE)
 
+#ifndef CONSISTENT_WC_SIZE
+#define CONSISTENT_WC_SIZE	SZ_2M
+#endif
+
+#define CONSISTENT_WC_END	CONSISTENT_BASE
+#define CONSISTENT_WC_BASE	(CONSISTENT_WC_END - CONSISTENT_WC_SIZE)
+
 #else /* CONFIG_MMU */
 
 /*
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0a0a1e7..b643262 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -18,12 +18,14 @@
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
 #include <linux/highmem.h>
+#include <linux/memblock.h>
 
 #include <asm/memory.h>
 #include <asm/highmem.h>
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
+#include <asm/mach/map.h>
 
 #include "mm.h"
 
@@ -117,93 +119,128 @@ static void __dma_free_buffer(struct page *page, size_t size)
 }
 
 #ifdef CONFIG_MMU
-/* Sanity check size */
-#if (CONSISTENT_DMA_SIZE % SZ_2M)
-#error "CONSISTENT_DMA_SIZE must be multiple of 2MiB"
+/* Sanity check sizes */
+#if CONSISTENT_DMA_SIZE % SECTION_SIZE
+#error "CONSISTENT_DMA_SIZE must be a multiple of the section size"
+#endif
+#if CONSISTENT_WC_SIZE % SECTION_SIZE
+#error "CONSISTENT_WC_SIZE must be a multiple of the section size"
+#endif
+#if ((CONSISTENT_DMA_SIZE + CONSISTENT_WC_SIZE) % SZ_2M)
+#error "Sum of CONSISTENT_DMA_SIZE and CONSISTENT_WC_SIZE must be multiple of 2MiB"
 #endif
 
-#define CONSISTENT_OFFSET(x)	(((unsigned long)(x) - CONSISTENT_BASE) >> PAGE_SHIFT)
-#define CONSISTENT_PTE_INDEX(x) (((unsigned long)(x) - CONSISTENT_BASE) >> PGDIR_SHIFT)
-#define NUM_CONSISTENT_PTES (CONSISTENT_DMA_SIZE >> PGDIR_SHIFT)
+#include "vmregion.h"
 
-/*
- * These are the page tables (2MB each) covering uncached, DMA consistent allocations
- */
-static pte_t *consistent_pte[NUM_CONSISTENT_PTES];
+struct dma_coherent_area {
+	struct arm_vmregion_head vm;
+	unsigned long pfn;
+	unsigned int type;
+	const char *name;
+};
 
-#include "vmregion.h"
+static struct dma_coherent_area coherent_wc_head = {
+	.vm = {
+		.vm_start	= CONSISTENT_WC_BASE,
+		.vm_end		= CONSISTENT_WC_END,
+	},
+	.type = MT_WC_COHERENT,
+	.name = "WC ",
+};
 
-static struct arm_vmregion_head consistent_head = {
-	.vm_lock	= __SPIN_LOCK_UNLOCKED(&consistent_head.vm_lock),
-	.vm_list	= LIST_HEAD_INIT(consistent_head.vm_list),
-	.vm_start	= CONSISTENT_BASE,
-	.vm_end		= CONSISTENT_END,
+static struct dma_coherent_area coherent_dma_head = {
+	.vm = {
+		.vm_start	= CONSISTENT_BASE,
+		.vm_end		= CONSISTENT_END,
+	},
+	.type = MT_DMA_COHERENT,
+	.name = "DMA coherent ",
 };
 
-#ifdef CONFIG_HUGETLB_PAGE
-#error ARM Coherent DMA allocator does not (yet) support huge TLB
-#endif
+static struct dma_coherent_area *coherent_areas[2] __initdata =
+	{ &coherent_wc_head, &coherent_dma_head };
 
-/*
- * Initialise the consistent memory allocation.
- */
-static int __init consistent_init(void)
+static struct dma_coherent_area *coherent_map[2];
+#define coherent_wc_area coherent_map[0]
+#define coherent_dma_area coherent_map[1]
+
+void dma_coherent_reserve(void)
 {
-	int ret = 0;
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-	int i = 0;
-	u32 base = CONSISTENT_BASE;
+	phys_addr_t base, max_addr;
+	unsigned long size;
+	int can_share, i;
 
-	do {
-		pgd = pgd_offset(&init_mm, base);
+	if (arch_is_coherent())
+		return;
 
-		pud = pud_alloc(&init_mm, pgd, base);
-		if (!pud) {
-			printk(KERN_ERR "%s: no pud tables\n", __func__);
-			ret = -ENOMEM;
-			break;
-		}
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+	/* ARMv6: only when DMA_MEM_BUFFERABLE is enabled */
+	can_share = cpu_architecture() >= CPU_ARCH_ARMv6;
+#else
+	/* ARMv7+: WC and DMA areas have the same properties, so can share */
+	can_share = cpu_architecture() >= CPU_ARCH_ARMv7;
+#endif
+	if (can_share) {
+		coherent_wc_head.name = "DMA coherent/WC ";
+		coherent_wc_head.vm.vm_end = coherent_dma_head.vm.vm_end;
+		coherent_dma_head.vm.vm_start = coherent_dma_head.vm.vm_end;
+		coherent_dma_area = coherent_wc_area = &coherent_wc_head;
+	} else {
+		memcpy(coherent_map, coherent_areas, sizeof(coherent_map));
+	}
 
-		pmd = pmd_alloc(&init_mm, pud, base);
-		if (!pmd) {
-			printk(KERN_ERR "%s: no pmd tables\n", __func__);
-			ret = -ENOMEM;
-			break;
-		}
-		WARN_ON(!pmd_none(*pmd));
+#ifdef CONFIG_ARM_DMA_ZONE_SIZE
+	max_addr = PHYS_OFFSET + ARM_DMA_ZONE_SIZE - 1;
+#else
+	max_addr = MEMBLOCK_ALLOC_ANYWHERE;
+#endif
+	for (i = 0; i < ARRAY_SIZE(coherent_areas); i++) {
+		struct dma_coherent_area *area = coherent_areas[i];
 
-		pte = pte_alloc_kernel(pmd, base);
-		if (!pte) {
-			printk(KERN_ERR "%s: no pte tables\n", __func__);
-			ret = -ENOMEM;
-			break;
-		}
+		size = area->vm.vm_end - area->vm.vm_start;
+		if (!size)
+			continue;
 
-		consistent_pte[i++] = pte;
-		base += (1 << PGDIR_SHIFT);
-	} while (base < CONSISTENT_END);
+		spin_lock_init(&area->vm.vm_lock);
+		INIT_LIST_HEAD(&area->vm.vm_list);
 
-	return ret;
+		base = memblock_alloc_base(size, SZ_1M, max_addr);
+		memblock_free(base, size);
+		memblock_remove(base, size);
+
+		area->pfn = __phys_to_pfn(base);
+
+		pr_info("DMA: %luMiB %smemory allocated at 0x%08llx phys\n",
+			size / 1048576, area->name, (unsigned long long)base);
+	}
 }
 
-core_initcall(consistent_init);
+void __init dma_coherent_mapping(void)
+{
+	struct map_desc map[ARRAY_SIZE(coherent_areas)];
+	int nr;
 
-static void *
-__dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
+	for (nr = 0; nr < ARRAY_SIZE(map); nr++) {
+		struct dma_coherent_area *area = coherent_areas[nr];
+
+		map[nr].pfn = area->pfn;
+		map[nr].virtual = area->vm.vm_start;
+		map[nr].length = area->vm.vm_end - area->vm.vm_start;
+		map[nr].type = area->type;
+		if (map[nr].length == 0)
+			break;
+	}
+
+	iotable_init(map, nr);
+}
+
+static void *dma_alloc_area(size_t size, unsigned long *pfn, gfp_t gfp,
+	struct dma_coherent_area *area)
 {
 	struct arm_vmregion *c;
 	size_t align;
 	int bit;
 
-	if (!consistent_pte[0]) {
-		printk(KERN_ERR "%s: not initialised\n", __func__);
-		dump_stack();
-		return NULL;
-	}
-
 	/*
 	 * Align the virtual region allocation - maximum alignment is
 	 * a section size, minimum is a page size.  This helps reduce
@@ -218,45 +255,21 @@ __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
 	/*
 	 * Allocate a virtual address in the consistent mapping region.
 	 */
-	c = arm_vmregion_alloc(&consistent_head, align, size,
+	c = arm_vmregion_alloc(&area->vm, align, size,
 			    gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
-	if (c) {
-		pte_t *pte;
-		int idx = CONSISTENT_PTE_INDEX(c->vm_start);
-		u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
-
-		pte = consistent_pte[idx] + off;
-		c->vm_pages = page;
-
-		do {
-			BUG_ON(!pte_none(*pte));
-
-			set_pte_ext(pte, mk_pte(page, prot), 0);
-			page++;
-			pte++;
-			off++;
-			if (off >= PTRS_PER_PTE) {
-				off = 0;
-				pte = consistent_pte[++idx];
-			}
-		} while (size -= PAGE_SIZE);
-
-		dsb();
+	if (!c)
+		return NULL;
 
-		return (void *)c->vm_start;
-	}
-	return NULL;
+	memset((void *)c->vm_start, 0, size);
+	*pfn = area->pfn + ((c->vm_start - area->vm.vm_start) >> PAGE_SHIFT);
+	return (void *)c->vm_start;
 }
 
-static void __dma_free_remap(void *cpu_addr, size_t size)
+static void dma_free_area(void *cpu_addr, size_t size, struct dma_coherent_area *area)
 {
 	struct arm_vmregion *c;
-	unsigned long addr;
-	pte_t *ptep;
-	int idx;
-	u32 off;
 
-	c = arm_vmregion_find_remove(&consistent_head, (unsigned long)cpu_addr);
+	c = arm_vmregion_find_remove(&area->vm, (unsigned long)cpu_addr);
 	if (!c) {
 		printk(KERN_ERR "%s: trying to free invalid coherent area: %p\n",
 		       __func__, cpu_addr);
@@ -271,61 +284,62 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
 		size = c->vm_end - c->vm_start;
 	}
 
-	idx = CONSISTENT_PTE_INDEX(c->vm_start);
-	off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
-	ptep = consistent_pte[idx] + off;
-	addr = c->vm_start;
-	do {
-		pte_t pte = ptep_get_and_clear(&init_mm, addr, ptep);
-
-		ptep++;
-		addr += PAGE_SIZE;
-		off++;
-		if (off >= PTRS_PER_PTE) {
-			off = 0;
-			ptep = consistent_pte[++idx];
-		}
+	arm_vmregion_free(&area->vm, c);
+}
 
-		if (pte_none(pte) || !pte_present(pte))
-			printk(KERN_CRIT "%s: bad page in kernel page table\n",
-			       __func__);
-	} while (size -= PAGE_SIZE);
+#define nommu() (0)
 
-	flush_tlb_kernel_range(c->vm_start, c->vm_end);
+#else	/* !CONFIG_MMU */
 
-	arm_vmregion_free(&consistent_head, c);
-}
+#define dma_alloc_area(size, pfn, gfp, area)	({ *(pfn) = 0; NULL })
+#define dma_free_area(addr, size, area)		do { } while (0)
 
-#else	/* !CONFIG_MMU */
+#define nommu()	(1)
+#define coherent_wc_area NULL
+#define coherent_dma_area NULL
 
-#define __dma_alloc_remap(page, size, gfp, prot)	page_address(page)
-#define __dma_free_remap(addr, size)			do { } while (0)
+void dma_coherent_reserve(void)
+{
+}
 
 #endif	/* CONFIG_MMU */
 
 static void *
 __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
-	    pgprot_t prot)
+	    struct dma_coherent_area *area)
 {
-	struct page *page;
-	void *addr;
+	unsigned long pfn;
+	void *ret;
 
 	*handle = ~0;
 	size = PAGE_ALIGN(size);
 
-	page = __dma_alloc_buffer(dev, size, gfp);
-	if (!page)
-		return NULL;
+	if (arch_is_coherent() || nommu()) {
+		struct page *page = __dma_alloc_buffer(dev, size, gfp);
+		if (!page)
+			return NULL;
+		pfn = page_to_pfn(page);
+		ret = page_address(page);
+	} else {
+		ret = dma_alloc_area(size, &pfn, gfp, area);
+	}
 
-	if (!arch_is_coherent())
-		addr = __dma_alloc_remap(page, size, gfp, prot);
-	else
-		addr = page_address(page);
+	if (ret)
+		*handle = pfn_to_dma(dev, pfn);
 
-	if (addr)
-		*handle = pfn_to_dma(dev, page_to_pfn(page));
+	return ret;
+}
+
+static void __dma_free(struct device *dev, size_t size, void *cpu_addr,
+	dma_addr_t handle, struct dma_coherent_area *area)
+{
+	size = PAGE_ALIGN(size);
 
-	return addr;
+	if (arch_is_coherent() || nommu()) {
+		__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+	} else {
+		dma_free_area(cpu_addr, size, area);
+	}
 }
 
 /*
@@ -340,8 +354,7 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gf
 	if (dma_alloc_from_coherent(dev, size, handle, &memory))
 		return memory;
 
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_dmacoherent(pgprot_kernel));
+	return __dma_alloc(dev, size, handle, gfp, coherent_dma_area);
 }
 EXPORT_SYMBOL(dma_alloc_coherent);
 
@@ -352,13 +365,13 @@ EXPORT_SYMBOL(dma_alloc_coherent);
 void *
 dma_alloc_writecombine(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
 {
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_writecombine(pgprot_kernel));
+	return __dma_alloc(dev, size, handle, gfp, coherent_wc_area);
 }
 EXPORT_SYMBOL(dma_alloc_writecombine);
 
 static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
-		    void *cpu_addr, dma_addr_t dma_addr, size_t size)
+		    void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		    struct dma_coherent_area *area)
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
@@ -367,7 +380,7 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 
 	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 
-	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
+	c = arm_vmregion_find(&area->vm, (unsigned long)cpu_addr);
 	if (c) {
 		unsigned long off = vma->vm_pgoff;
 
@@ -390,7 +403,7 @@ int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
 		      void *cpu_addr, dma_addr_t dma_addr, size_t size)
 {
 	vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
+	return dma_mmap(dev, vma, cpu_addr, dma_addr, size, coherent_dma_area);
 }
 EXPORT_SYMBOL(dma_mmap_coherent);
 
@@ -398,7 +411,7 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
 			  void *cpu_addr, dma_addr_t dma_addr, size_t size)
 {
 	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
+	return dma_mmap(dev, vma, cpu_addr, dma_addr, size, coherent_wc_area);
 }
 EXPORT_SYMBOL(dma_mmap_writecombine);
 
@@ -413,14 +426,18 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
-	size = PAGE_ALIGN(size);
+	__dma_free(dev, size, cpu_addr, handle, coherent_dma_area);
+}
+EXPORT_SYMBOL(dma_free_coherent);
 
-	if (!arch_is_coherent())
-		__dma_free_remap(cpu_addr, size);
+void dma_free_writecombine(struct device *dev, size_t size, void *cpu_addr,
+	dma_addr_t handle)
+{
+	WARN_ON(irqs_disabled());
 
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+	__dma_free(dev, size, cpu_addr, handle, coherent_wc_area);
 }
-EXPORT_SYMBOL(dma_free_coherent);
+EXPORT_SYMBOL(dma_free_writecombine);
 
 /*
  * Make an area consistent for devices.
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 2fee782..77076a6 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -365,6 +365,7 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 
 	arm_mm_memblock_reserve();
 	arm_dt_memblock_reserve();
+	dma_coherent_reserve();
 
 	/* reserve any platform specific memblock areas */
 	if (mdesc->reserve)
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 0105667..3abaa2c 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -31,3 +31,5 @@ extern u32 arm_dma_limit;
 
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
+void dma_coherent_reserve(void);
+void dma_coherent_mapping(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 594d677..027f118 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -273,6 +273,16 @@ static struct mem_type mem_types[] = {
 		.prot_l1   = PMD_TYPE_TABLE,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_DMA_COHERENT] = {
+		.prot_sect	= PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
+				  PMD_SECT_S,
+		.domain		= DOMAIN_IO,
+	},
+	[MT_WC_COHERENT] = {
+		.prot_sect	= PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
+				  PMD_SECT_S,
+		.domain		= DOMAIN_IO,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
@@ -353,6 +363,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_NONSHARED].prot_sect |= PMD_SECT_XN;
 			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_XN;
 			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_XN;
+			mem_types[MT_DMA_COHERENT].prot_sect |= PMD_SECT_XN;
 		}
 		if (cpu_arch >= CPU_ARCH_ARMv7 && (cr & CR_TRE)) {
 			/*
@@ -457,13 +468,24 @@ static void __init build_mem_type_table(void)
 			/* Non-cacheable Normal is XCB = 001 */
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |=
 				PMD_SECT_BUFFERED;
+			mem_types[MT_WC_COHERENT].prot_sect |=
+				PMD_SECT_BUFFERED;
+			mem_types[MT_DMA_COHERENT].prot_sect |=
+				PMD_SECT_BUFFERED;
 		} else {
 			/* For both ARMv6 and non-TEX-remapping ARMv7 */
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |=
 				PMD_SECT_TEX(1);
+			mem_types[MT_WC_COHERENT].prot_sect |=
+				PMD_SECT_TEX(1);
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+			mem_types[MT_DMA_COHERENT].prot_sect |=
+				PMD_SECT_TEX(1);
+#endif
 		}
 	} else {
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_BUFFERABLE;
+		mem_types[MT_WC_COHERENT].prot_sect |= PMD_SECT_BUFFERED;
 	}
 
 	for (i = 0; i < 16; i++) {
@@ -976,6 +998,8 @@ static void __init devicemaps_init(struct machine_desc *mdesc)
 		create_mapping(&map);
 	}
 
+	dma_coherent_mapping();
+
 	/*
 	 * Ask the machine support to map in the statically mapped devices.
 	 */
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
  2011-08-12 10:58 ` Marek Szyprowski
  (?)
@ 2011-08-12 10:58   ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special memory area which is
exclusive from system memory to avoid remapping page attributes what might
be not allowed in atomic context on some systems. If CMA has been disabled
then all DMA allocations are performed from this area.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                      |    1 +
 arch/arm/include/asm/device.h         |    3 +
 arch/arm/include/asm/dma-contiguous.h |   33 +++++++
 arch/arm/include/asm/mach/map.h       |    5 +-
 arch/arm/mm/dma-mapping.c             |  169 +++++++++++++++++++++++++--------
 arch/arm/mm/init.c                    |    5 +-
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++++--
 8 files changed, 196 insertions(+), 52 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2c71a8f..20fa729 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 9f390ce..942913e 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -10,6 +10,9 @@ struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
+#ifdef CONFIG_CMA
+	struct cma *cma_area;
+#endif
 };
 
 struct pdev_archdata {
diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..99bf7c8
--- /dev/null
+++ b/arch/arm/include/asm/dma-contiguous.h
@@ -0,0 +1,33 @@
+#ifndef ASMARM_DMA_CONTIGUOUS_H
+#define ASMARM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+#define MAX_CMA_AREAS	(8)
+
+void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
+
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev->archdata.cma_area)
+		return dev->archdata.cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	dev->archdata.cma_area = cma;
+}
+
+#else
+
+#define MAX_CMA_AREAS	(0)
+
+#endif
+#endif
+#endif
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index 3845215..5982a83 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -29,8 +29,9 @@ struct map_desc {
 #define MT_MEMORY_NONCACHED	11
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
-#define MT_DMA_COHERENT		14
-#define MT_WC_COHERENT		15
+#define MT_MEMORY_DMA_READY	14
+#define MT_DMA_COHERENT		15
+#define MT_WC_COHERENT		16
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index b643262..63175d1 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,6 +17,7 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
 #include <linux/memblock.h>
 
@@ -26,6 +27,7 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 #include <asm/mach/map.h>
+#include <asm/dma-contiguous.h>
 
 #include "mm.h"
 
@@ -56,6 +58,24 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+static struct page *__dma_alloc_system_pages(size_t count, gfp_t gfp,
+					     unsigned long order)
+{
+	struct page *page, *p, *e;
+
+	page = alloc_pages(gfp, order);
+	if (!page)
+		return NULL;
+
+	/*
+	 * Now split the huge page and free the excess pages
+	 */
+	split_page(page, order);
+	for (p = page + count, e = page + (1 << order); p < e; p++)
+		__free_page(p);
+	return page;
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
@@ -63,7 +83,8 @@ static u64 get_coherent_dma_mask(struct device *dev)
 static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gfp)
 {
 	unsigned long order = get_order(size);
-	struct page *page, *p, *e;
+	size_t count = size >> PAGE_SHIFT;
+	struct page *page;
 	void *ptr;
 	u64 mask = get_coherent_dma_mask(dev);
 
@@ -82,16 +103,16 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	if (mask < 0xffffffffULL)
 		gfp |= GFP_DMA;
 
-	page = alloc_pages(gfp, order);
-	if (!page)
-		return NULL;
-
 	/*
-	 * Now split the huge page and free the excess pages
+	 * Allocate contiguous memory
 	 */
-	split_page(page, order);
-	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
-		__free_page(p);
+	if (cma_available())
+		page = dma_alloc_from_contiguous(dev, count, order);
+	else
+		page = __dma_alloc_system_pages(count, gfp, order);
+
+	if (!page)
+		return NULL;
 
 	/*
 	 * Ensure that the allocated pages are zeroed, and that any data
@@ -108,7 +129,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 /*
  * Free a DMA buffer.  'size' must be page aligned.
  */
-static void __dma_free_buffer(struct page *page, size_t size)
+static void __dma_free_system_buffer(struct page *page, size_t size)
 {
 	struct page *e = page + (size >> PAGE_SHIFT);
 
@@ -136,6 +157,7 @@ struct dma_coherent_area {
 	struct arm_vmregion_head vm;
 	unsigned long pfn;
 	unsigned int type;
+	pgprot_t prot;
 	const char *name;
 };
 
@@ -232,6 +254,55 @@ void __init dma_coherent_mapping(void)
 	}
 
 	iotable_init(map, nr);
+	coherent_dma_area->prot = pgprot_dmacoherent(pgprot_kernel);
+	coherent_wc_area->prot = pgprot_writecombine(pgprot_kernel);
+}
+
+struct dma_contiguous_early_reserve {
+	phys_addr_t base;
+	unsigned long size;
+};
+
+static struct dma_contiguous_early_reserve
+dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
+		phys_addr_t start = dma_mmu_remap[i].base;
+		phys_addr_t end = start + dma_mmu_remap[i].size;
+		struct map_desc map;
+		unsigned long addr;
+
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
+		if (start >= end)
+			return;
+
+		map.pfn = __phys_to_pfn(start);
+		map.virtual = __phys_to_virt(start);
+		map.length = end - start;
+		map.type = MT_MEMORY_DMA_READY;
+
+		/*
+		 * Clear previous low-memory mapping
+		 */
+		for (addr = __phys_to_virt(start); addr < __phys_to_virt(end);
+		     addr += PGDIR_SIZE)
+			pmd_clear(pmd_off_k(addr));
+
+		iotable_init(&map, 1);
+	}
 }
 
 static void *dma_alloc_area(size_t size, unsigned long *pfn, gfp_t gfp,
@@ -289,10 +360,34 @@ static void dma_free_area(void *cpu_addr, size_t size, struct dma_coherent_area
 
 #define nommu() (0)
 
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	struct page *page = virt_to_page(addr);
+	pgprot_t prot = *(pgprot_t *)data;
+
+	set_pte_ext(pte, mk_pte(page, prot), 0);
+	return 0;
+}
+
+static void dma_remap_area(struct page *page, size_t size, pgprot_t prot)
+{
+	unsigned long start = (unsigned long) page_address(page);
+	unsigned end = start + size;
+
+	if (arch_is_coherent())
+		return;
+
+	apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
+	dsb();
+	flush_tlb_kernel_range(start, end);
+}
+
 #else	/* !CONFIG_MMU */
 
 #define dma_alloc_area(size, pfn, gfp, area)	({ *(pfn) = 0; NULL })
 #define dma_free_area(addr, size, area)		do { } while (0)
+#define dma_remap_area(page, size, prot)	do { } while (0)
 
 #define nommu()	(1)
 #define coherent_wc_area NULL
@@ -308,19 +403,27 @@ static void *
 __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
 	    struct dma_coherent_area *area)
 {
-	unsigned long pfn;
-	void *ret;
+	unsigned long pfn = 0;
+	void *ret = NULL;
 
 	*handle = ~0;
 	size = PAGE_ALIGN(size);
 
-	if (arch_is_coherent() || nommu()) {
+	if (arch_is_coherent() || nommu() ||
+	   (cma_available() && !(gfp & GFP_ATOMIC))) {
+		/*
+		 * Allocate from system or CMA pages
+		 */
 		struct page *page = __dma_alloc_buffer(dev, size, gfp);
 		if (!page)
 			return NULL;
+		dma_remap_area(page, size, area->prot);
 		pfn = page_to_pfn(page);
 		ret = page_address(page);
 	} else {
+		/*
+		 * Allocate from reserved DMA coherent/wc area
+		 */
 		ret = dma_alloc_area(size, &pfn, gfp, area);
 	}
 
@@ -333,12 +436,19 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
 static void __dma_free(struct device *dev, size_t size, void *cpu_addr,
 	dma_addr_t handle, struct dma_coherent_area *area)
 {
+	struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
 	size = PAGE_ALIGN(size);
 
 	if (arch_is_coherent() || nommu()) {
-		__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
-	} else {
+		WARN_ON(irqs_disabled());
+		__dma_free_system_buffer(page, size);
+	} else if ((unsigned long)cpu_addr >= area->vm.vm_start &&
+		   (unsigned long)cpu_addr < area->vm.vm_end) {
 		dma_free_area(cpu_addr, size, area);
+	} else {
+		WARN_ON(irqs_disabled());
+		dma_remap_area(page, size, pgprot_kernel);
+		dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
 	}
 }
 
@@ -375,27 +485,12 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
-	unsigned long user_size, kern_size;
-	struct arm_vmregion *c;
-
-	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-
-	c = arm_vmregion_find(&area->vm, (unsigned long)cpu_addr);
-	if (c) {
-		unsigned long off = vma->vm_pgoff;
-
-		kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;
-
-		if (off < kern_size &&
-		    user_size <= (kern_size - off)) {
-			ret = remap_pfn_range(vma, vma->vm_start,
-					      page_to_pfn(c->vm_pages) + off,
-					      user_size << PAGE_SHIFT,
-					      vma->vm_page_prot);
-		}
-	}
+	unsigned long pfn = dma_to_pfn(dev, dma_addr);
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn + vma->vm_pgoff,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
 #endif	/* CONFIG_MMU */
-
 	return ret;
 }
 
@@ -421,8 +516,6 @@ EXPORT_SYMBOL(dma_mmap_writecombine);
  */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
-
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
@@ -433,8 +526,6 @@ EXPORT_SYMBOL(dma_free_coherent);
 void dma_free_writecombine(struct device *dev, size_t size, void *cpu_addr,
 	dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
-
 	__dma_free(dev, size, cpu_addr, handle, coherent_wc_area);
 }
 EXPORT_SYMBOL(dma_free_writecombine);
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 77076a6..0f2dbb8 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -20,6 +20,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -365,12 +366,14 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 
 	arm_mm_memblock_reserve();
 	arm_dt_memblock_reserve();
-	dma_coherent_reserve();
 
 	/* reserve any platform specific memblock areas */
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	dma_coherent_reserve();
+	dma_contiguous_reserve();
+
 	memblock_analyze();
 	memblock_dump_all();
 }
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 3abaa2c..46101be 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -29,7 +29,10 @@ extern u32 arm_dma_limit;
 #define arm_dma_limit ((u32)~0)
 #endif
 
+extern phys_addr_t arm_lowmem_limit;
+
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
 void dma_coherent_reserve(void);
 void dma_coherent_mapping(void);
+void dma_contiguous_remap(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 027f118..9dc18d4 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -273,6 +273,11 @@ static struct mem_type mem_types[] = {
 		.prot_l1   = PMD_TYPE_TABLE,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_DMA_READY] = {
+		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,
+		.prot_l1   = PMD_TYPE_TABLE,
+		.domain    = DOMAIN_KERNEL,
+	},
 	[MT_DMA_COHERENT] = {
 		.prot_sect	= PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
 				  PMD_SECT_S,
@@ -425,6 +430,7 @@ static void __init build_mem_type_table(void)
 	if (arch_is_coherent() && cpu_is_xsc3()) {
 		mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+		mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 	}
@@ -454,6 +460,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 		}
@@ -504,6 +511,7 @@ static void __init build_mem_type_table(void)
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
 	mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+	mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
 	mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
 	mem_types[MT_ROM].prot_sect |= cp->pmd;
 
@@ -583,7 +591,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr,
 	 * L1 entries, whereas PGDs refer to a group of L1 entries making
 	 * up one logical pointer to an L2 table.
 	 */
-	if (((addr | end | phys) & ~SECTION_MASK) == 0) {
+	if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) {
 		pmd_t *p = pmd;
 
 		if (addr & SECTION_SIZE)
@@ -779,7 +787,7 @@ static int __init early_vmalloc(char *arg)
 }
 early_param("vmalloc", early_vmalloc);
 
-static phys_addr_t lowmem_limit __initdata = 0;
+phys_addr_t arm_lowmem_limit __initdata = 0;
 
 void __init sanity_check_meminfo(void)
 {
@@ -848,8 +856,8 @@ void __init sanity_check_meminfo(void)
 			bank->size = newsize;
 		}
 #endif
-		if (!bank->highmem && bank->start + bank->size > lowmem_limit)
-			lowmem_limit = bank->start + bank->size;
+		if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit)
+			arm_lowmem_limit = bank->start + bank->size;
 
 		j++;
 	}
@@ -874,7 +882,7 @@ void __init sanity_check_meminfo(void)
 	}
 #endif
 	meminfo.nr_banks = j;
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 }
 
 static inline void prepare_page_table(void)
@@ -899,8 +907,8 @@ static inline void prepare_page_table(void)
 	 * Find the end of the first block of lowmem.
 	 */
 	end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
-	if (end >= lowmem_limit)
-		end = lowmem_limit;
+	if (end >= arm_lowmem_limit)
+		end = arm_lowmem_limit;
 
 	/*
 	 * Clear out all the kernel space mappings, except for the first
@@ -1034,8 +1042,8 @@ static void __init map_lowmem(void)
 		phys_addr_t end = start + reg->size;
 		struct map_desc map;
 
-		if (end > lowmem_limit)
-			end = lowmem_limit;
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
 		if (start >= end)
 			break;
 
@@ -1056,11 +1064,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 
 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
 
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special memory area which is
exclusive from system memory to avoid remapping page attributes what might
be not allowed in atomic context on some systems. If CMA has been disabled
then all DMA allocations are performed from this area.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                      |    1 +
 arch/arm/include/asm/device.h         |    3 +
 arch/arm/include/asm/dma-contiguous.h |   33 +++++++
 arch/arm/include/asm/mach/map.h       |    5 +-
 arch/arm/mm/dma-mapping.c             |  169 +++++++++++++++++++++++++--------
 arch/arm/mm/init.c                    |    5 +-
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++++--
 8 files changed, 196 insertions(+), 52 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2c71a8f..20fa729 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 9f390ce..942913e 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -10,6 +10,9 @@ struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
+#ifdef CONFIG_CMA
+	struct cma *cma_area;
+#endif
 };
 
 struct pdev_archdata {
diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..99bf7c8
--- /dev/null
+++ b/arch/arm/include/asm/dma-contiguous.h
@@ -0,0 +1,33 @@
+#ifndef ASMARM_DMA_CONTIGUOUS_H
+#define ASMARM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+#define MAX_CMA_AREAS	(8)
+
+void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
+
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev->archdata.cma_area)
+		return dev->archdata.cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	dev->archdata.cma_area = cma;
+}
+
+#else
+
+#define MAX_CMA_AREAS	(0)
+
+#endif
+#endif
+#endif
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index 3845215..5982a83 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -29,8 +29,9 @@ struct map_desc {
 #define MT_MEMORY_NONCACHED	11
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
-#define MT_DMA_COHERENT		14
-#define MT_WC_COHERENT		15
+#define MT_MEMORY_DMA_READY	14
+#define MT_DMA_COHERENT		15
+#define MT_WC_COHERENT		16
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index b643262..63175d1 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,6 +17,7 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
 #include <linux/memblock.h>
 
@@ -26,6 +27,7 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 #include <asm/mach/map.h>
+#include <asm/dma-contiguous.h>
 
 #include "mm.h"
 
@@ -56,6 +58,24 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+static struct page *__dma_alloc_system_pages(size_t count, gfp_t gfp,
+					     unsigned long order)
+{
+	struct page *page, *p, *e;
+
+	page = alloc_pages(gfp, order);
+	if (!page)
+		return NULL;
+
+	/*
+	 * Now split the huge page and free the excess pages
+	 */
+	split_page(page, order);
+	for (p = page + count, e = page + (1 << order); p < e; p++)
+		__free_page(p);
+	return page;
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
@@ -63,7 +83,8 @@ static u64 get_coherent_dma_mask(struct device *dev)
 static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gfp)
 {
 	unsigned long order = get_order(size);
-	struct page *page, *p, *e;
+	size_t count = size >> PAGE_SHIFT;
+	struct page *page;
 	void *ptr;
 	u64 mask = get_coherent_dma_mask(dev);
 
@@ -82,16 +103,16 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	if (mask < 0xffffffffULL)
 		gfp |= GFP_DMA;
 
-	page = alloc_pages(gfp, order);
-	if (!page)
-		return NULL;
-
 	/*
-	 * Now split the huge page and free the excess pages
+	 * Allocate contiguous memory
 	 */
-	split_page(page, order);
-	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
-		__free_page(p);
+	if (cma_available())
+		page = dma_alloc_from_contiguous(dev, count, order);
+	else
+		page = __dma_alloc_system_pages(count, gfp, order);
+
+	if (!page)
+		return NULL;
 
 	/*
 	 * Ensure that the allocated pages are zeroed, and that any data
@@ -108,7 +129,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 /*
  * Free a DMA buffer.  'size' must be page aligned.
  */
-static void __dma_free_buffer(struct page *page, size_t size)
+static void __dma_free_system_buffer(struct page *page, size_t size)
 {
 	struct page *e = page + (size >> PAGE_SHIFT);
 
@@ -136,6 +157,7 @@ struct dma_coherent_area {
 	struct arm_vmregion_head vm;
 	unsigned long pfn;
 	unsigned int type;
+	pgprot_t prot;
 	const char *name;
 };
 
@@ -232,6 +254,55 @@ void __init dma_coherent_mapping(void)
 	}
 
 	iotable_init(map, nr);
+	coherent_dma_area->prot = pgprot_dmacoherent(pgprot_kernel);
+	coherent_wc_area->prot = pgprot_writecombine(pgprot_kernel);
+}
+
+struct dma_contiguous_early_reserve {
+	phys_addr_t base;
+	unsigned long size;
+};
+
+static struct dma_contiguous_early_reserve
+dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
+		phys_addr_t start = dma_mmu_remap[i].base;
+		phys_addr_t end = start + dma_mmu_remap[i].size;
+		struct map_desc map;
+		unsigned long addr;
+
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
+		if (start >= end)
+			return;
+
+		map.pfn = __phys_to_pfn(start);
+		map.virtual = __phys_to_virt(start);
+		map.length = end - start;
+		map.type = MT_MEMORY_DMA_READY;
+
+		/*
+		 * Clear previous low-memory mapping
+		 */
+		for (addr = __phys_to_virt(start); addr < __phys_to_virt(end);
+		     addr += PGDIR_SIZE)
+			pmd_clear(pmd_off_k(addr));
+
+		iotable_init(&map, 1);
+	}
 }
 
 static void *dma_alloc_area(size_t size, unsigned long *pfn, gfp_t gfp,
@@ -289,10 +360,34 @@ static void dma_free_area(void *cpu_addr, size_t size, struct dma_coherent_area
 
 #define nommu() (0)
 
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	struct page *page = virt_to_page(addr);
+	pgprot_t prot = *(pgprot_t *)data;
+
+	set_pte_ext(pte, mk_pte(page, prot), 0);
+	return 0;
+}
+
+static void dma_remap_area(struct page *page, size_t size, pgprot_t prot)
+{
+	unsigned long start = (unsigned long) page_address(page);
+	unsigned end = start + size;
+
+	if (arch_is_coherent())
+		return;
+
+	apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
+	dsb();
+	flush_tlb_kernel_range(start, end);
+}
+
 #else	/* !CONFIG_MMU */
 
 #define dma_alloc_area(size, pfn, gfp, area)	({ *(pfn) = 0; NULL })
 #define dma_free_area(addr, size, area)		do { } while (0)
+#define dma_remap_area(page, size, prot)	do { } while (0)
 
 #define nommu()	(1)
 #define coherent_wc_area NULL
@@ -308,19 +403,27 @@ static void *
 __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
 	    struct dma_coherent_area *area)
 {
-	unsigned long pfn;
-	void *ret;
+	unsigned long pfn = 0;
+	void *ret = NULL;
 
 	*handle = ~0;
 	size = PAGE_ALIGN(size);
 
-	if (arch_is_coherent() || nommu()) {
+	if (arch_is_coherent() || nommu() ||
+	   (cma_available() && !(gfp & GFP_ATOMIC))) {
+		/*
+		 * Allocate from system or CMA pages
+		 */
 		struct page *page = __dma_alloc_buffer(dev, size, gfp);
 		if (!page)
 			return NULL;
+		dma_remap_area(page, size, area->prot);
 		pfn = page_to_pfn(page);
 		ret = page_address(page);
 	} else {
+		/*
+		 * Allocate from reserved DMA coherent/wc area
+		 */
 		ret = dma_alloc_area(size, &pfn, gfp, area);
 	}
 
@@ -333,12 +436,19 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
 static void __dma_free(struct device *dev, size_t size, void *cpu_addr,
 	dma_addr_t handle, struct dma_coherent_area *area)
 {
+	struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
 	size = PAGE_ALIGN(size);
 
 	if (arch_is_coherent() || nommu()) {
-		__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
-	} else {
+		WARN_ON(irqs_disabled());
+		__dma_free_system_buffer(page, size);
+	} else if ((unsigned long)cpu_addr >= area->vm.vm_start &&
+		   (unsigned long)cpu_addr < area->vm.vm_end) {
 		dma_free_area(cpu_addr, size, area);
+	} else {
+		WARN_ON(irqs_disabled());
+		dma_remap_area(page, size, pgprot_kernel);
+		dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
 	}
 }
 
@@ -375,27 +485,12 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
-	unsigned long user_size, kern_size;
-	struct arm_vmregion *c;
-
-	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-
-	c = arm_vmregion_find(&area->vm, (unsigned long)cpu_addr);
-	if (c) {
-		unsigned long off = vma->vm_pgoff;
-
-		kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;
-
-		if (off < kern_size &&
-		    user_size <= (kern_size - off)) {
-			ret = remap_pfn_range(vma, vma->vm_start,
-					      page_to_pfn(c->vm_pages) + off,
-					      user_size << PAGE_SHIFT,
-					      vma->vm_page_prot);
-		}
-	}
+	unsigned long pfn = dma_to_pfn(dev, dma_addr);
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn + vma->vm_pgoff,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
 #endif	/* CONFIG_MMU */
-
 	return ret;
 }
 
@@ -421,8 +516,6 @@ EXPORT_SYMBOL(dma_mmap_writecombine);
  */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
-
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
@@ -433,8 +526,6 @@ EXPORT_SYMBOL(dma_free_coherent);
 void dma_free_writecombine(struct device *dev, size_t size, void *cpu_addr,
 	dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
-
 	__dma_free(dev, size, cpu_addr, handle, coherent_wc_area);
 }
 EXPORT_SYMBOL(dma_free_writecombine);
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 77076a6..0f2dbb8 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -20,6 +20,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -365,12 +366,14 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 
 	arm_mm_memblock_reserve();
 	arm_dt_memblock_reserve();
-	dma_coherent_reserve();
 
 	/* reserve any platform specific memblock areas */
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	dma_coherent_reserve();
+	dma_contiguous_reserve();
+
 	memblock_analyze();
 	memblock_dump_all();
 }
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 3abaa2c..46101be 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -29,7 +29,10 @@ extern u32 arm_dma_limit;
 #define arm_dma_limit ((u32)~0)
 #endif
 
+extern phys_addr_t arm_lowmem_limit;
+
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
 void dma_coherent_reserve(void);
 void dma_coherent_mapping(void);
+void dma_contiguous_remap(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 027f118..9dc18d4 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -273,6 +273,11 @@ static struct mem_type mem_types[] = {
 		.prot_l1   = PMD_TYPE_TABLE,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_DMA_READY] = {
+		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,
+		.prot_l1   = PMD_TYPE_TABLE,
+		.domain    = DOMAIN_KERNEL,
+	},
 	[MT_DMA_COHERENT] = {
 		.prot_sect	= PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
 				  PMD_SECT_S,
@@ -425,6 +430,7 @@ static void __init build_mem_type_table(void)
 	if (arch_is_coherent() && cpu_is_xsc3()) {
 		mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+		mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 	}
@@ -454,6 +460,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 		}
@@ -504,6 +511,7 @@ static void __init build_mem_type_table(void)
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
 	mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+	mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
 	mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
 	mem_types[MT_ROM].prot_sect |= cp->pmd;
 
@@ -583,7 +591,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr,
 	 * L1 entries, whereas PGDs refer to a group of L1 entries making
 	 * up one logical pointer to an L2 table.
 	 */
-	if (((addr | end | phys) & ~SECTION_MASK) == 0) {
+	if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) {
 		pmd_t *p = pmd;
 
 		if (addr & SECTION_SIZE)
@@ -779,7 +787,7 @@ static int __init early_vmalloc(char *arg)
 }
 early_param("vmalloc", early_vmalloc);
 
-static phys_addr_t lowmem_limit __initdata = 0;
+phys_addr_t arm_lowmem_limit __initdata = 0;
 
 void __init sanity_check_meminfo(void)
 {
@@ -848,8 +856,8 @@ void __init sanity_check_meminfo(void)
 			bank->size = newsize;
 		}
 #endif
-		if (!bank->highmem && bank->start + bank->size > lowmem_limit)
-			lowmem_limit = bank->start + bank->size;
+		if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit)
+			arm_lowmem_limit = bank->start + bank->size;
 
 		j++;
 	}
@@ -874,7 +882,7 @@ void __init sanity_check_meminfo(void)
 	}
 #endif
 	meminfo.nr_banks = j;
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 }
 
 static inline void prepare_page_table(void)
@@ -899,8 +907,8 @@ static inline void prepare_page_table(void)
 	 * Find the end of the first block of lowmem.
 	 */
 	end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
-	if (end >= lowmem_limit)
-		end = lowmem_limit;
+	if (end >= arm_lowmem_limit)
+		end = arm_lowmem_limit;
 
 	/*
 	 * Clear out all the kernel space mappings, except for the first
@@ -1034,8 +1042,8 @@ static void __init map_lowmem(void)
 		phys_addr_t end = start + reg->size;
 		struct map_desc map;
 
-		if (end > lowmem_limit)
-			end = lowmem_limit;
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
 		if (start >= end)
 			break;
 
@@ -1056,11 +1064,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 
 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
 
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special memory area which is
exclusive from system memory to avoid remapping page attributes what might
be not allowed in atomic context on some systems. If CMA has been disabled
then all DMA allocations are performed from this area.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                      |    1 +
 arch/arm/include/asm/device.h         |    3 +
 arch/arm/include/asm/dma-contiguous.h |   33 +++++++
 arch/arm/include/asm/mach/map.h       |    5 +-
 arch/arm/mm/dma-mapping.c             |  169 +++++++++++++++++++++++++--------
 arch/arm/mm/init.c                    |    5 +-
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   29 ++++--
 8 files changed, 196 insertions(+), 52 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2c71a8f..20fa729 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_CONTIGUOUS
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 9f390ce..942913e 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -10,6 +10,9 @@ struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
+#ifdef CONFIG_CMA
+	struct cma *cma_area;
+#endif
 };
 
 struct pdev_archdata {
diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..99bf7c8
--- /dev/null
+++ b/arch/arm/include/asm/dma-contiguous.h
@@ -0,0 +1,33 @@
+#ifndef ASMARM_DMA_CONTIGUOUS_H
+#define ASMARM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+#define MAX_CMA_AREAS	(8)
+
+void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
+
+static inline struct cma *get_dev_cma_area(struct device *dev)
+{
+	if (dev->archdata.cma_area)
+		return dev->archdata.cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void set_dev_cma_area(struct device *dev, struct cma *cma)
+{
+	dev->archdata.cma_area = cma;
+}
+
+#else
+
+#define MAX_CMA_AREAS	(0)
+
+#endif
+#endif
+#endif
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index 3845215..5982a83 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -29,8 +29,9 @@ struct map_desc {
 #define MT_MEMORY_NONCACHED	11
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
-#define MT_DMA_COHERENT		14
-#define MT_WC_COHERENT		15
+#define MT_MEMORY_DMA_READY	14
+#define MT_DMA_COHERENT		15
+#define MT_WC_COHERENT		16
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index b643262..63175d1 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,6 +17,7 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
 #include <linux/memblock.h>
 
@@ -26,6 +27,7 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 #include <asm/mach/map.h>
+#include <asm/dma-contiguous.h>
 
 #include "mm.h"
 
@@ -56,6 +58,24 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+static struct page *__dma_alloc_system_pages(size_t count, gfp_t gfp,
+					     unsigned long order)
+{
+	struct page *page, *p, *e;
+
+	page = alloc_pages(gfp, order);
+	if (!page)
+		return NULL;
+
+	/*
+	 * Now split the huge page and free the excess pages
+	 */
+	split_page(page, order);
+	for (p = page + count, e = page + (1 << order); p < e; p++)
+		__free_page(p);
+	return page;
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
@@ -63,7 +83,8 @@ static u64 get_coherent_dma_mask(struct device *dev)
 static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gfp)
 {
 	unsigned long order = get_order(size);
-	struct page *page, *p, *e;
+	size_t count = size >> PAGE_SHIFT;
+	struct page *page;
 	void *ptr;
 	u64 mask = get_coherent_dma_mask(dev);
 
@@ -82,16 +103,16 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	if (mask < 0xffffffffULL)
 		gfp |= GFP_DMA;
 
-	page = alloc_pages(gfp, order);
-	if (!page)
-		return NULL;
-
 	/*
-	 * Now split the huge page and free the excess pages
+	 * Allocate contiguous memory
 	 */
-	split_page(page, order);
-	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
-		__free_page(p);
+	if (cma_available())
+		page = dma_alloc_from_contiguous(dev, count, order);
+	else
+		page = __dma_alloc_system_pages(count, gfp, order);
+
+	if (!page)
+		return NULL;
 
 	/*
 	 * Ensure that the allocated pages are zeroed, and that any data
@@ -108,7 +129,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 /*
  * Free a DMA buffer.  'size' must be page aligned.
  */
-static void __dma_free_buffer(struct page *page, size_t size)
+static void __dma_free_system_buffer(struct page *page, size_t size)
 {
 	struct page *e = page + (size >> PAGE_SHIFT);
 
@@ -136,6 +157,7 @@ struct dma_coherent_area {
 	struct arm_vmregion_head vm;
 	unsigned long pfn;
 	unsigned int type;
+	pgprot_t prot;
 	const char *name;
 };
 
@@ -232,6 +254,55 @@ void __init dma_coherent_mapping(void)
 	}
 
 	iotable_init(map, nr);
+	coherent_dma_area->prot = pgprot_dmacoherent(pgprot_kernel);
+	coherent_wc_area->prot = pgprot_writecombine(pgprot_kernel);
+}
+
+struct dma_contiguous_early_reserve {
+	phys_addr_t base;
+	unsigned long size;
+};
+
+static struct dma_contiguous_early_reserve
+dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
+		phys_addr_t start = dma_mmu_remap[i].base;
+		phys_addr_t end = start + dma_mmu_remap[i].size;
+		struct map_desc map;
+		unsigned long addr;
+
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
+		if (start >= end)
+			return;
+
+		map.pfn = __phys_to_pfn(start);
+		map.virtual = __phys_to_virt(start);
+		map.length = end - start;
+		map.type = MT_MEMORY_DMA_READY;
+
+		/*
+		 * Clear previous low-memory mapping
+		 */
+		for (addr = __phys_to_virt(start); addr < __phys_to_virt(end);
+		     addr += PGDIR_SIZE)
+			pmd_clear(pmd_off_k(addr));
+
+		iotable_init(&map, 1);
+	}
 }
 
 static void *dma_alloc_area(size_t size, unsigned long *pfn, gfp_t gfp,
@@ -289,10 +360,34 @@ static void dma_free_area(void *cpu_addr, size_t size, struct dma_coherent_area
 
 #define nommu() (0)
 
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	struct page *page = virt_to_page(addr);
+	pgprot_t prot = *(pgprot_t *)data;
+
+	set_pte_ext(pte, mk_pte(page, prot), 0);
+	return 0;
+}
+
+static void dma_remap_area(struct page *page, size_t size, pgprot_t prot)
+{
+	unsigned long start = (unsigned long) page_address(page);
+	unsigned end = start + size;
+
+	if (arch_is_coherent())
+		return;
+
+	apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
+	dsb();
+	flush_tlb_kernel_range(start, end);
+}
+
 #else	/* !CONFIG_MMU */
 
 #define dma_alloc_area(size, pfn, gfp, area)	({ *(pfn) = 0; NULL })
 #define dma_free_area(addr, size, area)		do { } while (0)
+#define dma_remap_area(page, size, prot)	do { } while (0)
 
 #define nommu()	(1)
 #define coherent_wc_area NULL
@@ -308,19 +403,27 @@ static void *
 __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
 	    struct dma_coherent_area *area)
 {
-	unsigned long pfn;
-	void *ret;
+	unsigned long pfn = 0;
+	void *ret = NULL;
 
 	*handle = ~0;
 	size = PAGE_ALIGN(size);
 
-	if (arch_is_coherent() || nommu()) {
+	if (arch_is_coherent() || nommu() ||
+	   (cma_available() && !(gfp & GFP_ATOMIC))) {
+		/*
+		 * Allocate from system or CMA pages
+		 */
 		struct page *page = __dma_alloc_buffer(dev, size, gfp);
 		if (!page)
 			return NULL;
+		dma_remap_area(page, size, area->prot);
 		pfn = page_to_pfn(page);
 		ret = page_address(page);
 	} else {
+		/*
+		 * Allocate from reserved DMA coherent/wc area
+		 */
 		ret = dma_alloc_area(size, &pfn, gfp, area);
 	}
 
@@ -333,12 +436,19 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
 static void __dma_free(struct device *dev, size_t size, void *cpu_addr,
 	dma_addr_t handle, struct dma_coherent_area *area)
 {
+	struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
 	size = PAGE_ALIGN(size);
 
 	if (arch_is_coherent() || nommu()) {
-		__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
-	} else {
+		WARN_ON(irqs_disabled());
+		__dma_free_system_buffer(page, size);
+	} else if ((unsigned long)cpu_addr >= area->vm.vm_start &&
+		   (unsigned long)cpu_addr < area->vm.vm_end) {
 		dma_free_area(cpu_addr, size, area);
+	} else {
+		WARN_ON(irqs_disabled());
+		dma_remap_area(page, size, pgprot_kernel);
+		dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
 	}
 }
 
@@ -375,27 +485,12 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
-	unsigned long user_size, kern_size;
-	struct arm_vmregion *c;
-
-	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-
-	c = arm_vmregion_find(&area->vm, (unsigned long)cpu_addr);
-	if (c) {
-		unsigned long off = vma->vm_pgoff;
-
-		kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;
-
-		if (off < kern_size &&
-		    user_size <= (kern_size - off)) {
-			ret = remap_pfn_range(vma, vma->vm_start,
-					      page_to_pfn(c->vm_pages) + off,
-					      user_size << PAGE_SHIFT,
-					      vma->vm_page_prot);
-		}
-	}
+	unsigned long pfn = dma_to_pfn(dev, dma_addr);
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn + vma->vm_pgoff,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
 #endif	/* CONFIG_MMU */
-
 	return ret;
 }
 
@@ -421,8 +516,6 @@ EXPORT_SYMBOL(dma_mmap_writecombine);
  */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
-
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
@@ -433,8 +526,6 @@ EXPORT_SYMBOL(dma_free_coherent);
 void dma_free_writecombine(struct device *dev, size_t size, void *cpu_addr,
 	dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
-
 	__dma_free(dev, size, cpu_addr, handle, coherent_wc_area);
 }
 EXPORT_SYMBOL(dma_free_writecombine);
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 77076a6..0f2dbb8 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -20,6 +20,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/prom.h>
@@ -365,12 +366,14 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 
 	arm_mm_memblock_reserve();
 	arm_dt_memblock_reserve();
-	dma_coherent_reserve();
 
 	/* reserve any platform specific memblock areas */
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	dma_coherent_reserve();
+	dma_contiguous_reserve();
+
 	memblock_analyze();
 	memblock_dump_all();
 }
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 3abaa2c..46101be 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -29,7 +29,10 @@ extern u32 arm_dma_limit;
 #define arm_dma_limit ((u32)~0)
 #endif
 
+extern phys_addr_t arm_lowmem_limit;
+
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
 void dma_coherent_reserve(void);
 void dma_coherent_mapping(void);
+void dma_contiguous_remap(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 027f118..9dc18d4 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -273,6 +273,11 @@ static struct mem_type mem_types[] = {
 		.prot_l1   = PMD_TYPE_TABLE,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_DMA_READY] = {
+		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,
+		.prot_l1   = PMD_TYPE_TABLE,
+		.domain    = DOMAIN_KERNEL,
+	},
 	[MT_DMA_COHERENT] = {
 		.prot_sect	= PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
 				  PMD_SECT_S,
@@ -425,6 +430,7 @@ static void __init build_mem_type_table(void)
 	if (arch_is_coherent() && cpu_is_xsc3()) {
 		mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+		mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 	}
@@ -454,6 +460,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 		}
@@ -504,6 +511,7 @@ static void __init build_mem_type_table(void)
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
 	mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+	mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
 	mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
 	mem_types[MT_ROM].prot_sect |= cp->pmd;
 
@@ -583,7 +591,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr,
 	 * L1 entries, whereas PGDs refer to a group of L1 entries making
 	 * up one logical pointer to an L2 table.
 	 */
-	if (((addr | end | phys) & ~SECTION_MASK) == 0) {
+	if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) {
 		pmd_t *p = pmd;
 
 		if (addr & SECTION_SIZE)
@@ -779,7 +787,7 @@ static int __init early_vmalloc(char *arg)
 }
 early_param("vmalloc", early_vmalloc);
 
-static phys_addr_t lowmem_limit __initdata = 0;
+phys_addr_t arm_lowmem_limit __initdata = 0;
 
 void __init sanity_check_meminfo(void)
 {
@@ -848,8 +856,8 @@ void __init sanity_check_meminfo(void)
 			bank->size = newsize;
 		}
 #endif
-		if (!bank->highmem && bank->start + bank->size > lowmem_limit)
-			lowmem_limit = bank->start + bank->size;
+		if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit)
+			arm_lowmem_limit = bank->start + bank->size;
 
 		j++;
 	}
@@ -874,7 +882,7 @@ void __init sanity_check_meminfo(void)
 	}
 #endif
 	meminfo.nr_banks = j;
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 }
 
 static inline void prepare_page_table(void)
@@ -899,8 +907,8 @@ static inline void prepare_page_table(void)
 	 * Find the end of the first block of lowmem.
 	 */
 	end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
-	if (end >= lowmem_limit)
-		end = lowmem_limit;
+	if (end >= arm_lowmem_limit)
+		end = arm_lowmem_limit;
 
 	/*
 	 * Clear out all the kernel space mappings, except for the first
@@ -1034,8 +1042,8 @@ static void __init map_lowmem(void)
 		phys_addr_t end = start + reg->size;
 		struct map_desc map;
 
-		if (end > lowmem_limit)
-			end = lowmem_limit;
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
 		if (start >= end)
 			break;
 
@@ -1056,11 +1064,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 
 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 9/9] ARM: S5PV210: example of CMA private area for FIMC device on Goni board
  2011-08-12 10:58 ` Marek Szyprowski
  (?)
@ 2011-08-12 10:58   ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

This patch is an example how device private CMA area can be activated.
It creates one CMA region and assigns it to the first s5p-fimc device on
Samsung Goni S5PC110 board.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/mach-s5pv210/Kconfig     |    1 +
 arch/arm/mach-s5pv210/mach-goni.c |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-s5pv210/Kconfig b/arch/arm/mach-s5pv210/Kconfig
index 69dd87c..697a5be 100644
--- a/arch/arm/mach-s5pv210/Kconfig
+++ b/arch/arm/mach-s5pv210/Kconfig
@@ -64,6 +64,7 @@ menu "S5PC110 Machines"
 config MACH_AQUILA
 	bool "Aquila"
 	select CPU_S5PV210
+	select CMA
 	select S3C_DEV_FB
 	select S5P_DEV_FIMC0
 	select S5P_DEV_FIMC1
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index 85c2d51..665c4ae 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -26,6 +26,7 @@
 #include <linux/input.h>
 #include <linux/gpio.h>
 #include <linux/interrupt.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach/arch.h>
 #include <asm/mach/map.h>
@@ -848,6 +849,9 @@ static void __init goni_map_io(void)
 static void __init goni_reserve(void)
 {
 	s5p_mfc_reserve_mem(0x43000000, 8 << 20, 0x51000000, 8 << 20);
+
+	/* Create private 16MiB contiguous memory area for s5p-fimc.0 device */
+	dma_declare_contiguous(&s5p_device_fimc0.dev, 16*SZ_1M, 0);
 }
 
 static void __init goni_machine_init(void)
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 9/9] ARM: S5PV210: example of CMA private area for FIMC device on Goni board
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Arnd Bergmann, Jesse Barker, Jonathan Corbet,
	Shariq Hasnain, Chunsang Jeong

This patch is an example how device private CMA area can be activated.
It creates one CMA region and assigns it to the first s5p-fimc device on
Samsung Goni S5PC110 board.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/mach-s5pv210/Kconfig     |    1 +
 arch/arm/mach-s5pv210/mach-goni.c |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-s5pv210/Kconfig b/arch/arm/mach-s5pv210/Kconfig
index 69dd87c..697a5be 100644
--- a/arch/arm/mach-s5pv210/Kconfig
+++ b/arch/arm/mach-s5pv210/Kconfig
@@ -64,6 +64,7 @@ menu "S5PC110 Machines"
 config MACH_AQUILA
 	bool "Aquila"
 	select CPU_S5PV210
+	select CMA
 	select S3C_DEV_FB
 	select S5P_DEV_FIMC0
 	select S5P_DEV_FIMC1
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index 85c2d51..665c4ae 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -26,6 +26,7 @@
 #include <linux/input.h>
 #include <linux/gpio.h>
 #include <linux/interrupt.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach/arch.h>
 #include <asm/mach/map.h>
@@ -848,6 +849,9 @@ static void __init goni_map_io(void)
 static void __init goni_reserve(void)
 {
 	s5p_mfc_reserve_mem(0x43000000, 8 << 20, 0x51000000, 8 << 20);
+
+	/* Create private 16MiB contiguous memory area for s5p-fimc.0 device */
+	dma_declare_contiguous(&s5p_device_fimc0.dev, 16*SZ_1M, 0);
 }
 
 static void __init goni_machine_init(void)
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 9/9] ARM: S5PV210: example of CMA private area for FIMC device on Goni board
@ 2011-08-12 10:58   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-12 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

This patch is an example how device private CMA area can be activated.
It creates one CMA region and assigns it to the first s5p-fimc device on
Samsung Goni S5PC110 board.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/mach-s5pv210/Kconfig     |    1 +
 arch/arm/mach-s5pv210/mach-goni.c |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-s5pv210/Kconfig b/arch/arm/mach-s5pv210/Kconfig
index 69dd87c..697a5be 100644
--- a/arch/arm/mach-s5pv210/Kconfig
+++ b/arch/arm/mach-s5pv210/Kconfig
@@ -64,6 +64,7 @@ menu "S5PC110 Machines"
 config MACH_AQUILA
 	bool "Aquila"
 	select CPU_S5PV210
+	select CMA
 	select S3C_DEV_FB
 	select S5P_DEV_FIMC0
 	select S5P_DEV_FIMC1
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index 85c2d51..665c4ae 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -26,6 +26,7 @@
 #include <linux/input.h>
 #include <linux/gpio.h>
 #include <linux/interrupt.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach/arch.h>
 #include <asm/mach/map.h>
@@ -848,6 +849,9 @@ static void __init goni_map_io(void)
 static void __init goni_reserve(void)
 {
 	s5p_mfc_reserve_mem(0x43000000, 8 << 20, 0x51000000, 8 << 20);
+
+	/* Create private 16MiB contiguous memory area for s5p-fimc.0 device */
+	dma_declare_contiguous(&s5p_device_fimc0.dev, 16*SZ_1M, 0);
 }
 
 static void __init goni_machine_init(void)
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-12 10:58   ` Marek Szyprowski
  (?)
@ 2011-08-12 12:53     ` Arnd Bergmann
  -1 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-12 12:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong

On Friday 12 August 2011, Marek Szyprowski wrote:
> 
> From: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> Steal memory from the kernel to provide coherent DMA memory to drivers.
> This avoids the problem with multiple mappings with differing attributes
> on later CPUs.
> 
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> [m.szyprowski: rebased onto 3.1-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

Hi Marek,

Is this the same patch that Russell had to revert because it didn't
work on some of the older machines, in particular those using
dmabounce?

I thought that our discussion ended with the plan to use this only
for ARMv6+ (which has a problem with double mapping) but not on ARMv5
and below (which don't have this problem but might need dmabounce).

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-12 12:53     ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-12 12:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong

On Friday 12 August 2011, Marek Szyprowski wrote:
> 
> From: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> Steal memory from the kernel to provide coherent DMA memory to drivers.
> This avoids the problem with multiple mappings with differing attributes
> on later CPUs.
> 
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> [m.szyprowski: rebased onto 3.1-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

Hi Marek,

Is this the same patch that Russell had to revert because it didn't
work on some of the older machines, in particular those using
dmabounce?

I thought that our discussion ended with the plan to use this only
for ARMv6+ (which has a problem with double mapping) but not on ARMv5
and below (which don't have this problem but might need dmabounce).

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-12 12:53     ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-12 12:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 12 August 2011, Marek Szyprowski wrote:
> 
> From: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> Steal memory from the kernel to provide coherent DMA memory to drivers.
> This avoids the problem with multiple mappings with differing attributes
> on later CPUs.
> 
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> [m.szyprowski: rebased onto 3.1-rc1]
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

Hi Marek,

Is this the same patch that Russell had to revert because it didn't
work on some of the older machines, in particular those using
dmabounce?

I thought that our discussion ended with the plan to use this only
for ARMv6+ (which has a problem with double mapping) but not on ARMv5
and below (which don't have this problem but might need dmabounce).

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
  2011-08-12 10:58   ` Marek Szyprowski
  (?)
@ 2011-08-12 15:00     ` Arnd Bergmann
  -1 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-12 15:00 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Marek Szyprowski, linux-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Russell King, Jonathan Corbet,
	Mel Gorman, Chunsang Jeong, Michal Nazarewicz, Jesse Barker,
	Kyungmin Park, Ankita Garg, Shariq Hasnain, Andrew Morton,
	KAMEZAWA Hiroyuki

On Friday 12 August 2011, Marek Szyprowski wrote:
> @@ -82,16 +103,16 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
>  	if (mask < 0xffffffffULL)
>  		gfp |= GFP_DMA;
>  
> -	page = alloc_pages(gfp, order);
> -	if (!page)
> -		return NULL;
> -
>  	/*
> -	 * Now split the huge page and free the excess pages
> +	 * Allocate contiguous memory
>  	 */
> -	split_page(page, order);
> -	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
> -		__free_page(p);
> +	if (cma_available())
> +		page = dma_alloc_from_contiguous(dev, count, order);
> +	else
> +		page = __dma_alloc_system_pages(count, gfp, order);
> +
> +	if (!page)
> +		return NULL;

Why do you need the fallback here? I would assume that CMA now has to be available
on ARMv6 and up to work at all. When you allocate from __dma_alloc_system_pages(),
wouldn't that necessarily fail in the dma_remap_area() stage?

>  
> -	if (arch_is_coherent() || nommu()) {
> +	if (arch_is_coherent() || nommu() ||
> +	   (cma_available() && !(gfp & GFP_ATOMIC))) {
> +		/*
> +		 * Allocate from system or CMA pages
> +		 */
>  		struct page *page = __dma_alloc_buffer(dev, size, gfp);
>  		if (!page)
>  			return NULL;
> +		dma_remap_area(page, size, area->prot);
>  		pfn = page_to_pfn(page);
>  		ret = page_address(page);

Similarly with coherent and nommu. It seems to me that lumping too
many cases together creates extra complexity here.

How about something like

	if (arch_is_coherent() || nommu())
		ret = alloc_simple_buffer();
	else if (arch_is_v4_v5())
		ret = alloc_remap();
	else if (gfp & GFP_ATOMIC)
		ret = alloc_from_pool();
	else
		ret = alloc_from_contiguous();

This also allows a natural conversion to dma_map_ops when we get there.

>  	/* reserve any platform specific memblock areas */
>  	if (mdesc->reserve)
>  		mdesc->reserve();
>  
> +	dma_coherent_reserve();
> +	dma_contiguous_reserve();
> +
>  	memblock_analyze();
>  	memblock_dump_all();
>  }

Since we can handle most allocations using CMA on ARMv6+, I would think
that we can have a much smaller reserved area. Have you tried changing
dma_coherent_reserve() to allocate out of the contiguous area instead of
wasting a full 2MB section of memory?

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-08-12 15:00     ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-12 15:00 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Marek Szyprowski, linux-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Russell King, Jonathan Corbet,
	Mel Gorman, Chunsang Jeong, Michal Nazarewicz, Jesse Barker,
	Kyungmin Park, Ankita Garg, Shariq Hasnain, Andrew Morton,
	KAMEZAWA Hiroyuki

On Friday 12 August 2011, Marek Szyprowski wrote:
> @@ -82,16 +103,16 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
>  	if (mask < 0xffffffffULL)
>  		gfp |= GFP_DMA;
>  
> -	page = alloc_pages(gfp, order);
> -	if (!page)
> -		return NULL;
> -
>  	/*
> -	 * Now split the huge page and free the excess pages
> +	 * Allocate contiguous memory
>  	 */
> -	split_page(page, order);
> -	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
> -		__free_page(p);
> +	if (cma_available())
> +		page = dma_alloc_from_contiguous(dev, count, order);
> +	else
> +		page = __dma_alloc_system_pages(count, gfp, order);
> +
> +	if (!page)
> +		return NULL;

Why do you need the fallback here? I would assume that CMA now has to be available
on ARMv6 and up to work at all. When you allocate from __dma_alloc_system_pages(),
wouldn't that necessarily fail in the dma_remap_area() stage?

>  
> -	if (arch_is_coherent() || nommu()) {
> +	if (arch_is_coherent() || nommu() ||
> +	   (cma_available() && !(gfp & GFP_ATOMIC))) {
> +		/*
> +		 * Allocate from system or CMA pages
> +		 */
>  		struct page *page = __dma_alloc_buffer(dev, size, gfp);
>  		if (!page)
>  			return NULL;
> +		dma_remap_area(page, size, area->prot);
>  		pfn = page_to_pfn(page);
>  		ret = page_address(page);

Similarly with coherent and nommu. It seems to me that lumping too
many cases together creates extra complexity here.

How about something like

	if (arch_is_coherent() || nommu())
		ret = alloc_simple_buffer();
	else if (arch_is_v4_v5())
		ret = alloc_remap();
	else if (gfp & GFP_ATOMIC)
		ret = alloc_from_pool();
	else
		ret = alloc_from_contiguous();

This also allows a natural conversion to dma_map_ops when we get there.

>  	/* reserve any platform specific memblock areas */
>  	if (mdesc->reserve)
>  		mdesc->reserve();
>  
> +	dma_coherent_reserve();
> +	dma_contiguous_reserve();
> +
>  	memblock_analyze();
>  	memblock_dump_all();
>  }

Since we can handle most allocations using CMA on ARMv6+, I would think
that we can have a much smaller reserved area. Have you tried changing
dma_coherent_reserve() to allocate out of the contiguous area instead of
wasting a full 2MB section of memory?

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-08-12 15:00     ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-12 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 12 August 2011, Marek Szyprowski wrote:
> @@ -82,16 +103,16 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
>  	if (mask < 0xffffffffULL)
>  		gfp |= GFP_DMA;
>  
> -	page = alloc_pages(gfp, order);
> -	if (!page)
> -		return NULL;
> -
>  	/*
> -	 * Now split the huge page and free the excess pages
> +	 * Allocate contiguous memory
>  	 */
> -	split_page(page, order);
> -	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
> -		__free_page(p);
> +	if (cma_available())
> +		page = dma_alloc_from_contiguous(dev, count, order);
> +	else
> +		page = __dma_alloc_system_pages(count, gfp, order);
> +
> +	if (!page)
> +		return NULL;

Why do you need the fallback here? I would assume that CMA now has to be available
on ARMv6 and up to work at all. When you allocate from __dma_alloc_system_pages(),
wouldn't that necessarily fail in the dma_remap_area() stage?

>  
> -	if (arch_is_coherent() || nommu()) {
> +	if (arch_is_coherent() || nommu() ||
> +	   (cma_available() && !(gfp & GFP_ATOMIC))) {
> +		/*
> +		 * Allocate from system or CMA pages
> +		 */
>  		struct page *page = __dma_alloc_buffer(dev, size, gfp);
>  		if (!page)
>  			return NULL;
> +		dma_remap_area(page, size, area->prot);
>  		pfn = page_to_pfn(page);
>  		ret = page_address(page);

Similarly with coherent and nommu. It seems to me that lumping too
many cases together creates extra complexity here.

How about something like

	if (arch_is_coherent() || nommu())
		ret = alloc_simple_buffer();
	else if (arch_is_v4_v5())
		ret = alloc_remap();
	else if (gfp & GFP_ATOMIC)
		ret = alloc_from_pool();
	else
		ret = alloc_from_contiguous();

This also allows a natural conversion to dma_map_ops when we get there.

>  	/* reserve any platform specific memblock areas */
>  	if (mdesc->reserve)
>  		mdesc->reserve();
>  
> +	dma_coherent_reserve();
> +	dma_contiguous_reserve();
> +
>  	memblock_analyze();
>  	memblock_dump_all();
>  }

Since we can handle most allocations using CMA on ARMv6+, I would think
that we can have a much smaller reserved area. Have you tried changing
dma_coherent_reserve() to allocate out of the contiguous area instead of
wasting a full 2MB section of memory?

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-12 12:53     ` Arnd Bergmann
  (?)
@ 2011-08-14  7:52       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2011-08-14  7:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong

On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote:
> On Friday 12 August 2011, Marek Szyprowski wrote:
> > 
> > From: Russell King <rmk+kernel@arm.linux.org.uk>
> > 
> > Steal memory from the kernel to provide coherent DMA memory to drivers.
> > This avoids the problem with multiple mappings with differing attributes
> > on later CPUs.
> > 
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > [m.szyprowski: rebased onto 3.1-rc1]
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Hi Marek,
> 
> Is this the same patch that Russell had to revert because it didn't
> work on some of the older machines, in particular those using
> dmabounce?
> 
> I thought that our discussion ended with the plan to use this only
> for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> and below (which don't have this problem but might need dmabounce).

I thought we'd decided to have a pool of available CMA memory on ARMv6K
to satisfy atomic allocations, which can grow and shrink in size, rather
than setting aside a fixed amount of contiguous system memory.

ARMv6 and ARMv7+ could use CMA directly, and <= ARMv5 can use the existing
allocation method.

Has something changed?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-14  7:52       ` Russell King - ARM Linux
  0 siblings, 0 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2011-08-14  7:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong

On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote:
> On Friday 12 August 2011, Marek Szyprowski wrote:
> > 
> > From: Russell King <rmk+kernel@arm.linux.org.uk>
> > 
> > Steal memory from the kernel to provide coherent DMA memory to drivers.
> > This avoids the problem with multiple mappings with differing attributes
> > on later CPUs.
> > 
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > [m.szyprowski: rebased onto 3.1-rc1]
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Hi Marek,
> 
> Is this the same patch that Russell had to revert because it didn't
> work on some of the older machines, in particular those using
> dmabounce?
> 
> I thought that our discussion ended with the plan to use this only
> for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> and below (which don't have this problem but might need dmabounce).

I thought we'd decided to have a pool of available CMA memory on ARMv6K
to satisfy atomic allocations, which can grow and shrink in size, rather
than setting aside a fixed amount of contiguous system memory.

ARMv6 and ARMv7+ could use CMA directly, and <= ARMv5 can use the existing
allocation method.

Has something changed?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-14  7:52       ` Russell King - ARM Linux
  0 siblings, 0 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2011-08-14  7:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote:
> On Friday 12 August 2011, Marek Szyprowski wrote:
> > 
> > From: Russell King <rmk+kernel@arm.linux.org.uk>
> > 
> > Steal memory from the kernel to provide coherent DMA memory to drivers.
> > This avoids the problem with multiple mappings with differing attributes
> > on later CPUs.
> > 
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > [m.szyprowski: rebased onto 3.1-rc1]
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Hi Marek,
> 
> Is this the same patch that Russell had to revert because it didn't
> work on some of the older machines, in particular those using
> dmabounce?
> 
> I thought that our discussion ended with the plan to use this only
> for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> and below (which don't have this problem but might need dmabounce).

I thought we'd decided to have a pool of available CMA memory on ARMv6K
to satisfy atomic allocations, which can grow and shrink in size, rather
than setting aside a fixed amount of contiguous system memory.

ARMv6 and ARMv7+ could use CMA directly, and <= ARMv5 can use the existing
allocation method.

Has something changed?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
  2011-08-12 15:00     ` Arnd Bergmann
  (?)
@ 2011-08-16  9:29       ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-16  9:29 UTC (permalink / raw)
  To: 'Arnd Bergmann', linux-arm-kernel
  Cc: linux-kernel, linux-media, linux-mm, linaro-mm-sig,
	'Daniel Walker', 'Russell King',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Michal Nazarewicz',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Shariq Hasnain',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

Hello,

On Friday, August 12, 2011 5:01 PM Arnd Bergmann wrote:

> On Friday 12 August 2011, Marek Szyprowski wrote:
> > @@ -82,16 +103,16 @@ static struct page *__dma_alloc_buffer(struct device
*dev,
> size_t size, gfp_t gf
> >  	if (mask < 0xffffffffULL)
> >  		gfp |= GFP_DMA;
> >
> > -	page = alloc_pages(gfp, order);
> > -	if (!page)
> > -		return NULL;
> > -
> >  	/*
> > -	 * Now split the huge page and free the excess pages
> > +	 * Allocate contiguous memory
> >  	 */
> > -	split_page(page, order);
> > -	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e;
p++)
> > -		__free_page(p);
> > +	if (cma_available())
> > +		page = dma_alloc_from_contiguous(dev, count, order);
> > +	else
> > +		page = __dma_alloc_system_pages(count, gfp, order);
> > +
> > +	if (!page)
> > +		return NULL;
> 
> Why do you need the fallback here? I would assume that CMA now has to be
available
> on ARMv6 and up to work at all. When you allocate from
__dma_alloc_system_pages(),
> wouldn't that necessarily fail in the dma_remap_area() stage?

It is not a fallback - I've just merged 2 cases together (CMA case and
coheren/nommu
arch). I agree that such mixed code might be confusing.

> >
> > -	if (arch_is_coherent() || nommu()) {
> > +	if (arch_is_coherent() || nommu() ||
> > +	   (cma_available() && !(gfp & GFP_ATOMIC))) {
> > +		/*
> > +		 * Allocate from system or CMA pages
> > +		 */
> >  		struct page *page = __dma_alloc_buffer(dev, size, gfp);
> >  		if (!page)
> >  			return NULL;
> > +		dma_remap_area(page, size, area->prot);
> >  		pfn = page_to_pfn(page);
> >  		ret = page_address(page);
> 
> Similarly with coherent and nommu. It seems to me that lumping too
> many cases together creates extra complexity here.
> 
> How about something like
> 
> 	if (arch_is_coherent() || nommu())
> 		ret = alloc_simple_buffer();
> 	else if (arch_is_v4_v5())
> 		ret = alloc_remap();
> 	else if (gfp & GFP_ATOMIC)
> 		ret = alloc_from_pool();
> 	else
> 		ret = alloc_from_contiguous();
> 
> This also allows a natural conversion to dma_map_ops when we get there.

Ok. Is it ok to enable CMA permanently for ARMv6+? If CMA is left conditional
the dma pool code will be much more complicated, because it will need to support
both CMA and non-CMA cases.

> >  	/* reserve any platform specific memblock areas */
> >  	if (mdesc->reserve)
> >  		mdesc->reserve();
> >
> > +	dma_coherent_reserve();
> > +	dma_contiguous_reserve();
> > +
> >  	memblock_analyze();
> >  	memblock_dump_all();
> >  }
> 
> Since we can handle most allocations using CMA on ARMv6+, I would think
> that we can have a much smaller reserved area. Have you tried changing
> dma_coherent_reserve() to allocate out of the contiguous area instead of
> wasting a full 2MB section of memory?

I will move the reserved pool directly into CMA area, so it can be shrunk below
2MiB.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-08-16  9:29       ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-16  9:29 UTC (permalink / raw)
  To: 'Arnd Bergmann', linux-arm-kernel
  Cc: linux-kernel, linux-media, linux-mm, linaro-mm-sig,
	'Daniel Walker', 'Russell King',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Michal Nazarewicz',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Shariq Hasnain',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

Hello,

On Friday, August 12, 2011 5:01 PM Arnd Bergmann wrote:

> On Friday 12 August 2011, Marek Szyprowski wrote:
> > @@ -82,16 +103,16 @@ static struct page *__dma_alloc_buffer(struct device
*dev,
> size_t size, gfp_t gf
> >  	if (mask < 0xffffffffULL)
> >  		gfp |= GFP_DMA;
> >
> > -	page = alloc_pages(gfp, order);
> > -	if (!page)
> > -		return NULL;
> > -
> >  	/*
> > -	 * Now split the huge page and free the excess pages
> > +	 * Allocate contiguous memory
> >  	 */
> > -	split_page(page, order);
> > -	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e;
p++)
> > -		__free_page(p);
> > +	if (cma_available())
> > +		page = dma_alloc_from_contiguous(dev, count, order);
> > +	else
> > +		page = __dma_alloc_system_pages(count, gfp, order);
> > +
> > +	if (!page)
> > +		return NULL;
> 
> Why do you need the fallback here? I would assume that CMA now has to be
available
> on ARMv6 and up to work at all. When you allocate from
__dma_alloc_system_pages(),
> wouldn't that necessarily fail in the dma_remap_area() stage?

It is not a fallback - I've just merged 2 cases together (CMA case and
coheren/nommu
arch). I agree that such mixed code might be confusing.

> >
> > -	if (arch_is_coherent() || nommu()) {
> > +	if (arch_is_coherent() || nommu() ||
> > +	   (cma_available() && !(gfp & GFP_ATOMIC))) {
> > +		/*
> > +		 * Allocate from system or CMA pages
> > +		 */
> >  		struct page *page = __dma_alloc_buffer(dev, size, gfp);
> >  		if (!page)
> >  			return NULL;
> > +		dma_remap_area(page, size, area->prot);
> >  		pfn = page_to_pfn(page);
> >  		ret = page_address(page);
> 
> Similarly with coherent and nommu. It seems to me that lumping too
> many cases together creates extra complexity here.
> 
> How about something like
> 
> 	if (arch_is_coherent() || nommu())
> 		ret = alloc_simple_buffer();
> 	else if (arch_is_v4_v5())
> 		ret = alloc_remap();
> 	else if (gfp & GFP_ATOMIC)
> 		ret = alloc_from_pool();
> 	else
> 		ret = alloc_from_contiguous();
> 
> This also allows a natural conversion to dma_map_ops when we get there.

Ok. Is it ok to enable CMA permanently for ARMv6+? If CMA is left conditional
the dma pool code will be much more complicated, because it will need to support
both CMA and non-CMA cases.

> >  	/* reserve any platform specific memblock areas */
> >  	if (mdesc->reserve)
> >  		mdesc->reserve();
> >
> > +	dma_coherent_reserve();
> > +	dma_contiguous_reserve();
> > +
> >  	memblock_analyze();
> >  	memblock_dump_all();
> >  }
> 
> Since we can handle most allocations using CMA on ARMv6+, I would think
> that we can have a much smaller reserved area. Have you tried changing
> dma_coherent_reserve() to allocate out of the contiguous area instead of
> wasting a full 2MB section of memory?

I will move the reserved pool directly into CMA area, so it can be shrunk below
2MiB.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-08-16  9:29       ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-16  9:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Friday, August 12, 2011 5:01 PM Arnd Bergmann wrote:

> On Friday 12 August 2011, Marek Szyprowski wrote:
> > @@ -82,16 +103,16 @@ static struct page *__dma_alloc_buffer(struct device
*dev,
> size_t size, gfp_t gf
> >  	if (mask < 0xffffffffULL)
> >  		gfp |= GFP_DMA;
> >
> > -	page = alloc_pages(gfp, order);
> > -	if (!page)
> > -		return NULL;
> > -
> >  	/*
> > -	 * Now split the huge page and free the excess pages
> > +	 * Allocate contiguous memory
> >  	 */
> > -	split_page(page, order);
> > -	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e;
p++)
> > -		__free_page(p);
> > +	if (cma_available())
> > +		page = dma_alloc_from_contiguous(dev, count, order);
> > +	else
> > +		page = __dma_alloc_system_pages(count, gfp, order);
> > +
> > +	if (!page)
> > +		return NULL;
> 
> Why do you need the fallback here? I would assume that CMA now has to be
available
> on ARMv6 and up to work at all. When you allocate from
__dma_alloc_system_pages(),
> wouldn't that necessarily fail in the dma_remap_area() stage?

It is not a fallback - I've just merged 2 cases together (CMA case and
coheren/nommu
arch). I agree that such mixed code might be confusing.

> >
> > -	if (arch_is_coherent() || nommu()) {
> > +	if (arch_is_coherent() || nommu() ||
> > +	   (cma_available() && !(gfp & GFP_ATOMIC))) {
> > +		/*
> > +		 * Allocate from system or CMA pages
> > +		 */
> >  		struct page *page = __dma_alloc_buffer(dev, size, gfp);
> >  		if (!page)
> >  			return NULL;
> > +		dma_remap_area(page, size, area->prot);
> >  		pfn = page_to_pfn(page);
> >  		ret = page_address(page);
> 
> Similarly with coherent and nommu. It seems to me that lumping too
> many cases together creates extra complexity here.
> 
> How about something like
> 
> 	if (arch_is_coherent() || nommu())
> 		ret = alloc_simple_buffer();
> 	else if (arch_is_v4_v5())
> 		ret = alloc_remap();
> 	else if (gfp & GFP_ATOMIC)
> 		ret = alloc_from_pool();
> 	else
> 		ret = alloc_from_contiguous();
> 
> This also allows a natural conversion to dma_map_ops when we get there.

Ok. Is it ok to enable CMA permanently for ARMv6+? If CMA is left conditional
the dma pool code will be much more complicated, because it will need to support
both CMA and non-CMA cases.

> >  	/* reserve any platform specific memblock areas */
> >  	if (mdesc->reserve)
> >  		mdesc->reserve();
> >
> > +	dma_coherent_reserve();
> > +	dma_contiguous_reserve();
> > +
> >  	memblock_analyze();
> >  	memblock_dump_all();
> >  }
> 
> Since we can handle most allocations using CMA on ARMv6+, I would think
> that we can have a much smaller reserved area. Have you tried changing
> dma_coherent_reserve() to allocate out of the contiguous area instead of
> wasting a full 2MB section of memory?

I will move the reserved pool directly into CMA area, so it can be shrunk below
2MiB.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-12 12:53     ` Arnd Bergmann
  (?)
@ 2011-08-16 10:17       ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-16 10:17 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Ankita Garg', 'Daniel Walker',
	'Mel Gorman', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong'

Hello,

On Friday, August 12, 2011 2:53 PM Arnd Bergmann wrote:

> On Friday 12 August 2011, Marek Szyprowski wrote:
> >
> > From: Russell King <rmk+kernel@arm.linux.org.uk>
> >
> > Steal memory from the kernel to provide coherent DMA memory to drivers.
> > This avoids the problem with multiple mappings with differing attributes
> > on later CPUs.
> >
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > [m.szyprowski: rebased onto 3.1-rc1]
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Hi Marek,
> 
> Is this the same patch that Russell had to revert because it didn't
> work on some of the older machines, in particular those using
> dmabounce?

Yes.
 
> I thought that our discussion ended with the plan to use this only
> for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> and below (which don't have this problem but might need dmabounce).

Ok, my fault. I've forgot to mention that this patch was almost ready 
during Linaro meeting, but I didn't manage to post it that time. Of course 
it doesn't fulfill all the agreements from that discussion.

I was only unsure if we should care about the case where CMA is not enabled
for ARMv6+ or not. This patch was prepared in assumption that 
dma_alloc_coherent should work in both cases - with and without CMA.

Now I assume that for ARMv6+ the CMA should be enabled unconditionally.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-16 10:17       ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-16 10:17 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Ankita Garg', 'Daniel Walker',
	'Mel Gorman', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong'

Hello,

On Friday, August 12, 2011 2:53 PM Arnd Bergmann wrote:

> On Friday 12 August 2011, Marek Szyprowski wrote:
> >
> > From: Russell King <rmk+kernel@arm.linux.org.uk>
> >
> > Steal memory from the kernel to provide coherent DMA memory to drivers.
> > This avoids the problem with multiple mappings with differing attributes
> > on later CPUs.
> >
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > [m.szyprowski: rebased onto 3.1-rc1]
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Hi Marek,
> 
> Is this the same patch that Russell had to revert because it didn't
> work on some of the older machines, in particular those using
> dmabounce?

Yes.
 
> I thought that our discussion ended with the plan to use this only
> for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> and below (which don't have this problem but might need dmabounce).

Ok, my fault. I've forgot to mention that this patch was almost ready 
during Linaro meeting, but I didn't manage to post it that time. Of course 
it doesn't fulfill all the agreements from that discussion.

I was only unsure if we should care about the case where CMA is not enabled
for ARMv6+ or not. This patch was prepared in assumption that 
dma_alloc_coherent should work in both cases - with and without CMA.

Now I assume that for ARMv6+ the CMA should be enabled unconditionally.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-16 10:17       ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-16 10:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Friday, August 12, 2011 2:53 PM Arnd Bergmann wrote:

> On Friday 12 August 2011, Marek Szyprowski wrote:
> >
> > From: Russell King <rmk+kernel@arm.linux.org.uk>
> >
> > Steal memory from the kernel to provide coherent DMA memory to drivers.
> > This avoids the problem with multiple mappings with differing attributes
> > on later CPUs.
> >
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > [m.szyprowski: rebased onto 3.1-rc1]
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Hi Marek,
> 
> Is this the same patch that Russell had to revert because it didn't
> work on some of the older machines, in particular those using
> dmabounce?

Yes.
 
> I thought that our discussion ended with the plan to use this only
> for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> and below (which don't have this problem but might need dmabounce).

Ok, my fault. I've forgot to mention that this patch was almost ready 
during Linaro meeting, but I didn't manage to post it that time. Of course 
it doesn't fulfill all the agreements from that discussion.

I was only unsure if we should care about the case where CMA is not enabled
for ARMv6+ or not. This patch was prepared in assumption that 
dma_alloc_coherent should work in both cases - with and without CMA.

Now I assume that for ARMv6+ the CMA should be enabled unconditionally.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
  2011-08-16  9:29       ` Marek Szyprowski
  (?)
@ 2011-08-16 13:14         ` Arnd Bergmann
  -1 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-16 13:14 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linux-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Michal Nazarewicz',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Shariq Hasnain',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

On Tuesday 16 August 2011, Marek Szyprowski wrote:
> On Friday, August 12, 2011 5:01 PM Arnd Bergmann wrote:

> > How about something like
> > 
> >       if (arch_is_coherent() || nommu())
> >               ret = alloc_simple_buffer();
> >       else if (arch_is_v4_v5())
> >               ret = alloc_remap();
> >       else if (gfp & GFP_ATOMIC)
> >               ret = alloc_from_pool();
> >       else
> >               ret = alloc_from_contiguous();
> > 
> > This also allows a natural conversion to dma_map_ops when we get there.
> 
> Ok. Is it ok to enable CMA permanently for ARMv6+? If CMA is left conditional
> the dma pool code will be much more complicated, because it will need to support
> both CMA and non-CMA cases.

I think that is ok, yes.

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-08-16 13:14         ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-16 13:14 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linux-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Jonathan Corbet', 'Mel Gorman',
	'Chunsang Jeong', 'Michal Nazarewicz',
	'Jesse Barker', 'Kyungmin Park',
	'Ankita Garg', 'Shariq Hasnain',
	'Andrew Morton', 'KAMEZAWA Hiroyuki'

On Tuesday 16 August 2011, Marek Szyprowski wrote:
> On Friday, August 12, 2011 5:01 PM Arnd Bergmann wrote:

> > How about something like
> > 
> >       if (arch_is_coherent() || nommu())
> >               ret = alloc_simple_buffer();
> >       else if (arch_is_v4_v5())
> >               ret = alloc_remap();
> >       else if (gfp & GFP_ATOMIC)
> >               ret = alloc_from_pool();
> >       else
> >               ret = alloc_from_contiguous();
> > 
> > This also allows a natural conversion to dma_map_ops when we get there.
> 
> Ok. Is it ok to enable CMA permanently for ARMv6+? If CMA is left conditional
> the dma pool code will be much more complicated, because it will need to support
> both CMA and non-CMA cases.

I think that is ok, yes.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem
@ 2011-08-16 13:14         ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-16 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 16 August 2011, Marek Szyprowski wrote:
> On Friday, August 12, 2011 5:01 PM Arnd Bergmann wrote:

> > How about something like
> > 
> >       if (arch_is_coherent() || nommu())
> >               ret = alloc_simple_buffer();
> >       else if (arch_is_v4_v5())
> >               ret = alloc_remap();
> >       else if (gfp & GFP_ATOMIC)
> >               ret = alloc_from_pool();
> >       else
> >               ret = alloc_from_contiguous();
> > 
> > This also allows a natural conversion to dma_map_ops when we get there.
> 
> Ok. Is it ok to enable CMA permanently for ARMv6+? If CMA is left conditional
> the dma pool code will be much more complicated, because it will need to support
> both CMA and non-CMA cases.

I think that is ok, yes.

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-14  7:52       ` Russell King - ARM Linux
  (?)
@ 2011-08-16 13:28         ` Arnd Bergmann
  -1 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-16 13:28 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong

On Sunday 14 August 2011, Russell King - ARM Linux wrote:
> On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote:
> > 
> > I thought that our discussion ended with the plan to use this only
> > for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> > and below (which don't have this problem but might need dmabounce).
> 
> I thought we'd decided to have a pool of available CMA memory on ARMv6K
> to satisfy atomic allocations, which can grow and shrink in size, rather
> than setting aside a fixed amount of contiguous system memory.

Hmm, I don't remember the point about dynamically sizing the pool for
ARMv6K, but that can well be an oversight on my part.  I do remember the
part about taking that memory pool from the CMA region as you say.

> ARMv6 and ARMv7+ could use CMA directly, and <= ARMv5 can use the existing
> allocation method.
> 
> Has something changed?

Nothing has changed regarding <=ARMv5. There was a small side discussion
about ARMv6 and ARMv7+ based on the idea that they can either use CMA
directly (doing TLB flushes for every allocation) or they could use the
same method as ARMv6K by setting aside a pool of pages for atomic
allocation. The first approach would consume less memory because it
requires no special pool, the second approach would be simpler because
it unifies the ARMv6K and ARMv6/ARMv7+ cases and also would be slightly
more efficient for atomic allocations because it avoids the expensive
TLB flush.

I didn't have a strong opinion either way, so IIRC Marek said he'd try
out both approaches and then send out the one that looked better, leaning
towards the second for simplicity of having fewer compile-time options.

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-16 13:28         ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-16 13:28 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong

On Sunday 14 August 2011, Russell King - ARM Linux wrote:
> On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote:
> > 
> > I thought that our discussion ended with the plan to use this only
> > for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> > and below (which don't have this problem but might need dmabounce).
> 
> I thought we'd decided to have a pool of available CMA memory on ARMv6K
> to satisfy atomic allocations, which can grow and shrink in size, rather
> than setting aside a fixed amount of contiguous system memory.

Hmm, I don't remember the point about dynamically sizing the pool for
ARMv6K, but that can well be an oversight on my part.  I do remember the
part about taking that memory pool from the CMA region as you say.

> ARMv6 and ARMv7+ could use CMA directly, and <= ARMv5 can use the existing
> allocation method.
> 
> Has something changed?

Nothing has changed regarding <=ARMv5. There was a small side discussion
about ARMv6 and ARMv7+ based on the idea that they can either use CMA
directly (doing TLB flushes for every allocation) or they could use the
same method as ARMv6K by setting aside a pool of pages for atomic
allocation. The first approach would consume less memory because it
requires no special pool, the second approach would be simpler because
it unifies the ARMv6K and ARMv6/ARMv7+ cases and also would be slightly
more efficient for atomic allocations because it avoids the expensive
TLB flush.

I didn't have a strong opinion either way, so IIRC Marek said he'd try
out both approaches and then send out the one that looked better, leaning
towards the second for simplicity of having fewer compile-time options.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-16 13:28         ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-16 13:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Sunday 14 August 2011, Russell King - ARM Linux wrote:
> On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote:
> > 
> > I thought that our discussion ended with the plan to use this only
> > for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> > and below (which don't have this problem but might need dmabounce).
> 
> I thought we'd decided to have a pool of available CMA memory on ARMv6K
> to satisfy atomic allocations, which can grow and shrink in size, rather
> than setting aside a fixed amount of contiguous system memory.

Hmm, I don't remember the point about dynamically sizing the pool for
ARMv6K, but that can well be an oversight on my part.  I do remember the
part about taking that memory pool from the CMA region as you say.

> ARMv6 and ARMv7+ could use CMA directly, and <= ARMv5 can use the existing
> allocation method.
> 
> Has something changed?

Nothing has changed regarding <=ARMv5. There was a small side discussion
about ARMv6 and ARMv7+ based on the idea that they can either use CMA
directly (doing TLB flushes for every allocation) or they could use the
same method as ARMv6K by setting aside a pool of pages for atomic
allocation. The first approach would consume less memory because it
requires no special pool, the second approach would be simpler because
it unifies the ARMv6K and ARMv6/ARMv7+ cases and also would be slightly
more efficient for atomic allocations because it avoids the expensive
TLB flush.

I didn't have a strong opinion either way, so IIRC Marek said he'd try
out both approaches and then send out the one that looked better, leaning
towards the second for simplicity of having fewer compile-time options.

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-16 13:28         ` Arnd Bergmann
  (?)
@ 2011-08-16 13:55           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2011-08-16 13:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong

On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> On Sunday 14 August 2011, Russell King - ARM Linux wrote:
> > On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote:
> > > 
> > > I thought that our discussion ended with the plan to use this only
> > > for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> > > and below (which don't have this problem but might need dmabounce).
> > 
> > I thought we'd decided to have a pool of available CMA memory on ARMv6K
> > to satisfy atomic allocations, which can grow and shrink in size, rather
> > than setting aside a fixed amount of contiguous system memory.
> 
> Hmm, I don't remember the point about dynamically sizing the pool for
> ARMv6K, but that can well be an oversight on my part.  I do remember the
> part about taking that memory pool from the CMA region as you say.

If you're setting aside a pool of pages, then you have to dynamically
size it.  I did mention during our discussion about this.

The problem is that a pool of fixed size is two fold: you need it to be
sufficiently large that it can satisfy all allocations which come along
in atomic context.  Yet, we don't want the pool to be too large because
then it prevents the memory being used for other purposes.

Basically, the total number of pages in the pool can be a fixed size,
but as they are depleted through allocation, they need to be
re-populated from CMA to re-build the reserve for future atomic
allocations.  If the pool becomes larger via frees, then obviously
we need to give pages back.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-16 13:55           ` Russell King - ARM Linux
  0 siblings, 0 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2011-08-16 13:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong

On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> On Sunday 14 August 2011, Russell King - ARM Linux wrote:
> > On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote:
> > > 
> > > I thought that our discussion ended with the plan to use this only
> > > for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> > > and below (which don't have this problem but might need dmabounce).
> > 
> > I thought we'd decided to have a pool of available CMA memory on ARMv6K
> > to satisfy atomic allocations, which can grow and shrink in size, rather
> > than setting aside a fixed amount of contiguous system memory.
> 
> Hmm, I don't remember the point about dynamically sizing the pool for
> ARMv6K, but that can well be an oversight on my part.  I do remember the
> part about taking that memory pool from the CMA region as you say.

If you're setting aside a pool of pages, then you have to dynamically
size it.  I did mention during our discussion about this.

The problem is that a pool of fixed size is two fold: you need it to be
sufficiently large that it can satisfy all allocations which come along
in atomic context.  Yet, we don't want the pool to be too large because
then it prevents the memory being used for other purposes.

Basically, the total number of pages in the pool can be a fixed size,
but as they are depleted through allocation, they need to be
re-populated from CMA to re-build the reserve for future atomic
allocations.  If the pool becomes larger via frees, then obviously
we need to give pages back.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-16 13:55           ` Russell King - ARM Linux
  0 siblings, 0 replies; 72+ messages in thread
From: Russell King - ARM Linux @ 2011-08-16 13:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> On Sunday 14 August 2011, Russell King - ARM Linux wrote:
> > On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote:
> > > 
> > > I thought that our discussion ended with the plan to use this only
> > > for ARMv6+ (which has a problem with double mapping) but not on ARMv5
> > > and below (which don't have this problem but might need dmabounce).
> > 
> > I thought we'd decided to have a pool of available CMA memory on ARMv6K
> > to satisfy atomic allocations, which can grow and shrink in size, rather
> > than setting aside a fixed amount of contiguous system memory.
> 
> Hmm, I don't remember the point about dynamically sizing the pool for
> ARMv6K, but that can well be an oversight on my part.  I do remember the
> part about taking that memory pool from the CMA region as you say.

If you're setting aside a pool of pages, then you have to dynamically
size it.  I did mention during our discussion about this.

The problem is that a pool of fixed size is two fold: you need it to be
sufficiently large that it can satisfy all allocations which come along
in atomic context.  Yet, we don't want the pool to be too large because
then it prevents the memory being used for other purposes.

Basically, the total number of pages in the pool can be a fixed size,
but as they are depleted through allocation, they need to be
re-populated from CMA to re-build the reserve for future atomic
allocations.  If the pool becomes larger via frees, then obviously
we need to give pages back.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-16 13:55           ` Russell King - ARM Linux
  (?)
@ 2011-08-16 14:26             ` Arnd Bergmann
  -1 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-16 14:26 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong

On Tuesday 16 August 2011, Russell King - ARM Linux wrote:
> On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> > Hmm, I don't remember the point about dynamically sizing the pool for
> > ARMv6K, but that can well be an oversight on my part.  I do remember the
> > part about taking that memory pool from the CMA region as you say.
> 
> If you're setting aside a pool of pages, then you have to dynamically
> size it.  I did mention during our discussion about this.
> 
> The problem is that a pool of fixed size is two fold: you need it to be
> sufficiently large that it can satisfy all allocations which come along
> in atomic context.  Yet, we don't want the pool to be too large because
> then it prevents the memory being used for other purposes.
> 
> Basically, the total number of pages in the pool can be a fixed size,
> but as they are depleted through allocation, they need to be
> re-populated from CMA to re-build the reserve for future atomic
> allocations.  If the pool becomes larger via frees, then obviously
> we need to give pages back.

Ok, thanks for the reminder. I must have completely missed this part
of the discussion.

When I briefly considered this problem, my own conclusion was that
the number of atomic DMA allocations would always be very low
because they tend to be short-lived (e.g. incoming network packets),
so we could ignore this problem and just use a smaller reservation
size. While this seems to be true in general (see "git grep -w -A3 
dma_alloc_coherent | grep ATOMIC"), there is one very significant
case that we cannot ignore, which is pci_alloc_consistent.

This function is still called by hundreds of PCI drivers and always
does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived
allocations and those that are too large to be ignored.

So at least for the case where we have PCI devices, I agree that
we need to have the dynamic pool.

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-16 14:26             ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-16 14:26 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Andrew Morton, KAMEZAWA Hiroyuki, Ankita Garg, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong

On Tuesday 16 August 2011, Russell King - ARM Linux wrote:
> On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> > Hmm, I don't remember the point about dynamically sizing the pool for
> > ARMv6K, but that can well be an oversight on my part.  I do remember the
> > part about taking that memory pool from the CMA region as you say.
> 
> If you're setting aside a pool of pages, then you have to dynamically
> size it.  I did mention during our discussion about this.
> 
> The problem is that a pool of fixed size is two fold: you need it to be
> sufficiently large that it can satisfy all allocations which come along
> in atomic context.  Yet, we don't want the pool to be too large because
> then it prevents the memory being used for other purposes.
> 
> Basically, the total number of pages in the pool can be a fixed size,
> but as they are depleted through allocation, they need to be
> re-populated from CMA to re-build the reserve for future atomic
> allocations.  If the pool becomes larger via frees, then obviously
> we need to give pages back.

Ok, thanks for the reminder. I must have completely missed this part
of the discussion.

When I briefly considered this problem, my own conclusion was that
the number of atomic DMA allocations would always be very low
because they tend to be short-lived (e.g. incoming network packets),
so we could ignore this problem and just use a smaller reservation
size. While this seems to be true in general (see "git grep -w -A3 
dma_alloc_coherent | grep ATOMIC"), there is one very significant
case that we cannot ignore, which is pci_alloc_consistent.

This function is still called by hundreds of PCI drivers and always
does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived
allocations and those that are too large to be ignored.

So at least for the case where we have PCI devices, I agree that
we need to have the dynamic pool.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-16 14:26             ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-16 14:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 16 August 2011, Russell King - ARM Linux wrote:
> On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> > Hmm, I don't remember the point about dynamically sizing the pool for
> > ARMv6K, but that can well be an oversight on my part.  I do remember the
> > part about taking that memory pool from the CMA region as you say.
> 
> If you're setting aside a pool of pages, then you have to dynamically
> size it.  I did mention during our discussion about this.
> 
> The problem is that a pool of fixed size is two fold: you need it to be
> sufficiently large that it can satisfy all allocations which come along
> in atomic context.  Yet, we don't want the pool to be too large because
> then it prevents the memory being used for other purposes.
> 
> Basically, the total number of pages in the pool can be a fixed size,
> but as they are depleted through allocation, they need to be
> re-populated from CMA to re-build the reserve for future atomic
> allocations.  If the pool becomes larger via frees, then obviously
> we need to give pages back.

Ok, thanks for the reminder. I must have completely missed this part
of the discussion.

When I briefly considered this problem, my own conclusion was that
the number of atomic DMA allocations would always be very low
because they tend to be short-lived (e.g. incoming network packets),
so we could ignore this problem and just use a smaller reservation
size. While this seems to be true in general (see "git grep -w -A3 
dma_alloc_coherent | grep ATOMIC"), there is one very significant
case that we cannot ignore, which is pci_alloc_consistent.

This function is still called by hundreds of PCI drivers and always
does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived
allocations and those that are too large to be ignored.

So at least for the case where we have PCI devices, I agree that
we need to have the dynamic pool.

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-16 14:26             ` Arnd Bergmann
  (?)
@ 2011-08-17  8:01               ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-17  8:01 UTC (permalink / raw)
  To: 'Arnd Bergmann', 'Russell King - ARM Linux'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Marek Szyprowski'

Hello,

On Tuesday, August 16, 2011 4:26 PM Arnd Bergmann wrote:
> On Tuesday 16 August 2011, Russell King - ARM Linux wrote:
> > On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> > > Hmm, I don't remember the point about dynamically sizing the pool for
> > > ARMv6K, but that can well be an oversight on my part.  I do remember the
> > > part about taking that memory pool from the CMA region as you say.
> >
> > If you're setting aside a pool of pages, then you have to dynamically
> > size it.  I did mention during our discussion about this.
> >
> > The problem is that a pool of fixed size is two fold: you need it to be
> > sufficiently large that it can satisfy all allocations which come along
> > in atomic context.  Yet, we don't want the pool to be too large because
> > then it prevents the memory being used for other purposes.
> >
> > Basically, the total number of pages in the pool can be a fixed size,
> > but as they are depleted through allocation, they need to be
> > re-populated from CMA to re-build the reserve for future atomic
> > allocations.  If the pool becomes larger via frees, then obviously
> > we need to give pages back.
> 
> Ok, thanks for the reminder. I must have completely missed this part
> of the discussion.
> 
> When I briefly considered this problem, my own conclusion was that
> the number of atomic DMA allocations would always be very low
> because they tend to be short-lived (e.g. incoming network packets),
> so we could ignore this problem and just use a smaller reservation
> size. While this seems to be true in general (see "git grep -w -A3
> dma_alloc_coherent | grep ATOMIC"), there is one very significant
> case that we cannot ignore, which is pci_alloc_consistent.
> 
> This function is still called by hundreds of PCI drivers and always
> does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived
> allocations and those that are too large to be ignored.
> 
> So at least for the case where we have PCI devices, I agree that
> we need to have the dynamic pool.

Do we really need the dynamic pool for the first version? I would like to
know how much memory can be allocated in GFP_ATOMIC context. What are the
typical sizes of such allocations?

Maybe for the first version a static pool with reasonably small size
(like 128KiB) will be more than enough? This size can be even board
depended or changed with kernel command line for systems that really
needs more memory.

I noticed one more problem. The size of the CMA managed area must be
the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
is 16MiB. These values comes from the internals of the kernel memory 
management design and page blocks are the only entities that can be managed
with page migration code.

I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to
create a CMA area.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-17  8:01               ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-17  8:01 UTC (permalink / raw)
  To: 'Arnd Bergmann', 'Russell King - ARM Linux'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Marek Szyprowski'

Hello,

On Tuesday, August 16, 2011 4:26 PM Arnd Bergmann wrote:
> On Tuesday 16 August 2011, Russell King - ARM Linux wrote:
> > On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> > > Hmm, I don't remember the point about dynamically sizing the pool for
> > > ARMv6K, but that can well be an oversight on my part.  I do remember the
> > > part about taking that memory pool from the CMA region as you say.
> >
> > If you're setting aside a pool of pages, then you have to dynamically
> > size it.  I did mention during our discussion about this.
> >
> > The problem is that a pool of fixed size is two fold: you need it to be
> > sufficiently large that it can satisfy all allocations which come along
> > in atomic context.  Yet, we don't want the pool to be too large because
> > then it prevents the memory being used for other purposes.
> >
> > Basically, the total number of pages in the pool can be a fixed size,
> > but as they are depleted through allocation, they need to be
> > re-populated from CMA to re-build the reserve for future atomic
> > allocations.  If the pool becomes larger via frees, then obviously
> > we need to give pages back.
> 
> Ok, thanks for the reminder. I must have completely missed this part
> of the discussion.
> 
> When I briefly considered this problem, my own conclusion was that
> the number of atomic DMA allocations would always be very low
> because they tend to be short-lived (e.g. incoming network packets),
> so we could ignore this problem and just use a smaller reservation
> size. While this seems to be true in general (see "git grep -w -A3
> dma_alloc_coherent | grep ATOMIC"), there is one very significant
> case that we cannot ignore, which is pci_alloc_consistent.
> 
> This function is still called by hundreds of PCI drivers and always
> does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived
> allocations and those that are too large to be ignored.
> 
> So at least for the case where we have PCI devices, I agree that
> we need to have the dynamic pool.

Do we really need the dynamic pool for the first version? I would like to
know how much memory can be allocated in GFP_ATOMIC context. What are the
typical sizes of such allocations?

Maybe for the first version a static pool with reasonably small size
(like 128KiB) will be more than enough? This size can be even board
depended or changed with kernel command line for systems that really
needs more memory.

I noticed one more problem. The size of the CMA managed area must be
the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
is 16MiB. These values comes from the internals of the kernel memory 
management design and page blocks are the only entities that can be managed
with page migration code.

I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to
create a CMA area.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-17  8:01               ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-17  8:01 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Tuesday, August 16, 2011 4:26 PM Arnd Bergmann wrote:
> On Tuesday 16 August 2011, Russell King - ARM Linux wrote:
> > On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> > > Hmm, I don't remember the point about dynamically sizing the pool for
> > > ARMv6K, but that can well be an oversight on my part.  I do remember the
> > > part about taking that memory pool from the CMA region as you say.
> >
> > If you're setting aside a pool of pages, then you have to dynamically
> > size it.  I did mention during our discussion about this.
> >
> > The problem is that a pool of fixed size is two fold: you need it to be
> > sufficiently large that it can satisfy all allocations which come along
> > in atomic context.  Yet, we don't want the pool to be too large because
> > then it prevents the memory being used for other purposes.
> >
> > Basically, the total number of pages in the pool can be a fixed size,
> > but as they are depleted through allocation, they need to be
> > re-populated from CMA to re-build the reserve for future atomic
> > allocations.  If the pool becomes larger via frees, then obviously
> > we need to give pages back.
> 
> Ok, thanks for the reminder. I must have completely missed this part
> of the discussion.
> 
> When I briefly considered this problem, my own conclusion was that
> the number of atomic DMA allocations would always be very low
> because they tend to be short-lived (e.g. incoming network packets),
> so we could ignore this problem and just use a smaller reservation
> size. While this seems to be true in general (see "git grep -w -A3
> dma_alloc_coherent | grep ATOMIC"), there is one very significant
> case that we cannot ignore, which is pci_alloc_consistent.
> 
> This function is still called by hundreds of PCI drivers and always
> does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived
> allocations and those that are too large to be ignored.
> 
> So at least for the case where we have PCI devices, I agree that
> we need to have the dynamic pool.

Do we really need the dynamic pool for the first version? I would like to
know how much memory can be allocated in GFP_ATOMIC context. What are the
typical sizes of such allocations?

Maybe for the first version a static pool with reasonably small size
(like 128KiB) will be more than enough? This size can be even board
depended or changed with kernel command line for systems that really
needs more memory.

I noticed one more problem. The size of the CMA managed area must be
the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
is 16MiB. These values comes from the internals of the kernel memory 
management design and page blocks are the only entities that can be managed
with page migration code.

I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to
create a CMA area.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-17  8:01               ` Marek Szyprowski
  (?)
@ 2011-08-17 12:28                 ` Arnd Bergmann
  -1 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-17 12:28 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Russell King - ARM Linux',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong'

On Wednesday 17 August 2011, Marek Szyprowski wrote:
> Do we really need the dynamic pool for the first version? I would like to
> know how much memory can be allocated in GFP_ATOMIC context. What are the
> typical sizes of such allocations?

I think this highly depends on the board and on the use case. We know
that 2 MB is usually enough, because that is the current CONSISTENT_DMA_SIZE
on most platforms. Most likely something a lot smaller will be ok
in practice. CONSISTENT_DMA_SIZE is currently used for both atomic
and non-atomic allocations.

> Maybe for the first version a static pool with reasonably small size
> (like 128KiB) will be more than enough? This size can be even board
> depended or changed with kernel command line for systems that really
> needs more memory.

For a first version that sounds good enough. Maybe we could use a fraction
of the CONSISTENT_DMA_SIZE as an estimate?

For the long-term solution, I see two options:

1. make the preallocated pool rather small so we normally don't need it.
2. make it large enough so we can also fulfill most nonatomic allocations
   from that pool to avoid the TLB flushes and going through the CMA
   code. Only use the real CMA region when the pool allocation fails.

In either case, there should be some method for balancing the pool
size.

> I noticed one more problem. The size of the CMA managed area must be
> the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
> is 16MiB. These values comes from the internals of the kernel memory 
> management design and page blocks are the only entities that can be managed
> with page migration code.
> 
> I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to
> create a CMA area.

My guess is that you can assume to have 64 MB or more on ARMv6 running Linux,
but other people may have more accurate data.

Also, there is the option of setting a lower value for FORCE_MAX_ZONEORDER
for some platforms if it becomes a problem.

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-17 12:28                 ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-17 12:28 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Russell King - ARM Linux',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong'

On Wednesday 17 August 2011, Marek Szyprowski wrote:
> Do we really need the dynamic pool for the first version? I would like to
> know how much memory can be allocated in GFP_ATOMIC context. What are the
> typical sizes of such allocations?

I think this highly depends on the board and on the use case. We know
that 2 MB is usually enough, because that is the current CONSISTENT_DMA_SIZE
on most platforms. Most likely something a lot smaller will be ok
in practice. CONSISTENT_DMA_SIZE is currently used for both atomic
and non-atomic allocations.

> Maybe for the first version a static pool with reasonably small size
> (like 128KiB) will be more than enough? This size can be even board
> depended or changed with kernel command line for systems that really
> needs more memory.

For a first version that sounds good enough. Maybe we could use a fraction
of the CONSISTENT_DMA_SIZE as an estimate?

For the long-term solution, I see two options:

1. make the preallocated pool rather small so we normally don't need it.
2. make it large enough so we can also fulfill most nonatomic allocations
   from that pool to avoid the TLB flushes and going through the CMA
   code. Only use the real CMA region when the pool allocation fails.

In either case, there should be some method for balancing the pool
size.

> I noticed one more problem. The size of the CMA managed area must be
> the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
> is 16MiB. These values comes from the internals of the kernel memory 
> management design and page blocks are the only entities that can be managed
> with page migration code.
> 
> I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to
> create a CMA area.

My guess is that you can assume to have 64 MB or more on ARMv6 running Linux,
but other people may have more accurate data.

Also, there is the option of setting a lower value for FORCE_MAX_ZONEORDER
for some platforms if it becomes a problem.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-17 12:28                 ` Arnd Bergmann
  0 siblings, 0 replies; 72+ messages in thread
From: Arnd Bergmann @ 2011-08-17 12:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 17 August 2011, Marek Szyprowski wrote:
> Do we really need the dynamic pool for the first version? I would like to
> know how much memory can be allocated in GFP_ATOMIC context. What are the
> typical sizes of such allocations?

I think this highly depends on the board and on the use case. We know
that 2 MB is usually enough, because that is the current CONSISTENT_DMA_SIZE
on most platforms. Most likely something a lot smaller will be ok
in practice. CONSISTENT_DMA_SIZE is currently used for both atomic
and non-atomic allocations.

> Maybe for the first version a static pool with reasonably small size
> (like 128KiB) will be more than enough? This size can be even board
> depended or changed with kernel command line for systems that really
> needs more memory.

For a first version that sounds good enough. Maybe we could use a fraction
of the CONSISTENT_DMA_SIZE as an estimate?

For the long-term solution, I see two options:

1. make the preallocated pool rather small so we normally don't need it.
2. make it large enough so we can also fulfill most nonatomic allocations
   from that pool to avoid the TLB flushes and going through the CMA
   code. Only use the real CMA region when the pool allocation fails.

In either case, there should be some method for balancing the pool
size.

> I noticed one more problem. The size of the CMA managed area must be
> the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
> is 16MiB. These values comes from the internals of the kernel memory 
> management design and page blocks are the only entities that can be managed
> with page migration code.
> 
> I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to
> create a CMA area.

My guess is that you can assume to have 64 MB or more on ARMv6 running Linux,
but other people may have more accurate data.

Also, there is the option of setting a lower value for FORCE_MAX_ZONEORDER
for some platforms if it becomes a problem.

	Arnd

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-16 14:26             ` Arnd Bergmann
  (?)
@ 2011-08-17 12:49               ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-17 12:49 UTC (permalink / raw)
  To: Marek Szyprowski, 'Arnd Bergmann',
	'Russell King - ARM Linux'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong'

Hello,

On Wednesday, August 17, 2011 10:01 AM Marek Szyprowski wrote:
> On Tuesday, August 16, 2011 4:26 PM Arnd Bergmann wrote:
> > On Tuesday 16 August 2011, Russell King - ARM Linux wrote:
> > > On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> > > > Hmm, I don't remember the point about dynamically sizing the pool for
> > > > ARMv6K, but that can well be an oversight on my part.  I do remember the
> > > > part about taking that memory pool from the CMA region as you say.
> > >
> > > If you're setting aside a pool of pages, then you have to dynamically
> > > size it.  I did mention during our discussion about this.
> > >
> > > The problem is that a pool of fixed size is two fold: you need it to be
> > > sufficiently large that it can satisfy all allocations which come along
> > > in atomic context.  Yet, we don't want the pool to be too large because
> > > then it prevents the memory being used for other purposes.
> > >
> > > Basically, the total number of pages in the pool can be a fixed size,
> > > but as they are depleted through allocation, they need to be
> > > re-populated from CMA to re-build the reserve for future atomic
> > > allocations.  If the pool becomes larger via frees, then obviously
> > > we need to give pages back.
> >
> > Ok, thanks for the reminder. I must have completely missed this part
> > of the discussion.
> >
> > When I briefly considered this problem, my own conclusion was that
> > the number of atomic DMA allocations would always be very low
> > because they tend to be short-lived (e.g. incoming network packets),
> > so we could ignore this problem and just use a smaller reservation
> > size. While this seems to be true in general (see "git grep -w -A3
> > dma_alloc_coherent | grep ATOMIC"), there is one very significant
> > case that we cannot ignore, which is pci_alloc_consistent.
> >
> > This function is still called by hundreds of PCI drivers and always
> > does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived
> > allocations and those that are too large to be ignored.
> >
> > So at least for the case where we have PCI devices, I agree that
> > we need to have the dynamic pool.
> 
> Do we really need the dynamic pool for the first version? I would like to
> know how much memory can be allocated in GFP_ATOMIC context. What are the
> typical sizes of such allocations?
> 
> Maybe for the first version a static pool with reasonably small size
> (like 128KiB) will be more than enough? This size can be even board
> depended or changed with kernel command line for systems that really
> needs more memory.
> 
> I noticed one more problem. The size of the CMA managed area must be
> the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
> is 16MiB. These values comes from the internals of the kernel memory
> management design and page blocks are the only entities that can be managed
> with page migration code.

I'm really sorry for the confusion. This 16MiB value worried me too much and
I've checked the code once again and found that this MAX_ORDER+1 value was
a miscalculation, which appeared in v11 of the  patches. The true minimal
CMA area size is 8MiB for ARM architecture. I believe this shouldn't be
an issue for the current ARMv6+ based machines.

I've checked it with "mem=16M cma=8M" kernel arguments. System booted fine
and CMA area has been successfully created.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-17 12:49               ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-17 12:49 UTC (permalink / raw)
  To: Marek Szyprowski, 'Arnd Bergmann',
	'Russell King - ARM Linux'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong'

Hello,

On Wednesday, August 17, 2011 10:01 AM Marek Szyprowski wrote:
> On Tuesday, August 16, 2011 4:26 PM Arnd Bergmann wrote:
> > On Tuesday 16 August 2011, Russell King - ARM Linux wrote:
> > > On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> > > > Hmm, I don't remember the point about dynamically sizing the pool for
> > > > ARMv6K, but that can well be an oversight on my part.  I do remember the
> > > > part about taking that memory pool from the CMA region as you say.
> > >
> > > If you're setting aside a pool of pages, then you have to dynamically
> > > size it.  I did mention during our discussion about this.
> > >
> > > The problem is that a pool of fixed size is two fold: you need it to be
> > > sufficiently large that it can satisfy all allocations which come along
> > > in atomic context.  Yet, we don't want the pool to be too large because
> > > then it prevents the memory being used for other purposes.
> > >
> > > Basically, the total number of pages in the pool can be a fixed size,
> > > but as they are depleted through allocation, they need to be
> > > re-populated from CMA to re-build the reserve for future atomic
> > > allocations.  If the pool becomes larger via frees, then obviously
> > > we need to give pages back.
> >
> > Ok, thanks for the reminder. I must have completely missed this part
> > of the discussion.
> >
> > When I briefly considered this problem, my own conclusion was that
> > the number of atomic DMA allocations would always be very low
> > because they tend to be short-lived (e.g. incoming network packets),
> > so we could ignore this problem and just use a smaller reservation
> > size. While this seems to be true in general (see "git grep -w -A3
> > dma_alloc_coherent | grep ATOMIC"), there is one very significant
> > case that we cannot ignore, which is pci_alloc_consistent.
> >
> > This function is still called by hundreds of PCI drivers and always
> > does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived
> > allocations and those that are too large to be ignored.
> >
> > So at least for the case where we have PCI devices, I agree that
> > we need to have the dynamic pool.
> 
> Do we really need the dynamic pool for the first version? I would like to
> know how much memory can be allocated in GFP_ATOMIC context. What are the
> typical sizes of such allocations?
> 
> Maybe for the first version a static pool with reasonably small size
> (like 128KiB) will be more than enough? This size can be even board
> depended or changed with kernel command line for systems that really
> needs more memory.
> 
> I noticed one more problem. The size of the CMA managed area must be
> the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
> is 16MiB. These values comes from the internals of the kernel memory
> management design and page blocks are the only entities that can be managed
> with page migration code.

I'm really sorry for the confusion. This 16MiB value worried me too much and
I've checked the code once again and found that this MAX_ORDER+1 value was
a miscalculation, which appeared in v11 of the  patches. The true minimal
CMA area size is 8MiB for ARM architecture. I believe this shouldn't be
an issue for the current ARMv6+ based machines.

I've checked it with "mem=16M cma=8M" kernel arguments. System booted fine
and CMA area has been successfully created.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-17 12:49               ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-17 12:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Wednesday, August 17, 2011 10:01 AM Marek Szyprowski wrote:
> On Tuesday, August 16, 2011 4:26 PM Arnd Bergmann wrote:
> > On Tuesday 16 August 2011, Russell King - ARM Linux wrote:
> > > On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote:
> > > > Hmm, I don't remember the point about dynamically sizing the pool for
> > > > ARMv6K, but that can well be an oversight on my part.  I do remember the
> > > > part about taking that memory pool from the CMA region as you say.
> > >
> > > If you're setting aside a pool of pages, then you have to dynamically
> > > size it.  I did mention during our discussion about this.
> > >
> > > The problem is that a pool of fixed size is two fold: you need it to be
> > > sufficiently large that it can satisfy all allocations which come along
> > > in atomic context.  Yet, we don't want the pool to be too large because
> > > then it prevents the memory being used for other purposes.
> > >
> > > Basically, the total number of pages in the pool can be a fixed size,
> > > but as they are depleted through allocation, they need to be
> > > re-populated from CMA to re-build the reserve for future atomic
> > > allocations.  If the pool becomes larger via frees, then obviously
> > > we need to give pages back.
> >
> > Ok, thanks for the reminder. I must have completely missed this part
> > of the discussion.
> >
> > When I briefly considered this problem, my own conclusion was that
> > the number of atomic DMA allocations would always be very low
> > because they tend to be short-lived (e.g. incoming network packets),
> > so we could ignore this problem and just use a smaller reservation
> > size. While this seems to be true in general (see "git grep -w -A3
> > dma_alloc_coherent | grep ATOMIC"), there is one very significant
> > case that we cannot ignore, which is pci_alloc_consistent.
> >
> > This function is still called by hundreds of PCI drivers and always
> > does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived
> > allocations and those that are too large to be ignored.
> >
> > So at least for the case where we have PCI devices, I agree that
> > we need to have the dynamic pool.
> 
> Do we really need the dynamic pool for the first version? I would like to
> know how much memory can be allocated in GFP_ATOMIC context. What are the
> typical sizes of such allocations?
> 
> Maybe for the first version a static pool with reasonably small size
> (like 128KiB) will be more than enough? This size can be even board
> depended or changed with kernel command line for systems that really
> needs more memory.
> 
> I noticed one more problem. The size of the CMA managed area must be
> the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
> is 16MiB. These values comes from the internals of the kernel memory
> management design and page blocks are the only entities that can be managed
> with page migration code.

I'm really sorry for the confusion. This 16MiB value worried me too much and
I've checked the code once again and found that this MAX_ORDER+1 value was
a miscalculation, which appeared in v11 of the  patches. The true minimal
CMA area size is 8MiB for ARM architecture. I believe this shouldn't be
an issue for the current ARMv6+ based machines.

I've checked it with "mem=16M cma=8M" kernel arguments. System booted fine
and CMA area has been successfully created.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-17 12:28                 ` Arnd Bergmann
  (?)
@ 2011-08-17 13:06                   ` Marek Szyprowski
  -1 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-17 13:06 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong'

Hello,

On Wednesday, August 17, 2011 2:29 PM Arnd Bergmann wrote:

> On Wednesday 17 August 2011, Marek Szyprowski wrote:
> > Do we really need the dynamic pool for the first version? I would like to
> > know how much memory can be allocated in GFP_ATOMIC context. What are the
> > typical sizes of such allocations?
> 
> I think this highly depends on the board and on the use case. We know
> that 2 MB is usually enough, because that is the current CONSISTENT_DMA_SIZE
> on most platforms. Most likely something a lot smaller will be ok
> in practice. CONSISTENT_DMA_SIZE is currently used for both atomic
> and non-atomic allocations.

Ok. The platforms that increased CONSISTENT_DMA_SIZE usually did that to enable
support for framebuffer or other multimedia devices, which won't be allocated 
in ATOMIC context anyway.

> > Maybe for the first version a static pool with reasonably small size
> > (like 128KiB) will be more than enough? This size can be even board
> > depended or changed with kernel command line for systems that really
> > needs more memory.
> 
> For a first version that sounds good enough. Maybe we could use a fraction
> of the CONSISTENT_DMA_SIZE as an estimate?

Ok, good. For the initial values I will probably use 1/8 of 
CONSISTENT_DMA_SIZE for coherent allocations. Writecombine atomic allocations
are extremely rare and rather ARM specific. 1/32 of CONSISTENT_DMA_SIZE should
be more than enough for them.

> For the long-term solution, I see two options:
> 
> 1. make the preallocated pool rather small so we normally don't need it.
> 2. make it large enough so we can also fulfill most nonatomic allocations
>    from that pool to avoid the TLB flushes and going through the CMA
>    code. Only use the real CMA region when the pool allocation fails.
> 
> In either case, there should be some method for balancing the pool
> size.

Right. The most obvious method is to use additional kernel thread which will
periodically call the balance function. In the implementation both usage 
scenarios are very similar, so this can even be a kernel parameter or Kconfig
option, but lets leave this for the future vesions.
 
> > I noticed one more problem. The size of the CMA managed area must be
> > the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
> > is 16MiB. These values comes from the internals of the kernel memory
> > management design and page blocks are the only entities that can be managed
> > with page migration code.
> >
> > I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to
> > create a CMA area.
> 
> My guess is that you can assume to have 64 MB or more on ARMv6 running Linux,
> but other people may have more accurate data.
> 
> Also, there is the option of setting a lower value for FORCE_MAX_ZONEORDER
> for some platforms if it becomes a problem.

Ok. I figured out an error in the above calculation, so 8MiB is the smallest
CMA area size. Assuming that there are at least 32MiB of memory available this
is not an issue anymore.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-17 13:06                   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-17 13:06 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Ankita Garg',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong'

Hello,

On Wednesday, August 17, 2011 2:29 PM Arnd Bergmann wrote:

> On Wednesday 17 August 2011, Marek Szyprowski wrote:
> > Do we really need the dynamic pool for the first version? I would like to
> > know how much memory can be allocated in GFP_ATOMIC context. What are the
> > typical sizes of such allocations?
> 
> I think this highly depends on the board and on the use case. We know
> that 2 MB is usually enough, because that is the current CONSISTENT_DMA_SIZE
> on most platforms. Most likely something a lot smaller will be ok
> in practice. CONSISTENT_DMA_SIZE is currently used for both atomic
> and non-atomic allocations.

Ok. The platforms that increased CONSISTENT_DMA_SIZE usually did that to enable
support for framebuffer or other multimedia devices, which won't be allocated 
in ATOMIC context anyway.

> > Maybe for the first version a static pool with reasonably small size
> > (like 128KiB) will be more than enough? This size can be even board
> > depended or changed with kernel command line for systems that really
> > needs more memory.
> 
> For a first version that sounds good enough. Maybe we could use a fraction
> of the CONSISTENT_DMA_SIZE as an estimate?

Ok, good. For the initial values I will probably use 1/8 of 
CONSISTENT_DMA_SIZE for coherent allocations. Writecombine atomic allocations
are extremely rare and rather ARM specific. 1/32 of CONSISTENT_DMA_SIZE should
be more than enough for them.

> For the long-term solution, I see two options:
> 
> 1. make the preallocated pool rather small so we normally don't need it.
> 2. make it large enough so we can also fulfill most nonatomic allocations
>    from that pool to avoid the TLB flushes and going through the CMA
>    code. Only use the real CMA region when the pool allocation fails.
> 
> In either case, there should be some method for balancing the pool
> size.

Right. The most obvious method is to use additional kernel thread which will
periodically call the balance function. In the implementation both usage 
scenarios are very similar, so this can even be a kernel parameter or Kconfig
option, but lets leave this for the future vesions.
 
> > I noticed one more problem. The size of the CMA managed area must be
> > the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
> > is 16MiB. These values comes from the internals of the kernel memory
> > management design and page blocks are the only entities that can be managed
> > with page migration code.
> >
> > I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to
> > create a CMA area.
> 
> My guess is that you can assume to have 64 MB or more on ARMv6 running Linux,
> but other people may have more accurate data.
> 
> Also, there is the option of setting a lower value for FORCE_MAX_ZONEORDER
> for some platforms if it becomes a problem.

Ok. I figured out an error in the above calculation, so 8MiB is the smallest
CMA area size. Assuming that there are at least 32MiB of memory available this
is not an issue anymore.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-17 13:06                   ` Marek Szyprowski
  0 siblings, 0 replies; 72+ messages in thread
From: Marek Szyprowski @ 2011-08-17 13:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Wednesday, August 17, 2011 2:29 PM Arnd Bergmann wrote:

> On Wednesday 17 August 2011, Marek Szyprowski wrote:
> > Do we really need the dynamic pool for the first version? I would like to
> > know how much memory can be allocated in GFP_ATOMIC context. What are the
> > typical sizes of such allocations?
> 
> I think this highly depends on the board and on the use case. We know
> that 2 MB is usually enough, because that is the current CONSISTENT_DMA_SIZE
> on most platforms. Most likely something a lot smaller will be ok
> in practice. CONSISTENT_DMA_SIZE is currently used for both atomic
> and non-atomic allocations.

Ok. The platforms that increased CONSISTENT_DMA_SIZE usually did that to enable
support for framebuffer or other multimedia devices, which won't be allocated 
in ATOMIC context anyway.

> > Maybe for the first version a static pool with reasonably small size
> > (like 128KiB) will be more than enough? This size can be even board
> > depended or changed with kernel command line for systems that really
> > needs more memory.
> 
> For a first version that sounds good enough. Maybe we could use a fraction
> of the CONSISTENT_DMA_SIZE as an estimate?

Ok, good. For the initial values I will probably use 1/8 of 
CONSISTENT_DMA_SIZE for coherent allocations. Writecombine atomic allocations
are extremely rare and rather ARM specific. 1/32 of CONSISTENT_DMA_SIZE should
be more than enough for them.

> For the long-term solution, I see two options:
> 
> 1. make the preallocated pool rather small so we normally don't need it.
> 2. make it large enough so we can also fulfill most nonatomic allocations
>    from that pool to avoid the TLB flushes and going through the CMA
>    code. Only use the real CMA region when the pool allocation fails.
> 
> In either case, there should be some method for balancing the pool
> size.

Right. The most obvious method is to use additional kernel thread which will
periodically call the balance function. In the implementation both usage 
scenarios are very similar, so this can even be a kernel parameter or Kconfig
option, but lets leave this for the future vesions.
 
> > I noticed one more problem. The size of the CMA managed area must be
> > the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area
> > is 16MiB. These values comes from the internals of the kernel memory
> > management design and page blocks are the only entities that can be managed
> > with page migration code.
> >
> > I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to
> > create a CMA area.
> 
> My guess is that you can assume to have 64 MB or more on ARMv6 running Linux,
> but other people may have more accurate data.
> 
> Also, there is the option of setting a lower value for FORCE_MAX_ZONEORDER
> for some platforms if it becomes a problem.

Ok. I figured out an error in the above calculation, so 8MiB is the smallest
CMA area size. Assuming that there are at least 32MiB of memory available this
is not an issue anymore.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
  2011-08-17 13:06                   ` Marek Szyprowski
  (?)
@ 2011-08-18  8:27                     ` Tixy
  -1 siblings, 0 replies; 72+ messages in thread
From: Tixy @ 2011-08-18  8:27 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Arnd Bergmann', 'Ankita Garg',
	'Daniel Walker', 'Russell King - ARM Linux',
	'Jesse Barker', 'Mel Gorman',
	'Chunsang Jeong', 'Jonathan Corbet',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'KAMEZAWA Hiroyuki', 'Shariq Hasnain',
	'Andrew Morton',
	linux-arm-kernel, linux-media

On Wed, 2011-08-17 at 15:06 +0200, Marek Szyprowski wrote:
[...]
> > > Maybe for the first version a static pool with reasonably small size
> > > (like 128KiB) will be more than enough? This size can be even board
> > > depended or changed with kernel command line for systems that really
> > > needs more memory.
> > 
> > For a first version that sounds good enough. Maybe we could use a fraction
> > of the CONSISTENT_DMA_SIZE as an estimate?
> 
> Ok, good. For the initial values I will probably use 1/8 of 
> CONSISTENT_DMA_SIZE for coherent allocations. Writecombine atomic allocations
> are extremely rare and rather ARM specific. 1/32 of CONSISTENT_DMA_SIZE should
> be more than enough for them.

For people who aren't aware, we have a patch to remove the define
CONSISTENT_DMA_SIZE and replace it with a runtime call to an
initialisation function [1]. I don't believe this fundamentally changes
anything being discussed though.

-- 
Tixy

[1] http://www.spinics.net/lists/arm-kernel/msg135589.html


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-18  8:27                     ` Tixy
  0 siblings, 0 replies; 72+ messages in thread
From: Tixy @ 2011-08-18  8:27 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Arnd Bergmann', 'Ankita Garg',
	'Daniel Walker', 'Russell King - ARM Linux',
	'Jesse Barker', 'Mel Gorman',
	'Chunsang Jeong', 'Jonathan Corbet',
	linux-kernel, 'Michal Nazarewicz',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'KAMEZAWA Hiroyuki', 'Shariq Hasnain',
	'Andrew Morton',
	linux-arm-kernel, linux-media

On Wed, 2011-08-17 at 15:06 +0200, Marek Szyprowski wrote:
[...]
> > > Maybe for the first version a static pool with reasonably small size
> > > (like 128KiB) will be more than enough? This size can be even board
> > > depended or changed with kernel command line for systems that really
> > > needs more memory.
> > 
> > For a first version that sounds good enough. Maybe we could use a fraction
> > of the CONSISTENT_DMA_SIZE as an estimate?
> 
> Ok, good. For the initial values I will probably use 1/8 of 
> CONSISTENT_DMA_SIZE for coherent allocations. Writecombine atomic allocations
> are extremely rare and rather ARM specific. 1/32 of CONSISTENT_DMA_SIZE should
> be more than enough for them.

For people who aren't aware, we have a patch to remove the define
CONSISTENT_DMA_SIZE and replace it with a runtime call to an
initialisation function [1]. I don't believe this fundamentally changes
anything being discussed though.

-- 
Tixy

[1] http://www.spinics.net/lists/arm-kernel/msg135589.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
@ 2011-08-18  8:27                     ` Tixy
  0 siblings, 0 replies; 72+ messages in thread
From: Tixy @ 2011-08-18  8:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-08-17 at 15:06 +0200, Marek Szyprowski wrote:
[...]
> > > Maybe for the first version a static pool with reasonably small size
> > > (like 128KiB) will be more than enough? This size can be even board
> > > depended or changed with kernel command line for systems that really
> > > needs more memory.
> > 
> > For a first version that sounds good enough. Maybe we could use a fraction
> > of the CONSISTENT_DMA_SIZE as an estimate?
> 
> Ok, good. For the initial values I will probably use 1/8 of 
> CONSISTENT_DMA_SIZE for coherent allocations. Writecombine atomic allocations
> are extremely rare and rather ARM specific. 1/32 of CONSISTENT_DMA_SIZE should
> be more than enough for them.

For people who aren't aware, we have a patch to remove the define
CONSISTENT_DMA_SIZE and replace it with a runtime call to an
initialisation function [1]. I don't believe this fundamentally changes
anything being discussed though.

-- 
Tixy

[1] http://www.spinics.net/lists/arm-kernel/msg135589.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2011-08-18  8:50 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-12 10:58 [PATCHv14 0/9] Contiguous Memory Allocator Marek Szyprowski
2011-08-12 10:58 ` Marek Szyprowski
2011-08-12 10:58 ` Marek Szyprowski
2011-08-12 10:58 ` [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58 ` [PATCH 2/9] mm: alloc_contig_freed_pages() added Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58 ` [PATCH 3/9] mm: alloc_contig_range() added Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58 ` [PATCH 4/9] mm: MIGRATE_CMA migration type added Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58 ` [PATCH 5/9] mm: MIGRATE_CMA isolation functions added Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58 ` [PATCH 6/9] drivers: add Contiguous Memory Allocator Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58 ` [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 12:53   ` Arnd Bergmann
2011-08-12 12:53     ` Arnd Bergmann
2011-08-12 12:53     ` Arnd Bergmann
2011-08-14  7:52     ` Russell King - ARM Linux
2011-08-14  7:52       ` Russell King - ARM Linux
2011-08-14  7:52       ` Russell King - ARM Linux
2011-08-16 13:28       ` Arnd Bergmann
2011-08-16 13:28         ` Arnd Bergmann
2011-08-16 13:28         ` Arnd Bergmann
2011-08-16 13:55         ` Russell King - ARM Linux
2011-08-16 13:55           ` Russell King - ARM Linux
2011-08-16 13:55           ` Russell King - ARM Linux
2011-08-16 14:26           ` Arnd Bergmann
2011-08-16 14:26             ` Arnd Bergmann
2011-08-16 14:26             ` Arnd Bergmann
2011-08-17  8:01             ` Marek Szyprowski
2011-08-17  8:01               ` Marek Szyprowski
2011-08-17  8:01               ` Marek Szyprowski
2011-08-17 12:28               ` Arnd Bergmann
2011-08-17 12:28                 ` Arnd Bergmann
2011-08-17 12:28                 ` Arnd Bergmann
2011-08-17 13:06                 ` Marek Szyprowski
2011-08-17 13:06                   ` Marek Szyprowski
2011-08-17 13:06                   ` Marek Szyprowski
2011-08-18  8:27                   ` Tixy
2011-08-18  8:27                     ` Tixy
2011-08-18  8:27                     ` Tixy
2011-08-17 12:49             ` Marek Szyprowski
2011-08-17 12:49               ` Marek Szyprowski
2011-08-17 12:49               ` Marek Szyprowski
2011-08-16 10:17     ` Marek Szyprowski
2011-08-16 10:17       ` Marek Szyprowski
2011-08-16 10:17       ` Marek Szyprowski
2011-08-12 10:58 ` [PATCH 8/9] ARM: integrate CMA with DMA-mapping subsystem Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 15:00   ` Arnd Bergmann
2011-08-12 15:00     ` Arnd Bergmann
2011-08-12 15:00     ` Arnd Bergmann
2011-08-16  9:29     ` Marek Szyprowski
2011-08-16  9:29       ` Marek Szyprowski
2011-08-16  9:29       ` Marek Szyprowski
2011-08-16 13:14       ` Arnd Bergmann
2011-08-16 13:14         ` Arnd Bergmann
2011-08-16 13:14         ` Arnd Bergmann
2011-08-12 10:58 ` [PATCH 9/9] ARM: S5PV210: example of CMA private area for FIMC device on Goni board Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski
2011-08-12 10:58   ` Marek Szyprowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.