All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] big chunk memory allocator v4
@ 2010-11-19  8:10 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-19  8:10 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras, akpm,
	kosaki.motohiro

Hi, this is an updated version. 

No major changes from the last one except for page allocation function.
removed RFC.

Order of patches is

[1/4] move some functions from memory_hotplug.c to page_isolation.c
[2/4] search physically contiguous range suitable for big chunk alloc.
[3/4] allocate big chunk memory based on memory hotplug(migration) technique
[4/4] modify page allocation function.

For what:

  I hear there is requirements to allocate a chunk of page which is larger than
  MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
  they hide some memory range by boot option (mem=) and use hidden memory
  for its own purpose. But this seems a lack of feature in memory management.

  This patch adds 
	alloc_contig_pages(start, end, nr_pages, gfp_mask)
  to allocate a chunk of page whose length is nr_pages from [start, end)
  phys address. This uses similar logic of memory-unplug, which tries to
  offline [start, end) pages. By this, drivers can allocate 30M or 128M or
  much bigger memory chunk on demand. (I allocated 1G chunk in my test).

  But yes, because of fragmentation, this cannot guarantee 100% alloc.
  If alloc_contig_pages() is called in system boot up or movable_zone is used,
  this allocation succeeds at high rate.

  I tested this on x86-64, and it seems to work as expected. But feedback from
  embeded guys are appreciated because I think they are main user of this
  function.

Thanks,
-Kame


  



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 0/4] big chunk memory allocator v4
@ 2010-11-19  8:10 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-19  8:10 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras, akpm,
	kosaki.motohiro

Hi, this is an updated version. 

No major changes from the last one except for page allocation function.
removed RFC.

Order of patches is

[1/4] move some functions from memory_hotplug.c to page_isolation.c
[2/4] search physically contiguous range suitable for big chunk alloc.
[3/4] allocate big chunk memory based on memory hotplug(migration) technique
[4/4] modify page allocation function.

For what:

  I hear there is requirements to allocate a chunk of page which is larger than
  MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
  they hide some memory range by boot option (mem=) and use hidden memory
  for its own purpose. But this seems a lack of feature in memory management.

  This patch adds 
	alloc_contig_pages(start, end, nr_pages, gfp_mask)
  to allocate a chunk of page whose length is nr_pages from [start, end)
  phys address. This uses similar logic of memory-unplug, which tries to
  offline [start, end) pages. By this, drivers can allocate 30M or 128M or
  much bigger memory chunk on demand. (I allocated 1G chunk in my test).

  But yes, because of fragmentation, this cannot guarantee 100% alloc.
  If alloc_contig_pages() is called in system boot up or movable_zone is used,
  this allocation succeeds at high rate.

  I tested this on x86-64, and it seems to work as expected. But feedback from
  embeded guys are appreciated because I think they are main user of this
  function.

Thanks,
-Kame


  


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/4] alloc_contig_pages() move some functions to page_isolation.c
  2010-11-19  8:10 ` KAMEZAWA Hiroyuki
@ 2010-11-19  8:12   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-19  8:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras, akpm,
	kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified range
of pfn. So, some of core logics can be used for other purpose as
allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Changelog: 2010/10/26
 - adjusted to mmotm-1024 + Bob's 3 clean ups.
Changelog: 2010/10/21
 - adjusted to mmotm-1020

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |    7 ++
 mm/memory_hotplug.c            |  108 ---------------------------------------
 mm/page_isolation.c            |  111 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 118 insertions(+), 108 deletions(-)

Index: mmotm-1117/include/linux/page-isolation.h
===================================================================
--- mmotm-1117.orig/include/linux/page-isolation.h
+++ mmotm-1117/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
Index: mmotm-1117/mm/memory_hotplug.c
===================================================================
--- mmotm-1117.orig/mm/memory_hotplug.c
+++ mmotm-1117/mm/memory_hotplug.c
@@ -615,114 +615,6 @@ int is_mem_section_removable(unsigned lo
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!page_count(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			/* Becasue we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,111 @@ int test_pages_isolated(unsigned long st
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!page_count(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/4] alloc_contig_pages() move some functions to page_isolation.c
@ 2010-11-19  8:12   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-19  8:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras, akpm,
	kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified range
of pfn. So, some of core logics can be used for other purpose as
allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Changelog: 2010/10/26
 - adjusted to mmotm-1024 + Bob's 3 clean ups.
Changelog: 2010/10/21
 - adjusted to mmotm-1020

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |    7 ++
 mm/memory_hotplug.c            |  108 ---------------------------------------
 mm/page_isolation.c            |  111 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 118 insertions(+), 108 deletions(-)

Index: mmotm-1117/include/linux/page-isolation.h
===================================================================
--- mmotm-1117.orig/include/linux/page-isolation.h
+++ mmotm-1117/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
Index: mmotm-1117/mm/memory_hotplug.c
===================================================================
--- mmotm-1117.orig/mm/memory_hotplug.c
+++ mmotm-1117/mm/memory_hotplug.c
@@ -615,114 +615,6 @@ int is_mem_section_removable(unsigned lo
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!page_count(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			/* Becasue we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,111 @@ int test_pages_isolated(unsigned long st
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!page_count(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range
  2010-11-19  8:10 ` KAMEZAWA Hiroyuki
@ 2010-11-19  8:14   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-19  8:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras, akpm,
	kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Unlike memory hotplug, at an allocation of contigous memory range, address
may not be a problem. IOW, if a requester of memory wants to allocate 100M of
of contigous memory, placement of allocated memory may not be a problem.
So, "finding a range of memory which seems to be MOVABLE" is required.

This patch adds a functon to isolate a length of memory within [start, end).
This function returns a pfn which is 1st page of isolated contigous chunk
of given length within [start, end).

If no_search=true is passed as argument, start address is always same to
the specified "base" addresss.

After isolation, free memory within this area will never be allocated.
But some pages will remain as "Used/LRU" pages. They should be dropped by
page reclaim or migration.

Changelog: 2010-11-17
 - fixed some conding style (if-then-else)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/page_isolation.c |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)

Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -7,6 +7,7 @@
 #include <linux/pageblock-flags.h>
 #include <linux/memcontrol.h>
 #include <linux/migrate.h>
+#include <linux/memory_hotplug.h>
 #include <linux/mm_inline.h>
 #include "internal.h"
 
@@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
 out:
 	return ret;
 }
+
+/*
+ * Functions for getting contiguous MOVABLE pages in a zone.
+ */
+struct page_range {
+	unsigned long base; /* Base address of searching contigouous block */
+	unsigned long end;
+	unsigned long pages;/* Length of contiguous block */
+	int align_order;
+	unsigned long align_mask;
+};
+
+int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
+{
+	struct page_range *blockinfo = arg;
+	unsigned long end;
+
+	end = pfn + nr_pages;
+	pfn = ALIGN(pfn, 1 << blockinfo->align_order);
+	end = end & ~(MAX_ORDER_NR_PAGES - 1);
+
+	if (end < pfn)
+		return 0;
+	if (end - pfn >= blockinfo->pages) {
+		blockinfo->base = pfn;
+		blockinfo->end = end;
+		return 1;
+	}
+	return 0;
+}
+
+static void __trim_zone(struct zone *zone, struct page_range *range)
+{
+	unsigned long pfn;
+	/*
+ 	 * skip pages which dones'nt under the zone.
+ 	 * There are some archs which zones are not in linear layout.
+	 */
+	if (page_zone(pfn_to_page(range->base)) != zone) {
+		for (pfn = range->base;
+			pfn < range->end;
+			pfn += MAX_ORDER_NR_PAGES) {
+			if (page_zone(pfn_to_page(pfn)) == zone)
+				break;
+		}
+		range->base = min(pfn, range->end);
+	}
+	/* Here, range-> base is in the zone if range->base != range->end */
+	for (pfn = range->base;
+	     pfn < range->end;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		if (zone != page_zone(pfn_to_page(pfn))) {
+			pfn = pfn - MAX_ORDER_NR_PAGES;
+			break;
+		}
+	}
+	range->end = min(pfn, range->end);
+	return;
+}
+
+/*
+ * This function is for finding a contiguous memory block which has length
+ * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
+ * and return the first page's pfn.
+ * This checks all pages in the returned range is free of Pg_LRU. To reduce
+ * the risk of false-positive testing, lru_add_drain_all() should be called
+ * before this function to reduce pages on pagevec for zones.
+ */
+
+static unsigned long find_contig_block(unsigned long base,
+		unsigned long end, unsigned long pages,
+		int align_order, struct zone *zone)
+{
+	unsigned long pfn, pos;
+	struct page_range blockinfo;
+	int ret;
+
+	VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
+	VM_BUG_ON(base & ((1 << align_order) - 1));
+retry:
+	blockinfo.base = base;
+	blockinfo.end = end;
+	blockinfo.pages = pages;
+	blockinfo.align_order = align_order;
+	blockinfo.align_mask = (1 << align_order) - 1;
+	/*
+	 * At first, check physical page layout and skip memory holes.
+	 */
+	ret = walk_system_ram_range(base, end - base, &blockinfo,
+		__get_contig_block);
+	if (!ret)
+		return 0;
+	/* check contiguous pages in a zone */
+	__trim_zone(zone, &blockinfo);
+
+	/*
+	 * Ok, we found contiguous memory chunk of size. Isolate it.
+	 * We just search MAX_ORDER aligned range.
+	 */
+	for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
+	     pfn += (1 << align_order)) {
+		struct zone *z = page_zone(pfn_to_page(pfn));
+		if (z != zone)
+			continue;
+
+		spin_lock_irq(&z->lock);
+		pos = pfn;
+		/*
+		 * Check the range only contains free pages or LRU pages.
+		 */
+		while (pos < pfn + pages) {
+			struct page *p;
+
+			if (!pfn_valid_within(pos))
+				break;
+			p = pfn_to_page(pos);
+			if (PageReserved(p))
+				break;
+			if (!page_count(p)) {
+				if (!PageBuddy(p))
+					pos++;
+				else
+					pos += (1 << page_order(p));
+			} else if (PageLRU(p)) {
+				pos++;
+			} else
+				break;
+		}
+		spin_unlock_irq(&z->lock);
+		if ((pos == pfn + pages)) {
+			if (!start_isolate_page_range(pfn, pfn + pages))
+				return pfn;
+		} else/* the chunk including "pos" should be skipped */
+			pfn = pos & ~((1 << align_order) - 1);
+		cond_resched();
+	}
+
+	/* failed */
+	if (blockinfo.end + pages <= end) {
+		/* Move base address and find the next block of RAM. */
+		base = blockinfo.end;
+		goto retry;
+	}
+	return 0;
+}


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range
@ 2010-11-19  8:14   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-19  8:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras, akpm,
	kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Unlike memory hotplug, at an allocation of contigous memory range, address
may not be a problem. IOW, if a requester of memory wants to allocate 100M of
of contigous memory, placement of allocated memory may not be a problem.
So, "finding a range of memory which seems to be MOVABLE" is required.

This patch adds a functon to isolate a length of memory within [start, end).
This function returns a pfn which is 1st page of isolated contigous chunk
of given length within [start, end).

If no_search=true is passed as argument, start address is always same to
the specified "base" addresss.

After isolation, free memory within this area will never be allocated.
But some pages will remain as "Used/LRU" pages. They should be dropped by
page reclaim or migration.

Changelog: 2010-11-17
 - fixed some conding style (if-then-else)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/page_isolation.c |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)

Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -7,6 +7,7 @@
 #include <linux/pageblock-flags.h>
 #include <linux/memcontrol.h>
 #include <linux/migrate.h>
+#include <linux/memory_hotplug.h>
 #include <linux/mm_inline.h>
 #include "internal.h"
 
@@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
 out:
 	return ret;
 }
+
+/*
+ * Functions for getting contiguous MOVABLE pages in a zone.
+ */
+struct page_range {
+	unsigned long base; /* Base address of searching contigouous block */
+	unsigned long end;
+	unsigned long pages;/* Length of contiguous block */
+	int align_order;
+	unsigned long align_mask;
+};
+
+int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
+{
+	struct page_range *blockinfo = arg;
+	unsigned long end;
+
+	end = pfn + nr_pages;
+	pfn = ALIGN(pfn, 1 << blockinfo->align_order);
+	end = end & ~(MAX_ORDER_NR_PAGES - 1);
+
+	if (end < pfn)
+		return 0;
+	if (end - pfn >= blockinfo->pages) {
+		blockinfo->base = pfn;
+		blockinfo->end = end;
+		return 1;
+	}
+	return 0;
+}
+
+static void __trim_zone(struct zone *zone, struct page_range *range)
+{
+	unsigned long pfn;
+	/*
+ 	 * skip pages which dones'nt under the zone.
+ 	 * There are some archs which zones are not in linear layout.
+	 */
+	if (page_zone(pfn_to_page(range->base)) != zone) {
+		for (pfn = range->base;
+			pfn < range->end;
+			pfn += MAX_ORDER_NR_PAGES) {
+			if (page_zone(pfn_to_page(pfn)) == zone)
+				break;
+		}
+		range->base = min(pfn, range->end);
+	}
+	/* Here, range-> base is in the zone if range->base != range->end */
+	for (pfn = range->base;
+	     pfn < range->end;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		if (zone != page_zone(pfn_to_page(pfn))) {
+			pfn = pfn - MAX_ORDER_NR_PAGES;
+			break;
+		}
+	}
+	range->end = min(pfn, range->end);
+	return;
+}
+
+/*
+ * This function is for finding a contiguous memory block which has length
+ * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
+ * and return the first page's pfn.
+ * This checks all pages in the returned range is free of Pg_LRU. To reduce
+ * the risk of false-positive testing, lru_add_drain_all() should be called
+ * before this function to reduce pages on pagevec for zones.
+ */
+
+static unsigned long find_contig_block(unsigned long base,
+		unsigned long end, unsigned long pages,
+		int align_order, struct zone *zone)
+{
+	unsigned long pfn, pos;
+	struct page_range blockinfo;
+	int ret;
+
+	VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
+	VM_BUG_ON(base & ((1 << align_order) - 1));
+retry:
+	blockinfo.base = base;
+	blockinfo.end = end;
+	blockinfo.pages = pages;
+	blockinfo.align_order = align_order;
+	blockinfo.align_mask = (1 << align_order) - 1;
+	/*
+	 * At first, check physical page layout and skip memory holes.
+	 */
+	ret = walk_system_ram_range(base, end - base, &blockinfo,
+		__get_contig_block);
+	if (!ret)
+		return 0;
+	/* check contiguous pages in a zone */
+	__trim_zone(zone, &blockinfo);
+
+	/*
+	 * Ok, we found contiguous memory chunk of size. Isolate it.
+	 * We just search MAX_ORDER aligned range.
+	 */
+	for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
+	     pfn += (1 << align_order)) {
+		struct zone *z = page_zone(pfn_to_page(pfn));
+		if (z != zone)
+			continue;
+
+		spin_lock_irq(&z->lock);
+		pos = pfn;
+		/*
+		 * Check the range only contains free pages or LRU pages.
+		 */
+		while (pos < pfn + pages) {
+			struct page *p;
+
+			if (!pfn_valid_within(pos))
+				break;
+			p = pfn_to_page(pos);
+			if (PageReserved(p))
+				break;
+			if (!page_count(p)) {
+				if (!PageBuddy(p))
+					pos++;
+				else
+					pos += (1 << page_order(p));
+			} else if (PageLRU(p)) {
+				pos++;
+			} else
+				break;
+		}
+		spin_unlock_irq(&z->lock);
+		if ((pos == pfn + pages)) {
+			if (!start_isolate_page_range(pfn, pfn + pages))
+				return pfn;
+		} else/* the chunk including "pos" should be skipped */
+			pfn = pos & ~((1 << align_order) - 1);
+		cond_resched();
+	}
+
+	/* failed */
+	if (blockinfo.end + pages <= end) {
+		/* Move base address and find the next block of RAM. */
+		base = blockinfo.end;
+		goto retry;
+	}
+	return 0;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration
  2010-11-19  8:10 ` KAMEZAWA Hiroyuki
@ 2010-11-19  8:15   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-19  8:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras, akpm,
	kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Add an function to allocate contiguous memory larger than MAX_ORDER.
The main difference between usual page allocator is that this uses
memory offline technique (Isolate pages and migrate remaining pages.).

I think this is not 100% solution because we can't avoid fragmentation,
but we have kernelcore= boot option and can create MOVABLE zone. That
helps us to allow allocate a contiguous range on demand.

The new function is

  alloc_contig_pages(base, end, nr_pages, alignment)

This function will allocate contiguous pages of nr_pages from the range
[base, end). If [base, end) is bigger than nr_pages, some pfn which
meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
it will be raised to be MAX_ORDER.

__alloc_contig_pages() has much more arguments.


Some drivers allocates contig pages by bootmem or hiding some memory
from the kernel at boot. But if contig pages are necessary only in some
situation, kernelcore= boot option and using page migration is a choice.

Changelog: 2010-11-19
 - removed no_search
 - removed some drain_ functions because they are heavy.
 - check -ENOMEM case

Changelog: 2010-10-26
 - support gfp_t
 - support zonelist/nodemask
 - support [base, end) 
 - support alignment

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |   15 ++
 mm/page_alloc.c                |   29 ++++
 mm/page_isolation.c            |  242 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 286 insertions(+)

Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/swap.h>
 #include <linux/memcontrol.h>
 #include <linux/migrate.h>
 #include <linux/memory_hotplug.h>
@@ -396,3 +397,244 @@ retry:
 	}
 	return 0;
 }
+
+/*
+ * Comparing caller specified [user_start, user_end) with physical memory layout
+ * [phys_start, phys_end). If no intersection is longer than nr_pages, return 1.
+ * If there is an intersection, return 0 and fill range in [*start, *end)
+ */
+static int
+__calc_search_range(unsigned long user_start, unsigned long user_end,
+		unsigned long nr_pages,
+		unsigned long phys_start, unsigned long phys_end,
+		unsigned long *start, unsigned long *end)
+{
+	if ((user_start >= phys_end) || (user_end <= phys_start))
+		return 1;
+	if (user_start <= phys_start) {
+		*start = phys_start;
+		*end = min(user_end, phys_end);
+	} else {
+		*start = user_start;
+		*end = min(user_end, phys_end);
+	}
+	if (*end - *start < nr_pages)
+		return 1;
+	return 0;
+}
+
+
+/**
+ * __alloc_contig_pages - allocate a contiguous physical pages
+ * @base: the lowest pfn which caller wants.
+ * @end:  the highest pfn which caller wants.
+ * @nr_pages: the length of a chunk of pages to be allocated.
+ * @align_order: alignment of start address of returned chunk in order.
+ *   Returned' page's order will be aligned to (1 << align_order).If smaller
+ *   than MAX_ORDER, it's raised to MAX_ORDER.
+ * @node: allocate near memory to the node, If -1, current node is used.
+ * @gfpflag: used to specify what zone the memory should be from.
+ * @nodemask: allocate memory within the nodemask.
+ *
+ * Search a memory range [base, end) and allocates physically contiguous
+ * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
+ * be allocated
+ *
+ * This returns a page of the beginning of contiguous block. At failure, NULL
+ * is returned.
+ *
+ * Limitation: at allocation, nr_pages may be increased to be aligned to
+ * MAX_ORDER before searching a range. So, even if there is a enough chunk
+ * for nr_pages, it may not be able to be allocated. Extra tail pages of
+ * allocated chunk is returned to buddy allocator before returning the caller.
+ */
+
+#define MIGRATION_RETRY	(5)
+struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
+			unsigned long nr_pages, int align_order,
+			int node, gfp_t gfpflag, nodemask_t *mask)
+{
+	unsigned long found, aligned_pages, start;
+	struct page *ret = NULL;
+	int migration_failed;
+	unsigned long align_mask;
+	struct zoneref *z;
+	struct zone *zone;
+	struct zonelist *zonelist;
+	enum zone_type highzone_idx = gfp_zone(gfpflag);
+	unsigned long zone_start, zone_end, rs, re, pos;
+
+	if (node == -1)
+		node = numa_node_id();
+
+	/* check unsupported flags */
+	if (gfpflag & __GFP_NORETRY)
+		return NULL;
+	if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS)) !=
+		(__GFP_WAIT | __GFP_IO | __GFP_FS))
+		return NULL;
+
+	if (gfpflag & __GFP_THISNODE)
+		zonelist = &NODE_DATA(node)->node_zonelists[1];
+	else
+		zonelist = &NODE_DATA(node)->node_zonelists[0];
+	/*
+	 * Base/nr_page/end should be aligned to MAX_ORDER
+	 */
+	found = 0;
+
+	if (align_order < MAX_ORDER)
+		align_order = MAX_ORDER;
+
+	align_mask = (1 << align_order) - 1;
+	/*
+	 * We allocates MAX_ORDER aligned pages and cut tail pages later.
+	 */
+	aligned_pages = ALIGN(nr_pages, (1 << MAX_ORDER));
+	/*
+	 * If end - base == nr_pages, we can't search range. base must be
+	 * aligned.
+	 */
+	if ((end - base == nr_pages) && (base & align_mask))
+		return NULL;
+
+	base = ALIGN(base, (1 << align_order));
+	if ((end <= base) || (end - base < aligned_pages))
+		return NULL;
+
+	/*
+	 * searching contig memory range within [pos, end).
+	 * pos is updated at migration failure to find next chunk in zone.
+	 * pos is reset to the base at searching next zone.
+	 * (see for_each_zone_zonelist_nodemask in mmzone.h)
+	 *
+	 * Note: we cannot assume zones/nodes are in linear memory layout.
+	 */
+	z = first_zones_zonelist(zonelist, highzone_idx, mask, &zone);
+	pos = base;
+retry:
+	if (!zone)
+		return NULL;
+
+	zone_start = ALIGN(zone->zone_start_pfn, 1 << align_order);
+	zone_end = zone->zone_start_pfn + zone->spanned_pages;
+
+	/* check [pos, end) is in this zone. */
+	if ((pos >= end) ||
+	     (__calc_search_range(pos, end, aligned_pages,
+			zone_start, zone_end, &rs, &re))) {
+next_zone:
+		/* go to the next zone */
+		z = next_zones_zonelist(++z, highzone_idx, mask, &zone);
+		/* reset the pos */
+		pos = base;
+		goto retry;
+	}
+	/* [pos, end) is trimmed to [rs, re) in this zone. */
+	pos = rs;
+
+	found = find_contig_block(rs, re, aligned_pages, align_order, zone);
+	if (!found)
+		goto next_zone;
+
+	/*
+	 * Because we isolated the range, free pages in the range will never
+	 * be (re)allocated. scan_lru_pages() finds the next PG_lru page in
+	 * the range and returns 0 if it reaches the end.
+	 */
+	migration_failed = 0;
+	rs = found;
+	re = found + aligned_pages;
+	for (rs = scan_lru_pages(rs, re);
+	     rs && rs < re;
+	     rs = scan_lru_pages(rs, re)) {
+		int rc = do_migrate_range(rs, re);
+		if (!rc)
+			migration_failed = 0;
+		else {
+			/* it's better to try another block ? */
+			if (++migration_failed >= MIGRATION_RETRY)
+				break;
+			if (rc == -EBUSY) {
+				/* There are unstable pages.on pagevec. */
+				lru_add_drain_all();
+				/*
+				 * there may be pages on pcplist before
+				 * we mark the range as ISOLATED.
+				 */
+				drain_all_pages();
+			} else if (rc == -ENOMEM)
+				goto nomem;
+		}
+		cond_resched();
+	}
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+	/* Check all pages are isolated */
+	if (test_pages_isolated(found, found + aligned_pages)) {
+		undo_isolate_page_range(found, aligned_pages);
+		/*
+		 * We failed at [found...found+aligned_pages) migration.
+		 * "rs" is the last pfn scan_lru_pages() found that the page
+		 * is LRU page. Update pos and try next chunk.
+		 */
+		pos = ALIGN(rs + 1, (1 << align_order));
+		goto retry; /* goto next chunk */
+	}
+	/*
+	 * OK, here, [found...found+pages) memory are isolated.
+	 * All pages in the range will be moved into the list with
+	 * page_count(page)=1.
+	 */
+	ret = pfn_to_page(found);
+	alloc_contig_freed_pages(found, found + aligned_pages, gfpflag);
+	/* unset ISOLATE */
+	undo_isolate_page_range(found, aligned_pages);
+	/* Free unnecessary pages in tail */
+	for (start = found + nr_pages; start < found + aligned_pages; start++)
+		__free_page(pfn_to_page(start));
+	return ret;
+nomem:
+	undo_isolate_page_range(found, aligned_pages);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(__alloc_contig_pages);
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+	int i;
+	for (i = 0; i < nr_pages; i++)
+		__free_page(page + i);
+}
+EXPORT_SYMBOL_GPL(free_contig_pages);
+
+/*
+ * Allocated pages will not be MOVABLE but MOVABLE zone is a suitable
+ * for allocating big chunk. So, using ZONE_MOVABLE is a default.
+ */
+
+struct page *alloc_contig_pages(unsigned long base, unsigned long end,
+			unsigned long nr_pages, int align_order)
+{
+	return __alloc_contig_pages(base, end, nr_pages, align_order, -1,
+				GFP_KERNEL | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages);
+
+struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order)
+{
+	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, -1,
+				GFP_KERNEL | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages_host);
+
+struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
+				int align_order)
+{
+	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, nid,
+			GFP_KERNEL | __GFP_THISNODE | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages_node);
Index: mmotm-1117/include/linux/page-isolation.h
===================================================================
--- mmotm-1117.orig/include/linux/page-isolation.h
+++ mmotm-1117/include/linux/page-isolation.h
@@ -32,6 +32,8 @@ test_pages_isolated(unsigned long start_
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern void alloc_contig_freed_pages(unsigned long pfn,
+		unsigned long pages, gfp_t flag);
 
 /*
  * For migration.
@@ -41,4 +43,17 @@ int test_pages_in_a_zone(unsigned long s
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+/*
+ * For large alloc.
+ */
+struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
+				unsigned long nr_pages, int align_order,
+				int node, gfp_t flag, nodemask_t *mask);
+struct page *alloc_contig_pages(unsigned long base, unsigned long end,
+				unsigned long nr_pages, int align_order);
+struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order);
+struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
+		int align_order);
+void free_contig_pages(struct page *page, int nr_pages);
+
 #endif
Index: mmotm-1117/mm/page_alloc.c
===================================================================
--- mmotm-1117.orig/mm/page_alloc.c
+++ mmotm-1117/mm/page_alloc.c
@@ -5447,6 +5447,35 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+
+void alloc_contig_freed_pages(unsigned long pfn,  unsigned long end, gfp_t flag)
+{
+	struct page *page;
+	struct zone *zone;
+	int order;
+	unsigned long start = pfn;
+
+	zone = page_zone(pfn_to_page(pfn));
+	spin_lock_irq(&zone->lock);
+	while (pfn < end) {
+		VM_BUG_ON(!pfn_valid(pfn));
+		page = pfn_to_page(pfn);
+		VM_BUG_ON(page_count(page));
+		VM_BUG_ON(!PageBuddy(page));
+		list_del(&page->lru);
+		order = page_order(page);
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+		pfn += 1 << order;
+	}
+	spin_unlock_irq(&zone->lock);
+
+	/*After this, pages in the range can be freed one be one */
+	for (pfn = start; pfn < end; pfn++)
+		prep_new_page(pfn_to_page(pfn), 0, flag);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration
@ 2010-11-19  8:15   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-19  8:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras, akpm,
	kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Add an function to allocate contiguous memory larger than MAX_ORDER.
The main difference between usual page allocator is that this uses
memory offline technique (Isolate pages and migrate remaining pages.).

I think this is not 100% solution because we can't avoid fragmentation,
but we have kernelcore= boot option and can create MOVABLE zone. That
helps us to allow allocate a contiguous range on demand.

The new function is

  alloc_contig_pages(base, end, nr_pages, alignment)

This function will allocate contiguous pages of nr_pages from the range
[base, end). If [base, end) is bigger than nr_pages, some pfn which
meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
it will be raised to be MAX_ORDER.

__alloc_contig_pages() has much more arguments.


Some drivers allocates contig pages by bootmem or hiding some memory
from the kernel at boot. But if contig pages are necessary only in some
situation, kernelcore= boot option and using page migration is a choice.

Changelog: 2010-11-19
 - removed no_search
 - removed some drain_ functions because they are heavy.
 - check -ENOMEM case

Changelog: 2010-10-26
 - support gfp_t
 - support zonelist/nodemask
 - support [base, end) 
 - support alignment

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |   15 ++
 mm/page_alloc.c                |   29 ++++
 mm/page_isolation.c            |  242 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 286 insertions(+)

Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/swap.h>
 #include <linux/memcontrol.h>
 #include <linux/migrate.h>
 #include <linux/memory_hotplug.h>
@@ -396,3 +397,244 @@ retry:
 	}
 	return 0;
 }
+
+/*
+ * Comparing caller specified [user_start, user_end) with physical memory layout
+ * [phys_start, phys_end). If no intersection is longer than nr_pages, return 1.
+ * If there is an intersection, return 0 and fill range in [*start, *end)
+ */
+static int
+__calc_search_range(unsigned long user_start, unsigned long user_end,
+		unsigned long nr_pages,
+		unsigned long phys_start, unsigned long phys_end,
+		unsigned long *start, unsigned long *end)
+{
+	if ((user_start >= phys_end) || (user_end <= phys_start))
+		return 1;
+	if (user_start <= phys_start) {
+		*start = phys_start;
+		*end = min(user_end, phys_end);
+	} else {
+		*start = user_start;
+		*end = min(user_end, phys_end);
+	}
+	if (*end - *start < nr_pages)
+		return 1;
+	return 0;
+}
+
+
+/**
+ * __alloc_contig_pages - allocate a contiguous physical pages
+ * @base: the lowest pfn which caller wants.
+ * @end:  the highest pfn which caller wants.
+ * @nr_pages: the length of a chunk of pages to be allocated.
+ * @align_order: alignment of start address of returned chunk in order.
+ *   Returned' page's order will be aligned to (1 << align_order).If smaller
+ *   than MAX_ORDER, it's raised to MAX_ORDER.
+ * @node: allocate near memory to the node, If -1, current node is used.
+ * @gfpflag: used to specify what zone the memory should be from.
+ * @nodemask: allocate memory within the nodemask.
+ *
+ * Search a memory range [base, end) and allocates physically contiguous
+ * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
+ * be allocated
+ *
+ * This returns a page of the beginning of contiguous block. At failure, NULL
+ * is returned.
+ *
+ * Limitation: at allocation, nr_pages may be increased to be aligned to
+ * MAX_ORDER before searching a range. So, even if there is a enough chunk
+ * for nr_pages, it may not be able to be allocated. Extra tail pages of
+ * allocated chunk is returned to buddy allocator before returning the caller.
+ */
+
+#define MIGRATION_RETRY	(5)
+struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
+			unsigned long nr_pages, int align_order,
+			int node, gfp_t gfpflag, nodemask_t *mask)
+{
+	unsigned long found, aligned_pages, start;
+	struct page *ret = NULL;
+	int migration_failed;
+	unsigned long align_mask;
+	struct zoneref *z;
+	struct zone *zone;
+	struct zonelist *zonelist;
+	enum zone_type highzone_idx = gfp_zone(gfpflag);
+	unsigned long zone_start, zone_end, rs, re, pos;
+
+	if (node == -1)
+		node = numa_node_id();
+
+	/* check unsupported flags */
+	if (gfpflag & __GFP_NORETRY)
+		return NULL;
+	if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS)) !=
+		(__GFP_WAIT | __GFP_IO | __GFP_FS))
+		return NULL;
+
+	if (gfpflag & __GFP_THISNODE)
+		zonelist = &NODE_DATA(node)->node_zonelists[1];
+	else
+		zonelist = &NODE_DATA(node)->node_zonelists[0];
+	/*
+	 * Base/nr_page/end should be aligned to MAX_ORDER
+	 */
+	found = 0;
+
+	if (align_order < MAX_ORDER)
+		align_order = MAX_ORDER;
+
+	align_mask = (1 << align_order) - 1;
+	/*
+	 * We allocates MAX_ORDER aligned pages and cut tail pages later.
+	 */
+	aligned_pages = ALIGN(nr_pages, (1 << MAX_ORDER));
+	/*
+	 * If end - base == nr_pages, we can't search range. base must be
+	 * aligned.
+	 */
+	if ((end - base == nr_pages) && (base & align_mask))
+		return NULL;
+
+	base = ALIGN(base, (1 << align_order));
+	if ((end <= base) || (end - base < aligned_pages))
+		return NULL;
+
+	/*
+	 * searching contig memory range within [pos, end).
+	 * pos is updated at migration failure to find next chunk in zone.
+	 * pos is reset to the base at searching next zone.
+	 * (see for_each_zone_zonelist_nodemask in mmzone.h)
+	 *
+	 * Note: we cannot assume zones/nodes are in linear memory layout.
+	 */
+	z = first_zones_zonelist(zonelist, highzone_idx, mask, &zone);
+	pos = base;
+retry:
+	if (!zone)
+		return NULL;
+
+	zone_start = ALIGN(zone->zone_start_pfn, 1 << align_order);
+	zone_end = zone->zone_start_pfn + zone->spanned_pages;
+
+	/* check [pos, end) is in this zone. */
+	if ((pos >= end) ||
+	     (__calc_search_range(pos, end, aligned_pages,
+			zone_start, zone_end, &rs, &re))) {
+next_zone:
+		/* go to the next zone */
+		z = next_zones_zonelist(++z, highzone_idx, mask, &zone);
+		/* reset the pos */
+		pos = base;
+		goto retry;
+	}
+	/* [pos, end) is trimmed to [rs, re) in this zone. */
+	pos = rs;
+
+	found = find_contig_block(rs, re, aligned_pages, align_order, zone);
+	if (!found)
+		goto next_zone;
+
+	/*
+	 * Because we isolated the range, free pages in the range will never
+	 * be (re)allocated. scan_lru_pages() finds the next PG_lru page in
+	 * the range and returns 0 if it reaches the end.
+	 */
+	migration_failed = 0;
+	rs = found;
+	re = found + aligned_pages;
+	for (rs = scan_lru_pages(rs, re);
+	     rs && rs < re;
+	     rs = scan_lru_pages(rs, re)) {
+		int rc = do_migrate_range(rs, re);
+		if (!rc)
+			migration_failed = 0;
+		else {
+			/* it's better to try another block ? */
+			if (++migration_failed >= MIGRATION_RETRY)
+				break;
+			if (rc == -EBUSY) {
+				/* There are unstable pages.on pagevec. */
+				lru_add_drain_all();
+				/*
+				 * there may be pages on pcplist before
+				 * we mark the range as ISOLATED.
+				 */
+				drain_all_pages();
+			} else if (rc == -ENOMEM)
+				goto nomem;
+		}
+		cond_resched();
+	}
+	if (!migration_failed) {
+		/* drop all pages in pagevec and pcp list */
+		lru_add_drain_all();
+		drain_all_pages();
+	}
+	/* Check all pages are isolated */
+	if (test_pages_isolated(found, found + aligned_pages)) {
+		undo_isolate_page_range(found, aligned_pages);
+		/*
+		 * We failed at [found...found+aligned_pages) migration.
+		 * "rs" is the last pfn scan_lru_pages() found that the page
+		 * is LRU page. Update pos and try next chunk.
+		 */
+		pos = ALIGN(rs + 1, (1 << align_order));
+		goto retry; /* goto next chunk */
+	}
+	/*
+	 * OK, here, [found...found+pages) memory are isolated.
+	 * All pages in the range will be moved into the list with
+	 * page_count(page)=1.
+	 */
+	ret = pfn_to_page(found);
+	alloc_contig_freed_pages(found, found + aligned_pages, gfpflag);
+	/* unset ISOLATE */
+	undo_isolate_page_range(found, aligned_pages);
+	/* Free unnecessary pages in tail */
+	for (start = found + nr_pages; start < found + aligned_pages; start++)
+		__free_page(pfn_to_page(start));
+	return ret;
+nomem:
+	undo_isolate_page_range(found, aligned_pages);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(__alloc_contig_pages);
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+	int i;
+	for (i = 0; i < nr_pages; i++)
+		__free_page(page + i);
+}
+EXPORT_SYMBOL_GPL(free_contig_pages);
+
+/*
+ * Allocated pages will not be MOVABLE but MOVABLE zone is a suitable
+ * for allocating big chunk. So, using ZONE_MOVABLE is a default.
+ */
+
+struct page *alloc_contig_pages(unsigned long base, unsigned long end,
+			unsigned long nr_pages, int align_order)
+{
+	return __alloc_contig_pages(base, end, nr_pages, align_order, -1,
+				GFP_KERNEL | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages);
+
+struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order)
+{
+	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, -1,
+				GFP_KERNEL | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages_host);
+
+struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
+				int align_order)
+{
+	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, nid,
+			GFP_KERNEL | __GFP_THISNODE | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages_node);
Index: mmotm-1117/include/linux/page-isolation.h
===================================================================
--- mmotm-1117.orig/include/linux/page-isolation.h
+++ mmotm-1117/include/linux/page-isolation.h
@@ -32,6 +32,8 @@ test_pages_isolated(unsigned long start_
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern void alloc_contig_freed_pages(unsigned long pfn,
+		unsigned long pages, gfp_t flag);
 
 /*
  * For migration.
@@ -41,4 +43,17 @@ int test_pages_in_a_zone(unsigned long s
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+/*
+ * For large alloc.
+ */
+struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
+				unsigned long nr_pages, int align_order,
+				int node, gfp_t flag, nodemask_t *mask);
+struct page *alloc_contig_pages(unsigned long base, unsigned long end,
+				unsigned long nr_pages, int align_order);
+struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order);
+struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
+		int align_order);
+void free_contig_pages(struct page *page, int nr_pages);
+
 #endif
Index: mmotm-1117/mm/page_alloc.c
===================================================================
--- mmotm-1117.orig/mm/page_alloc.c
+++ mmotm-1117/mm/page_alloc.c
@@ -5447,6 +5447,35 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+
+void alloc_contig_freed_pages(unsigned long pfn,  unsigned long end, gfp_t flag)
+{
+	struct page *page;
+	struct zone *zone;
+	int order;
+	unsigned long start = pfn;
+
+	zone = page_zone(pfn_to_page(pfn));
+	spin_lock_irq(&zone->lock);
+	while (pfn < end) {
+		VM_BUG_ON(!pfn_valid(pfn));
+		page = pfn_to_page(pfn);
+		VM_BUG_ON(page_count(page));
+		VM_BUG_ON(!PageBuddy(page));
+		list_del(&page->lru);
+		order = page_order(page);
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+		pfn += 1 << order;
+	}
+	spin_unlock_irq(&zone->lock);
+
+	/*After this, pages in the range can be freed one be one */
+	for (pfn = start; pfn < end; pfn++)
+		prep_new_page(pfn_to_page(pfn), 0, flag);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 4/4] alloc_contig_pages() use better allocation function for migration
  2010-11-19  8:10 ` KAMEZAWA Hiroyuki
@ 2010-11-19  8:16   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-19  8:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras, akpm,
	kosaki.motohiro


From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Old story.
Because we cannot assume which memory section will be offlined next,
hotremove_migrate_alloc() just uses alloc_page(). i.e. make no decision
where the page should be migrate into. Considering memory hotplug's
nature, the next memory section near to a section which is being removed
will be removed in the next. So, migrate pages to the same node of original
page doesn't make sense in many case, it just increases load.
Migration destination page is allocated from the node where offlining script
runs.

Now, contiguous-alloc uses do_migrate_range(). In this case, migration
destination node should be the same node of migration source page.

This patch modifies hotremove_migrate_alloc() and pass "nid" to it.
Memory hotremove will pass -1. So, if the page will be moved to
the node where offlining script runs....no behavior changes.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |    3 ++-
 mm/memory_hotplug.c            |    2 +-
 mm/page_isolation.c            |   21 ++++++++++++++++-----
 3 files changed, 19 insertions(+), 7 deletions(-)

Index: mmotm-1117/include/linux/page-isolation.h
===================================================================
--- mmotm-1117.orig/include/linux/page-isolation.h
+++ mmotm-1117/include/linux/page-isolation.h
@@ -41,7 +41,8 @@ extern void alloc_contig_freed_pages(uns
 
 int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
-int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
+int do_migrate_range(unsigned long start_pfn,
+	unsigned long end_pfn, int node);
 
 /*
  * For large alloc.
Index: mmotm-1117/mm/memory_hotplug.c
===================================================================
--- mmotm-1117.orig/mm/memory_hotplug.c
+++ mmotm-1117/mm/memory_hotplug.c
@@ -724,7 +724,7 @@ repeat:
 
 	pfn = scan_lru_pages(start_pfn, end_pfn);
 	if (pfn) { /* We have page on LRU */
-		ret = do_migrate_range(pfn, end_pfn);
+		ret = do_migrate_range(pfn, end_pfn, numa_node_id());
 		if (!ret) {
 			drain = 1;
 			goto repeat;
Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -193,12 +193,21 @@ unsigned long scan_lru_pages(unsigned lo
 struct page *
 hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
 {
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
+	return alloc_pages_node(private, GFP_HIGHUSER_MOVABLE, 0);
 }
 
+/*
+ * Migrate pages in the range to somewhere. Migration target page is allocated
+ * by hotremove_migrate_alloc(). If on_node is specicied, new page will be
+ * selected from nearby nodes. At hotremove, this "allocate from near node"
+ * can be harmful because we may remove other pages in the node for removing
+ * more pages in node. contiguous_alloc() uses on_node=true for avoiding
+ * unnecessary migration to far node.
+ */
+
 #define NR_OFFLINE_AT_ONCE_PAGES	(256)
-int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn,
+		int node)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -245,7 +254,7 @@ int do_migrate_range(unsigned long start
 			goto out;
 		}
 		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+		ret = migrate_pages(&source, hotremove_migrate_alloc, node, 1);
 		if (ret)
 			putback_lru_pages(&source);
 	}
@@ -463,6 +472,7 @@ struct page *__alloc_contig_pages(unsign
 	struct zonelist *zonelist;
 	enum zone_type highzone_idx = gfp_zone(gfpflag);
 	unsigned long zone_start, zone_end, rs, re, pos;
+	int target_node;
 
 	if (node == -1)
 		node = numa_node_id();
@@ -516,6 +526,7 @@ retry:
 	if (!zone)
 		return NULL;
 
+	target_node = zone->zone_pgdat->node_id;
 	zone_start = ALIGN(zone->zone_start_pfn, 1 << align_order);
 	zone_end = zone->zone_start_pfn + zone->spanned_pages;
 
@@ -548,7 +559,7 @@ next_zone:
 	for (rs = scan_lru_pages(rs, re);
 	     rs && rs < re;
 	     rs = scan_lru_pages(rs, re)) {
-		int rc = do_migrate_range(rs, re);
+		int rc = do_migrate_range(rs, re, target_node);
 		if (!rc)
 			migration_failed = 0;
 		else {


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 4/4] alloc_contig_pages() use better allocation function for migration
@ 2010-11-19  8:16   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-19  8:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras, akpm,
	kosaki.motohiro


From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Old story.
Because we cannot assume which memory section will be offlined next,
hotremove_migrate_alloc() just uses alloc_page(). i.e. make no decision
where the page should be migrate into. Considering memory hotplug's
nature, the next memory section near to a section which is being removed
will be removed in the next. So, migrate pages to the same node of original
page doesn't make sense in many case, it just increases load.
Migration destination page is allocated from the node where offlining script
runs.

Now, contiguous-alloc uses do_migrate_range(). In this case, migration
destination node should be the same node of migration source page.

This patch modifies hotremove_migrate_alloc() and pass "nid" to it.
Memory hotremove will pass -1. So, if the page will be moved to
the node where offlining script runs....no behavior changes.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |    3 ++-
 mm/memory_hotplug.c            |    2 +-
 mm/page_isolation.c            |   21 ++++++++++++++++-----
 3 files changed, 19 insertions(+), 7 deletions(-)

Index: mmotm-1117/include/linux/page-isolation.h
===================================================================
--- mmotm-1117.orig/include/linux/page-isolation.h
+++ mmotm-1117/include/linux/page-isolation.h
@@ -41,7 +41,8 @@ extern void alloc_contig_freed_pages(uns
 
 int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
-int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
+int do_migrate_range(unsigned long start_pfn,
+	unsigned long end_pfn, int node);
 
 /*
  * For large alloc.
Index: mmotm-1117/mm/memory_hotplug.c
===================================================================
--- mmotm-1117.orig/mm/memory_hotplug.c
+++ mmotm-1117/mm/memory_hotplug.c
@@ -724,7 +724,7 @@ repeat:
 
 	pfn = scan_lru_pages(start_pfn, end_pfn);
 	if (pfn) { /* We have page on LRU */
-		ret = do_migrate_range(pfn, end_pfn);
+		ret = do_migrate_range(pfn, end_pfn, numa_node_id());
 		if (!ret) {
 			drain = 1;
 			goto repeat;
Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -193,12 +193,21 @@ unsigned long scan_lru_pages(unsigned lo
 struct page *
 hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
 {
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
+	return alloc_pages_node(private, GFP_HIGHUSER_MOVABLE, 0);
 }
 
+/*
+ * Migrate pages in the range to somewhere. Migration target page is allocated
+ * by hotremove_migrate_alloc(). If on_node is specicied, new page will be
+ * selected from nearby nodes. At hotremove, this "allocate from near node"
+ * can be harmful because we may remove other pages in the node for removing
+ * more pages in node. contiguous_alloc() uses on_node=true for avoiding
+ * unnecessary migration to far node.
+ */
+
 #define NR_OFFLINE_AT_ONCE_PAGES	(256)
-int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn,
+		int node)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -245,7 +254,7 @@ int do_migrate_range(unsigned long start
 			goto out;
 		}
 		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+		ret = migrate_pages(&source, hotremove_migrate_alloc, node, 1);
 		if (ret)
 			putback_lru_pages(&source);
 	}
@@ -463,6 +472,7 @@ struct page *__alloc_contig_pages(unsign
 	struct zonelist *zonelist;
 	enum zone_type highzone_idx = gfp_zone(gfpflag);
 	unsigned long zone_start, zone_end, rs, re, pos;
+	int target_node;
 
 	if (node == -1)
 		node = numa_node_id();
@@ -516,6 +526,7 @@ retry:
 	if (!zone)
 		return NULL;
 
+	target_node = zone->zone_pgdat->node_id;
 	zone_start = ALIGN(zone->zone_start_pfn, 1 << align_order);
 	zone_end = zone->zone_start_pfn + zone->spanned_pages;
 
@@ -548,7 +559,7 @@ next_zone:
 	for (rs = scan_lru_pages(rs, re);
 	     rs && rs < re;
 	     rs = scan_lru_pages(rs, re)) {
-		int rc = do_migrate_range(rs, re);
+		int rc = do_migrate_range(rs, re, target_node);
 		if (!rc)
 			migration_failed = 0;
 		else {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
  2010-11-19  8:10 ` KAMEZAWA Hiroyuki
@ 2010-11-19 20:56   ` Andrew Morton
  -1 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2010-11-19 20:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras,
	kosaki.motohiro

On Fri, 19 Nov 2010 17:10:33 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Hi, this is an updated version. 
> 
> No major changes from the last one except for page allocation function.
> removed RFC.
> 
> Order of patches is
> 
> [1/4] move some functions from memory_hotplug.c to page_isolation.c
> [2/4] search physically contiguous range suitable for big chunk alloc.
> [3/4] allocate big chunk memory based on memory hotplug(migration) technique
> [4/4] modify page allocation function.
> 
> For what:
> 
>   I hear there is requirements to allocate a chunk of page which is larger than
>   MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
>   they hide some memory range by boot option (mem=) and use hidden memory
>   for its own purpose. But this seems a lack of feature in memory management.
> 
>   This patch adds 
> 	alloc_contig_pages(start, end, nr_pages, gfp_mask)
>   to allocate a chunk of page whose length is nr_pages from [start, end)
>   phys address. This uses similar logic of memory-unplug, which tries to
>   offline [start, end) pages. By this, drivers can allocate 30M or 128M or
>   much bigger memory chunk on demand. (I allocated 1G chunk in my test).
> 
>   But yes, because of fragmentation, this cannot guarantee 100% alloc.
>   If alloc_contig_pages() is called in system boot up or movable_zone is used,
>   this allocation succeeds at high rate.

So this is an alternatve implementation for the functionality offered
by Michal's "The Contiguous Memory Allocator framework".

>   I tested this on x86-64, and it seems to work as expected. But feedback from
>   embeded guys are appreciated because I think they are main user of this
>   function.

>From where I sit, feedback from the embedded guys is *vital*, because
they are indeed the main users.

Michal, I haven't made a note of all the people who are interested in
and who are potential users of this code.  Your patch series has a
billion cc's and is up to version 6.  Could I ask that you review and
test this code, and also hunt down other people (probably at other
organisations) who can do likewise for us?  Because until we hear from
those people that this work satisfies their needs, we can't really
proceed much further.

Thanks.




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
@ 2010-11-19 20:56   ` Andrew Morton
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2010-11-19 20:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras,
	kosaki.motohiro

On Fri, 19 Nov 2010 17:10:33 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Hi, this is an updated version. 
> 
> No major changes from the last one except for page allocation function.
> removed RFC.
> 
> Order of patches is
> 
> [1/4] move some functions from memory_hotplug.c to page_isolation.c
> [2/4] search physically contiguous range suitable for big chunk alloc.
> [3/4] allocate big chunk memory based on memory hotplug(migration) technique
> [4/4] modify page allocation function.
> 
> For what:
> 
>   I hear there is requirements to allocate a chunk of page which is larger than
>   MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
>   they hide some memory range by boot option (mem=) and use hidden memory
>   for its own purpose. But this seems a lack of feature in memory management.
> 
>   This patch adds 
> 	alloc_contig_pages(start, end, nr_pages, gfp_mask)
>   to allocate a chunk of page whose length is nr_pages from [start, end)
>   phys address. This uses similar logic of memory-unplug, which tries to
>   offline [start, end) pages. By this, drivers can allocate 30M or 128M or
>   much bigger memory chunk on demand. (I allocated 1G chunk in my test).
> 
>   But yes, because of fragmentation, this cannot guarantee 100% alloc.
>   If alloc_contig_pages() is called in system boot up or movable_zone is used,
>   this allocation succeeds at high rate.

So this is an alternatve implementation for the functionality offered
by Michal's "The Contiguous Memory Allocator framework".

>   I tested this on x86-64, and it seems to work as expected. But feedback from
>   embeded guys are appreciated because I think they are main user of this
>   function.

>From where I sit, feedback from the embedded guys is *vital*, because
they are indeed the main users.

Michal, I haven't made a note of all the people who are interested in
and who are potential users of this code.  Your patch series has a
billion cc's and is up to version 6.  Could I ask that you review and
test this code, and also hunt down other people (probably at other
organisations) who can do likewise for us?  Because until we hear from
those people that this work satisfies their needs, we can't really
proceed much further.

Thanks.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/4] alloc_contig_pages() move some functions to page_isolation.c
  2010-11-19  8:12   ` KAMEZAWA Hiroyuki
@ 2010-11-21 15:07     ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-21 15:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 05:12:39PM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified range
> of pfn. So, some of core logics can be used for other purpose as
> allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Changelog: 2010/10/26
>  - adjusted to mmotm-1024 + Bob's 3 clean ups.
> Changelog: 2010/10/21
>  - adjusted to mmotm-1020
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/4] alloc_contig_pages() move some functions to page_isolation.c
@ 2010-11-21 15:07     ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-21 15:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 05:12:39PM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Memory hotplug is a logic for making pages unused in the specified range
> of pfn. So, some of core logics can be used for other purpose as
> allocating a very large contigous memory block.
> 
> This patch moves some functions from mm/memory_hotplug.c to
> mm/page_isolation.c. This helps adding a function for large-alloc in
> page_isolation.c with memory-unplug technique.
> 
> Changelog: 2010/10/26
>  - adjusted to mmotm-1024 + Bob's 3 clean ups.
> Changelog: 2010/10/21
>  - adjusted to mmotm-1020
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range
  2010-11-19  8:14   ` KAMEZAWA Hiroyuki
@ 2010-11-21 15:21     ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-21 15:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 05:14:15PM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Unlike memory hotplug, at an allocation of contigous memory range, address
> may not be a problem. IOW, if a requester of memory wants to allocate 100M of
> of contigous memory, placement of allocated memory may not be a problem.
> So, "finding a range of memory which seems to be MOVABLE" is required.
> 
> This patch adds a functon to isolate a length of memory within [start, end).
> This function returns a pfn which is 1st page of isolated contigous chunk
> of given length within [start, end).
> 
> If no_search=true is passed as argument, start address is always same to
> the specified "base" addresss.
> 
> After isolation, free memory within this area will never be allocated.
> But some pages will remain as "Used/LRU" pages. They should be dropped by
> page reclaim or migration.
> 
> Changelog: 2010-11-17
>  - fixed some conding style (if-then-else)
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Minchan Kim <minchan.kim@gmail.com>

Just some trivial comment below. 

Intentionally, I don't add Reviewed-by. 
Instead of it, I add Acked-by since I support this work.

I reviewed your old version but have forgot it. :(
So I will have a time to review your code and then add Reviewed-by.

> ---
>  mm/page_isolation.c |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 146 insertions(+)
> 
> Index: mmotm-1117/mm/page_isolation.c
> ===================================================================
> --- mmotm-1117.orig/mm/page_isolation.c
> +++ mmotm-1117/mm/page_isolation.c
> @@ -7,6 +7,7 @@
>  #include <linux/pageblock-flags.h>
>  #include <linux/memcontrol.h>
>  #include <linux/migrate.h>
> +#include <linux/memory_hotplug.h>
>  #include <linux/mm_inline.h>
>  #include "internal.h"
>  
> @@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
>  out:
>  	return ret;
>  }
> +
> +/*
> + * Functions for getting contiguous MOVABLE pages in a zone.
> + */
> +struct page_range {
> +	unsigned long base; /* Base address of searching contigouous block */
> +	unsigned long end;
> +	unsigned long pages;/* Length of contiguous block */
> +	int align_order;
> +	unsigned long align_mask;
> +};
> +
> +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> +{
> +	struct page_range *blockinfo = arg;
> +	unsigned long end;
> +
> +	end = pfn + nr_pages;
> +	pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> +	end = end & ~(MAX_ORDER_NR_PAGES - 1);
> +
> +	if (end < pfn)
> +		return 0;
> +	if (end - pfn >= blockinfo->pages) {
> +		blockinfo->base = pfn;
> +		blockinfo->end = end;
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +static void __trim_zone(struct zone *zone, struct page_range *range)
> +{
> +	unsigned long pfn;
> +	/*
> + 	 * skip pages which dones'nt under the zone.

                            typo

> + 	 * There are some archs which zones are not in linear layout.
> +	 */
> +	if (page_zone(pfn_to_page(range->base)) != zone) {
> +		for (pfn = range->base;
> +			pfn < range->end;
> +			pfn += MAX_ORDER_NR_PAGES) {
> +			if (page_zone(pfn_to_page(pfn)) == zone)
> +				break;
> +		}
> +		range->base = min(pfn, range->end);
> +	}
> +	/* Here, range-> base is in the zone if range->base != range->end */
> +	for (pfn = range->base;
> +	     pfn < range->end;
> +	     pfn += MAX_ORDER_NR_PAGES) {
> +		if (zone != page_zone(pfn_to_page(pfn))) {
> +			pfn = pfn - MAX_ORDER_NR_PAGES;
> +			break;
> +		}
> +	}
> +	range->end = min(pfn, range->end);
> +	return;
> +}
> +
> +/*
> + * This function is for finding a contiguous memory block which has length
> + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> + * and return the first page's pfn.
> + * This checks all pages in the returned range is free of Pg_LRU. To reduce

                                                              typo

> + * the risk of false-positive testing, lru_add_drain_all() should be called
> + * before this function to reduce pages on pagevec for zones.
> + */
> +
> +static unsigned long find_contig_block(unsigned long base,
> +		unsigned long end, unsigned long pages,
> +		int align_order, struct zone *zone)
> +{
> +	unsigned long pfn, pos;
> +	struct page_range blockinfo;
> +	int ret;
> +
> +	VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> +	VM_BUG_ON(base & ((1 << align_order) - 1));
> +retry:
> +	blockinfo.base = base;
> +	blockinfo.end = end;
> +	blockinfo.pages = pages;
> +	blockinfo.align_order = align_order;
> +	blockinfo.align_mask = (1 << align_order) - 1;
> +	/*
> +	 * At first, check physical page layout and skip memory holes.
> +	 */
> +	ret = walk_system_ram_range(base, end - base, &blockinfo,
> +		__get_contig_block);

We need #include <linux/ioport.h>

> +	if (!ret)
> +		return 0;
> +	/* check contiguous pages in a zone */
> +	__trim_zone(zone, &blockinfo);
> +
> +	/*
> +	 * Ok, we found contiguous memory chunk of size. Isolate it.
> +	 * We just search MAX_ORDER aligned range.
> +	 */
> +	for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
> +	     pfn += (1 << align_order)) {
> +		struct zone *z = page_zone(pfn_to_page(pfn));
> +		if (z != zone)
> +			continue;
> +
> +		spin_lock_irq(&z->lock);
> +		pos = pfn;
> +		/*
> +		 * Check the range only contains free pages or LRU pages.
> +		 */
> +		while (pos < pfn + pages) {
> +			struct page *p;
> +
> +			if (!pfn_valid_within(pos))
> +				break;
> +			p = pfn_to_page(pos);
> +			if (PageReserved(p))
> +				break;
> +			if (!page_count(p)) {
> +				if (!PageBuddy(p))
> +					pos++;
> +				else
> +					pos += (1 << page_order(p));
> +			} else if (PageLRU(p)) {
> +				pos++;
> +			} else
> +				break;
> +		}
> +		spin_unlock_irq(&z->lock);
> +		if ((pos == pfn + pages)) {
> +			if (!start_isolate_page_range(pfn, pfn + pages))
> +				return pfn;
> +		} else/* the chunk including "pos" should be skipped */
> +			pfn = pos & ~((1 << align_order) - 1);
> +		cond_resched();
> +	}
> +
> +	/* failed */
> +	if (blockinfo.end + pages <= end) {
> +		/* Move base address and find the next block of RAM. */
> +		base = blockinfo.end;
> +		goto retry;
> +	}
> +	return 0;
> +}
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range
@ 2010-11-21 15:21     ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-21 15:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 05:14:15PM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Unlike memory hotplug, at an allocation of contigous memory range, address
> may not be a problem. IOW, if a requester of memory wants to allocate 100M of
> of contigous memory, placement of allocated memory may not be a problem.
> So, "finding a range of memory which seems to be MOVABLE" is required.
> 
> This patch adds a functon to isolate a length of memory within [start, end).
> This function returns a pfn which is 1st page of isolated contigous chunk
> of given length within [start, end).
> 
> If no_search=true is passed as argument, start address is always same to
> the specified "base" addresss.
> 
> After isolation, free memory within this area will never be allocated.
> But some pages will remain as "Used/LRU" pages. They should be dropped by
> page reclaim or migration.
> 
> Changelog: 2010-11-17
>  - fixed some conding style (if-then-else)
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Minchan Kim <minchan.kim@gmail.com>

Just some trivial comment below. 

Intentionally, I don't add Reviewed-by. 
Instead of it, I add Acked-by since I support this work.

I reviewed your old version but have forgot it. :(
So I will have a time to review your code and then add Reviewed-by.

> ---
>  mm/page_isolation.c |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 146 insertions(+)
> 
> Index: mmotm-1117/mm/page_isolation.c
> ===================================================================
> --- mmotm-1117.orig/mm/page_isolation.c
> +++ mmotm-1117/mm/page_isolation.c
> @@ -7,6 +7,7 @@
>  #include <linux/pageblock-flags.h>
>  #include <linux/memcontrol.h>
>  #include <linux/migrate.h>
> +#include <linux/memory_hotplug.h>
>  #include <linux/mm_inline.h>
>  #include "internal.h"
>  
> @@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
>  out:
>  	return ret;
>  }
> +
> +/*
> + * Functions for getting contiguous MOVABLE pages in a zone.
> + */
> +struct page_range {
> +	unsigned long base; /* Base address of searching contigouous block */
> +	unsigned long end;
> +	unsigned long pages;/* Length of contiguous block */
> +	int align_order;
> +	unsigned long align_mask;
> +};
> +
> +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> +{
> +	struct page_range *blockinfo = arg;
> +	unsigned long end;
> +
> +	end = pfn + nr_pages;
> +	pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> +	end = end & ~(MAX_ORDER_NR_PAGES - 1);
> +
> +	if (end < pfn)
> +		return 0;
> +	if (end - pfn >= blockinfo->pages) {
> +		blockinfo->base = pfn;
> +		blockinfo->end = end;
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +static void __trim_zone(struct zone *zone, struct page_range *range)
> +{
> +	unsigned long pfn;
> +	/*
> + 	 * skip pages which dones'nt under the zone.

                            typo

> + 	 * There are some archs which zones are not in linear layout.
> +	 */
> +	if (page_zone(pfn_to_page(range->base)) != zone) {
> +		for (pfn = range->base;
> +			pfn < range->end;
> +			pfn += MAX_ORDER_NR_PAGES) {
> +			if (page_zone(pfn_to_page(pfn)) == zone)
> +				break;
> +		}
> +		range->base = min(pfn, range->end);
> +	}
> +	/* Here, range-> base is in the zone if range->base != range->end */
> +	for (pfn = range->base;
> +	     pfn < range->end;
> +	     pfn += MAX_ORDER_NR_PAGES) {
> +		if (zone != page_zone(pfn_to_page(pfn))) {
> +			pfn = pfn - MAX_ORDER_NR_PAGES;
> +			break;
> +		}
> +	}
> +	range->end = min(pfn, range->end);
> +	return;
> +}
> +
> +/*
> + * This function is for finding a contiguous memory block which has length
> + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> + * and return the first page's pfn.
> + * This checks all pages in the returned range is free of Pg_LRU. To reduce

                                                              typo

> + * the risk of false-positive testing, lru_add_drain_all() should be called
> + * before this function to reduce pages on pagevec for zones.
> + */
> +
> +static unsigned long find_contig_block(unsigned long base,
> +		unsigned long end, unsigned long pages,
> +		int align_order, struct zone *zone)
> +{
> +	unsigned long pfn, pos;
> +	struct page_range blockinfo;
> +	int ret;
> +
> +	VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> +	VM_BUG_ON(base & ((1 << align_order) - 1));
> +retry:
> +	blockinfo.base = base;
> +	blockinfo.end = end;
> +	blockinfo.pages = pages;
> +	blockinfo.align_order = align_order;
> +	blockinfo.align_mask = (1 << align_order) - 1;
> +	/*
> +	 * At first, check physical page layout and skip memory holes.
> +	 */
> +	ret = walk_system_ram_range(base, end - base, &blockinfo,
> +		__get_contig_block);

We need #include <linux/ioport.h>

> +	if (!ret)
> +		return 0;
> +	/* check contiguous pages in a zone */
> +	__trim_zone(zone, &blockinfo);
> +
> +	/*
> +	 * Ok, we found contiguous memory chunk of size. Isolate it.
> +	 * We just search MAX_ORDER aligned range.
> +	 */
> +	for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
> +	     pfn += (1 << align_order)) {
> +		struct zone *z = page_zone(pfn_to_page(pfn));
> +		if (z != zone)
> +			continue;
> +
> +		spin_lock_irq(&z->lock);
> +		pos = pfn;
> +		/*
> +		 * Check the range only contains free pages or LRU pages.
> +		 */
> +		while (pos < pfn + pages) {
> +			struct page *p;
> +
> +			if (!pfn_valid_within(pos))
> +				break;
> +			p = pfn_to_page(pos);
> +			if (PageReserved(p))
> +				break;
> +			if (!page_count(p)) {
> +				if (!PageBuddy(p))
> +					pos++;
> +				else
> +					pos += (1 << page_order(p));
> +			} else if (PageLRU(p)) {
> +				pos++;
> +			} else
> +				break;
> +		}
> +		spin_unlock_irq(&z->lock);
> +		if ((pos == pfn + pages)) {
> +			if (!start_isolate_page_range(pfn, pfn + pages))
> +				return pfn;
> +		} else/* the chunk including "pos" should be skipped */
> +			pfn = pos & ~((1 << align_order) - 1);
> +		cond_resched();
> +	}
> +
> +	/* failed */
> +	if (blockinfo.end + pages <= end) {
> +		/* Move base address and find the next block of RAM. */
> +		base = blockinfo.end;
> +		goto retry;
> +	}
> +	return 0;
> +}
> 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration
  2010-11-19  8:15   ` KAMEZAWA Hiroyuki
@ 2010-11-21 15:25     ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-21 15:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 05:15:28PM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Add an function to allocate contiguous memory larger than MAX_ORDER.
> The main difference between usual page allocator is that this uses
> memory offline technique (Isolate pages and migrate remaining pages.).
> 
> I think this is not 100% solution because we can't avoid fragmentation,
> but we have kernelcore= boot option and can create MOVABLE zone. That
> helps us to allow allocate a contiguous range on demand.
> 
> The new function is
> 
>   alloc_contig_pages(base, end, nr_pages, alignment)
> 
> This function will allocate contiguous pages of nr_pages from the range
> [base, end). If [base, end) is bigger than nr_pages, some pfn which
> meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> it will be raised to be MAX_ORDER.
> 
> __alloc_contig_pages() has much more arguments.
> 
> 
> Some drivers allocates contig pages by bootmem or hiding some memory
> from the kernel at boot. But if contig pages are necessary only in some
> situation, kernelcore= boot option and using page migration is a choice.
> 
> Changelog: 2010-11-19
>  - removed no_search
>  - removed some drain_ functions because they are heavy.
>  - check -ENOMEM case
> 
> Changelog: 2010-10-26
>  - support gfp_t
>  - support zonelist/nodemask
>  - support [base, end) 
>  - support alignment
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Minchan Kim <minchan.kim@gmail.com>

Trivial comment below. 

> +EXPORT_SYMBOL_GPL(alloc_contig_pages);
> +
> +struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order)
> +{
> +	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, -1,
> +				GFP_KERNEL | __GFP_MOVABLE, NULL);
> +}

We need include #include <linux/bootmem.h> for using max_pfn. 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration
@ 2010-11-21 15:25     ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-21 15:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 05:15:28PM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Add an function to allocate contiguous memory larger than MAX_ORDER.
> The main difference between usual page allocator is that this uses
> memory offline technique (Isolate pages and migrate remaining pages.).
> 
> I think this is not 100% solution because we can't avoid fragmentation,
> but we have kernelcore= boot option and can create MOVABLE zone. That
> helps us to allow allocate a contiguous range on demand.
> 
> The new function is
> 
>   alloc_contig_pages(base, end, nr_pages, alignment)
> 
> This function will allocate contiguous pages of nr_pages from the range
> [base, end). If [base, end) is bigger than nr_pages, some pfn which
> meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> it will be raised to be MAX_ORDER.
> 
> __alloc_contig_pages() has much more arguments.
> 
> 
> Some drivers allocates contig pages by bootmem or hiding some memory
> from the kernel at boot. But if contig pages are necessary only in some
> situation, kernelcore= boot option and using page migration is a choice.
> 
> Changelog: 2010-11-19
>  - removed no_search
>  - removed some drain_ functions because they are heavy.
>  - check -ENOMEM case
> 
> Changelog: 2010-10-26
>  - support gfp_t
>  - support zonelist/nodemask
>  - support [base, end) 
>  - support alignment
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Minchan Kim <minchan.kim@gmail.com>

Trivial comment below. 

> +EXPORT_SYMBOL_GPL(alloc_contig_pages);
> +
> +struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order)
> +{
> +	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, -1,
> +				GFP_KERNEL | __GFP_MOVABLE, NULL);
> +}

We need include #include <linux/bootmem.h> for using max_pfn. 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
  2010-11-19 20:56   ` Andrew Morton
@ 2010-11-22  0:04     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-22  0:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras,
	kosaki.motohiro

On Fri, 19 Nov 2010 12:56:53 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 19 Nov 2010 17:10:33 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > Hi, this is an updated version. 
> > 
> > No major changes from the last one except for page allocation function.
> > removed RFC.
> > 
> > Order of patches is
> > 
> > [1/4] move some functions from memory_hotplug.c to page_isolation.c
> > [2/4] search physically contiguous range suitable for big chunk alloc.
> > [3/4] allocate big chunk memory based on memory hotplug(migration) technique
> > [4/4] modify page allocation function.
> > 
> > For what:
> > 
> >   I hear there is requirements to allocate a chunk of page which is larger than
> >   MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
> >   they hide some memory range by boot option (mem=) and use hidden memory
> >   for its own purpose. But this seems a lack of feature in memory management.
> > 
> >   This patch adds 
> > 	alloc_contig_pages(start, end, nr_pages, gfp_mask)
> >   to allocate a chunk of page whose length is nr_pages from [start, end)
> >   phys address. This uses similar logic of memory-unplug, which tries to
> >   offline [start, end) pages. By this, drivers can allocate 30M or 128M or
> >   much bigger memory chunk on demand. (I allocated 1G chunk in my test).
> > 
> >   But yes, because of fragmentation, this cannot guarantee 100% alloc.
> >   If alloc_contig_pages() is called in system boot up or movable_zone is used,
> >   this allocation succeeds at high rate.
> 
> So this is an alternatve implementation for the functionality offered
> by Michal's "The Contiguous Memory Allocator framework".
> 

Yes, this will be a backends for that kind of works.

I think there are two ways to allocate contiguous pages larger than MAX_ORDER.

1) hide some memory at boot and add an another memory allocator.
2) support a range allocator as [start, end)

This is an trial from 2). I used memory-hotplug technique because I know some.
This patch itself has no "map" and "management" function, so it should be
developped in another patch (but maybe it will be not my work.)

> >   I tested this on x86-64, and it seems to work as expected. But feedback from
> >   embeded guys are appreciated because I think they are main user of this
> >   function.
> 
> From where I sit, feedback from the embedded guys is *vital*, because
> they are indeed the main users.
> 
> Michal, I haven't made a note of all the people who are interested in
> and who are potential users of this code.  Your patch series has a
> billion cc's and is up to version 6.  Could I ask that you review and
> test this code, and also hunt down other people (probably at other
> organisations) who can do likewise for us?  Because until we hear from
> those people that this work satisfies their needs, we can't really
> proceed much further.
> 

yes. please.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
@ 2010-11-22  0:04     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-22  0:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, andi.kleen, felipe.contreras,
	kosaki.motohiro

On Fri, 19 Nov 2010 12:56:53 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 19 Nov 2010 17:10:33 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > Hi, this is an updated version. 
> > 
> > No major changes from the last one except for page allocation function.
> > removed RFC.
> > 
> > Order of patches is
> > 
> > [1/4] move some functions from memory_hotplug.c to page_isolation.c
> > [2/4] search physically contiguous range suitable for big chunk alloc.
> > [3/4] allocate big chunk memory based on memory hotplug(migration) technique
> > [4/4] modify page allocation function.
> > 
> > For what:
> > 
> >   I hear there is requirements to allocate a chunk of page which is larger than
> >   MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
> >   they hide some memory range by boot option (mem=) and use hidden memory
> >   for its own purpose. But this seems a lack of feature in memory management.
> > 
> >   This patch adds 
> > 	alloc_contig_pages(start, end, nr_pages, gfp_mask)
> >   to allocate a chunk of page whose length is nr_pages from [start, end)
> >   phys address. This uses similar logic of memory-unplug, which tries to
> >   offline [start, end) pages. By this, drivers can allocate 30M or 128M or
> >   much bigger memory chunk on demand. (I allocated 1G chunk in my test).
> > 
> >   But yes, because of fragmentation, this cannot guarantee 100% alloc.
> >   If alloc_contig_pages() is called in system boot up or movable_zone is used,
> >   this allocation succeeds at high rate.
> 
> So this is an alternatve implementation for the functionality offered
> by Michal's "The Contiguous Memory Allocator framework".
> 

Yes, this will be a backends for that kind of works.

I think there are two ways to allocate contiguous pages larger than MAX_ORDER.

1) hide some memory at boot and add an another memory allocator.
2) support a range allocator as [start, end)

This is an trial from 2). I used memory-hotplug technique because I know some.
This patch itself has no "map" and "management" function, so it should be
developped in another patch (but maybe it will be not my work.)

> >   I tested this on x86-64, and it seems to work as expected. But feedback from
> >   embeded guys are appreciated because I think they are main user of this
> >   function.
> 
> From where I sit, feedback from the embedded guys is *vital*, because
> they are indeed the main users.
> 
> Michal, I haven't made a note of all the people who are interested in
> and who are potential users of this code.  Your patch series has a
> billion cc's and is up to version 6.  Could I ask that you review and
> test this code, and also hunt down other people (probably at other
> organisations) who can do likewise for us?  Because until we hear from
> those people that this work satisfies their needs, we can't really
> proceed much further.
> 

yes. please.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range
  2010-11-21 15:21     ` Minchan Kim
@ 2010-11-22  0:11       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-22  0:11 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Mon, 22 Nov 2010 00:21:31 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Acked-by: Minchan Kim <minchan.kim@gmail.com>
> 
> Just some trivial comment below. 
> 
> Intentionally, I don't add Reviewed-by. 
> Instead of it, I add Acked-by since I support this work.
Thanks.

> 
> I reviewed your old version but have forgot it. :(

Sorry, I had a vacation ;(

> So I will have a time to review your code and then add Reviewed-by.
> 
> > ---
> >  mm/page_isolation.c |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 146 insertions(+)
> > 
> > Index: mmotm-1117/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1117.orig/mm/page_isolation.c
> > +++ mmotm-1117/mm/page_isolation.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/pageblock-flags.h>
> >  #include <linux/memcontrol.h>
> >  #include <linux/migrate.h>
> > +#include <linux/memory_hotplug.h>
> >  #include <linux/mm_inline.h>
> >  #include "internal.h"
> >  
> > @@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
> >  out:
> >  	return ret;
> >  }
> > +
> > +/*
> > + * Functions for getting contiguous MOVABLE pages in a zone.
> > + */
> > +struct page_range {
> > +	unsigned long base; /* Base address of searching contigouous block */
> > +	unsigned long end;
> > +	unsigned long pages;/* Length of contiguous block */
> > +	int align_order;
> > +	unsigned long align_mask;
> > +};
> > +
> > +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> > +{
> > +	struct page_range *blockinfo = arg;
> > +	unsigned long end;
> > +
> > +	end = pfn + nr_pages;
> > +	pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> > +	end = end & ~(MAX_ORDER_NR_PAGES - 1);
> > +
> > +	if (end < pfn)
> > +		return 0;
> > +	if (end - pfn >= blockinfo->pages) {
> > +		blockinfo->base = pfn;
> > +		blockinfo->end = end;
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static void __trim_zone(struct zone *zone, struct page_range *range)
> > +{
> > +	unsigned long pfn;
> > +	/*
> > + 	 * skip pages which dones'nt under the zone.
> 
>                             typo
> 
will fix.


> > + 	 * There are some archs which zones are not in linear layout.
> > +	 */
> > +	if (page_zone(pfn_to_page(range->base)) != zone) {
> > +		for (pfn = range->base;
> > +			pfn < range->end;
> > +			pfn += MAX_ORDER_NR_PAGES) {
> > +			if (page_zone(pfn_to_page(pfn)) == zone)
> > +				break;
> > +		}
> > +		range->base = min(pfn, range->end);
> > +	}
> > +	/* Here, range-> base is in the zone if range->base != range->end */
> > +	for (pfn = range->base;
> > +	     pfn < range->end;
> > +	     pfn += MAX_ORDER_NR_PAGES) {
> > +		if (zone != page_zone(pfn_to_page(pfn))) {
> > +			pfn = pfn - MAX_ORDER_NR_PAGES;
> > +			break;
> > +		}
> > +	}
> > +	range->end = min(pfn, range->end);
> > +	return;
> > +}
> > +
> > +/*
> > + * This function is for finding a contiguous memory block which has length
> > + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> > + * and return the first page's pfn.
> > + * This checks all pages in the returned range is free of Pg_LRU. To reduce
> 
>                                                               typo
> 
will lfix.

> > + * the risk of false-positive testing, lru_add_drain_all() should be called
> > + * before this function to reduce pages on pagevec for zones.
> > + */
> > +
> > +static unsigned long find_contig_block(unsigned long base,
> > +		unsigned long end, unsigned long pages,
> > +		int align_order, struct zone *zone)
> > +{
> > +	unsigned long pfn, pos;
> > +	struct page_range blockinfo;
> > +	int ret;
> > +
> > +	VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> > +	VM_BUG_ON(base & ((1 << align_order) - 1));
> > +retry:
> > +	blockinfo.base = base;
> > +	blockinfo.end = end;
> > +	blockinfo.pages = pages;
> > +	blockinfo.align_order = align_order;
> > +	blockinfo.align_mask = (1 << align_order) - 1;
> > +	/*
> > +	 * At first, check physical page layout and skip memory holes.
> > +	 */
> > +	ret = walk_system_ram_range(base, end - base, &blockinfo,
> > +		__get_contig_block);
> 
> We need #include <linux/ioport.h>
> 

ok.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range
@ 2010-11-22  0:11       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-22  0:11 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Mon, 22 Nov 2010 00:21:31 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Acked-by: Minchan Kim <minchan.kim@gmail.com>
> 
> Just some trivial comment below. 
> 
> Intentionally, I don't add Reviewed-by. 
> Instead of it, I add Acked-by since I support this work.
Thanks.

> 
> I reviewed your old version but have forgot it. :(

Sorry, I had a vacation ;(

> So I will have a time to review your code and then add Reviewed-by.
> 
> > ---
> >  mm/page_isolation.c |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 146 insertions(+)
> > 
> > Index: mmotm-1117/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1117.orig/mm/page_isolation.c
> > +++ mmotm-1117/mm/page_isolation.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/pageblock-flags.h>
> >  #include <linux/memcontrol.h>
> >  #include <linux/migrate.h>
> > +#include <linux/memory_hotplug.h>
> >  #include <linux/mm_inline.h>
> >  #include "internal.h"
> >  
> > @@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
> >  out:
> >  	return ret;
> >  }
> > +
> > +/*
> > + * Functions for getting contiguous MOVABLE pages in a zone.
> > + */
> > +struct page_range {
> > +	unsigned long base; /* Base address of searching contigouous block */
> > +	unsigned long end;
> > +	unsigned long pages;/* Length of contiguous block */
> > +	int align_order;
> > +	unsigned long align_mask;
> > +};
> > +
> > +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> > +{
> > +	struct page_range *blockinfo = arg;
> > +	unsigned long end;
> > +
> > +	end = pfn + nr_pages;
> > +	pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> > +	end = end & ~(MAX_ORDER_NR_PAGES - 1);
> > +
> > +	if (end < pfn)
> > +		return 0;
> > +	if (end - pfn >= blockinfo->pages) {
> > +		blockinfo->base = pfn;
> > +		blockinfo->end = end;
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static void __trim_zone(struct zone *zone, struct page_range *range)
> > +{
> > +	unsigned long pfn;
> > +	/*
> > + 	 * skip pages which dones'nt under the zone.
> 
>                             typo
> 
will fix.


> > + 	 * There are some archs which zones are not in linear layout.
> > +	 */
> > +	if (page_zone(pfn_to_page(range->base)) != zone) {
> > +		for (pfn = range->base;
> > +			pfn < range->end;
> > +			pfn += MAX_ORDER_NR_PAGES) {
> > +			if (page_zone(pfn_to_page(pfn)) == zone)
> > +				break;
> > +		}
> > +		range->base = min(pfn, range->end);
> > +	}
> > +	/* Here, range-> base is in the zone if range->base != range->end */
> > +	for (pfn = range->base;
> > +	     pfn < range->end;
> > +	     pfn += MAX_ORDER_NR_PAGES) {
> > +		if (zone != page_zone(pfn_to_page(pfn))) {
> > +			pfn = pfn - MAX_ORDER_NR_PAGES;
> > +			break;
> > +		}
> > +	}
> > +	range->end = min(pfn, range->end);
> > +	return;
> > +}
> > +
> > +/*
> > + * This function is for finding a contiguous memory block which has length
> > + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> > + * and return the first page's pfn.
> > + * This checks all pages in the returned range is free of Pg_LRU. To reduce
> 
>                                                               typo
> 
will lfix.

> > + * the risk of false-positive testing, lru_add_drain_all() should be called
> > + * before this function to reduce pages on pagevec for zones.
> > + */
> > +
> > +static unsigned long find_contig_block(unsigned long base,
> > +		unsigned long end, unsigned long pages,
> > +		int align_order, struct zone *zone)
> > +{
> > +	unsigned long pfn, pos;
> > +	struct page_range blockinfo;
> > +	int ret;
> > +
> > +	VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> > +	VM_BUG_ON(base & ((1 << align_order) - 1));
> > +retry:
> > +	blockinfo.base = base;
> > +	blockinfo.end = end;
> > +	blockinfo.pages = pages;
> > +	blockinfo.align_order = align_order;
> > +	blockinfo.align_mask = (1 << align_order) - 1;
> > +	/*
> > +	 * At first, check physical page layout and skip memory holes.
> > +	 */
> > +	ret = walk_system_ram_range(base, end - base, &blockinfo,
> > +		__get_contig_block);
> 
> We need #include <linux/ioport.h>
> 

ok.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration
  2010-11-21 15:25     ` Minchan Kim
@ 2010-11-22  0:13       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-22  0:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Mon, 22 Nov 2010 00:25:56 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Fri, Nov 19, 2010 at 05:15:28PM +0900, KAMEZAWA Hiroyuki wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Add an function to allocate contiguous memory larger than MAX_ORDER.
> > The main difference between usual page allocator is that this uses
> > memory offline technique (Isolate pages and migrate remaining pages.).
> > 
> > I think this is not 100% solution because we can't avoid fragmentation,
> > but we have kernelcore= boot option and can create MOVABLE zone. That
> > helps us to allow allocate a contiguous range on demand.
> > 
> > The new function is
> > 
> >   alloc_contig_pages(base, end, nr_pages, alignment)
> > 
> > This function will allocate contiguous pages of nr_pages from the range
> > [base, end). If [base, end) is bigger than nr_pages, some pfn which
> > meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> > it will be raised to be MAX_ORDER.
> > 
> > __alloc_contig_pages() has much more arguments.
> > 
> > 
> > Some drivers allocates contig pages by bootmem or hiding some memory
> > from the kernel at boot. But if contig pages are necessary only in some
> > situation, kernelcore= boot option and using page migration is a choice.
> > 
> > Changelog: 2010-11-19
> >  - removed no_search
> >  - removed some drain_ functions because they are heavy.
> >  - check -ENOMEM case
> > 
> > Changelog: 2010-10-26
> >  - support gfp_t
> >  - support zonelist/nodemask
> >  - support [base, end) 
> >  - support alignment
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Acked-by: Minchan Kim <minchan.kim@gmail.com>
> 
> Trivial comment below. 
> 
> > +EXPORT_SYMBOL_GPL(alloc_contig_pages);
> > +
> > +struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order)
> > +{
> > +	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, -1,
> > +				GFP_KERNEL | __GFP_MOVABLE, NULL);
> > +}
> 
> We need include #include <linux/bootmem.h> for using max_pfn. 
> 

will add that.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration
@ 2010-11-22  0:13       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-22  0:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Mon, 22 Nov 2010 00:25:56 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Fri, Nov 19, 2010 at 05:15:28PM +0900, KAMEZAWA Hiroyuki wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Add an function to allocate contiguous memory larger than MAX_ORDER.
> > The main difference between usual page allocator is that this uses
> > memory offline technique (Isolate pages and migrate remaining pages.).
> > 
> > I think this is not 100% solution because we can't avoid fragmentation,
> > but we have kernelcore= boot option and can create MOVABLE zone. That
> > helps us to allow allocate a contiguous range on demand.
> > 
> > The new function is
> > 
> >   alloc_contig_pages(base, end, nr_pages, alignment)
> > 
> > This function will allocate contiguous pages of nr_pages from the range
> > [base, end). If [base, end) is bigger than nr_pages, some pfn which
> > meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> > it will be raised to be MAX_ORDER.
> > 
> > __alloc_contig_pages() has much more arguments.
> > 
> > 
> > Some drivers allocates contig pages by bootmem or hiding some memory
> > from the kernel at boot. But if contig pages are necessary only in some
> > situation, kernelcore= boot option and using page migration is a choice.
> > 
> > Changelog: 2010-11-19
> >  - removed no_search
> >  - removed some drain_ functions because they are heavy.
> >  - check -ENOMEM case
> > 
> > Changelog: 2010-10-26
> >  - support gfp_t
> >  - support zonelist/nodemask
> >  - support [base, end) 
> >  - support alignment
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Acked-by: Minchan Kim <minchan.kim@gmail.com>
> 
> Trivial comment below. 
> 
> > +EXPORT_SYMBOL_GPL(alloc_contig_pages);
> > +
> > +struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order)
> > +{
> > +	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, -1,
> > +				GFP_KERNEL | __GFP_MOVABLE, NULL);
> > +}
> 
> We need include #include <linux/bootmem.h> for using max_pfn. 
> 

will add that.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
  2010-11-19 20:56   ` Andrew Morton
@ 2010-11-22  0:30     ` Felipe Contreras
  -1 siblings, 0 replies; 44+ messages in thread
From: Felipe Contreras @ 2010-11-22  0:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, minchan.kim, Bob Liu,
	fujita.tomonori, m.nazarewicz, pawel, andi.kleen,
	kosaki.motohiro

On Fri, Nov 19, 2010 at 10:56 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Fri, 19 Nov 2010 17:10:33 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
>> Hi, this is an updated version.
>>
>> No major changes from the last one except for page allocation function.
>> removed RFC.
>>
>> Order of patches is
>>
>> [1/4] move some functions from memory_hotplug.c to page_isolation.c
>> [2/4] search physically contiguous range suitable for big chunk alloc.
>> [3/4] allocate big chunk memory based on memory hotplug(migration) technique
>> [4/4] modify page allocation function.
>>
>> For what:
>>
>>   I hear there is requirements to allocate a chunk of page which is larger than
>>   MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
>>   they hide some memory range by boot option (mem=) and use hidden memory
>>   for its own purpose. But this seems a lack of feature in memory management.

Actually, now that's not needed any more by using memblock:
http://article.gmane.org/gmane.linux.ports.arm.omap/44978

>>   This patch adds
>>       alloc_contig_pages(start, end, nr_pages, gfp_mask)
>>   to allocate a chunk of page whose length is nr_pages from [start, end)
>>   phys address. This uses similar logic of memory-unplug, which tries to
>>   offline [start, end) pages. By this, drivers can allocate 30M or 128M or
>>   much bigger memory chunk on demand. (I allocated 1G chunk in my test).
>>
>>   But yes, because of fragmentation, this cannot guarantee 100% alloc.
>>   If alloc_contig_pages() is called in system boot up or movable_zone is used,
>>   this allocation succeeds at high rate.
>
> So this is an alternatve implementation for the functionality offered
> by Michal's "The Contiguous Memory Allocator framework".
>
>>   I tested this on x86-64, and it seems to work as expected. But feedback from
>>   embeded guys are appreciated because I think they are main user of this
>>   function.
>
> From where I sit, feedback from the embedded guys is *vital*, because
> they are indeed the main users.
>
> Michal, I haven't made a note of all the people who are interested in
> and who are potential users of this code.  Your patch series has a
> billion cc's and is up to version 6.  Could I ask that you review and
> test this code, and also hunt down other people (probably at other
> organisations) who can do likewise for us?  Because until we hear from
> those people that this work satisfies their needs, we can't really
> proceed much further.

As I've explained before, a contiguous memory allocator would be nice,
but on ARM many drivers not only need contiguous memory, but
non-cacheable, and this requires removing the memory from normal
kernel mapping in early boot.

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
@ 2010-11-22  0:30     ` Felipe Contreras
  0 siblings, 0 replies; 44+ messages in thread
From: Felipe Contreras @ 2010-11-22  0:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, minchan.kim, Bob Liu,
	fujita.tomonori, m.nazarewicz, pawel, andi.kleen,
	kosaki.motohiro

On Fri, Nov 19, 2010 at 10:56 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Fri, 19 Nov 2010 17:10:33 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
>> Hi, this is an updated version.
>>
>> No major changes from the last one except for page allocation function.
>> removed RFC.
>>
>> Order of patches is
>>
>> [1/4] move some functions from memory_hotplug.c to page_isolation.c
>> [2/4] search physically contiguous range suitable for big chunk alloc.
>> [3/4] allocate big chunk memory based on memory hotplug(migration) technique
>> [4/4] modify page allocation function.
>>
>> For what:
>>
>>   I hear there is requirements to allocate a chunk of page which is larger than
>>   MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
>>   they hide some memory range by boot option (mem=) and use hidden memory
>>   for its own purpose. But this seems a lack of feature in memory management.

Actually, now that's not needed any more by using memblock:
http://article.gmane.org/gmane.linux.ports.arm.omap/44978

>>   This patch adds
>>       alloc_contig_pages(start, end, nr_pages, gfp_mask)
>>   to allocate a chunk of page whose length is nr_pages from [start, end)
>>   phys address. This uses similar logic of memory-unplug, which tries to
>>   offline [start, end) pages. By this, drivers can allocate 30M or 128M or
>>   much bigger memory chunk on demand. (I allocated 1G chunk in my test).
>>
>>   But yes, because of fragmentation, this cannot guarantee 100% alloc.
>>   If alloc_contig_pages() is called in system boot up or movable_zone is used,
>>   this allocation succeeds at high rate.
>
> So this is an alternatve implementation for the functionality offered
> by Michal's "The Contiguous Memory Allocator framework".
>
>>   I tested this on x86-64, and it seems to work as expected. But feedback from
>>   embeded guys are appreciated because I think they are main user of this
>>   function.
>
> From where I sit, feedback from the embedded guys is *vital*, because
> they are indeed the main users.
>
> Michal, I haven't made a note of all the people who are interested in
> and who are potential users of this code.  Your patch series has a
> billion cc's and is up to version 6.  Could I ask that you review and
> test this code, and also hunt down other people (probably at other
> organisations) who can do likewise for us?  Because until we hear from
> those people that this work satisfies their needs, we can't really
> proceed much further.

As I've explained before, a contiguous memory allocator would be nice,
but on ARM many drivers not only need contiguous memory, but
non-cacheable, and this requires removing the memory from normal
kernel mapping in early boot.

Cheers.

-- 
Felipe Contreras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 0/4] big chunk memory allocator v4
  2010-11-19 20:56   ` Andrew Morton
@ 2010-11-22  8:59     ` Kleen, Andi
  -1 siblings, 0 replies; 44+ messages in thread
From: Kleen, Andi @ 2010-11-22  8:59 UTC (permalink / raw)
  To: Andrew Morton, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, felipe.contreras, kosaki.motohiro

> >   But yes, because of fragmentation, this cannot guarantee 100%
> alloc.
> >   If alloc_contig_pages() is called in system boot up or movable_zone
> is used,
> >   this allocation succeeds at high rate.
> 
> So this is an alternatve implementation for the functionality offered
> by Michal's "The Contiguous Memory Allocator framework".

I see them more as orthogonal: Michal's code relies on preallocation
and manages the memory after that.

This code supplies the infrastructure to replace preallocation
with just using movable zones.

-Andi



^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 0/4] big chunk memory allocator v4
@ 2010-11-22  8:59     ` Kleen, Andi
  0 siblings, 0 replies; 44+ messages in thread
From: Kleen, Andi @ 2010-11-22  8:59 UTC (permalink / raw)
  To: Andrew Morton, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	m.nazarewicz, pawel, felipe.contreras, kosaki.motohiro

> >   But yes, because of fragmentation, this cannot guarantee 100%
> alloc.
> >   If alloc_contig_pages() is called in system boot up or movable_zone
> is used,
> >   this allocation succeeds at high rate.
> 
> So this is an alternatve implementation for the functionality offered
> by Michal's "The Contiguous Memory Allocator framework".

I see them more as orthogonal: Michal's code relies on preallocation
and manages the memory after that.

This code supplies the infrastructure to replace preallocation
with just using movable zones.

-Andi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range
  2010-11-19  8:14   ` KAMEZAWA Hiroyuki
@ 2010-11-22 11:20     ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-22 11:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 5:14 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Unlike memory hotplug, at an allocation of contigous memory range, address
> may not be a problem. IOW, if a requester of memory wants to allocate 100M of
> of contigous memory, placement of allocated memory may not be a problem.
> So, "finding a range of memory which seems to be MOVABLE" is required.
>
> This patch adds a functon to isolate a length of memory within [start, end).
> This function returns a pfn which is 1st page of isolated contigous chunk
> of given length within [start, end).
>
> If no_search=true is passed as argument, start address is always same to
> the specified "base" addresss.
>
> After isolation, free memory within this area will never be allocated.
> But some pages will remain as "Used/LRU" pages. They should be dropped by
> page reclaim or migration.
>
> Changelog: 2010-11-17
>  - fixed some conding style (if-then-else)
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/page_isolation.c |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 146 insertions(+)
>
> Index: mmotm-1117/mm/page_isolation.c
> ===================================================================
> --- mmotm-1117.orig/mm/page_isolation.c
> +++ mmotm-1117/mm/page_isolation.c
> @@ -7,6 +7,7 @@
>  #include <linux/pageblock-flags.h>
>  #include <linux/memcontrol.h>
>  #include <linux/migrate.h>
> +#include <linux/memory_hotplug.h>
>  #include <linux/mm_inline.h>
>  #include "internal.h"
>
> @@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
>  out:
>        return ret;
>  }
> +
> +/*
> + * Functions for getting contiguous MOVABLE pages in a zone.
> + */
> +struct page_range {
> +       unsigned long base; /* Base address of searching contigouous block */
> +       unsigned long end;
> +       unsigned long pages;/* Length of contiguous block */

Nitpick.
You used nr_pages in other place.
I hope you use the name consistent.

> +       int align_order;
> +       unsigned long align_mask;

Does we really need this field 'align_mask'?
We can get always from align_order.

> +};
> +
> +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> +{
> +       struct page_range *blockinfo = arg;
> +       unsigned long end;
> +
> +       end = pfn + nr_pages;
> +       pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> +       end = end & ~(MAX_ORDER_NR_PAGES - 1);
> +
> +       if (end < pfn)
> +               return 0;
> +       if (end - pfn >= blockinfo->pages) {
> +               blockinfo->base = pfn;
> +               blockinfo->end = end;
> +               return 1;
> +       }
> +       return 0;
> +}
> +
> +static void __trim_zone(struct zone *zone, struct page_range *range)
> +{
> +       unsigned long pfn;
> +       /*
> +        * skip pages which dones'nt under the zone.

typo dones'nt -> doesn't :)

> +        * There are some archs which zones are not in linear layout.
> +        */
> +       if (page_zone(pfn_to_page(range->base)) != zone) {
> +               for (pfn = range->base;
> +                       pfn < range->end;
> +                       pfn += MAX_ORDER_NR_PAGES) {
> +                       if (page_zone(pfn_to_page(pfn)) == zone)
> +                               break;
> +               }
> +               range->base = min(pfn, range->end);
> +       }
> +       /* Here, range-> base is in the zone if range->base != range->end */
> +       for (pfn = range->base;
> +            pfn < range->end;
> +            pfn += MAX_ORDER_NR_PAGES) {
> +               if (zone != page_zone(pfn_to_page(pfn))) {
> +                       pfn = pfn - MAX_ORDER_NR_PAGES;
> +                       break;
> +               }
> +       }
> +       range->end = min(pfn, range->end);
> +       return;

Remove return

> +}
> +
> +/*
> + * This function is for finding a contiguous memory block which has length
> + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> + * and return the first page's pfn.
> + * This checks all pages in the returned range is free of Pg_LRU. To reduce
> + * the risk of false-positive testing, lru_add_drain_all() should be called
> + * before this function to reduce pages on pagevec for zones.
> + */
> +
> +static unsigned long find_contig_block(unsigned long base,
> +               unsigned long end, unsigned long pages,
> +               int align_order, struct zone *zone)
> +{
> +       unsigned long pfn, pos;
> +       struct page_range blockinfo;
> +       int ret;
> +
> +       VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> +       VM_BUG_ON(base & ((1 << align_order) - 1));
> +retry:
> +       blockinfo.base = base;
> +       blockinfo.end = end;
> +       blockinfo.pages = pages;
> +       blockinfo.align_order = align_order;
> +       blockinfo.align_mask = (1 << align_order) - 1;

We don't need this.

> +       /*
> +        * At first, check physical page layout and skip memory holes.
> +        */
> +       ret = walk_system_ram_range(base, end - base, &blockinfo,
> +               __get_contig_block);
> +       if (!ret)
> +               return 0;
> +       /* check contiguous pages in a zone */
> +       __trim_zone(zone, &blockinfo);
> +
> +       /*
> +        * Ok, we found contiguous memory chunk of size. Isolate it.
> +        * We just search MAX_ORDER aligned range.
> +        */
> +       for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
> +            pfn += (1 << align_order)) {
> +               struct zone *z = page_zone(pfn_to_page(pfn));
> +               if (z != zone)
> +                       continue;

Could we make sure pass __trim_zone is to satisfy whole pfn in zone
what we want.
Repeated the zone check is rather annoying.
I mean let's __get_contig_block or __trim_zone already does check zone
so that we remove the zone check in here.

> +
> +               spin_lock_irq(&z->lock);
> +               pos = pfn;
> +               /*
> +                * Check the range only contains free pages or LRU pages.
> +                */
> +               while (pos < pfn + pages) {
> +                       struct page *p;
> +
> +                       if (!pfn_valid_within(pos))
> +                               break;
> +                       p = pfn_to_page(pos);
> +                       if (PageReserved(p))
> +                               break;
> +                       if (!page_count(p)) {
> +                               if (!PageBuddy(p))
> +                                       pos++;
> +                               else
> +                                       pos += (1 << page_order(p));
> +                       } else if (PageLRU(p)) {

Could we check get_pageblock_migratetype(page) == MIGRATE_MOVABLE in
here and early bail out?

> +                               pos++;
> +                       } else
> +                               break;
> +               }
> +               spin_unlock_irq(&z->lock);
> +               if ((pos == pfn + pages)) {
> +                       if (!start_isolate_page_range(pfn, pfn + pages))
> +                               return pfn;
> +               } else/* the chunk including "pos" should be skipped */
> +                       pfn = pos & ~((1 << align_order) - 1);
> +               cond_resched();
> +       }
> +
> +       /* failed */
> +       if (blockinfo.end + pages <= end) {
> +               /* Move base address and find the next block of RAM. */
> +               base = blockinfo.end;
> +               goto retry;
> +       }
> +       return 0;

If the base is 0, isn't it impossible return pfn 0?
x86 in FLAT isn't impossible but I think some architecture might be possible.
Just guessing.

How about returning negative value and return first page pfn and last
page pfn as out parameter base, end?

> +}
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range
@ 2010-11-22 11:20     ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-22 11:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 5:14 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Unlike memory hotplug, at an allocation of contigous memory range, address
> may not be a problem. IOW, if a requester of memory wants to allocate 100M of
> of contigous memory, placement of allocated memory may not be a problem.
> So, "finding a range of memory which seems to be MOVABLE" is required.
>
> This patch adds a functon to isolate a length of memory within [start, end).
> This function returns a pfn which is 1st page of isolated contigous chunk
> of given length within [start, end).
>
> If no_search=true is passed as argument, start address is always same to
> the specified "base" addresss.
>
> After isolation, free memory within this area will never be allocated.
> But some pages will remain as "Used/LRU" pages. They should be dropped by
> page reclaim or migration.
>
> Changelog: 2010-11-17
>  - fixed some conding style (if-then-else)
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/page_isolation.c |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 146 insertions(+)
>
> Index: mmotm-1117/mm/page_isolation.c
> ===================================================================
> --- mmotm-1117.orig/mm/page_isolation.c
> +++ mmotm-1117/mm/page_isolation.c
> @@ -7,6 +7,7 @@
>  #include <linux/pageblock-flags.h>
>  #include <linux/memcontrol.h>
>  #include <linux/migrate.h>
> +#include <linux/memory_hotplug.h>
>  #include <linux/mm_inline.h>
>  #include "internal.h"
>
> @@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
>  out:
>        return ret;
>  }
> +
> +/*
> + * Functions for getting contiguous MOVABLE pages in a zone.
> + */
> +struct page_range {
> +       unsigned long base; /* Base address of searching contigouous block */
> +       unsigned long end;
> +       unsigned long pages;/* Length of contiguous block */

Nitpick.
You used nr_pages in other place.
I hope you use the name consistent.

> +       int align_order;
> +       unsigned long align_mask;

Does we really need this field 'align_mask'?
We can get always from align_order.

> +};
> +
> +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> +{
> +       struct page_range *blockinfo = arg;
> +       unsigned long end;
> +
> +       end = pfn + nr_pages;
> +       pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> +       end = end & ~(MAX_ORDER_NR_PAGES - 1);
> +
> +       if (end < pfn)
> +               return 0;
> +       if (end - pfn >= blockinfo->pages) {
> +               blockinfo->base = pfn;
> +               blockinfo->end = end;
> +               return 1;
> +       }
> +       return 0;
> +}
> +
> +static void __trim_zone(struct zone *zone, struct page_range *range)
> +{
> +       unsigned long pfn;
> +       /*
> +        * skip pages which dones'nt under the zone.

typo dones'nt -> doesn't :)

> +        * There are some archs which zones are not in linear layout.
> +        */
> +       if (page_zone(pfn_to_page(range->base)) != zone) {
> +               for (pfn = range->base;
> +                       pfn < range->end;
> +                       pfn += MAX_ORDER_NR_PAGES) {
> +                       if (page_zone(pfn_to_page(pfn)) == zone)
> +                               break;
> +               }
> +               range->base = min(pfn, range->end);
> +       }
> +       /* Here, range-> base is in the zone if range->base != range->end */
> +       for (pfn = range->base;
> +            pfn < range->end;
> +            pfn += MAX_ORDER_NR_PAGES) {
> +               if (zone != page_zone(pfn_to_page(pfn))) {
> +                       pfn = pfn - MAX_ORDER_NR_PAGES;
> +                       break;
> +               }
> +       }
> +       range->end = min(pfn, range->end);
> +       return;

Remove return

> +}
> +
> +/*
> + * This function is for finding a contiguous memory block which has length
> + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> + * and return the first page's pfn.
> + * This checks all pages in the returned range is free of Pg_LRU. To reduce
> + * the risk of false-positive testing, lru_add_drain_all() should be called
> + * before this function to reduce pages on pagevec for zones.
> + */
> +
> +static unsigned long find_contig_block(unsigned long base,
> +               unsigned long end, unsigned long pages,
> +               int align_order, struct zone *zone)
> +{
> +       unsigned long pfn, pos;
> +       struct page_range blockinfo;
> +       int ret;
> +
> +       VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> +       VM_BUG_ON(base & ((1 << align_order) - 1));
> +retry:
> +       blockinfo.base = base;
> +       blockinfo.end = end;
> +       blockinfo.pages = pages;
> +       blockinfo.align_order = align_order;
> +       blockinfo.align_mask = (1 << align_order) - 1;

We don't need this.

> +       /*
> +        * At first, check physical page layout and skip memory holes.
> +        */
> +       ret = walk_system_ram_range(base, end - base, &blockinfo,
> +               __get_contig_block);
> +       if (!ret)
> +               return 0;
> +       /* check contiguous pages in a zone */
> +       __trim_zone(zone, &blockinfo);
> +
> +       /*
> +        * Ok, we found contiguous memory chunk of size. Isolate it.
> +        * We just search MAX_ORDER aligned range.
> +        */
> +       for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
> +            pfn += (1 << align_order)) {
> +               struct zone *z = page_zone(pfn_to_page(pfn));
> +               if (z != zone)
> +                       continue;

Could we make sure pass __trim_zone is to satisfy whole pfn in zone
what we want.
Repeated the zone check is rather annoying.
I mean let's __get_contig_block or __trim_zone already does check zone
so that we remove the zone check in here.

> +
> +               spin_lock_irq(&z->lock);
> +               pos = pfn;
> +               /*
> +                * Check the range only contains free pages or LRU pages.
> +                */
> +               while (pos < pfn + pages) {
> +                       struct page *p;
> +
> +                       if (!pfn_valid_within(pos))
> +                               break;
> +                       p = pfn_to_page(pos);
> +                       if (PageReserved(p))
> +                               break;
> +                       if (!page_count(p)) {
> +                               if (!PageBuddy(p))
> +                                       pos++;
> +                               else
> +                                       pos += (1 << page_order(p));
> +                       } else if (PageLRU(p)) {

Could we check get_pageblock_migratetype(page) == MIGRATE_MOVABLE in
here and early bail out?

> +                               pos++;
> +                       } else
> +                               break;
> +               }
> +               spin_unlock_irq(&z->lock);
> +               if ((pos == pfn + pages)) {
> +                       if (!start_isolate_page_range(pfn, pfn + pages))
> +                               return pfn;
> +               } else/* the chunk including "pos" should be skipped */
> +                       pfn = pos & ~((1 << align_order) - 1);
> +               cond_resched();
> +       }
> +
> +       /* failed */
> +       if (blockinfo.end + pages <= end) {
> +               /* Move base address and find the next block of RAM. */
> +               base = blockinfo.end;
> +               goto retry;
> +       }
> +       return 0;

If the base is 0, isn't it impossible return pfn 0?
x86 in FLAT isn't impossible but I think some architecture might be possible.
Just guessing.

How about returning negative value and return first page pfn and last
page pfn as out parameter base, end?

> +}
>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration
  2010-11-19  8:15   ` KAMEZAWA Hiroyuki
@ 2010-11-22 11:44     ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-22 11:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 5:15 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Add an function to allocate contiguous memory larger than MAX_ORDER.
> The main difference between usual page allocator is that this uses
> memory offline technique (Isolate pages and migrate remaining pages.).
>
> I think this is not 100% solution because we can't avoid fragmentation,
> but we have kernelcore= boot option and can create MOVABLE zone. That
> helps us to allow allocate a contiguous range on demand.

And later we can use compaction and reclaim, too.
So I think this approach is the way we have to go.

>
> The new function is
>
>  alloc_contig_pages(base, end, nr_pages, alignment)
>
> This function will allocate contiguous pages of nr_pages from the range
> [base, end). If [base, end) is bigger than nr_pages, some pfn which
> meats alignment will be allocated. If alignment is smaller than MAX_ORDER,

type meet

> it will be raised to be MAX_ORDER.
>
> __alloc_contig_pages() has much more arguments.
>
>
> Some drivers allocates contig pages by bootmem or hiding some memory
> from the kernel at boot. But if contig pages are necessary only in some
> situation, kernelcore= boot option and using page migration is a choice.
>
> Changelog: 2010-11-19
>  - removed no_search
>  - removed some drain_ functions because they are heavy.
>  - check -ENOMEM case
>
> Changelog: 2010-10-26
>  - support gfp_t
>  - support zonelist/nodemask
>  - support [base, end)
>  - support alignment
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page-isolation.h |   15 ++
>  mm/page_alloc.c                |   29 ++++
>  mm/page_isolation.c            |  242 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 286 insertions(+)
>
> Index: mmotm-1117/mm/page_isolation.c
> ===================================================================
> --- mmotm-1117.orig/mm/page_isolation.c
> +++ mmotm-1117/mm/page_isolation.c
> @@ -5,6 +5,7 @@
>  #include <linux/mm.h>
>  #include <linux/page-isolation.h>
>  #include <linux/pageblock-flags.h>
> +#include <linux/swap.h>
>  #include <linux/memcontrol.h>
>  #include <linux/migrate.h>
>  #include <linux/memory_hotplug.h>
> @@ -396,3 +397,244 @@ retry:
>        }
>        return 0;
>  }
> +
> +/*
> + * Comparing caller specified [user_start, user_end) with physical memory layout
> + * [phys_start, phys_end). If no intersection is longer than nr_pages, return 1.
> + * If there is an intersection, return 0 and fill range in [*start, *end)

I understand the goal of function.
But comment is rather awkward.

> + */
> +static int
> +__calc_search_range(unsigned long user_start, unsigned long user_end,

Personally, I don't like the function name.
How about "__adjust_search_range"?
But I am not against this name strongly. :)

> +               unsigned long nr_pages,
> +               unsigned long phys_start, unsigned long phys_end,
> +               unsigned long *start, unsigned long *end)
> +{
> +       if ((user_start >= phys_end) || (user_end <= phys_start))
> +               return 1;
> +       if (user_start <= phys_start) {
> +               *start = phys_start;
> +               *end = min(user_end, phys_end);
> +       } else {
> +               *start = user_start;
> +               *end = min(user_end, phys_end);
> +       }
> +       if (*end - *start < nr_pages)
> +               return 1;
> +       return 0;
> +}
> +
> +
> +/**
> + * __alloc_contig_pages - allocate a contiguous physical pages
> + * @base: the lowest pfn which caller wants.
> + * @end:  the highest pfn which caller wants.
> + * @nr_pages: the length of a chunk of pages to be allocated.

the number of pages to be allocated.

> + * @align_order: alignment of start address of returned chunk in order.
> + *   Returned' page's order will be aligned to (1 << align_order).If smaller
> + *   than MAX_ORDER, it's raised to MAX_ORDER.
> + * @node: allocate near memory to the node, If -1, current node is used.
> + * @gfpflag: used to specify what zone the memory should be from.
> + * @nodemask: allocate memory within the nodemask.
> + *
> + * Search a memory range [base, end) and allocates physically contiguous
> + * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
> + * be allocated
> + *
> + * This returns a page of the beginning of contiguous block. At failure, NULL
> + * is returned.
> + *
> + * Limitation: at allocation, nr_pages may be increased to be aligned to
> + * MAX_ORDER before searching a range. So, even if there is a enough chunk
> + * for nr_pages, it may not be able to be allocated. Extra tail pages of
> + * allocated chunk is returned to buddy allocator before returning the caller.
> + */
> +
> +#define MIGRATION_RETRY        (5)
> +struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
> +                       unsigned long nr_pages, int align_order,
> +                       int node, gfp_t gfpflag, nodemask_t *mask)
> +{
> +       unsigned long found, aligned_pages, start;
> +       struct page *ret = NULL;
> +       int migration_failed;
> +       unsigned long align_mask;
> +       struct zoneref *z;
> +       struct zone *zone;
> +       struct zonelist *zonelist;
> +       enum zone_type highzone_idx = gfp_zone(gfpflag);
> +       unsigned long zone_start, zone_end, rs, re, pos;
> +
> +       if (node == -1)
> +               node = numa_node_id();
> +
> +       /* check unsupported flags */
> +       if (gfpflag & __GFP_NORETRY)
> +               return NULL;
> +       if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS)) !=
> +               (__GFP_WAIT | __GFP_IO | __GFP_FS))
> +               return NULL;

Why do we have to care about __GFP_IO|__GFP_FS?
If you consider compaction/reclaim later, I am OK.

> +
> +       if (gfpflag & __GFP_THISNODE)
> +               zonelist = &NODE_DATA(node)->node_zonelists[1];
> +       else
> +               zonelist = &NODE_DATA(node)->node_zonelists[0];
> +       /*
> +        * Base/nr_page/end should be aligned to MAX_ORDER
> +        */
> +       found = 0;
> +
> +       if (align_order < MAX_ORDER)
> +               align_order = MAX_ORDER;
> +
> +       align_mask = (1 << align_order) - 1;
> +       /*
> +        * We allocates MAX_ORDER aligned pages and cut tail pages later.
> +        */
> +       aligned_pages = ALIGN(nr_pages, (1 << MAX_ORDER));
> +       /*
> +        * If end - base == nr_pages, we can't search range. base must be
> +        * aligned.
> +        */
> +       if ((end - base == nr_pages) && (base & align_mask))
> +               return NULL;
> +
> +       base = ALIGN(base, (1 << align_order));
> +       if ((end <= base) || (end - base < aligned_pages))
> +               return NULL;
> +
> +       /*
> +        * searching contig memory range within [pos, end).
> +        * pos is updated at migration failure to find next chunk in zone.
> +        * pos is reset to the base at searching next zone.
> +        * (see for_each_zone_zonelist_nodemask in mmzone.h)
> +        *
> +        * Note: we cannot assume zones/nodes are in linear memory layout.
> +        */
> +       z = first_zones_zonelist(zonelist, highzone_idx, mask, &zone);
> +       pos = base;
> +retry:
> +       if (!zone)
> +               return NULL;
> +
> +       zone_start = ALIGN(zone->zone_start_pfn, 1 << align_order);
> +       zone_end = zone->zone_start_pfn + zone->spanned_pages;
> +
> +       /* check [pos, end) is in this zone. */
> +       if ((pos >= end) ||
> +            (__calc_search_range(pos, end, aligned_pages,
> +                       zone_start, zone_end, &rs, &re))) {
> +next_zone:
> +               /* go to the next zone */
> +               z = next_zones_zonelist(++z, highzone_idx, mask, &zone);
> +               /* reset the pos */
> +               pos = base;
> +               goto retry;
> +       }
> +       /* [pos, end) is trimmed to [rs, re) in this zone. */
> +       pos = rs;

The 'pos' doesn't used any more at below.

> +
> +       found = find_contig_block(rs, re, aligned_pages, align_order, zone);
> +       if (!found)
> +               goto next_zone;
> +
> +       /*
> +        * Because we isolated the range, free pages in the range will never
> +        * be (re)allocated. scan_lru_pages() finds the next PG_lru page in
> +        * the range and returns 0 if it reaches the end.
> +        */
> +       migration_failed = 0;
> +       rs = found;
> +       re = found + aligned_pages;
> +       for (rs = scan_lru_pages(rs, re);
> +            rs && rs < re;
> +            rs = scan_lru_pages(rs, re)) {
> +               int rc = do_migrate_range(rs, re);
> +               if (!rc)
> +                       migration_failed = 0;
> +               else {
> +                       /* it's better to try another block ? */
> +                       if (++migration_failed >= MIGRATION_RETRY)
> +                               break;
> +                       if (rc == -EBUSY) {
> +                               /* There are unstable pages.on pagevec. */
> +                               lru_add_drain_all();
> +                               /*
> +                                * there may be pages on pcplist before
> +                                * we mark the range as ISOLATED.
> +                                */
> +                               drain_all_pages();
> +                       } else if (rc == -ENOMEM)
> +                               goto nomem;
> +               }
> +               cond_resched();
> +       }
> +       if (!migration_failed) {
> +               /* drop all pages in pagevec and pcp list */
> +               lru_add_drain_all();
> +               drain_all_pages();
> +       }
> +       /* Check all pages are isolated */
> +       if (test_pages_isolated(found, found + aligned_pages)) {
> +               undo_isolate_page_range(found, aligned_pages);
> +               /*
> +                * We failed at [found...found+aligned_pages) migration.
> +                * "rs" is the last pfn scan_lru_pages() found that the page
> +                * is LRU page. Update pos and try next chunk.
> +                */
> +               pos = ALIGN(rs + 1, (1 << align_order));
> +               goto retry; /* goto next chunk */
> +       }
> +       /*
> +        * OK, here, [found...found+pages) memory are isolated.
> +        * All pages in the range will be moved into the list with
> +        * page_count(page)=1.
> +        */
> +       ret = pfn_to_page(found);
> +       alloc_contig_freed_pages(found, found + aligned_pages, gfpflag);
> +       /* unset ISOLATE */
> +       undo_isolate_page_range(found, aligned_pages);
> +       /* Free unnecessary pages in tail */
> +       for (start = found + nr_pages; start < found + aligned_pages; start++)
> +               __free_page(pfn_to_page(start));
> +       return ret;
> +nomem:
> +       undo_isolate_page_range(found, aligned_pages);
> +       return NULL;
> +}
> +EXPORT_SYMBOL_GPL(__alloc_contig_pages);
> +
> +void free_contig_pages(struct page *page, int nr_pages)
> +{
> +       int i;
> +       for (i = 0; i < nr_pages; i++)
> +               __free_page(page + i);
> +}
> +EXPORT_SYMBOL_GPL(free_contig_pages);
> +
> +/*
> + * Allocated pages will not be MOVABLE but MOVABLE zone is a suitable
> + * for allocating big chunk. So, using ZONE_MOVABLE is a default.
> + */
> +
> +struct page *alloc_contig_pages(unsigned long base, unsigned long end,
> +                       unsigned long nr_pages, int align_order)
> +{
> +       return __alloc_contig_pages(base, end, nr_pages, align_order, -1,
> +                               GFP_KERNEL | __GFP_MOVABLE, NULL);
> +}
> +EXPORT_SYMBOL_GPL(alloc_contig_pages);
> +
> +struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order)
> +{
> +       return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, -1,
> +                               GFP_KERNEL | __GFP_MOVABLE, NULL);
> +}
> +EXPORT_SYMBOL_GPL(alloc_contig_pages_host);
> +
> +struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
> +                               int align_order)
> +{
> +       return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, nid,
> +                       GFP_KERNEL | __GFP_THISNODE | __GFP_MOVABLE, NULL);
> +}
> +EXPORT_SYMBOL_GPL(alloc_contig_pages_node);
> Index: mmotm-1117/include/linux/page-isolation.h
> ===================================================================
> --- mmotm-1117.orig/include/linux/page-isolation.h
> +++ mmotm-1117/include/linux/page-isolation.h
> @@ -32,6 +32,8 @@ test_pages_isolated(unsigned long start_
>  */
>  extern int set_migratetype_isolate(struct page *page);
>  extern void unset_migratetype_isolate(struct page *page);
> +extern void alloc_contig_freed_pages(unsigned long pfn,
> +               unsigned long pages, gfp_t flag);
>
>  /*
>  * For migration.
> @@ -41,4 +43,17 @@ int test_pages_in_a_zone(unsigned long s
>  unsigned long scan_lru_pages(unsigned long start, unsigned long end);
>  int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>
> +/*
> + * For large alloc.
> + */
> +struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
> +                               unsigned long nr_pages, int align_order,
> +                               int node, gfp_t flag, nodemask_t *mask);
> +struct page *alloc_contig_pages(unsigned long base, unsigned long end,
> +                               unsigned long nr_pages, int align_order);
> +struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order);
> +struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
> +               int align_order);
> +void free_contig_pages(struct page *page, int nr_pages);
> +
>  #endif
> Index: mmotm-1117/mm/page_alloc.c
> ===================================================================
> --- mmotm-1117.orig/mm/page_alloc.c
> +++ mmotm-1117/mm/page_alloc.c
> @@ -5447,6 +5447,35 @@ out:
>        spin_unlock_irqrestore(&zone->lock, flags);
>  }
>
> +
> +void alloc_contig_freed_pages(unsigned long pfn,  unsigned long end, gfp_t flag)
> +{
> +       struct page *page;
> +       struct zone *zone;
> +       int order;
> +       unsigned long start = pfn;
> +
> +       zone = page_zone(pfn_to_page(pfn));
> +       spin_lock_irq(&zone->lock);
> +       while (pfn < end) {
> +               VM_BUG_ON(!pfn_valid(pfn));
> +               page = pfn_to_page(pfn);
> +               VM_BUG_ON(page_count(page));
> +               VM_BUG_ON(!PageBuddy(page));
> +               list_del(&page->lru);
> +               order = page_order(page);
> +               zone->free_area[order].nr_free--;
> +               rmv_page_order(page);
> +               __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
> +               pfn += 1 << order;
> +       }
> +       spin_unlock_irq(&zone->lock);
> +
> +       /*After this, pages in the range can be freed one be one */
> +       for (pfn = start; pfn < end; pfn++)
> +               prep_new_page(pfn_to_page(pfn), 0, flag);
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>  * All pages in the range must be isolated before calling this.
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration
@ 2010-11-22 11:44     ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-22 11:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 5:15 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Add an function to allocate contiguous memory larger than MAX_ORDER.
> The main difference between usual page allocator is that this uses
> memory offline technique (Isolate pages and migrate remaining pages.).
>
> I think this is not 100% solution because we can't avoid fragmentation,
> but we have kernelcore= boot option and can create MOVABLE zone. That
> helps us to allow allocate a contiguous range on demand.

And later we can use compaction and reclaim, too.
So I think this approach is the way we have to go.

>
> The new function is
>
>  alloc_contig_pages(base, end, nr_pages, alignment)
>
> This function will allocate contiguous pages of nr_pages from the range
> [base, end). If [base, end) is bigger than nr_pages, some pfn which
> meats alignment will be allocated. If alignment is smaller than MAX_ORDER,

type meet

> it will be raised to be MAX_ORDER.
>
> __alloc_contig_pages() has much more arguments.
>
>
> Some drivers allocates contig pages by bootmem or hiding some memory
> from the kernel at boot. But if contig pages are necessary only in some
> situation, kernelcore= boot option and using page migration is a choice

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/4] alloc_contig_pages() use better allocation function for migration
  2010-11-19  8:16   ` KAMEZAWA Hiroyuki
@ 2010-11-22 12:01     ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-22 12:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 5:16 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Old story.
> Because we cannot assume which memory section will be offlined next,
> hotremove_migrate_alloc() just uses alloc_page(). i.e. make no decision
> where the page should be migrate into. Considering memory hotplug's
> nature, the next memory section near to a section which is being removed
> will be removed in the next. So, migrate pages to the same node of original
> page doesn't make sense in many case, it just increases load.
> Migration destination page is allocated from the node where offlining script
> runs.
>
> Now, contiguous-alloc uses do_migrate_range(). In this case, migration
> destination node should be the same node of migration source page.
>
> This patch modifies hotremove_migrate_alloc() and pass "nid" to it.
> Memory hotremove will pass -1. So, if the page will be moved to
> the node where offlining script runs....no behavior changes.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/4] alloc_contig_pages() use better allocation function for migration
@ 2010-11-22 12:01     ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2010-11-22 12:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Fri, Nov 19, 2010 at 5:16 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Old story.
> Because we cannot assume which memory section will be offlined next,
> hotremove_migrate_alloc() just uses alloc_page(). i.e. make no decision
> where the page should be migrate into. Considering memory hotplug's
> nature, the next memory section near to a section which is being removed
> will be removed in the next. So, migrate pages to the same node of original
> page doesn't make sense in many case, it just increases load.
> Migration destination page is allocated from the node where offlining script
> runs.
>
> Now, contiguous-alloc uses do_migrate_range(). In this case, migration
> destination node should be the same node of migration source page.
>
> This patch modifies hotremove_migrate_alloc() and pass "nid" to it.
> Memory hotremove will pass -1. So, if the page will be moved to
> the node where offlining script runs....no behavior changes.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
  2010-11-22  8:59     ` Kleen, Andi
@ 2010-11-23 15:44       ` Michał Nazarewicz
  -1 siblings, 0 replies; 44+ messages in thread
From: Michał Nazarewicz @ 2010-11-23 15:44 UTC (permalink / raw)
  To: Andrew Morton, KAMEZAWA Hiroyuki, Kleen, Andi
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	pawel, felipe.contreras, kosaki.motohiro

On Mon, 22 Nov 2010 09:59:57 +0100, Kleen, Andi <andi.kleen@intel.com> wrote:

>> >   But yes, because of fragmentation, this cannot guarantee 100%
>> alloc.
>> >   If alloc_contig_pages() is called in system boot up or movable_zone
>> is used,
>> >   this allocation succeeds at high rate.
>>
>> So this is an alternatve implementation for the functionality offered
>> by Michal's "The Contiguous Memory Allocator framework".
>
> I see them more as orthogonal: Michal's code relies on preallocation
> and manages the memory after that.

Yes and no.  The v6 version adds not-yet-finished support for sharing
the preallocated blocks with page allocator (so if CMA is not using the
memory, page allocator can allocate it, and when CMA finally wants to
use it the allocated pages are migrated).

In the v6 implementation I have added a new migration type (I cannot seem
to find who proposed such approach first).  When I'll end debugging the
code I'll try to work things out without adding additional entity (that
is new migration type).

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
@ 2010-11-23 15:44       ` Michał Nazarewicz
  0 siblings, 0 replies; 44+ messages in thread
From: Michał Nazarewicz @ 2010-11-23 15:44 UTC (permalink / raw)
  To: Andrew Morton, KAMEZAWA Hiroyuki, Kleen, Andi
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	pawel, felipe.contreras, kosaki.motohiro

On Mon, 22 Nov 2010 09:59:57 +0100, Kleen, Andi <andi.kleen@intel.com> wrote:

>> >   But yes, because of fragmentation, this cannot guarantee 100%
>> alloc.
>> >   If alloc_contig_pages() is called in system boot up or movable_zone
>> is used,
>> >   this allocation succeeds at high rate.
>>
>> So this is an alternatve implementation for the functionality offered
>> by Michal's "The Contiguous Memory Allocator framework".
>
> I see them more as orthogonal: Michal's code relies on preallocation
> and manages the memory after that.

Yes and no.  The v6 version adds not-yet-finished support for sharing
the preallocated blocks with page allocator (so if CMA is not using the
memory, page allocator can allocate it, and when CMA finally wants to
use it the allocated pages are migrated).

In the v6 implementation I have added a new migration type (I cannot seem
to find who proposed such approach first).  When I'll end debugging the
code I'll try to work things out without adding additional entity (that
is new migration type).

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
  2010-11-22  0:04     ` KAMEZAWA Hiroyuki
@ 2010-11-23 15:46       ` Michał Nazarewicz
  -1 siblings, 0 replies; 44+ messages in thread
From: Michał Nazarewicz @ 2010-11-23 15:46 UTC (permalink / raw)
  To: Andrew Morton, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	pawel, andi.kleen, felipe.contreras, kosaki.motohiro,
	Marek Szyprowski

On Mon, 22 Nov 2010 01:04:31 +0100, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Fri, 19 Nov 2010 12:56:53 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
>> On Fri, 19 Nov 2010 17:10:33 +0900
>> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>
>> > Hi, this is an updated version.
>> >
>> > No major changes from the last one except for page allocation function.
>> > removed RFC.
>> >
>> > Order of patches is
>> >
>> > [1/4] move some functions from memory_hotplug.c to page_isolation.c
>> > [2/4] search physically contiguous range suitable for big chunk alloc.
>> > [3/4] allocate big chunk memory based on memory hotplug(migration) technique
>> > [4/4] modify page allocation function.
>> >
>> > For what:
>> >
>> >   I hear there is requirements to allocate a chunk of page which is larger than
>> >   MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
>> >   they hide some memory range by boot option (mem=) and use hidden memory
>> >   for its own purpose. But this seems a lack of feature in memory management.
>> >
>> >   This patch adds
>> > 	alloc_contig_pages(start, end, nr_pages, gfp_mask)
>> >   to allocate a chunk of page whose length is nr_pages from [start, end)
>> >   phys address. This uses similar logic of memory-unplug, which tries to
>> >   offline [start, end) pages. By this, drivers can allocate 30M or 128M or
>> >   much bigger memory chunk on demand. (I allocated 1G chunk in my test).
>> >
>> >   But yes, because of fragmentation, this cannot guarantee 100% alloc.
>> >   If alloc_contig_pages() is called in system boot up or movable_zone is used,
>> >   this allocation succeeds at high rate.
>>
>> So this is an alternatve implementation for the functionality offered
>> by Michal's "The Contiguous Memory Allocator framework".
>>
>
> Yes, this will be a backends for that kind of works.

As a matter of fact CMA's v6 tries to use code "borrowed" from the alloc_contig_pages()
patches.

The most important difference is that alloc_contig_pages() would look for a chunk
of memory that can be allocated and then perform migration whereas CMA assumes that
regions it controls are always "migratable".

Also, I've tried to remove the requirement for MAX_ORDER alignment.

> I think there are two ways to allocate contiguous pages larger than MAX_ORDER.
>
> 1) hide some memory at boot and add an another memory allocator.
> 2) support a range allocator as [start, end)
>
> This is an trial from 2). I used memory-hotplug technique because I know some.
> This patch itself has no "map" and "management" function, so it should be
> developped in another patch (but maybe it will be not my work.)

Yes, this is also a valid point.  From my use cases, the alloc_contig_pages()
would probably not be enough and require some management code to be added.

>> >   I tested this on x86-64, and it seems to work as expected. But feedback from
>> >   embeded guys are appreciated because I think they are main user of this
>> >   function.
>>
>> From where I sit, feedback from the embedded guys is *vital*, because
>> they are indeed the main users.
>>
>> Michal, I haven't made a note of all the people who are interested in
>> and who are potential users of this code.  Your patch series has a
>> billion cc's and is up to version 6.

Ah, yes...  I was thinking about shrinking the cc list but didn't want to
seem rude or anything removing ppl who have shown interest in the previous
posted version.

>> Could I ask that you review and
>> test this code, and also hunt down other people (probably at other
>> organisations) who can do likewise for us?  Because until we hear from
>> those people that this work satisfies their needs, we can't really
>> proceed much further.

A few things than:

1. As Felipe mentioned, on ARM it is often desired to have the memory
    mapped as non-cacheable, which most often mean that the memory never
    reaches the page allocator.  This means, that alloc_contig_pages()
    would not be suitable for cases where one needs such memory.

    Or could this be overcome by adding the memory back as highmem?  But
    then, it would force to compile in highmem support even if platform
    does not really need it.

2. Device drivers should not by themselves know what ranges of memory to
    allocate memory from.  Moreover, some device drivers could require
    allocation different buffers from different ranges.  As such, this
    would require some management code on top of alloc_contig_pages().

3. When posting hwmem, Johan Mossberg mentioned that he'd like to see
    notion of "pinning" chunks (so that not-pinned chunks can be moved
    around when hardware does not use them to defragment memory).  This
    would again require some management code on top of
    alloc_contig_pages().

4. I might be mistaken here, but the way I understand ZONE_MOVABLE work
    is that it is cut of from the end of memory.  Or am I talking nonsense?
    My concern is that at least one chip I'm working with requires
    allocations from different memory banks which would basically mean that
    there would have to be two movable zones, ie:

    +-------------------+-------------------+
    | Memory Bank #1    | Memory Bank #2    |
    +---------+---------+---------+---------+
    | normal  | movable | normal  | movable |
    +---------+---------+---------+---------+

So even though I'm personally somehow drawn by alloc_contig_pages()'s
simplicity (compared to CMA at least), those quick thoughts make me think
that alloc_contig_pages() would work rather as a backend (as Kamezawa
mentioned) for some, maybe even tiny but still present, management code
which would handle "marking memory fragments as ZONE_MOVABLE" (whatever
that would involve) and deciding which memory ranges drivers can allocate
from.

I'm also wondering whether alloc_contig_pages()'s first-fit is suitable but
that probably cannot be judged without some benchmarks.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
@ 2010-11-23 15:46       ` Michał Nazarewicz
  0 siblings, 0 replies; 44+ messages in thread
From: Michał Nazarewicz @ 2010-11-23 15:46 UTC (permalink / raw)
  To: Andrew Morton, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, Bob Liu, fujita.tomonori,
	pawel, andi.kleen, felipe.contreras, kosaki.motohiro,
	Marek Szyprowski

On Mon, 22 Nov 2010 01:04:31 +0100, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Fri, 19 Nov 2010 12:56:53 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
>> On Fri, 19 Nov 2010 17:10:33 +0900
>> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>
>> > Hi, this is an updated version.
>> >
>> > No major changes from the last one except for page allocation function.
>> > removed RFC.
>> >
>> > Order of patches is
>> >
>> > [1/4] move some functions from memory_hotplug.c to page_isolation.c
>> > [2/4] search physically contiguous range suitable for big chunk alloc.
>> > [3/4] allocate big chunk memory based on memory hotplug(migration) technique
>> > [4/4] modify page allocation function.
>> >
>> > For what:
>> >
>> >   I hear there is requirements to allocate a chunk of page which is larger than
>> >   MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
>> >   they hide some memory range by boot option (mem=) and use hidden memory
>> >   for its own purpose. But this seems a lack of feature in memory management.
>> >
>> >   This patch adds
>> > 	alloc_contig_pages(start, end, nr_pages, gfp_mask)
>> >   to allocate a chunk of page whose length is nr_pages from [start, end)
>> >   phys address. This uses similar logic of memory-unplug, which tries to
>> >   offline [start, end) pages. By this, drivers can allocate 30M or 128M or
>> >   much bigger memory chunk on demand. (I allocated 1G chunk in my test).
>> >
>> >   But yes, because of fragmentation, this cannot guarantee 100% alloc.
>> >   If alloc_contig_pages() is called in system boot up or movable_zone is used,
>> >   this allocation succeeds at high rate.
>>
>> So this is an alternatve implementation for the functionality offered
>> by Michal's "The Contiguous Memory Allocator framework".
>>
>
> Yes, this will be a backends for that kind of works.

As a matter of fact CMA's v6 tries to use code "borrowed" from the alloc_contig_pages()
patches.

The most important difference is that alloc_contig_pages() would look for a chunk
of memory that can be allocated and then perform migration whereas CMA assumes that
regions it controls are always "migratable".

Also, I've tried to remove the requirement for MAX_ORDER alignment.

> I think there are two ways to allocate contiguous pages larger than MAX_ORDER.
>
> 1) hide some memory at boot and add an another memory allocator.
> 2) support a range allocator as [start, end)
>
> This is an trial from 2). I used memory-hotplug technique because I know some.
> This patch itself has no "map" and "management" function, so it should be
> developped in another patch (but maybe it will be not my work.)

Yes, this is also a valid point.  From my use cases, the alloc_contig_pages()
would probably not be enough and require some management code to be added.

>> >   I tested this on x86-64, and it seems to work as expected. But feedback from
>> >   embeded guys are appreciated because I think they are main user of this
>> >   function.
>>
>> From where I sit, feedback from the embedded guys is *vital*, because
>> they are indeed the main users.
>>
>> Michal, I haven't made a note of all the people who are interested in
>> and who are potential users of this code.  Your patch series has a
>> billion cc's and is up to version 6.

Ah, yes...  I was thinking about shrinking the cc list but didn't want to
seem rude or anything removing ppl who have shown interest in the previous
posted version.

>> Could I ask that you review and
>> test this code, and also hunt down other people (probably at other
>> organisations) who can do likewise for us?  Because until we hear from
>> those people that this work satisfies their needs, we can't really
>> proceed much further.

A few things than:

1. As Felipe mentioned, on ARM it is often desired to have the memory
    mapped as non-cacheable, which most often mean that the memory never
    reaches the page allocator.  This means, that alloc_contig_pages()
    would not be suitable for cases where one needs such memory.

    Or could this be overcome by adding the memory back as highmem?  But
    then, it would force to compile in highmem support even if platform
    does not really need it.

2. Device drivers should not by themselves know what ranges of memory to
    allocate memory from.  Moreover, some device drivers could require
    allocation different buffers from different ranges.  As such, this
    would require some management code on top of alloc_contig_pages().

3. When posting hwmem, Johan Mossberg mentioned that he'd like to see
    notion of "pinning" chunks (so that not-pinned chunks can be moved
    around when hardware does not use them to defragment memory).  This
    would again require some management code on top of
    alloc_contig_pages().

4. I might be mistaken here, but the way I understand ZONE_MOVABLE work
    is that it is cut of from the end of memory.  Or am I talking nonsense?
    My concern is that at least one chip I'm working with requires
    allocations from different memory banks which would basically mean that
    there would have to be two movable zones, ie:

    +-------------------+-------------------+
    | Memory Bank #1    | Memory Bank #2    |
    +---------+---------+---------+---------+
    | normal  | movable | normal  | movable |
    +---------+---------+---------+---------+

So even though I'm personally somehow drawn by alloc_contig_pages()'s
simplicity (compared to CMA at least), those quick thoughts make me think
that alloc_contig_pages() would work rather as a backend (as Kamezawa
mentioned) for some, maybe even tiny but still present, management code
which would handle "marking memory fragments as ZONE_MOVABLE" (whatever
that would involve) and deciding which memory ranges drivers can allocate
from.

I'm also wondering whether alloc_contig_pages()'s first-fit is suitable but
that probably cannot be judged without some benchmarks.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range
  2010-11-22 11:20     ` Minchan Kim
@ 2010-11-24  0:15       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-24  0:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Mon, 22 Nov 2010 20:20:14 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Fri, Nov 19, 2010 at 5:14 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Unlike memory hotplug, at an allocation of contigous memory range, address
> > may not be a problem. IOW, if a requester of memory wants to allocate 100M of
> > of contigous memory, placement of allocated memory may not be a problem.
> > So, "finding a range of memory which seems to be MOVABLE" is required.
> >
> > This patch adds a functon to isolate a length of memory within [start, end).
> > This function returns a pfn which is 1st page of isolated contigous chunk
> > of given length within [start, end).
> >
> > If no_search=true is passed as argument, start address is always same to
> > the specified "base" addresss.
> >
> > After isolation, free memory within this area will never be allocated.
> > But some pages will remain as "Used/LRU" pages. They should be dropped by
> > page reclaim or migration.
> >
> > Changelog: 2010-11-17
> >  - fixed some conding style (if-then-else)
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/page_isolation.c |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 146 insertions(+)
> >
> > Index: mmotm-1117/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1117.orig/mm/page_isolation.c
> > +++ mmotm-1117/mm/page_isolation.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/pageblock-flags.h>
> >  #include <linux/memcontrol.h>
> >  #include <linux/migrate.h>
> > +#include <linux/memory_hotplug.h>
> >  #include <linux/mm_inline.h>
> >  #include "internal.h"
> >
> > @@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
> >  out:
> >        return ret;
> >  }
> > +
> > +/*
> > + * Functions for getting contiguous MOVABLE pages in a zone.
> > + */
> > +struct page_range {
> > +       unsigned long base; /* Base address of searching contigouous block */
> > +       unsigned long end;
> > +       unsigned long pages;/* Length of contiguous block */
> 
> Nitpick.
> You used nr_pages in other place.
> I hope you use the name consistent.
> 
Sure, I'll fix it.

> > +       int align_order;
> > +       unsigned long align_mask;
> 
> Does we really need this field 'align_mask'?

No.

> We can get always from align_order.
> 

Always  writes ((1 << align_order) -1) ? Hmm.


> > +};
> > +
> > +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> > +{
> > +       struct page_range *blockinfo = arg;
> > +       unsigned long end;
> > +
> > +       end = pfn + nr_pages;
> > +       pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> > +       end = end & ~(MAX_ORDER_NR_PAGES - 1);
> > +
> > +       if (end < pfn)
> > +               return 0;
> > +       if (end - pfn >= blockinfo->pages) {
> > +               blockinfo->base = pfn;
> > +               blockinfo->end = end;
> > +               return 1;
> > +       }
> > +       return 0;
> > +}
> > +
> > +static void __trim_zone(struct zone *zone, struct page_range *range)
> > +{
> > +       unsigned long pfn;
> > +       /*
> > +        * skip pages which dones'nt under the zone.
> 
> typo dones'nt -> doesn't :)
> 
will fix.

> > +        * There are some archs which zones are not in linear layout.
> > +        */
> > +       if (page_zone(pfn_to_page(range->base)) != zone) {
> > +               for (pfn = range->base;
> > +                       pfn < range->end;
> > +                       pfn += MAX_ORDER_NR_PAGES) {
> > +                       if (page_zone(pfn_to_page(pfn)) == zone)
> > +                               break;
> > +               }
> > +               range->base = min(pfn, range->end);
> > +       }
> > +       /* Here, range-> base is in the zone if range->base != range->end */
> > +       for (pfn = range->base;
> > +            pfn < range->end;
> > +            pfn += MAX_ORDER_NR_PAGES) {
> > +               if (zone != page_zone(pfn_to_page(pfn))) {
> > +                       pfn = pfn - MAX_ORDER_NR_PAGES;
> > +                       break;
> > +               }
> > +       }
> > +       range->end = min(pfn, range->end);
> > +       return;
> 
> Remove return
> 
Ah, ok.

> > +}
> > +
> > +/*
> > + * This function is for finding a contiguous memory block which has length
> > + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> > + * and return the first page's pfn.
> > + * This checks all pages in the returned range is free of Pg_LRU. To reduce
> > + * the risk of false-positive testing, lru_add_drain_all() should be called
> > + * before this function to reduce pages on pagevec for zones.
> > + */
> > +
> > +static unsigned long find_contig_block(unsigned long base,
> > +               unsigned long end, unsigned long pages,
> > +               int align_order, struct zone *zone)
> > +{
> > +       unsigned long pfn, pos;
> > +       struct page_range blockinfo;
> > +       int ret;
> > +
> > +       VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> > +       VM_BUG_ON(base & ((1 << align_order) - 1));
> > +retry:
> > +       blockinfo.base = base;
> > +       blockinfo.end = end;
> > +       blockinfo.pages = pages;
> > +       blockinfo.align_order = align_order;
> > +       blockinfo.align_mask = (1 << align_order) - 1;
> 
> We don't need this.
> 
mask ?

> > +       /*
> > +        * At first, check physical page layout and skip memory holes.
> > +        */
> > +       ret = walk_system_ram_range(base, end - base, &blockinfo,
> > +               __get_contig_block);
> > +       if (!ret)
> > +               return 0;
> > +       /* check contiguous pages in a zone */
> > +       __trim_zone(zone, &blockinfo);
> > +
> > +       /*
> > +        * Ok, we found contiguous memory chunk of size. Isolate it.
> > +        * We just search MAX_ORDER aligned range.
> > +        */
> > +       for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
> > +            pfn += (1 << align_order)) {
> > +               struct zone *z = page_zone(pfn_to_page(pfn));
> > +               if (z != zone)
> > +                       continue;
> 
> Could we make sure pass __trim_zone is to satisfy whole pfn in zone
> what we want.
> Repeated the zone check is rather annoying.
> I mean let's __get_contig_block or __trim_zone already does check zone
> so that we remove the zone check in here.

Ah, yes. I'll remove this.

> 
> > +
> > +               spin_lock_irq(&z->lock);
> > +               pos = pfn;
> > +               /*
> > +                * Check the range only contains free pages or LRU pages.
> > +                */
> > +               while (pos < pfn + pages) {
> > +                       struct page *p;
> > +
> > +                       if (!pfn_valid_within(pos))
> > +                               break;
> > +                       p = pfn_to_page(pos);
> > +                       if (PageReserved(p))
> > +                               break;
> > +                       if (!page_count(p)) {
> > +                               if (!PageBuddy(p))
> > +                                       pos++;
> > +                               else
> > +                                       pos += (1 << page_order(p));
> > +                       } else if (PageLRU(p)) {
> 
> Could we check get_pageblock_migratetype(page) == MIGRATE_MOVABLE in
> here and early bail out?
> 

I'm not sure that's very good. pageblock-type can be fragmented and even
if pageblock-type is not MIGRATABLE, all pages in pageblock may be free.
Because PageLRU() is checked, all required 'quick' check is done,  I think.


> > +                               pos++;
> > +                       } else
> > +                               break;
> > +               }
> > +               spin_unlock_irq(&z->lock);
> > +               if ((pos == pfn + pages)) {
> > +                       if (!start_isolate_page_range(pfn, pfn + pages))
> > +                               return pfn;
> > +               } else/* the chunk including "pos" should be skipped */
> > +                       pfn = pos & ~((1 << align_order) - 1);
> > +               cond_resched();
> > +       }
> > +
> > +       /* failed */
> > +       if (blockinfo.end + pages <= end) {
> > +               /* Move base address and find the next block of RAM. */
> > +               base = blockinfo.end;
> > +               goto retry;
> > +       }
> > +       return 0;
> 
> If the base is 0, isn't it impossible return pfn 0?
> x86 in FLAT isn't impossible but I think some architecture might be possible.
> Just guessing.
> 
> How about returning negative value and return first page pfn and last
> page pfn as out parameter base, end?
> 

Hmm, will add a check.

Thanks,
-Kame


> > +}
> >
> >
> 
> 
> 
> -- 
> Kind regards,
> Minchan Kim
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range
@ 2010-11-24  0:15       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-24  0:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Mon, 22 Nov 2010 20:20:14 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Fri, Nov 19, 2010 at 5:14 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Unlike memory hotplug, at an allocation of contigous memory range, address
> > may not be a problem. IOW, if a requester of memory wants to allocate 100M of
> > of contigous memory, placement of allocated memory may not be a problem.
> > So, "finding a range of memory which seems to be MOVABLE" is required.
> >
> > This patch adds a functon to isolate a length of memory within [start, end).
> > This function returns a pfn which is 1st page of isolated contigous chunk
> > of given length within [start, end).
> >
> > If no_search=true is passed as argument, start address is always same to
> > the specified "base" addresss.
> >
> > After isolation, free memory within this area will never be allocated.
> > But some pages will remain as "Used/LRU" pages. They should be dropped by
> > page reclaim or migration.
> >
> > Changelog: 2010-11-17
> > A - fixed some conding style (if-then-else)
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > A mm/page_isolation.c | A 146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > A 1 file changed, 146 insertions(+)
> >
> > Index: mmotm-1117/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1117.orig/mm/page_isolation.c
> > +++ mmotm-1117/mm/page_isolation.c
> > @@ -7,6 +7,7 @@
> > A #include <linux/pageblock-flags.h>
> > A #include <linux/memcontrol.h>
> > A #include <linux/migrate.h>
> > +#include <linux/memory_hotplug.h>
> > A #include <linux/mm_inline.h>
> > A #include "internal.h"
> >
> > @@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
> > A out:
> > A  A  A  A return ret;
> > A }
> > +
> > +/*
> > + * Functions for getting contiguous MOVABLE pages in a zone.
> > + */
> > +struct page_range {
> > + A  A  A  unsigned long base; /* Base address of searching contigouous block */
> > + A  A  A  unsigned long end;
> > + A  A  A  unsigned long pages;/* Length of contiguous block */
> 
> Nitpick.
> You used nr_pages in other place.
> I hope you use the name consistent.
> 
Sure, I'll fix it.

> > + A  A  A  int align_order;
> > + A  A  A  unsigned long align_mask;
> 
> Does we really need this field 'align_mask'?

No.

> We can get always from align_order.
> 

Always  writes ((1 << align_order) -1) ? Hmm.


> > +};
> > +
> > +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> > +{
> > + A  A  A  struct page_range *blockinfo = arg;
> > + A  A  A  unsigned long end;
> > +
> > + A  A  A  end = pfn + nr_pages;
> > + A  A  A  pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> > + A  A  A  end = end & ~(MAX_ORDER_NR_PAGES - 1);
> > +
> > + A  A  A  if (end < pfn)
> > + A  A  A  A  A  A  A  return 0;
> > + A  A  A  if (end - pfn >= blockinfo->pages) {
> > + A  A  A  A  A  A  A  blockinfo->base = pfn;
> > + A  A  A  A  A  A  A  blockinfo->end = end;
> > + A  A  A  A  A  A  A  return 1;
> > + A  A  A  }
> > + A  A  A  return 0;
> > +}
> > +
> > +static void __trim_zone(struct zone *zone, struct page_range *range)
> > +{
> > + A  A  A  unsigned long pfn;
> > + A  A  A  /*
> > + A  A  A  A * skip pages which dones'nt under the zone.
> 
> typo dones'nt -> doesn't :)
> 
will fix.

> > + A  A  A  A * There are some archs which zones are not in linear layout.
> > + A  A  A  A */
> > + A  A  A  if (page_zone(pfn_to_page(range->base)) != zone) {
> > + A  A  A  A  A  A  A  for (pfn = range->base;
> > + A  A  A  A  A  A  A  A  A  A  A  pfn < range->end;
> > + A  A  A  A  A  A  A  A  A  A  A  pfn += MAX_ORDER_NR_PAGES) {
> > + A  A  A  A  A  A  A  A  A  A  A  if (page_zone(pfn_to_page(pfn)) == zone)
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  break;
> > + A  A  A  A  A  A  A  }
> > + A  A  A  A  A  A  A  range->base = min(pfn, range->end);
> > + A  A  A  }
> > + A  A  A  /* Here, range-> base is in the zone if range->base != range->end */
> > + A  A  A  for (pfn = range->base;
> > + A  A  A  A  A  A pfn < range->end;
> > + A  A  A  A  A  A pfn += MAX_ORDER_NR_PAGES) {
> > + A  A  A  A  A  A  A  if (zone != page_zone(pfn_to_page(pfn))) {
> > + A  A  A  A  A  A  A  A  A  A  A  pfn = pfn - MAX_ORDER_NR_PAGES;
> > + A  A  A  A  A  A  A  A  A  A  A  break;
> > + A  A  A  A  A  A  A  }
> > + A  A  A  }
> > + A  A  A  range->end = min(pfn, range->end);
> > + A  A  A  return;
> 
> Remove return
> 
Ah, ok.

> > +}
> > +
> > +/*
> > + * This function is for finding a contiguous memory block which has length
> > + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> > + * and return the first page's pfn.
> > + * This checks all pages in the returned range is free of Pg_LRU. To reduce
> > + * the risk of false-positive testing, lru_add_drain_all() should be called
> > + * before this function to reduce pages on pagevec for zones.
> > + */
> > +
> > +static unsigned long find_contig_block(unsigned long base,
> > + A  A  A  A  A  A  A  unsigned long end, unsigned long pages,
> > + A  A  A  A  A  A  A  int align_order, struct zone *zone)
> > +{
> > + A  A  A  unsigned long pfn, pos;
> > + A  A  A  struct page_range blockinfo;
> > + A  A  A  int ret;
> > +
> > + A  A  A  VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> > + A  A  A  VM_BUG_ON(base & ((1 << align_order) - 1));
> > +retry:
> > + A  A  A  blockinfo.base = base;
> > + A  A  A  blockinfo.end = end;
> > + A  A  A  blockinfo.pages = pages;
> > + A  A  A  blockinfo.align_order = align_order;
> > + A  A  A  blockinfo.align_mask = (1 << align_order) - 1;
> 
> We don't need this.
> 
mask ?

> > + A  A  A  /*
> > + A  A  A  A * At first, check physical page layout and skip memory holes.
> > + A  A  A  A */
> > + A  A  A  ret = walk_system_ram_range(base, end - base, &blockinfo,
> > + A  A  A  A  A  A  A  __get_contig_block);
> > + A  A  A  if (!ret)
> > + A  A  A  A  A  A  A  return 0;
> > + A  A  A  /* check contiguous pages in a zone */
> > + A  A  A  __trim_zone(zone, &blockinfo);
> > +
> > + A  A  A  /*
> > + A  A  A  A * Ok, we found contiguous memory chunk of size. Isolate it.
> > + A  A  A  A * We just search MAX_ORDER aligned range.
> > + A  A  A  A */
> > + A  A  A  for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
> > + A  A  A  A  A  A pfn += (1 << align_order)) {
> > + A  A  A  A  A  A  A  struct zone *z = page_zone(pfn_to_page(pfn));
> > + A  A  A  A  A  A  A  if (z != zone)
> > + A  A  A  A  A  A  A  A  A  A  A  continue;
> 
> Could we make sure pass __trim_zone is to satisfy whole pfn in zone
> what we want.
> Repeated the zone check is rather annoying.
> I mean let's __get_contig_block or __trim_zone already does check zone
> so that we remove the zone check in here.

Ah, yes. I'll remove this.

> 
> > +
> > + A  A  A  A  A  A  A  spin_lock_irq(&z->lock);
> > + A  A  A  A  A  A  A  pos = pfn;
> > + A  A  A  A  A  A  A  /*
> > + A  A  A  A  A  A  A  A * Check the range only contains free pages or LRU pages.
> > + A  A  A  A  A  A  A  A */
> > + A  A  A  A  A  A  A  while (pos < pfn + pages) {
> > + A  A  A  A  A  A  A  A  A  A  A  struct page *p;
> > +
> > + A  A  A  A  A  A  A  A  A  A  A  if (!pfn_valid_within(pos))
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  break;
> > + A  A  A  A  A  A  A  A  A  A  A  p = pfn_to_page(pos);
> > + A  A  A  A  A  A  A  A  A  A  A  if (PageReserved(p))
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  break;
> > + A  A  A  A  A  A  A  A  A  A  A  if (!page_count(p)) {
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  if (!PageBuddy(p))
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  pos++;
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  else
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  pos += (1 << page_order(p));
> > + A  A  A  A  A  A  A  A  A  A  A  } else if (PageLRU(p)) {
> 
> Could we check get_pageblock_migratetype(page) == MIGRATE_MOVABLE in
> here and early bail out?
> 

I'm not sure that's very good. pageblock-type can be fragmented and even
if pageblock-type is not MIGRATABLE, all pages in pageblock may be free.
Because PageLRU() is checked, all required 'quick' check is done,  I think.


> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  pos++;
> > + A  A  A  A  A  A  A  A  A  A  A  } else
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  break;
> > + A  A  A  A  A  A  A  }
> > + A  A  A  A  A  A  A  spin_unlock_irq(&z->lock);
> > + A  A  A  A  A  A  A  if ((pos == pfn + pages)) {
> > + A  A  A  A  A  A  A  A  A  A  A  if (!start_isolate_page_range(pfn, pfn + pages))
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  return pfn;
> > + A  A  A  A  A  A  A  } else/* the chunk including "pos" should be skipped */
> > + A  A  A  A  A  A  A  A  A  A  A  pfn = pos & ~((1 << align_order) - 1);
> > + A  A  A  A  A  A  A  cond_resched();
> > + A  A  A  }
> > +
> > + A  A  A  /* failed */
> > + A  A  A  if (blockinfo.end + pages <= end) {
> > + A  A  A  A  A  A  A  /* Move base address and find the next block of RAM. */
> > + A  A  A  A  A  A  A  base = blockinfo.end;
> > + A  A  A  A  A  A  A  goto retry;
> > + A  A  A  }
> > + A  A  A  return 0;
> 
> If the base is 0, isn't it impossible return pfn 0?
> x86 in FLAT isn't impossible but I think some architecture might be possible.
> Just guessing.
> 
> How about returning negative value and return first page pfn and last
> page pfn as out parameter base, end?
> 

Hmm, will add a check.

Thanks,
-Kame


> > +}
> >
> >
> 
> 
> 
> -- 
> Kind regards,
> Minchan Kim
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration
  2010-11-22 11:44     ` Minchan Kim
@ 2010-11-24  0:20       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-24  0:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Mon, 22 Nov 2010 20:44:03 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Fri, Nov 19, 2010 at 5:15 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Add an function to allocate contiguous memory larger than MAX_ORDER.
> > The main difference between usual page allocator is that this uses
> > memory offline technique (Isolate pages and migrate remaining pages.).
> >
> > I think this is not 100% solution because we can't avoid fragmentation,
> > but we have kernelcore= boot option and can create MOVABLE zone. That
> > helps us to allow allocate a contiguous range on demand.
> 
> And later we can use compaction and reclaim, too.
> So I think this approach is the way we have to go.
> 
> >
> > The new function is
> >
> >  alloc_contig_pages(base, end, nr_pages, alignment)
> >
> > This function will allocate contiguous pages of nr_pages from the range
> > [base, end). If [base, end) is bigger than nr_pages, some pfn which
> > meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> 
> type meet
> 
will fix.

> > it will be raised to be MAX_ORDER.
> >
> > __alloc_contig_pages() has much more arguments.
> >
> >
> > Some drivers allocates contig pages by bootmem or hiding some memory
> > from the kernel at boot. But if contig pages are necessary only in some
> > situation, kernelcore= boot option and using page migration is a choice.
> >
> > Changelog: 2010-11-19
> >  - removed no_search
> >  - removed some drain_ functions because they are heavy.
> >  - check -ENOMEM case
> >
> > Changelog: 2010-10-26
> >  - support gfp_t
> >  - support zonelist/nodemask
> >  - support [base, end)
> >  - support alignment
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  include/linux/page-isolation.h |   15 ++
> >  mm/page_alloc.c                |   29 ++++
> >  mm/page_isolation.c            |  242 +++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 286 insertions(+)
> >
> > Index: mmotm-1117/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1117.orig/mm/page_isolation.c
> > +++ mmotm-1117/mm/page_isolation.c
> > @@ -5,6 +5,7 @@
> >  #include <linux/mm.h>
> >  #include <linux/page-isolation.h>
> >  #include <linux/pageblock-flags.h>
> > +#include <linux/swap.h>
> >  #include <linux/memcontrol.h>
> >  #include <linux/migrate.h>
> >  #include <linux/memory_hotplug.h>
> > @@ -396,3 +397,244 @@ retry:
> >        }
> >        return 0;
> >  }
> > +
> > +/*
> > + * Comparing caller specified [user_start, user_end) with physical memory layout
> > + * [phys_start, phys_end). If no intersection is longer than nr_pages, return 1.
> > + * If there is an intersection, return 0 and fill range in [*start, *end)
> 
> I understand the goal of function.
> But comment is rather awkward.
> 

ok, I will rewrite.

> > + */
> > +static int
> > +__calc_search_range(unsigned long user_start, unsigned long user_end,
> 
> Personally, I don't like the function name.
> How about "__adjust_search_range"?
> But I am not against this name strongly. :)
> 
I will rename this.


> > +               unsigned long nr_pages,
> > +               unsigned long phys_start, unsigned long phys_end,
> > +               unsigned long *start, unsigned long *end)
> > +{
> > +       if ((user_start >= phys_end) || (user_end <= phys_start))
> > +               return 1;
> > +       if (user_start <= phys_start) {
> > +               *start = phys_start;
> > +               *end = min(user_end, phys_end);
> > +       } else {
> > +               *start = user_start;
> > +               *end = min(user_end, phys_end);
> > +       }
> > +       if (*end - *start < nr_pages)
> > +               return 1;
> > +       return 0;
> > +}
> > +
> > +
> > +/**
> > + * __alloc_contig_pages - allocate a contiguous physical pages
> > + * @base: the lowest pfn which caller wants.
> > + * @end:  the highest pfn which caller wants.
> > + * @nr_pages: the length of a chunk of pages to be allocated.
> 
> the number of pages to be allocated.
> 
ok.

> > + * @align_order: alignment of start address of returned chunk in order.
> > + *   Returned' page's order will be aligned to (1 << align_order).If smaller
> > + *   than MAX_ORDER, it's raised to MAX_ORDER.
> > + * @node: allocate near memory to the node, If -1, current node is used.
> > + * @gfpflag: used to specify what zone the memory should be from.
> > + * @nodemask: allocate memory within the nodemask.
> > + *
> > + * Search a memory range [base, end) and allocates physically contiguous
> > + * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
> > + * be allocated
> > + *
> > + * This returns a page of the beginning of contiguous block. At failure, NULL
> > + * is returned.
> > + *
> > + * Limitation: at allocation, nr_pages may be increased to be aligned to
> > + * MAX_ORDER before searching a range. So, even if there is a enough chunk
> > + * for nr_pages, it may not be able to be allocated. Extra tail pages of
> > + * allocated chunk is returned to buddy allocator before returning the caller.
> > + */
> > +
> > +#define MIGRATION_RETRY        (5)
> > +struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
> > +                       unsigned long nr_pages, int align_order,
> > +                       int node, gfp_t gfpflag, nodemask_t *mask)
> > +{
> > +       unsigned long found, aligned_pages, start;
> > +       struct page *ret = NULL;
> > +       int migration_failed;
> > +       unsigned long align_mask;
> > +       struct zoneref *z;
> > +       struct zone *zone;
> > +       struct zonelist *zonelist;
> > +       enum zone_type highzone_idx = gfp_zone(gfpflag);
> > +       unsigned long zone_start, zone_end, rs, re, pos;
> > +
> > +       if (node == -1)
> > +               node = numa_node_id();
> > +
> > +       /* check unsupported flags */
> > +       if (gfpflag & __GFP_NORETRY)
> > +               return NULL;
> > +       if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS)) !=
> > +               (__GFP_WAIT | __GFP_IO | __GFP_FS))
> > +               return NULL;
> 
> Why do we have to care about __GFP_IO|__GFP_FS?
> If you consider compaction/reclaim later, I am OK.
> 
because in page migration, we use GFP_HIGHUSER_MOVABLE now.


> > +
> > +       if (gfpflag & __GFP_THISNODE)
> > +               zonelist = &NODE_DATA(node)->node_zonelists[1];
> > +       else
> > +               zonelist = &NODE_DATA(node)->node_zonelists[0];
> > +       /*
> > +        * Base/nr_page/end should be aligned to MAX_ORDER
> > +        */
> > +       found = 0;
> > +
> > +       if (align_order < MAX_ORDER)
> > +               align_order = MAX_ORDER;
> > +
> > +       align_mask = (1 << align_order) - 1;
> > +       /*
> > +        * We allocates MAX_ORDER aligned pages and cut tail pages later.
> > +        */
> > +       aligned_pages = ALIGN(nr_pages, (1 << MAX_ORDER));
> > +       /*
> > +        * If end - base == nr_pages, we can't search range. base must be
> > +        * aligned.
> > +        */
> > +       if ((end - base == nr_pages) && (base & align_mask))
> > +               return NULL;
> > +
> > +       base = ALIGN(base, (1 << align_order));
> > +       if ((end <= base) || (end - base < aligned_pages))
> > +               return NULL;
> > +
> > +       /*
> > +        * searching contig memory range within [pos, end).
> > +        * pos is updated at migration failure to find next chunk in zone.
> > +        * pos is reset to the base at searching next zone.
> > +        * (see for_each_zone_zonelist_nodemask in mmzone.h)
> > +        *
> > +        * Note: we cannot assume zones/nodes are in linear memory layout.
> > +        */
> > +       z = first_zones_zonelist(zonelist, highzone_idx, mask, &zone);
> > +       pos = base;
> > +retry:
> > +       if (!zone)
> > +               return NULL;
> > +
> > +       zone_start = ALIGN(zone->zone_start_pfn, 1 << align_order);
> > +       zone_end = zone->zone_start_pfn + zone->spanned_pages;
> > +
> > +       /* check [pos, end) is in this zone. */
> > +       if ((pos >= end) ||
> > +            (__calc_search_range(pos, end, aligned_pages,
> > +                       zone_start, zone_end, &rs, &re))) {
> > +next_zone:
> > +               /* go to the next zone */
> > +               z = next_zones_zonelist(++z, highzone_idx, mask, &zone);
> > +               /* reset the pos */
> > +               pos = base;
> > +               goto retry;
> > +       }
> > +       /* [pos, end) is trimmed to [rs, re) in this zone. */
> > +       pos = rs;
> 
> The 'pos' doesn't used any more at below.
> 
Ah, yes. I'll check this was for what and remove this.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration
@ 2010-11-24  0:20       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-24  0:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Bob Liu, fujita.tomonori, m.nazarewicz,
	pawel, andi.kleen, felipe.contreras, akpm, kosaki.motohiro

On Mon, 22 Nov 2010 20:44:03 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Fri, Nov 19, 2010 at 5:15 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Add an function to allocate contiguous memory larger than MAX_ORDER.
> > The main difference between usual page allocator is that this uses
> > memory offline technique (Isolate pages and migrate remaining pages.).
> >
> > I think this is not 100% solution because we can't avoid fragmentation,
> > but we have kernelcore= boot option and can create MOVABLE zone. That
> > helps us to allow allocate a contiguous range on demand.
> 
> And later we can use compaction and reclaim, too.
> So I think this approach is the way we have to go.
> 
> >
> > The new function is
> >
> > A alloc_contig_pages(base, end, nr_pages, alignment)
> >
> > This function will allocate contiguous pages of nr_pages from the range
> > [base, end). If [base, end) is bigger than nr_pages, some pfn which
> > meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> 
> type meet
> 
will fix.

> > it will be raised to be MAX_ORDER.
> >
> > __alloc_contig_pages() has much more arguments.
> >
> >
> > Some drivers allocates contig pages by bootmem or hiding some memory
> > from the kernel at boot. But if contig pages are necessary only in some
> > situation, kernelcore= boot option and using page migration is a choice.
> >
> > Changelog: 2010-11-19
> > A - removed no_search
> > A - removed some drain_ functions because they are heavy.
> > A - check -ENOMEM case
> >
> > Changelog: 2010-10-26
> > A - support gfp_t
> > A - support zonelist/nodemask
> > A - support [base, end)
> > A - support alignment
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > A include/linux/page-isolation.h | A  15 ++
> > A mm/page_alloc.c A  A  A  A  A  A  A  A | A  29 ++++
> > A mm/page_isolation.c A  A  A  A  A  A | A 242 +++++++++++++++++++++++++++++++++++++++++
> > A 3 files changed, 286 insertions(+)
> >
> > Index: mmotm-1117/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1117.orig/mm/page_isolation.c
> > +++ mmotm-1117/mm/page_isolation.c
> > @@ -5,6 +5,7 @@
> > A #include <linux/mm.h>
> > A #include <linux/page-isolation.h>
> > A #include <linux/pageblock-flags.h>
> > +#include <linux/swap.h>
> > A #include <linux/memcontrol.h>
> > A #include <linux/migrate.h>
> > A #include <linux/memory_hotplug.h>
> > @@ -396,3 +397,244 @@ retry:
> > A  A  A  A }
> > A  A  A  A return 0;
> > A }
> > +
> > +/*
> > + * Comparing caller specified [user_start, user_end) with physical memory layout
> > + * [phys_start, phys_end). If no intersection is longer than nr_pages, return 1.
> > + * If there is an intersection, return 0 and fill range in [*start, *end)
> 
> I understand the goal of function.
> But comment is rather awkward.
> 

ok, I will rewrite.

> > + */
> > +static int
> > +__calc_search_range(unsigned long user_start, unsigned long user_end,
> 
> Personally, I don't like the function name.
> How about "__adjust_search_range"?
> But I am not against this name strongly. :)
> 
I will rename this.


> > + A  A  A  A  A  A  A  unsigned long nr_pages,
> > + A  A  A  A  A  A  A  unsigned long phys_start, unsigned long phys_end,
> > + A  A  A  A  A  A  A  unsigned long *start, unsigned long *end)
> > +{
> > + A  A  A  if ((user_start >= phys_end) || (user_end <= phys_start))
> > + A  A  A  A  A  A  A  return 1;
> > + A  A  A  if (user_start <= phys_start) {
> > + A  A  A  A  A  A  A  *start = phys_start;
> > + A  A  A  A  A  A  A  *end = min(user_end, phys_end);
> > + A  A  A  } else {
> > + A  A  A  A  A  A  A  *start = user_start;
> > + A  A  A  A  A  A  A  *end = min(user_end, phys_end);
> > + A  A  A  }
> > + A  A  A  if (*end - *start < nr_pages)
> > + A  A  A  A  A  A  A  return 1;
> > + A  A  A  return 0;
> > +}
> > +
> > +
> > +/**
> > + * __alloc_contig_pages - allocate a contiguous physical pages
> > + * @base: the lowest pfn which caller wants.
> > + * @end: A the highest pfn which caller wants.
> > + * @nr_pages: the length of a chunk of pages to be allocated.
> 
> the number of pages to be allocated.
> 
ok.

> > + * @align_order: alignment of start address of returned chunk in order.
> > + * A  Returned' page's order will be aligned to (1 << align_order).If smaller
> > + * A  than MAX_ORDER, it's raised to MAX_ORDER.
> > + * @node: allocate near memory to the node, If -1, current node is used.
> > + * @gfpflag: used to specify what zone the memory should be from.
> > + * @nodemask: allocate memory within the nodemask.
> > + *
> > + * Search a memory range [base, end) and allocates physically contiguous
> > + * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
> > + * be allocated
> > + *
> > + * This returns a page of the beginning of contiguous block. At failure, NULL
> > + * is returned.
> > + *
> > + * Limitation: at allocation, nr_pages may be increased to be aligned to
> > + * MAX_ORDER before searching a range. So, even if there is a enough chunk
> > + * for nr_pages, it may not be able to be allocated. Extra tail pages of
> > + * allocated chunk is returned to buddy allocator before returning the caller.
> > + */
> > +
> > +#define MIGRATION_RETRY A  A  A  A (5)
> > +struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
> > + A  A  A  A  A  A  A  A  A  A  A  unsigned long nr_pages, int align_order,
> > + A  A  A  A  A  A  A  A  A  A  A  int node, gfp_t gfpflag, nodemask_t *mask)
> > +{
> > + A  A  A  unsigned long found, aligned_pages, start;
> > + A  A  A  struct page *ret = NULL;
> > + A  A  A  int migration_failed;
> > + A  A  A  unsigned long align_mask;
> > + A  A  A  struct zoneref *z;
> > + A  A  A  struct zone *zone;
> > + A  A  A  struct zonelist *zonelist;
> > + A  A  A  enum zone_type highzone_idx = gfp_zone(gfpflag);
> > + A  A  A  unsigned long zone_start, zone_end, rs, re, pos;
> > +
> > + A  A  A  if (node == -1)
> > + A  A  A  A  A  A  A  node = numa_node_id();
> > +
> > + A  A  A  /* check unsupported flags */
> > + A  A  A  if (gfpflag & __GFP_NORETRY)
> > + A  A  A  A  A  A  A  return NULL;
> > + A  A  A  if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS)) !=
> > + A  A  A  A  A  A  A  (__GFP_WAIT | __GFP_IO | __GFP_FS))
> > + A  A  A  A  A  A  A  return NULL;
> 
> Why do we have to care about __GFP_IO|__GFP_FS?
> If you consider compaction/reclaim later, I am OK.
> 
because in page migration, we use GFP_HIGHUSER_MOVABLE now.


> > +
> > + A  A  A  if (gfpflag & __GFP_THISNODE)
> > + A  A  A  A  A  A  A  zonelist = &NODE_DATA(node)->node_zonelists[1];
> > + A  A  A  else
> > + A  A  A  A  A  A  A  zonelist = &NODE_DATA(node)->node_zonelists[0];
> > + A  A  A  /*
> > + A  A  A  A * Base/nr_page/end should be aligned to MAX_ORDER
> > + A  A  A  A */
> > + A  A  A  found = 0;
> > +
> > + A  A  A  if (align_order < MAX_ORDER)
> > + A  A  A  A  A  A  A  align_order = MAX_ORDER;
> > +
> > + A  A  A  align_mask = (1 << align_order) - 1;
> > + A  A  A  /*
> > + A  A  A  A * We allocates MAX_ORDER aligned pages and cut tail pages later.
> > + A  A  A  A */
> > + A  A  A  aligned_pages = ALIGN(nr_pages, (1 << MAX_ORDER));
> > + A  A  A  /*
> > + A  A  A  A * If end - base == nr_pages, we can't search range. base must be
> > + A  A  A  A * aligned.
> > + A  A  A  A */
> > + A  A  A  if ((end - base == nr_pages) && (base & align_mask))
> > + A  A  A  A  A  A  A  return NULL;
> > +
> > + A  A  A  base = ALIGN(base, (1 << align_order));
> > + A  A  A  if ((end <= base) || (end - base < aligned_pages))
> > + A  A  A  A  A  A  A  return NULL;
> > +
> > + A  A  A  /*
> > + A  A  A  A * searching contig memory range within [pos, end).
> > + A  A  A  A * pos is updated at migration failure to find next chunk in zone.
> > + A  A  A  A * pos is reset to the base at searching next zone.
> > + A  A  A  A * (see for_each_zone_zonelist_nodemask in mmzone.h)
> > + A  A  A  A *
> > + A  A  A  A * Note: we cannot assume zones/nodes are in linear memory layout.
> > + A  A  A  A */
> > + A  A  A  z = first_zones_zonelist(zonelist, highzone_idx, mask, &zone);
> > + A  A  A  pos = base;
> > +retry:
> > + A  A  A  if (!zone)
> > + A  A  A  A  A  A  A  return NULL;
> > +
> > + A  A  A  zone_start = ALIGN(zone->zone_start_pfn, 1 << align_order);
> > + A  A  A  zone_end = zone->zone_start_pfn + zone->spanned_pages;
> > +
> > + A  A  A  /* check [pos, end) is in this zone. */
> > + A  A  A  if ((pos >= end) ||
> > + A  A  A  A  A  A (__calc_search_range(pos, end, aligned_pages,
> > + A  A  A  A  A  A  A  A  A  A  A  zone_start, zone_end, &rs, &re))) {
> > +next_zone:
> > + A  A  A  A  A  A  A  /* go to the next zone */
> > + A  A  A  A  A  A  A  z = next_zones_zonelist(++z, highzone_idx, mask, &zone);
> > + A  A  A  A  A  A  A  /* reset the pos */
> > + A  A  A  A  A  A  A  pos = base;
> > + A  A  A  A  A  A  A  goto retry;
> > + A  A  A  }
> > + A  A  A  /* [pos, end) is trimmed to [rs, re) in this zone. */
> > + A  A  A  pos = rs;
> 
> The 'pos' doesn't used any more at below.
> 
Ah, yes. I'll check this was for what and remove this.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
  2010-11-23 15:46       ` Michał Nazarewicz
@ 2010-11-24  0:36         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-24  0:36 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: Andrew Morton, linux-mm, linux-kernel, minchan.kim, Bob Liu,
	fujita.tomonori, pawel, andi.kleen, felipe.contreras,
	kosaki.motohiro, Marek Szyprowski

On Tue, 23 Nov 2010 16:46:03 +0100
Michał Nazarewicz <m.nazarewicz@samsung.com> wrote:

> A few things than:
> 
> 1. As Felipe mentioned, on ARM it is often desired to have the memory
>     mapped as non-cacheable, which most often mean that the memory never
>     reaches the page allocator.  This means, that alloc_contig_pages()
>     would not be suitable for cases where one needs such memory.
> 
>     Or could this be overcome by adding the memory back as highmem?  But
>     then, it would force to compile in highmem support even if platform
>     does not really need it.
> 
> 2. Device drivers should not by themselves know what ranges of memory to
>     allocate memory from.  Moreover, some device drivers could require
>     allocation different buffers from different ranges.  As such, this
>     would require some management code on top of alloc_contig_pages().
> 
> 3. When posting hwmem, Johan Mossberg mentioned that he'd like to see
>     notion of "pinning" chunks (so that not-pinned chunks can be moved
>     around when hardware does not use them to defragment memory).  This
>     would again require some management code on top of
>     alloc_contig_pages().
> 
> 4. I might be mistaken here, but the way I understand ZONE_MOVABLE work
>     is that it is cut of from the end of memory.  Or am I talking nonsense?
>     My concern is that at least one chip I'm working with requires
>     allocations from different memory banks which would basically mean that
>     there would have to be two movable zones, ie:
> 
>     +-------------------+-------------------+
>     | Memory Bank #1    | Memory Bank #2    |
>     +---------+---------+---------+---------+
>     | normal  | movable | normal  | movable |
>     +---------+---------+---------+---------+
> 
yes.

> So even though I'm personally somehow drawn by alloc_contig_pages()'s
> simplicity (compared to CMA at least), those quick thoughts make me think
> that alloc_contig_pages() would work rather as a backend (as Kamezawa
> mentioned) for some, maybe even tiny but still present, management code
> which would handle "marking memory fragments as ZONE_MOVABLE" (whatever
> that would involve) and deciding which memory ranges drivers can allocate
> from.
> 
> I'm also wondering whether alloc_contig_pages()'s first-fit is suitable but
> that probably cannot be judged without some benchmarks.
> 

I'll continue to update patches, you can freely reuse my code and integrate
this set to yours. I works for this firstly for EMBEDED but I want this to be
a _generic_ function for gerenal purpose architecture.
There may be guys who want 1G page on a host with tons of free memory.


Thanks,
-Kame
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] big chunk memory allocator v4
@ 2010-11-24  0:36         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-24  0:36 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: Andrew Morton, linux-mm, linux-kernel, minchan.kim, Bob Liu,
	fujita.tomonori, pawel, andi.kleen, felipe.contreras,
	kosaki.motohiro, Marek Szyprowski

On Tue, 23 Nov 2010 16:46:03 +0100
MichaA? Nazarewicz <m.nazarewicz@samsung.com> wrote:

> A few things than:
> 
> 1. As Felipe mentioned, on ARM it is often desired to have the memory
>     mapped as non-cacheable, which most often mean that the memory never
>     reaches the page allocator.  This means, that alloc_contig_pages()
>     would not be suitable for cases where one needs such memory.
> 
>     Or could this be overcome by adding the memory back as highmem?  But
>     then, it would force to compile in highmem support even if platform
>     does not really need it.
> 
> 2. Device drivers should not by themselves know what ranges of memory to
>     allocate memory from.  Moreover, some device drivers could require
>     allocation different buffers from different ranges.  As such, this
>     would require some management code on top of alloc_contig_pages().
> 
> 3. When posting hwmem, Johan Mossberg mentioned that he'd like to see
>     notion of "pinning" chunks (so that not-pinned chunks can be moved
>     around when hardware does not use them to defragment memory).  This
>     would again require some management code on top of
>     alloc_contig_pages().
> 
> 4. I might be mistaken here, but the way I understand ZONE_MOVABLE work
>     is that it is cut of from the end of memory.  Or am I talking nonsense?
>     My concern is that at least one chip I'm working with requires
>     allocations from different memory banks which would basically mean that
>     there would have to be two movable zones, ie:
> 
>     +-------------------+-------------------+
>     | Memory Bank #1    | Memory Bank #2    |
>     +---------+---------+---------+---------+
>     | normal  | movable | normal  | movable |
>     +---------+---------+---------+---------+
> 
yes.

> So even though I'm personally somehow drawn by alloc_contig_pages()'s
> simplicity (compared to CMA at least), those quick thoughts make me think
> that alloc_contig_pages() would work rather as a backend (as Kamezawa
> mentioned) for some, maybe even tiny but still present, management code
> which would handle "marking memory fragments as ZONE_MOVABLE" (whatever
> that would involve) and deciding which memory ranges drivers can allocate
> from.
> 
> I'm also wondering whether alloc_contig_pages()'s first-fit is suitable but
> that probably cannot be judged without some benchmarks.
> 

I'll continue to update patches, you can freely reuse my code and integrate
this set to yours. I works for this firstly for EMBEDED but I want this to be
a _generic_ function for gerenal purpose architecture.
There may be guys who want 1G page on a host with tons of free memory.


Thanks,
-Kame
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2010-11-24  0:42 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-19  8:10 [PATCH 0/4] big chunk memory allocator v4 KAMEZAWA Hiroyuki
2010-11-19  8:10 ` KAMEZAWA Hiroyuki
2010-11-19  8:12 ` [PATCH 1/4] alloc_contig_pages() move some functions to page_isolation.c KAMEZAWA Hiroyuki
2010-11-19  8:12   ` KAMEZAWA Hiroyuki
2010-11-21 15:07   ` Minchan Kim
2010-11-21 15:07     ` Minchan Kim
2010-11-19  8:14 ` [PATCH 2/4] alloc_contig_pages() find appropriate physical memory range KAMEZAWA Hiroyuki
2010-11-19  8:14   ` KAMEZAWA Hiroyuki
2010-11-21 15:21   ` Minchan Kim
2010-11-21 15:21     ` Minchan Kim
2010-11-22  0:11     ` KAMEZAWA Hiroyuki
2010-11-22  0:11       ` KAMEZAWA Hiroyuki
2010-11-22 11:20   ` Minchan Kim
2010-11-22 11:20     ` Minchan Kim
2010-11-24  0:15     ` KAMEZAWA Hiroyuki
2010-11-24  0:15       ` KAMEZAWA Hiroyuki
2010-11-19  8:15 ` [PATCH 3/4] alloc_contig_pages() allocate big chunk memory using migration KAMEZAWA Hiroyuki
2010-11-19  8:15   ` KAMEZAWA Hiroyuki
2010-11-21 15:25   ` Minchan Kim
2010-11-21 15:25     ` Minchan Kim
2010-11-22  0:13     ` KAMEZAWA Hiroyuki
2010-11-22  0:13       ` KAMEZAWA Hiroyuki
2010-11-22 11:44   ` Minchan Kim
2010-11-22 11:44     ` Minchan Kim
2010-11-24  0:20     ` KAMEZAWA Hiroyuki
2010-11-24  0:20       ` KAMEZAWA Hiroyuki
2010-11-19  8:16 ` [PATCH 4/4] alloc_contig_pages() use better allocation function for migration KAMEZAWA Hiroyuki
2010-11-19  8:16   ` KAMEZAWA Hiroyuki
2010-11-22 12:01   ` Minchan Kim
2010-11-22 12:01     ` Minchan Kim
2010-11-19 20:56 ` [PATCH 0/4] big chunk memory allocator v4 Andrew Morton
2010-11-19 20:56   ` Andrew Morton
2010-11-22  0:04   ` KAMEZAWA Hiroyuki
2010-11-22  0:04     ` KAMEZAWA Hiroyuki
2010-11-23 15:46     ` Michał Nazarewicz
2010-11-23 15:46       ` Michał Nazarewicz
2010-11-24  0:36       ` KAMEZAWA Hiroyuki
2010-11-24  0:36         ` KAMEZAWA Hiroyuki
2010-11-22  0:30   ` Felipe Contreras
2010-11-22  0:30     ` Felipe Contreras
2010-11-22  8:59   ` Kleen, Andi
2010-11-22  8:59     ` Kleen, Andi
2010-11-23 15:44     ` Michał Nazarewicz
2010-11-23 15:44       ` Michał Nazarewicz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.