All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-26 10:00 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-26 10:00 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

Hi, here is version 2.

I only did small test and it seems to work (but I think there will be bug...)
I post this now just because I'll be out of office 10/31-11/15 with ksummit and
a private trip.

Any comments are welcome but please see the interface is enough for use cases or
not.  For example) If MAX_ORDER alignment is too bad, I need to rewrite almost
all code.

Now interface is:


struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
                        unsigned long nr_pages, int align_order,
                        int node, gfp_t gfpflag, nodemask_t *mask)

 * @base: the lowest pfn which caller wants.
 * @end:  the highest pfn which caller wants.
 * @nr_pages: the length of a chunk of pages to be allocated.
 * @align_order: alignment of start address of returned chunk in order.
 *   Returned' page's order will be aligned to (1 << align_order).If smaller
 *   than MAX_ORDER, it's raised to MAX_ORDER.
 * @node: allocate near memory to the node, If -1, current node is used.
 * @gfpflag: see include/linux/gfp.h
 * @nodemask: allocate memory within the nodemask.

If the caller wants a FIXED address, set end - base == nr_pages.

The patch is based onto the latest mmotm + Bob's 3 patches for fixing
memory_hotplug.c (they are queued.)

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-26 10:00 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-26 10:00 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

Hi, here is version 2.

I only did small test and it seems to work (but I think there will be bug...)
I post this now just because I'll be out of office 10/31-11/15 with ksummit and
a private trip.

Any comments are welcome but please see the interface is enough for use cases or
not.  For example) If MAX_ORDER alignment is too bad, I need to rewrite almost
all code.

Now interface is:


struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
                        unsigned long nr_pages, int align_order,
                        int node, gfp_t gfpflag, nodemask_t *mask)

 * @base: the lowest pfn which caller wants.
 * @end:  the highest pfn which caller wants.
 * @nr_pages: the length of a chunk of pages to be allocated.
 * @align_order: alignment of start address of returned chunk in order.
 *   Returned' page's order will be aligned to (1 << align_order).If smaller
 *   than MAX_ORDER, it's raised to MAX_ORDER.
 * @node: allocate near memory to the node, If -1, current node is used.
 * @gfpflag: see include/linux/gfp.h
 * @nodemask: allocate memory within the nodemask.

If the caller wants a FIXED address, set end - base == nr_pages.

The patch is based onto the latest mmotm + Bob's 3 patches for fixing
memory_hotplug.c (they are queued.)

Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 1/3] move code from memory_hotplug to page_isolation
  2010-10-26 10:00 ` KAMEZAWA Hiroyuki
@ 2010-10-26 10:02   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-26 10:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified range
of pfn. So, some of core logics can be used for other purpose as
allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Changelog: 2010/10/26
 - adjusted to mmotm-1024 + Bob's 3 clean ups.
Changelog: 2010/10/21
 - adjusted to mmotm-1020

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |    7 ++
 mm/memory_hotplug.c            |  108 ---------------------------------------
 mm/page_isolation.c            |  112 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+), 108 deletions(-)

Index: mmotm-1024/include/linux/page-isolation.h
===================================================================
--- mmotm-1024.orig/include/linux/page-isolation.h
+++ mmotm-1024/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
Index: mmotm-1024/mm/memory_hotplug.c
===================================================================
--- mmotm-1024.orig/mm/memory_hotplug.c
+++ mmotm-1024/mm/memory_hotplug.c
@@ -617,114 +617,6 @@ int is_mem_section_removable(unsigned lo
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!page_count(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			/* Becasue we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
Index: mmotm-1024/mm/page_isolation.c
===================================================================
--- mmotm-1024.orig/mm/page_isolation.c
+++ mmotm-1024/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,111 @@ int test_pages_isolated(unsigned long st
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!page_count(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 1/3] move code from memory_hotplug to page_isolation
@ 2010-10-26 10:02   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-26 10:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified range
of pfn. So, some of core logics can be used for other purpose as
allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Changelog: 2010/10/26
 - adjusted to mmotm-1024 + Bob's 3 clean ups.
Changelog: 2010/10/21
 - adjusted to mmotm-1020

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |    7 ++
 mm/memory_hotplug.c            |  108 ---------------------------------------
 mm/page_isolation.c            |  112 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+), 108 deletions(-)

Index: mmotm-1024/include/linux/page-isolation.h
===================================================================
--- mmotm-1024.orig/include/linux/page-isolation.h
+++ mmotm-1024/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
Index: mmotm-1024/mm/memory_hotplug.c
===================================================================
--- mmotm-1024.orig/mm/memory_hotplug.c
+++ mmotm-1024/mm/memory_hotplug.c
@@ -617,114 +617,6 @@ int is_mem_section_removable(unsigned lo
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct page *page;
-	int i;
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += MAX_ORDER_NR_PAGES) {
-		i = 0;
-		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
-		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
-			i++;
-		if (i == MAX_ORDER_NR_PAGES)
-			continue;
-		page = pfn_to_page(pfn + i);
-		if (zone && page_zone(page) != zone)
-			return 0;
-		zone = page_zone(page);
-	}
-	return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-	unsigned long pfn;
-	struct page *page;
-	for (pfn = start; pfn < end; pfn++) {
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageLRU(page))
-				return pfn;
-		}
-	}
-	return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-	/* This should be improooooved!! */
-	return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
-	int ret = 0;
-	LIST_HEAD(source);
-
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
-		if (!pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		if (!page_count(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can only deal with pages on
-		 * LRU.
-		 */
-		ret = isolate_lru_page(page);
-		if (!ret) { /* Success */
-			list_add_tail(&page->lru, &source);
-			move_pages--;
-			inc_zone_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
-
-		} else {
-#ifdef CONFIG_DEBUG_VM
-			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
-			       pfn);
-			dump_page(page);
-#endif
-			/* Becasue we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
-		}
-	}
-	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_lru_pages(&source);
-			goto out;
-		}
-		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
-		if (ret)
-			putback_lru_pages(&source);
-	}
-out:
-	return ret;
-}
-
-/*
  * remove from free_area[] and mark all as Reserved.
  */
 static int
Index: mmotm-1024/mm/page_isolation.c
===================================================================
--- mmotm-1024.orig/mm/page_isolation.c
+++ mmotm-1024/mm/page_isolation.c
@@ -5,6 +5,9 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 static inline struct page *
@@ -139,3 +142,111 @@ int test_pages_isolated(unsigned long st
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct zone *zone = NULL;
+	struct page *page;
+	int i;
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		i = 0;
+		/* This is just a CONFIG_HOLES_IN_ZONE check.*/
+		while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+			i++;
+		if (i == MAX_ORDER_NR_PAGES)
+			continue;
+		page = pfn_to_page(pfn + i);
+		if (zone && page_zone(page) != zone)
+			return 0;
+		zone = page_zone(page);
+	}
+	return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+	/* This should be improooooved!! */
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!page_count(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page);
+		if (!ret) { /* Success */
+			list_add_tail(&page->lru, &source);
+			move_pages--;
+			inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
+
+		} else {
+#ifdef CONFIG_DEBUG_VM
+			printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+			       pfn);
+			dump_page(page);
+#endif
+			/* Because we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page)) {
+				not_managed++;
+				ret = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!list_empty(&source)) {
+		if (not_managed) {
+			putback_lru_pages(&source);
+			goto out;
+		}
+		/* this function returns # of failed pages */
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+		if (ret)
+			putback_lru_pages(&source);
+	}
+out:
+	return ret;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 2/3] a help function for find physically contiguous block.
  2010-10-26 10:00 ` KAMEZAWA Hiroyuki
@ 2010-10-26 10:04   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-26 10:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Unlike memory hotplug, at an allocation of contigous memory range, address
may not be a problem. IOW, if a requester of memory wants to allocate 100M of
of contigous memory, placement of allocated memory may not be a problem.
So, "finding a range of memory which seems to be MOVABLE" is required.

This patch adds a functon to isolate a length of memory within [start, end).
This function returns a pfn which is 1st page of isolated contigous chunk
of given length within [start, end).

After isolation, free memory within this area will never be allocated.
But some pages will remain as "Used/LRU" pages. They should be dropped by
page reclaim or migration.

Changelog:
 - zone is added to the argument.
 - fixed a case that zones are not in linear.
 - added zone->lock.


Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/page_isolation.c |  148 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 148 insertions(+)

Index: mmotm-1024/mm/page_isolation.c
===================================================================
--- mmotm-1024.orig/mm/page_isolation.c
+++ mmotm-1024/mm/page_isolation.c
@@ -7,6 +7,7 @@
 #include <linux/pageblock-flags.h>
 #include <linux/memcontrol.h>
 #include <linux/migrate.h>
+#include <linux/memory_hotplug.h>
 #include <linux/mm_inline.h>
 #include "internal.h"
 
@@ -250,3 +251,150 @@ int do_migrate_range(unsigned long start
 out:
 	return ret;
 }
+
+/*
+ * Functions for getting contiguous MOVABLE pages in a zone.
+ */
+struct page_range {
+	unsigned long base; /* Base address of searching contigouous block */
+	unsigned long end;
+	unsigned long pages;/* Length of contiguous block */
+	int align_order;
+	unsigned long align_mask;
+};
+
+int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
+{
+	struct page_range *blockinfo = arg;
+	unsigned long end;
+
+	end = pfn + nr_pages;
+	pfn = ALIGN(pfn, 1 << blockinfo->align_order);
+	end = end & ~(MAX_ORDER_NR_PAGES - 1);
+
+	if (end < pfn)
+		return 0;
+	if (end - pfn >= blockinfo->pages) {
+		blockinfo->base = pfn;
+		blockinfo->end = end;
+		return 1;
+	}
+	return 0;
+}
+
+static void __trim_zone(struct zone *zone, struct page_range *range)
+{
+	unsigned long pfn;
+	/*
+ 	 * skip pages which dones'nt under the zone.
+ 	 * There are some archs which zones are not in linear layout.
+	 */
+	if (page_zone(pfn_to_page(range->base)) != zone) {
+		for (pfn = range->base;
+			pfn < range->end;
+			pfn += MAX_ORDER_NR_PAGES) {
+			if (page_zone(pfn_to_page(pfn)) == zone)
+				break;
+		}
+		range->base = min(pfn, range->end);
+	}
+	/* Here, range-> base is in the zone if range->base != range->end */
+	for (pfn = range->base;
+	     pfn < range->end;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		if (zone != page_zone(pfn_to_page(pfn))) {
+			pfn = pfn - MAX_ORDER_NR_PAGES;
+			break;
+		}
+	}
+	range->end = min(pfn, range->end);
+	return;
+}
+
+/*
+ * This function is for finding a contiguous memory block which has length
+ * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
+ * and return the first page's pfn.
+ * This checks all pages in the returned range is free of Pg_LRU. To reduce
+ * the risk of false-positive testing, lru_add_drain_all() should be called
+ * before this function to reduce pages on pagevec for zones.
+ */
+
+static unsigned long find_contig_block(unsigned long base,
+		unsigned long end, unsigned long pages,
+		int align_order, struct zone *zone)
+{
+	unsigned long pfn, pos;
+	struct page_range blockinfo;
+	int ret;
+
+	VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
+	VM_BUG_ON(base & ((1 << align_order) - 1));
+retry:
+	blockinfo.base = base;
+	blockinfo.end = end;
+	blockinfo.pages = pages;
+	blockinfo.align_order = align_order;
+	blockinfo.align_mask = (1 << align_order) - 1;
+	/*
+	 * At first, check physical page layout and skip memory holes.
+	 */
+	ret = walk_system_ram_range(base, end - base, &blockinfo,
+		__get_contig_block);
+	if (!ret)
+		return 0;
+	/* check contiguous pages in a zone */
+	__trim_zone(zone, &blockinfo);
+
+	/*
+	 * Ok, we found contiguous memory chunk of size. Isolate it.
+	 * We just search MAX_ORDER aligned range.
+	 */
+	for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
+	     pfn += (1 << align_order)) {
+		struct zone *z = page_zone(pfn_to_page(pfn));
+
+		spin_lock_irq(&z->lock);
+		pos = pfn;
+		/*
+		 * Check the range only contains free pages or LRU pages.
+		 */
+		while (pos < pfn + pages) {
+			struct page *p;
+
+			if (!pfn_valid_within(pos))
+				break;
+			p = pfn_to_page(pos);
+			if (PageReserved(p))
+				break;
+			if (!page_count(p)) {
+				if (!PageBuddy(p))
+					pos++;
+				else if (PageBuddy(p)) {
+					int order = page_order(p);
+					pos += (1 << order);
+				}
+			} else if (PageLRU(p)) {
+				pos++;
+			} else
+				break;
+		}
+		spin_unlock_irq(&z->lock);
+		if ((pos == pfn + pages) &&
+			!start_isolate_page_range(pfn, pfn + pages))
+				return pfn;
+		if (pos & ((1 << align_order) - 1))
+			pfn = ALIGN(pos, (1 << align_order));
+		else
+			pfn = pos + (1 << align_order);
+		cond_resched();
+	}
+
+	/* failed */
+	if (blockinfo.end + pages <= end) {
+		/* Move base address and find the next block of RAM. */
+		base = blockinfo.end;
+		goto retry;
+	}
+	return 0;
+}


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 2/3] a help function for find physically contiguous block.
@ 2010-10-26 10:04   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-26 10:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Unlike memory hotplug, at an allocation of contigous memory range, address
may not be a problem. IOW, if a requester of memory wants to allocate 100M of
of contigous memory, placement of allocated memory may not be a problem.
So, "finding a range of memory which seems to be MOVABLE" is required.

This patch adds a functon to isolate a length of memory within [start, end).
This function returns a pfn which is 1st page of isolated contigous chunk
of given length within [start, end).

After isolation, free memory within this area will never be allocated.
But some pages will remain as "Used/LRU" pages. They should be dropped by
page reclaim or migration.

Changelog:
 - zone is added to the argument.
 - fixed a case that zones are not in linear.
 - added zone->lock.


Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/page_isolation.c |  148 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 148 insertions(+)

Index: mmotm-1024/mm/page_isolation.c
===================================================================
--- mmotm-1024.orig/mm/page_isolation.c
+++ mmotm-1024/mm/page_isolation.c
@@ -7,6 +7,7 @@
 #include <linux/pageblock-flags.h>
 #include <linux/memcontrol.h>
 #include <linux/migrate.h>
+#include <linux/memory_hotplug.h>
 #include <linux/mm_inline.h>
 #include "internal.h"
 
@@ -250,3 +251,150 @@ int do_migrate_range(unsigned long start
 out:
 	return ret;
 }
+
+/*
+ * Functions for getting contiguous MOVABLE pages in a zone.
+ */
+struct page_range {
+	unsigned long base; /* Base address of searching contigouous block */
+	unsigned long end;
+	unsigned long pages;/* Length of contiguous block */
+	int align_order;
+	unsigned long align_mask;
+};
+
+int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
+{
+	struct page_range *blockinfo = arg;
+	unsigned long end;
+
+	end = pfn + nr_pages;
+	pfn = ALIGN(pfn, 1 << blockinfo->align_order);
+	end = end & ~(MAX_ORDER_NR_PAGES - 1);
+
+	if (end < pfn)
+		return 0;
+	if (end - pfn >= blockinfo->pages) {
+		blockinfo->base = pfn;
+		blockinfo->end = end;
+		return 1;
+	}
+	return 0;
+}
+
+static void __trim_zone(struct zone *zone, struct page_range *range)
+{
+	unsigned long pfn;
+	/*
+ 	 * skip pages which dones'nt under the zone.
+ 	 * There are some archs which zones are not in linear layout.
+	 */
+	if (page_zone(pfn_to_page(range->base)) != zone) {
+		for (pfn = range->base;
+			pfn < range->end;
+			pfn += MAX_ORDER_NR_PAGES) {
+			if (page_zone(pfn_to_page(pfn)) == zone)
+				break;
+		}
+		range->base = min(pfn, range->end);
+	}
+	/* Here, range-> base is in the zone if range->base != range->end */
+	for (pfn = range->base;
+	     pfn < range->end;
+	     pfn += MAX_ORDER_NR_PAGES) {
+		if (zone != page_zone(pfn_to_page(pfn))) {
+			pfn = pfn - MAX_ORDER_NR_PAGES;
+			break;
+		}
+	}
+	range->end = min(pfn, range->end);
+	return;
+}
+
+/*
+ * This function is for finding a contiguous memory block which has length
+ * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
+ * and return the first page's pfn.
+ * This checks all pages in the returned range is free of Pg_LRU. To reduce
+ * the risk of false-positive testing, lru_add_drain_all() should be called
+ * before this function to reduce pages on pagevec for zones.
+ */
+
+static unsigned long find_contig_block(unsigned long base,
+		unsigned long end, unsigned long pages,
+		int align_order, struct zone *zone)
+{
+	unsigned long pfn, pos;
+	struct page_range blockinfo;
+	int ret;
+
+	VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
+	VM_BUG_ON(base & ((1 << align_order) - 1));
+retry:
+	blockinfo.base = base;
+	blockinfo.end = end;
+	blockinfo.pages = pages;
+	blockinfo.align_order = align_order;
+	blockinfo.align_mask = (1 << align_order) - 1;
+	/*
+	 * At first, check physical page layout and skip memory holes.
+	 */
+	ret = walk_system_ram_range(base, end - base, &blockinfo,
+		__get_contig_block);
+	if (!ret)
+		return 0;
+	/* check contiguous pages in a zone */
+	__trim_zone(zone, &blockinfo);
+
+	/*
+	 * Ok, we found contiguous memory chunk of size. Isolate it.
+	 * We just search MAX_ORDER aligned range.
+	 */
+	for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
+	     pfn += (1 << align_order)) {
+		struct zone *z = page_zone(pfn_to_page(pfn));
+
+		spin_lock_irq(&z->lock);
+		pos = pfn;
+		/*
+		 * Check the range only contains free pages or LRU pages.
+		 */
+		while (pos < pfn + pages) {
+			struct page *p;
+
+			if (!pfn_valid_within(pos))
+				break;
+			p = pfn_to_page(pos);
+			if (PageReserved(p))
+				break;
+			if (!page_count(p)) {
+				if (!PageBuddy(p))
+					pos++;
+				else if (PageBuddy(p)) {
+					int order = page_order(p);
+					pos += (1 << order);
+				}
+			} else if (PageLRU(p)) {
+				pos++;
+			} else
+				break;
+		}
+		spin_unlock_irq(&z->lock);
+		if ((pos == pfn + pages) &&
+			!start_isolate_page_range(pfn, pfn + pages))
+				return pfn;
+		if (pos & ((1 << align_order) - 1))
+			pfn = ALIGN(pos, (1 << align_order));
+		else
+			pfn = pos + (1 << align_order);
+		cond_resched();
+	}
+
+	/* failed */
+	if (blockinfo.end + pages <= end) {
+		/* Move base address and find the next block of RAM. */
+		base = blockinfo.end;
+		goto retry;
+	}
+	return 0;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 3/3] a big contig memory allocator
  2010-10-26 10:00 ` KAMEZAWA Hiroyuki
@ 2010-10-26 10:08   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-26 10:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Add an function to allocate contiguous memory larger than MAX_ORDER.
The main difference between usual page allocator is that this uses
memory offline technique (Isolate pages and migrate remaining pages.).

I think this is not 100% solution because we can't avoid fragmentation,
but we have kernelcore= boot option and can create MOVABLE zone. That
helps us to allow allocate a contiguous range on demand.

The new function is

  alloc_contig_pages(base, end, nr_pages, alignment)

This function will allocate contiguous pages of nr_pages from the range
[base, end). If [base, end) is bigger than nr_pages, some pfn which
meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
it will be raised to be MAX_ORDER.

__alloc_contig_pages() has much more arguments.

Some drivers allocates contig pages by bootmem or hiding some memory
from the kernel at boot. But if contig pages are necessary only in some
situation, kernelcore= boot option and using page migration is a choice.

Note: I'm not 100% sure __GFP_HARDWALL check is required or not..


Changelog: 2010-10-26
 - support gfp_t
 - support zonelist/nodemask
 - support [base, end) 
 - support alignment

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |   15 ++
 mm/page_alloc.c                |   29 ++++
 mm/page_isolation.c            |  239 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 283 insertions(+)

Index: mmotm-1024/mm/page_isolation.c
===================================================================
--- mmotm-1024.orig/mm/page_isolation.c
+++ mmotm-1024/mm/page_isolation.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/swap.h>
 #include <linux/memcontrol.h>
 #include <linux/migrate.h>
 #include <linux/memory_hotplug.h>
@@ -398,3 +399,241 @@ retry:
 	}
 	return 0;
 }
+
+/*
+ * Comparing user specified [user_start, user_end) with physical memory layout
+ * [phys_start, phys_end). If no intersection of length nr_pages, return 1.
+ * If there is an intersection, return 0 and fill range in [*start, *end)
+ */
+static int
+__calc_search_range(unsigned long user_start, unsigned long user_end,
+		unsigned long nr_pages,
+		unsigned long phys_start, unsigned long phys_end,
+		unsigned long *start, unsigned long *end)
+{
+	if ((user_start >= phys_end) || (user_end <= phys_start))
+		return 1;
+	if (user_start <= phys_start) {
+		*start = phys_start;
+		*end = min(user_end, phys_end);
+	} else {
+		*start = user_start;
+		*end = min(user_end, phys_end);
+	}
+	if (*end - *start < nr_pages)
+		return 1;
+	return 0;
+}
+
+
+/**
+ * __alloc_contig_pages - allocate a contiguous physical pages
+ * @base: the lowest pfn which caller wants.
+ * @end:  the highest pfn which caller wants.
+ * @nr_pages: the length of a chunk of pages to be allocated.
+ * @align_order: alignment of start address of returned chunk in order.
+ *   Returned' page's order will be aligned to (1 << align_order).If smaller
+ *   than MAX_ORDER, it's raised to MAX_ORDER.
+ * @node: allocate near memory to the node, If -1, current node is used.
+ * @gfpflag: used to specify what zone the memory should be from.
+ * @nodemask: allocate memory within the nodemask.
+ *
+ * Search a memory range [base, end) and allocates physically contiguous
+ * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
+ * be allocated
+ *
+ * This returns a page of the beginning of contiguous block. At failure, NULL
+ * is returned.
+ *
+ * Limitation: at allocation, nr_pages may be increased to be aligned to
+ * MAX_ORDER before searching a range. So, even if there is a enough chunk
+ * for nr_pages, it may not be able to be allocated. Extra tail pages of
+ * allocated chunk is returned to buddy allocator before returning the caller.
+ */
+
+#define MIGRATION_RETRY	(5)
+struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
+			unsigned long nr_pages, int align_order,
+			int node, gfp_t gfpflag, nodemask_t *mask)
+{
+	unsigned long found, aligned_pages, start;
+	struct page *ret = NULL;
+	int migration_failed;
+	bool no_search = false;
+	unsigned long align_mask;
+	struct zoneref *z;
+	struct zone *zone;
+	struct zonelist *zonelist;
+	enum zone_type highzone_idx = gfp_zone(gfpflag);
+	unsigned long zone_start, zone_end, rs, re, pos;
+
+	if (node == -1)
+		node = numa_node_id();
+
+	/* check unsupported flags */
+	if (gfpflag & __GFP_NORETRY)
+		return NULL;
+	if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)) !=
+		(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL))
+		return NULL;
+
+	if (gfpflag & __GFP_THISNODE)
+		zonelist = &NODE_DATA(node)->node_zonelists[1];
+	else
+		zonelist = &NODE_DATA(node)->node_zonelists[0];
+	/*
+	 * Base/nr_page/end should be aligned to MAX_ORDER
+	 */
+	found = 0;
+
+	if (align_order < MAX_ORDER)
+		align_order = MAX_ORDER;
+
+	align_mask = (1 << align_order) - 1;
+	if (end - base == nr_pages)
+		no_search = true;
+	/*
+	 * We allocates MAX_ORDER aligned pages and cut tail pages later.
+	 */
+	aligned_pages = ALIGN(nr_pages, (1 << MAX_ORDER));
+	/*
+	 * If end - base == nr_pages, we can't search range. base must be
+	 * aligned.
+	 */
+	if ((end - base == nr_pages) && (base & align_mask))
+		return NULL;
+
+	base = ALIGN(base, (1 << align_order));
+	if ((end <= base) || (end - base < aligned_pages))
+		return NULL;
+
+	/*
+	 * searching contig memory range within [pos, end).
+	 * pos is updated at migration failure to find next chunk in zone.
+	 * pos is reset to the base at searching next zone.
+	 * (see for_each_zone_zonelist_nodemask in mmzone.h)
+	 *
+	 * Note: we cannot assume zones/nodes are in linear memory layout.
+	 */
+	z = first_zones_zonelist(zonelist, highzone_idx, mask, &zone);
+	pos = base;
+retry:
+	if (!zone)
+		return NULL;
+
+	zone_start = ALIGN(zone->zone_start_pfn, 1 << align_order);
+	zone_end = zone->zone_start_pfn + zone->spanned_pages;
+
+	/* check [pos, end) is in this zone. */
+	if ((pos >= end) ||
+	     (__calc_search_range(pos, end, aligned_pages,
+			zone_start, zone_end, &rs, &re))) {
+next_zone:
+		/* go to the next zone */
+		z = next_zones_zonelist(++z, highzone_idx, mask, &zone);
+		/* reset the pos */
+		pos = base;
+		goto retry;
+	}
+	/* [pos, end) is trimmed to [rs, re) in this zone. */
+	pos = rs;
+
+	found = find_contig_block(rs, re, aligned_pages, align_order, zone);
+	if (!found)
+		goto next_zone;
+
+	/*
+	 * OK, here, we have contiguous pageblock marked as "isolated"
+	 * try migration.
+	 */
+	drain_all_pages();
+	lru_add_drain_all();
+
+	/*
+	 * scan_lru_pages() finds the next PG_lru page in the range
+	 * scan_lru_pages() returns 0 when it reaches the end.
+	 */
+	migration_failed = 0;
+	rs = found;
+	re = found + aligned_pages;
+	for (rs = scan_lru_pages(rs, re);
+	     rs && rs < re;
+	     rs = scan_lru_pages(rs, re)) {
+		if (do_migrate_range(rs, re)) {
+			/* it's better to try another block ? */
+			if (++migration_failed >= MIGRATION_RETRY)
+				break;
+			/* take a rest and synchronize LRU etc. */
+			drain_all_pages();
+			lru_add_drain_all();
+		} else /* reset migration_failure counter */
+			migration_failed = 0;
+	}
+
+	if (!migration_failed) {
+		drain_all_pages();
+		lru_add_drain_all();
+	}
+	/* Check all pages are isolated */
+	if (test_pages_isolated(found, found + aligned_pages)) {
+		undo_isolate_page_range(found, aligned_pages);
+		/*
+		 * We failed at [found...found+aligned_pages) migration.
+		 * "rs" is the last pfn scan_lru_pages() found that the page
+		 * is LRU page. Update pos and try next chunk.
+		 */
+		pos = ALIGN(rs + 1, (1 << align_order));
+		goto retry; /* goto next chunk */
+	}
+	/*
+	 * OK, here, [found...found+pages) memory are isolated.
+	 * All pages in the range will be moved into the list with
+	 * page_count(page)=1.
+	 */
+	ret = pfn_to_page(found);
+	alloc_contig_freed_pages(found, found + aligned_pages, gfpflag);
+	/* unset ISOLATE */
+	undo_isolate_page_range(found, aligned_pages);
+	/* Free unnecessary pages in tail */
+	for (start = found + nr_pages; start < found + aligned_pages; start++)
+		__free_page(pfn_to_page(start));
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(__alloc_contig_pages);
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+	int i;
+	for (i = 0; i < nr_pages; i++)
+		__free_page(page + i);
+}
+EXPORT_SYMBOL_GPL(free_contig_pages);
+
+/*
+ * Allocated pages will not be MOVABLE but MOVABLE zone is a suitable
+ * for allocating big chunk. So, using ZONE_MOVABLE is a default.
+ */
+
+struct page *alloc_contig_pages(unsigned long base, unsigned long end,
+			unsigned long nr_pages, int align_order)
+{
+	return __alloc_contig_pages(base, end, nr_pages, align_order, -1,
+				GFP_KERNEL | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages);
+
+struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order)
+{
+	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, -1,
+				GFP_KERNEL | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages_host);
+
+struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
+				int align_order)
+{
+	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, nid,
+			GFP_KERNEL | __GFP_THISNODE | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages_node);
Index: mmotm-1024/include/linux/page-isolation.h
===================================================================
--- mmotm-1024.orig/include/linux/page-isolation.h
+++ mmotm-1024/include/linux/page-isolation.h
@@ -32,6 +32,8 @@ test_pages_isolated(unsigned long start_
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern void alloc_contig_freed_pages(unsigned long pfn,
+		unsigned long pages, gfp_t flag);
 
 /*
  * For migration.
@@ -41,4 +43,17 @@ int test_pages_in_a_zone(unsigned long s
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+/*
+ * For large alloc.
+ */
+struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
+				unsigned long nr_pages, int align_order,
+				int node, gfp_t flag, nodemask_t *mask);
+struct page *alloc_contig_pages(unsigned long base, unsigned long end,
+				unsigned long nr_pages, int align_order);
+struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order);
+struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
+		int align_order);
+void free_contig_pages(struct page *page, int nr_pages);
+
 #endif
Index: mmotm-1024/mm/page_alloc.c
===================================================================
--- mmotm-1024.orig/mm/page_alloc.c
+++ mmotm-1024/mm/page_alloc.c
@@ -5430,6 +5430,35 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+
+void alloc_contig_freed_pages(unsigned long pfn,  unsigned long end, gfp_t flag)
+{
+	struct page *page;
+	struct zone *zone;
+	int order;
+	unsigned long start = pfn;
+
+	zone = page_zone(pfn_to_page(pfn));
+	spin_lock_irq(&zone->lock);
+	while (pfn < end) {
+		VM_BUG_ON(!pfn_valid(pfn));
+		page = pfn_to_page(pfn);
+		VM_BUG_ON(page_count(page));
+		VM_BUG_ON(!PageBuddy(page));
+		list_del(&page->lru);
+		order = page_order(page);
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+		pfn += 1 << order;
+	}
+	spin_unlock_irq(&zone->lock);
+
+	/*After this, pages in the range can be freed one be one */
+	for (pfn = start; pfn < end; pfn++)
+		prep_new_page(pfn_to_page(pfn), 0, flag);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 3/3] a big contig memory allocator
@ 2010-10-26 10:08   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-26 10:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Add an function to allocate contiguous memory larger than MAX_ORDER.
The main difference between usual page allocator is that this uses
memory offline technique (Isolate pages and migrate remaining pages.).

I think this is not 100% solution because we can't avoid fragmentation,
but we have kernelcore= boot option and can create MOVABLE zone. That
helps us to allow allocate a contiguous range on demand.

The new function is

  alloc_contig_pages(base, end, nr_pages, alignment)

This function will allocate contiguous pages of nr_pages from the range
[base, end). If [base, end) is bigger than nr_pages, some pfn which
meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
it will be raised to be MAX_ORDER.

__alloc_contig_pages() has much more arguments.

Some drivers allocates contig pages by bootmem or hiding some memory
from the kernel at boot. But if contig pages are necessary only in some
situation, kernelcore= boot option and using page migration is a choice.

Note: I'm not 100% sure __GFP_HARDWALL check is required or not..


Changelog: 2010-10-26
 - support gfp_t
 - support zonelist/nodemask
 - support [base, end) 
 - support alignment

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |   15 ++
 mm/page_alloc.c                |   29 ++++
 mm/page_isolation.c            |  239 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 283 insertions(+)

Index: mmotm-1024/mm/page_isolation.c
===================================================================
--- mmotm-1024.orig/mm/page_isolation.c
+++ mmotm-1024/mm/page_isolation.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/swap.h>
 #include <linux/memcontrol.h>
 #include <linux/migrate.h>
 #include <linux/memory_hotplug.h>
@@ -398,3 +399,241 @@ retry:
 	}
 	return 0;
 }
+
+/*
+ * Comparing user specified [user_start, user_end) with physical memory layout
+ * [phys_start, phys_end). If no intersection of length nr_pages, return 1.
+ * If there is an intersection, return 0 and fill range in [*start, *end)
+ */
+static int
+__calc_search_range(unsigned long user_start, unsigned long user_end,
+		unsigned long nr_pages,
+		unsigned long phys_start, unsigned long phys_end,
+		unsigned long *start, unsigned long *end)
+{
+	if ((user_start >= phys_end) || (user_end <= phys_start))
+		return 1;
+	if (user_start <= phys_start) {
+		*start = phys_start;
+		*end = min(user_end, phys_end);
+	} else {
+		*start = user_start;
+		*end = min(user_end, phys_end);
+	}
+	if (*end - *start < nr_pages)
+		return 1;
+	return 0;
+}
+
+
+/**
+ * __alloc_contig_pages - allocate a contiguous physical pages
+ * @base: the lowest pfn which caller wants.
+ * @end:  the highest pfn which caller wants.
+ * @nr_pages: the length of a chunk of pages to be allocated.
+ * @align_order: alignment of start address of returned chunk in order.
+ *   Returned' page's order will be aligned to (1 << align_order).If smaller
+ *   than MAX_ORDER, it's raised to MAX_ORDER.
+ * @node: allocate near memory to the node, If -1, current node is used.
+ * @gfpflag: used to specify what zone the memory should be from.
+ * @nodemask: allocate memory within the nodemask.
+ *
+ * Search a memory range [base, end) and allocates physically contiguous
+ * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
+ * be allocated
+ *
+ * This returns a page of the beginning of contiguous block. At failure, NULL
+ * is returned.
+ *
+ * Limitation: at allocation, nr_pages may be increased to be aligned to
+ * MAX_ORDER before searching a range. So, even if there is a enough chunk
+ * for nr_pages, it may not be able to be allocated. Extra tail pages of
+ * allocated chunk is returned to buddy allocator before returning the caller.
+ */
+
+#define MIGRATION_RETRY	(5)
+struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
+			unsigned long nr_pages, int align_order,
+			int node, gfp_t gfpflag, nodemask_t *mask)
+{
+	unsigned long found, aligned_pages, start;
+	struct page *ret = NULL;
+	int migration_failed;
+	bool no_search = false;
+	unsigned long align_mask;
+	struct zoneref *z;
+	struct zone *zone;
+	struct zonelist *zonelist;
+	enum zone_type highzone_idx = gfp_zone(gfpflag);
+	unsigned long zone_start, zone_end, rs, re, pos;
+
+	if (node == -1)
+		node = numa_node_id();
+
+	/* check unsupported flags */
+	if (gfpflag & __GFP_NORETRY)
+		return NULL;
+	if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)) !=
+		(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL))
+		return NULL;
+
+	if (gfpflag & __GFP_THISNODE)
+		zonelist = &NODE_DATA(node)->node_zonelists[1];
+	else
+		zonelist = &NODE_DATA(node)->node_zonelists[0];
+	/*
+	 * Base/nr_page/end should be aligned to MAX_ORDER
+	 */
+	found = 0;
+
+	if (align_order < MAX_ORDER)
+		align_order = MAX_ORDER;
+
+	align_mask = (1 << align_order) - 1;
+	if (end - base == nr_pages)
+		no_search = true;
+	/*
+	 * We allocates MAX_ORDER aligned pages and cut tail pages later.
+	 */
+	aligned_pages = ALIGN(nr_pages, (1 << MAX_ORDER));
+	/*
+	 * If end - base == nr_pages, we can't search range. base must be
+	 * aligned.
+	 */
+	if ((end - base == nr_pages) && (base & align_mask))
+		return NULL;
+
+	base = ALIGN(base, (1 << align_order));
+	if ((end <= base) || (end - base < aligned_pages))
+		return NULL;
+
+	/*
+	 * searching contig memory range within [pos, end).
+	 * pos is updated at migration failure to find next chunk in zone.
+	 * pos is reset to the base at searching next zone.
+	 * (see for_each_zone_zonelist_nodemask in mmzone.h)
+	 *
+	 * Note: we cannot assume zones/nodes are in linear memory layout.
+	 */
+	z = first_zones_zonelist(zonelist, highzone_idx, mask, &zone);
+	pos = base;
+retry:
+	if (!zone)
+		return NULL;
+
+	zone_start = ALIGN(zone->zone_start_pfn, 1 << align_order);
+	zone_end = zone->zone_start_pfn + zone->spanned_pages;
+
+	/* check [pos, end) is in this zone. */
+	if ((pos >= end) ||
+	     (__calc_search_range(pos, end, aligned_pages,
+			zone_start, zone_end, &rs, &re))) {
+next_zone:
+		/* go to the next zone */
+		z = next_zones_zonelist(++z, highzone_idx, mask, &zone);
+		/* reset the pos */
+		pos = base;
+		goto retry;
+	}
+	/* [pos, end) is trimmed to [rs, re) in this zone. */
+	pos = rs;
+
+	found = find_contig_block(rs, re, aligned_pages, align_order, zone);
+	if (!found)
+		goto next_zone;
+
+	/*
+	 * OK, here, we have contiguous pageblock marked as "isolated"
+	 * try migration.
+	 */
+	drain_all_pages();
+	lru_add_drain_all();
+
+	/*
+	 * scan_lru_pages() finds the next PG_lru page in the range
+	 * scan_lru_pages() returns 0 when it reaches the end.
+	 */
+	migration_failed = 0;
+	rs = found;
+	re = found + aligned_pages;
+	for (rs = scan_lru_pages(rs, re);
+	     rs && rs < re;
+	     rs = scan_lru_pages(rs, re)) {
+		if (do_migrate_range(rs, re)) {
+			/* it's better to try another block ? */
+			if (++migration_failed >= MIGRATION_RETRY)
+				break;
+			/* take a rest and synchronize LRU etc. */
+			drain_all_pages();
+			lru_add_drain_all();
+		} else /* reset migration_failure counter */
+			migration_failed = 0;
+	}
+
+	if (!migration_failed) {
+		drain_all_pages();
+		lru_add_drain_all();
+	}
+	/* Check all pages are isolated */
+	if (test_pages_isolated(found, found + aligned_pages)) {
+		undo_isolate_page_range(found, aligned_pages);
+		/*
+		 * We failed at [found...found+aligned_pages) migration.
+		 * "rs" is the last pfn scan_lru_pages() found that the page
+		 * is LRU page. Update pos and try next chunk.
+		 */
+		pos = ALIGN(rs + 1, (1 << align_order));
+		goto retry; /* goto next chunk */
+	}
+	/*
+	 * OK, here, [found...found+pages) memory are isolated.
+	 * All pages in the range will be moved into the list with
+	 * page_count(page)=1.
+	 */
+	ret = pfn_to_page(found);
+	alloc_contig_freed_pages(found, found + aligned_pages, gfpflag);
+	/* unset ISOLATE */
+	undo_isolate_page_range(found, aligned_pages);
+	/* Free unnecessary pages in tail */
+	for (start = found + nr_pages; start < found + aligned_pages; start++)
+		__free_page(pfn_to_page(start));
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(__alloc_contig_pages);
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+	int i;
+	for (i = 0; i < nr_pages; i++)
+		__free_page(page + i);
+}
+EXPORT_SYMBOL_GPL(free_contig_pages);
+
+/*
+ * Allocated pages will not be MOVABLE but MOVABLE zone is a suitable
+ * for allocating big chunk. So, using ZONE_MOVABLE is a default.
+ */
+
+struct page *alloc_contig_pages(unsigned long base, unsigned long end,
+			unsigned long nr_pages, int align_order)
+{
+	return __alloc_contig_pages(base, end, nr_pages, align_order, -1,
+				GFP_KERNEL | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages);
+
+struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order)
+{
+	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, -1,
+				GFP_KERNEL | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages_host);
+
+struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
+				int align_order)
+{
+	return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, nid,
+			GFP_KERNEL | __GFP_THISNODE | __GFP_MOVABLE, NULL);
+}
+EXPORT_SYMBOL_GPL(alloc_contig_pages_node);
Index: mmotm-1024/include/linux/page-isolation.h
===================================================================
--- mmotm-1024.orig/include/linux/page-isolation.h
+++ mmotm-1024/include/linux/page-isolation.h
@@ -32,6 +32,8 @@ test_pages_isolated(unsigned long start_
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern void alloc_contig_freed_pages(unsigned long pfn,
+		unsigned long pages, gfp_t flag);
 
 /*
  * For migration.
@@ -41,4 +43,17 @@ int test_pages_in_a_zone(unsigned long s
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
 int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
+/*
+ * For large alloc.
+ */
+struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
+				unsigned long nr_pages, int align_order,
+				int node, gfp_t flag, nodemask_t *mask);
+struct page *alloc_contig_pages(unsigned long base, unsigned long end,
+				unsigned long nr_pages, int align_order);
+struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order);
+struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
+		int align_order);
+void free_contig_pages(struct page *page, int nr_pages);
+
 #endif
Index: mmotm-1024/mm/page_alloc.c
===================================================================
--- mmotm-1024.orig/mm/page_alloc.c
+++ mmotm-1024/mm/page_alloc.c
@@ -5430,6 +5430,35 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+
+void alloc_contig_freed_pages(unsigned long pfn,  unsigned long end, gfp_t flag)
+{
+	struct page *page;
+	struct zone *zone;
+	int order;
+	unsigned long start = pfn;
+
+	zone = page_zone(pfn_to_page(pfn));
+	spin_lock_irq(&zone->lock);
+	while (pfn < end) {
+		VM_BUG_ON(!pfn_valid(pfn));
+		page = pfn_to_page(pfn);
+		VM_BUG_ON(page_count(page));
+		VM_BUG_ON(!PageBuddy(page));
+		list_del(&page->lru);
+		order = page_order(page);
+		zone->free_area[order].nr_free--;
+		rmv_page_order(page);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+		pfn += 1 << order;
+	}
+	spin_unlock_irq(&zone->lock);
+
+	/*After this, pages in the range can be freed one be one */
+	for (pfn = start; pfn < end; pfn++)
+		prep_new_page(pfn_to_page(pfn), 0, flag);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
  2010-10-26 10:00 ` KAMEZAWA Hiroyuki
  (?)
@ 2010-10-27 23:22   ` Minchan Kim
  -1 siblings, 0 replies; 46+ messages in thread
From: Minchan Kim @ 2010-10-27 23:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras, linux-arm-kernel,
	Jonathan Corbet, Michal Nazarewicz, Russell King, Pawel Osciak,
	Peter Zijlstra

On Tue, Oct 26, 2010 at 7:00 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Hi, here is version 2.
>
> I only did small test and it seems to work (but I think there will be bug...)
> I post this now just because I'll be out of office 10/31-11/15 with ksummit and
> a private trip.
>
> Any comments are welcome but please see the interface is enough for use cases or
> not.  For example) If MAX_ORDER alignment is too bad, I need to rewrite almost
> all code.

First of all, thanks for the endless your effort to embedded system.
It's time for statkeholders to review this.
Cced some guys. Maybe many people of them have to attend KS.
So I hope SAMSUNG guys review this.

Maybe they can't test this since ARM doesn't support movable zone now.
(I will look into this).
As Kame said, please, review this patch whether this patch have enough
interface and meet
your requirement.
I think this can't meet _all_ of your requirements(ex, latency and
making sure getting big contiguous memory) but I believe it can meet
NOT CRITICAL many cases, I guess.

>
> Now interface is:
>
>
> struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
>                        unsigned long nr_pages, int align_order,
>                        int node, gfp_t gfpflag, nodemask_t *mask)
>
>  * @base: the lowest pfn which caller wants.
>  * @end:  the highest pfn which caller wants.
>  * @nr_pages: the length of a chunk of pages to be allocated.
>  * @align_order: alignment of start address of returned chunk in order.
>  *   Returned' page's order will be aligned to (1 << align_order).If smaller
>  *   than MAX_ORDER, it's raised to MAX_ORDER.
>  * @node: allocate near memory to the node, If -1, current node is used.
>  * @gfpflag: see include/linux/gfp.h
>  * @nodemask: allocate memory within the nodemask.
>
> If the caller wants a FIXED address, set end - base == nr_pages.
>
> The patch is based onto the latest mmotm + Bob's 3 patches for fixing
> memory_hotplug.c (they are queued.)
>
> Thanks,
> -Kame
>
>
>
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-27 23:22   ` Minchan Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Minchan Kim @ 2010-10-27 23:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras, linux-arm-kernel,
	Jonathan Corbet, Michal Nazarewicz, Russell King, Pawel Osciak,
	Peter Zijlstra

On Tue, Oct 26, 2010 at 7:00 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Hi, here is version 2.
>
> I only did small test and it seems to work (but I think there will be bug...)
> I post this now just because I'll be out of office 10/31-11/15 with ksummit and
> a private trip.
>
> Any comments are welcome but please see the interface is enough for use cases or
> not.  For example) If MAX_ORDER alignment is too bad, I need to rewrite almost
> all code.

First of all, thanks for the endless your effort to embedded system.
It's time for statkeholders to review this.
Cced some guys. Maybe many people of them have to attend KS.
So I hope SAMSUNG guys review this.

Maybe they can't test this since ARM doesn't support movable zone now.
(I will look into this).
As Kame said, please, review this patch whether this patch have enough
interface and meet
your requirement.
I think this can't meet _all_ of your requirements(ex, latency and
making sure getting big contiguous memory) but I believe it can meet
NOT CRITICAL many cases, I guess.

>
> Now interface is:
>
>
> struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
>                        unsigned long nr_pages, int align_order,
>                        int node, gfp_t gfpflag, nodemask_t *mask)
>
>  * @base: the lowest pfn which caller wants.
>  * @end:  the highest pfn which caller wants.
>  * @nr_pages: the length of a chunk of pages to be allocated.
>  * @align_order: alignment of start address of returned chunk in order.
>  *   Returned' page's order will be aligned to (1 << align_order).If smaller
>  *   than MAX_ORDER, it's raised to MAX_ORDER.
>  * @node: allocate near memory to the node, If -1, current node is used

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-27 23:22   ` Minchan Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Minchan Kim @ 2010-10-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 26, 2010 at 7:00 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Hi, here is version 2.
>
> I only did small test and it seems to work (but I think there will be bug...)
> I post this now just because I'll be out of office 10/31-11/15 with ksummit and
> a private trip.
>
> Any comments are welcome but please see the interface is enough for use cases or
> not. ?For example) If MAX_ORDER alignment is too bad, I need to rewrite almost
> all code.

First of all, thanks for the endless your effort to embedded system.
It's time for statkeholders to review this.
Cced some guys. Maybe many people of them have to attend KS.
So I hope SAMSUNG guys review this.

Maybe they can't test this since ARM doesn't support movable zone now.
(I will look into this).
As Kame said, please, review this patch whether this patch have enough
interface and meet
your requirement.
I think this can't meet _all_ of your requirements(ex, latency and
making sure getting big contiguous memory) but I believe it can meet
NOT CRITICAL many cases, I guess.

>
> Now interface is:
>
>
> struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
> ? ? ? ? ? ? ? ? ? ? ? ?unsigned long nr_pages, int align_order,
> ? ? ? ? ? ? ? ? ? ? ? ?int node, gfp_t gfpflag, nodemask_t *mask)
>
> ?* @base: the lowest pfn which caller wants.
> ?* @end: ?the highest pfn which caller wants.
> ?* @nr_pages: the length of a chunk of pages to be allocated.
> ?* @align_order: alignment of start address of returned chunk in order.
> ?* ? Returned' page's order will be aligned to (1 << align_order).If smaller
> ?* ? than MAX_ORDER, it's raised to MAX_ORDER.
> ?* @node: allocate near memory to the node, If -1, current node is used.
> ?* @gfpflag: see include/linux/gfp.h
> ?* @nodemask: allocate memory within the nodemask.
>
> If the caller wants a FIXED address, set end - base == nr_pages.
>
> The patch is based onto the latest mmotm + Bob's 3 patches for fixing
> memory_hotplug.c (they are queued.)
>
> Thanks,
> -Kame
>
>
>
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 2/3] a help function for find physically contiguous block.
  2010-10-26 10:04   ` KAMEZAWA Hiroyuki
@ 2010-10-29  3:53     ` Bob Liu
  -1 siblings, 0 replies; 46+ messages in thread
From: Bob Liu @ 2010-10-29  3:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 8239 bytes --]

On Tue, Oct 26, 2010 at 6:04 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Unlike memory hotplug, at an allocation of contigous memory range, address
> may not be a problem. IOW, if a requester of memory wants to allocate 100M of
> of contigous memory, placement of allocated memory may not be a problem.
> So, "finding a range of memory which seems to be MOVABLE" is required.
>
> This patch adds a functon to isolate a length of memory within [start, end).
> This function returns a pfn which is 1st page of isolated contigous chunk
> of given length within [start, end).
>
> After isolation, free memory within this area will never be allocated.
> But some pages will remain as "Used/LRU" pages. They should be dropped by
> page reclaim or migration.
>
> Changelog:
>  - zone is added to the argument.
>  - fixed a case that zones are not in linear.
>  - added zone->lock.
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/page_isolation.c |  148 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 148 insertions(+)
>
> Index: mmotm-1024/mm/page_isolation.c
> ===================================================================
> --- mmotm-1024.orig/mm/page_isolation.c
> +++ mmotm-1024/mm/page_isolation.c
> @@ -7,6 +7,7 @@
>  #include <linux/pageblock-flags.h>
>  #include <linux/memcontrol.h>
>  #include <linux/migrate.h>
> +#include <linux/memory_hotplug.h>
>  #include <linux/mm_inline.h>
>  #include "internal.h"
>
> @@ -250,3 +251,150 @@ int do_migrate_range(unsigned long start
>  out:
>        return ret;
>  }
> +
> +/*
> + * Functions for getting contiguous MOVABLE pages in a zone.
> + */
> +struct page_range {
> +       unsigned long base; /* Base address of searching contigouous block */
> +       unsigned long end;
> +       unsigned long pages;/* Length of contiguous block */
> +       int align_order;
> +       unsigned long align_mask;
> +};
> +
> +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> +{
> +       struct page_range *blockinfo = arg;
> +       unsigned long end;
> +
> +       end = pfn + nr_pages;
> +       pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> +       end = end & ~(MAX_ORDER_NR_PAGES - 1);
> +
> +       if (end < pfn)
> +               return 0;
> +       if (end - pfn >= blockinfo->pages) {
> +               blockinfo->base = pfn;
> +               blockinfo->end = end;
> +               return 1;
> +       }
> +       return 0;
> +}
> +
> +static void __trim_zone(struct zone *zone, struct page_range *range)
> +{
> +       unsigned long pfn;
> +       /*
> +        * skip pages which dones'nt under the zone.
> +        * There are some archs which zones are not in linear layout.
> +        */
> +       if (page_zone(pfn_to_page(range->base)) != zone) {
> +               for (pfn = range->base;
> +                       pfn < range->end;
> +                       pfn += MAX_ORDER_NR_PAGES) {
> +                       if (page_zone(pfn_to_page(pfn)) == zone)
> +                               break;
> +               }
> +               range->base = min(pfn, range->end);
> +       }
> +       /* Here, range-> base is in the zone if range->base != range->end */
> +       for (pfn = range->base;
> +            pfn < range->end;
> +            pfn += MAX_ORDER_NR_PAGES) {
> +               if (zone != page_zone(pfn_to_page(pfn))) {
> +                       pfn = pfn - MAX_ORDER_NR_PAGES;
> +                       break;
> +               }
> +       }
> +       range->end = min(pfn, range->end);
> +       return;
> +}
> +
> +/*
> + * This function is for finding a contiguous memory block which has length
> + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> + * and return the first page's pfn.
> + * This checks all pages in the returned range is free of Pg_LRU. To reduce
> + * the risk of false-positive testing, lru_add_drain_all() should be called
> + * before this function to reduce pages on pagevec for zones.
> + */
> +
> +static unsigned long find_contig_block(unsigned long base,
> +               unsigned long end, unsigned long pages,
> +               int align_order, struct zone *zone)
> +{
> +       unsigned long pfn, pos;
> +       struct page_range blockinfo;
> +       int ret;
> +
> +       VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> +       VM_BUG_ON(base & ((1 << align_order) - 1));
> +retry:
> +       blockinfo.base = base;
> +       blockinfo.end = end;
> +       blockinfo.pages = pages;
> +       blockinfo.align_order = align_order;
> +       blockinfo.align_mask = (1 << align_order) - 1;
> +       /*
> +        * At first, check physical page layout and skip memory holes.
> +        */
> +       ret = walk_system_ram_range(base, end - base, &blockinfo,
> +               __get_contig_block);
> +       if (!ret)
> +               return 0;
> +       /* check contiguous pages in a zone */
> +       __trim_zone(zone, &blockinfo);
> +
> +       /*
> +        * Ok, we found contiguous memory chunk of size. Isolate it.
> +        * We just search MAX_ORDER aligned range.
> +        */
> +       for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
> +            pfn += (1 << align_order)) {
> +               struct zone *z = page_zone(pfn_to_page(pfn));
> +
> +               spin_lock_irq(&z->lock);
> +               pos = pfn;
> +               /*
> +                * Check the range only contains free pages or LRU pages.
> +                */
> +               while (pos < pfn + pages) {
> +                       struct page *p;
> +
> +                       if (!pfn_valid_within(pos))
> +                               break;
> +                       p = pfn_to_page(pos);
> +                       if (PageReserved(p))
> +                               break;
> +                       if (!page_count(p)) {
> +                               if (!PageBuddy(p))
> +                                       pos++;
> +                               else if (PageBuddy(p)) {

just else is okay?

> +                                       int order = page_order(p);
> +                                       pos += (1 << order);
> +                               }
> +                       } else if (PageLRU(p)) {
> +                               pos++;
> +                       } else
> +                               break;
> +               }
> +               spin_unlock_irq(&z->lock);
> +               if ((pos == pfn + pages) &&
> +                       !start_isolate_page_range(pfn, pfn + pages))
> +                               return pfn;
> +               if (pos & ((1 << align_order) - 1))
> +                       pfn = ALIGN(pos, (1 << align_order));
> +               else
> +                       pfn = pos + (1 << align_order);

pfn has changed here, then why the for loop still need pfn += (1 <<
align_order))?
or maybe I missed something.

> +               cond_resched();
> +       }
> +
> +       /* failed */
> +       if (blockinfo.end + pages <= end) {
> +               /* Move base address and find the next block of RAM. */
> +               base = blockinfo.end;
> +               goto retry;
> +       }
> +       return 0;
> +}
>

-- 
Thanks,
--Bob
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 2/3] a help function for find physically contiguous block.
@ 2010-10-29  3:53     ` Bob Liu
  0 siblings, 0 replies; 46+ messages in thread
From: Bob Liu @ 2010-10-29  3:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 8178 bytes --]

On Tue, Oct 26, 2010 at 6:04 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Unlike memory hotplug, at an allocation of contigous memory range, address
> may not be a problem. IOW, if a requester of memory wants to allocate 100M of
> of contigous memory, placement of allocated memory may not be a problem.
> So, "finding a range of memory which seems to be MOVABLE" is required.
>
> This patch adds a functon to isolate a length of memory within [start, end).
> This function returns a pfn which is 1st page of isolated contigous chunk
> of given length within [start, end).
>
> After isolation, free memory within this area will never be allocated.
> But some pages will remain as "Used/LRU" pages. They should be dropped by
> page reclaim or migration.
>
> Changelog:
>  - zone is added to the argument.
>  - fixed a case that zones are not in linear.
>  - added zone->lock.
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/page_isolation.c |  148 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 148 insertions(+)
>
> Index: mmotm-1024/mm/page_isolation.c
> ===================================================================
> --- mmotm-1024.orig/mm/page_isolation.c
> +++ mmotm-1024/mm/page_isolation.c
> @@ -7,6 +7,7 @@
>  #include <linux/pageblock-flags.h>
>  #include <linux/memcontrol.h>
>  #include <linux/migrate.h>
> +#include <linux/memory_hotplug.h>
>  #include <linux/mm_inline.h>
>  #include "internal.h"
>
> @@ -250,3 +251,150 @@ int do_migrate_range(unsigned long start
>  out:
>        return ret;
>  }
> +
> +/*
> + * Functions for getting contiguous MOVABLE pages in a zone.
> + */
> +struct page_range {
> +       unsigned long base; /* Base address of searching contigouous block */
> +       unsigned long end;
> +       unsigned long pages;/* Length of contiguous block */
> +       int align_order;
> +       unsigned long align_mask;
> +};
> +
> +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> +{
> +       struct page_range *blockinfo = arg;
> +       unsigned long end;
> +
> +       end = pfn + nr_pages;
> +       pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> +       end = end & ~(MAX_ORDER_NR_PAGES - 1);
> +
> +       if (end < pfn)
> +               return 0;
> +       if (end - pfn >= blockinfo->pages) {
> +               blockinfo->base = pfn;
> +               blockinfo->end = end;
> +               return 1;
> +       }
> +       return 0;
> +}
> +
> +static void __trim_zone(struct zone *zone, struct page_range *range)
> +{
> +       unsigned long pfn;
> +       /*
> +        * skip pages which dones'nt under the zone.
> +        * There are some archs which zones are not in linear layout.
> +        */
> +       if (page_zone(pfn_to_page(range->base)) != zone) {
> +               for (pfn = range->base;
> +                       pfn < range->end;
> +                       pfn += MAX_ORDER_NR_PAGES) {
> +                       if (page_zone(pfn_to_page(pfn)) == zone)
> +                               break;
> +               }
> +               range->base = min(pfn, range->end);
> +       }
> +       /* Here, range-> base is in the zone if range->base != range->end */
> +       for (pfn = range->base;
> +            pfn < range->end;
> +            pfn += MAX_ORDER_NR_PAGES) {
> +               if (zone != page_zone(pfn_to_page(pfn))) {
> +                       pfn = pfn - MAX_ORDER_NR_PAGES;
> +                       break;
> +               }
> +       }
> +       range->end = min(pfn, range->end);
> +       return;
> +}
> +
> +/*
> + * This function is for finding a contiguous memory block which has length
> + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> + * and return the first page's pfn.
> + * This checks all pages in the returned range is free of Pg_LRU. To reduce
> + * the risk of false-positive testing, lru_add_drain_all() should be called
> + * before this function to reduce pages on pagevec for zones.
> + */
> +
> +static unsigned long find_contig_block(unsigned long base,
> +               unsigned long end, unsigned long pages,
> +               int align_order, struct zone *zone)
> +{
> +       unsigned long pfn, pos;
> +       struct page_range blockinfo;
> +       int ret;
> +
> +       VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> +       VM_BUG_ON(base & ((1 << align_order) - 1));
> +retry:
> +       blockinfo.base = base;
> +       blockinfo.end = end;
> +       blockinfo.pages = pages;
> +       blockinfo.align_order = align_order;
> +       blockinfo.align_mask = (1 << align_order) - 1;
> +       /*
> +        * At first, check physical page layout and skip memory holes.
> +        */
> +       ret = walk_system_ram_range(base, end - base, &blockinfo,
> +               __get_contig_block);
> +       if (!ret)
> +               return 0;
> +       /* check contiguous pages in a zone */
> +       __trim_zone(zone, &blockinfo);
> +
> +       /*
> +        * Ok, we found contiguous memory chunk of size. Isolate it.
> +        * We just search MAX_ORDER aligned range.
> +        */
> +       for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
> +            pfn += (1 << align_order)) {
> +               struct zone *z = page_zone(pfn_to_page(pfn));
> +
> +               spin_lock_irq(&z->lock);
> +               pos = pfn;
> +               /*
> +                * Check the range only contains free pages or LRU pages.
> +                */
> +               while (pos < pfn + pages) {
> +                       struct page *p;
> +
> +                       if (!pfn_valid_within(pos))
> +                               break;
> +                       p = pfn_to_page(pos);
> +                       if (PageReserved(p))
> +                               break;
> +                       if (!page_count(p)) {
> +                               if (!PageBuddy(p))
> +                                       pos++;
> +                               else if (PageBuddy(p)) {

just else is okay?

> +                                       int order = page_order(p);
> +                                       pos += (1 << order);
> +                               }
> +                       } else if (PageLRU(p)) {
> +                               pos++;
> +                       } else
> +                               break;
> +               }
> +               spin_unlock_irq(&z->lock);
> +               if ((pos == pfn + pages) &&
> +                       !start_isolate_page_range(pfn, pfn + pages))
> +                               return pfn;
> +               if (pos & ((1 << align_order) - 1))
> +                       pfn = ALIGN(pos, (1 << align_order));
> +               else
> +                       pfn = pos + (1 << align_order);

pfn has changed here, then why the for loop still need pfn += (1 <<
align_order))?
or maybe I missed something.

> +               cond_resched();
> +       }
> +
> +       /* failed */
> +       if (blockinfo.end + pages <= end) {
> +               /* Move base address and find the next block of RAM. */
> +               base = blockinfo.end;
> +               goto retry;
> +       }
> +       return 0;
> +}
>

-- 
Thanks,
--Bob
N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 3/3] a big contig memory allocator
  2010-10-26 10:08   ` KAMEZAWA Hiroyuki
@ 2010-10-29  3:55     ` Bob Liu
  -1 siblings, 0 replies; 46+ messages in thread
From: Bob Liu @ 2010-10-29  3:55 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

On Tue, Oct 26, 2010 at 6:08 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Add an function to allocate contiguous memory larger than MAX_ORDER.
> The main difference between usual page allocator is that this uses
> memory offline technique (Isolate pages and migrate remaining pages.).
>
> I think this is not 100% solution because we can't avoid fragmentation,
> but we have kernelcore= boot option and can create MOVABLE zone. That
> helps us to allow allocate a contiguous range on demand.
>
> The new function is
>
>  alloc_contig_pages(base, end, nr_pages, alignment)
>
> This function will allocate contiguous pages of nr_pages from the range
> [base, end). If [base, end) is bigger than nr_pages, some pfn which
> meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> it will be raised to be MAX_ORDER.
>
> __alloc_contig_pages() has much more arguments.
>
> Some drivers allocates contig pages by bootmem or hiding some memory
> from the kernel at boot. But if contig pages are necessary only in some
> situation, kernelcore= boot option and using page migration is a choice.
>
> Note: I'm not 100% sure __GFP_HARDWALL check is required or not..
>
>
> Changelog: 2010-10-26
>  - support gfp_t
>  - support zonelist/nodemask
>  - support [base, end)
>  - support alignment
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page-isolation.h |   15 ++
>  mm/page_alloc.c                |   29 ++++
>  mm/page_isolation.c            |  239 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 283 insertions(+)
>
> Index: mmotm-1024/mm/page_isolation.c
> ===================================================================
> --- mmotm-1024.orig/mm/page_isolation.c
> +++ mmotm-1024/mm/page_isolation.c
> @@ -5,6 +5,7 @@
>  #include <linux/mm.h>
>  #include <linux/page-isolation.h>
>  #include <linux/pageblock-flags.h>
> +#include <linux/swap.h>
>  #include <linux/memcontrol.h>
>  #include <linux/migrate.h>
>  #include <linux/memory_hotplug.h>
> @@ -398,3 +399,241 @@ retry:
>        }
>        return 0;
>  }
> +
> +/*
> + * Comparing user specified [user_start, user_end) with physical memory layout
> + * [phys_start, phys_end). If no intersection of length nr_pages, return 1.
> + * If there is an intersection, return 0 and fill range in [*start, *end)
> + */
> +static int
> +__calc_search_range(unsigned long user_start, unsigned long user_end,
> +               unsigned long nr_pages,
> +               unsigned long phys_start, unsigned long phys_end,
> +               unsigned long *start, unsigned long *end)
> +{
> +       if ((user_start >= phys_end) || (user_end <= phys_start))
> +               return 1;
> +       if (user_start <= phys_start) {
> +               *start = phys_start;
> +               *end = min(user_end, phys_end);
> +       } else {
> +               *start = user_start;
> +               *end = min(user_end, phys_end);
> +       }
> +       if (*end - *start < nr_pages)
> +               return 1;
> +       return 0;
> +}
> +
> +
> +/**
> + * __alloc_contig_pages - allocate a contiguous physical pages
> + * @base: the lowest pfn which caller wants.
> + * @end:  the highest pfn which caller wants.
> + * @nr_pages: the length of a chunk of pages to be allocated.
> + * @align_order: alignment of start address of returned chunk in order.
> + *   Returned' page's order will be aligned to (1 << align_order).If smaller
> + *   than MAX_ORDER, it's raised to MAX_ORDER.
> + * @node: allocate near memory to the node, If -1, current node is used.
> + * @gfpflag: used to specify what zone the memory should be from.
> + * @nodemask: allocate memory within the nodemask.
> + *
> + * Search a memory range [base, end) and allocates physically contiguous
> + * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
> + * be allocated
> + *
> + * This returns a page of the beginning of contiguous block. At failure, NULL
> + * is returned.
> + *
> + * Limitation: at allocation, nr_pages may be increased to be aligned to
> + * MAX_ORDER before searching a range. So, even if there is a enough chunk
> + * for nr_pages, it may not be able to be allocated. Extra tail pages of
> + * allocated chunk is returned to buddy allocator before returning the caller.
> + */
> +
> +#define MIGRATION_RETRY        (5)
> +struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
> +                       unsigned long nr_pages, int align_order,
> +                       int node, gfp_t gfpflag, nodemask_t *mask)
> +{
> +       unsigned long found, aligned_pages, start;
> +       struct page *ret = NULL;
> +       int migration_failed;
> +       bool no_search = false;
> +       unsigned long align_mask;
> +       struct zoneref *z;
> +       struct zone *zone;
> +       struct zonelist *zonelist;
> +       enum zone_type highzone_idx = gfp_zone(gfpflag);
> +       unsigned long zone_start, zone_end, rs, re, pos;
> +
> +       if (node == -1)
> +               node = numa_node_id();
> +
> +       /* check unsupported flags */
> +       if (gfpflag & __GFP_NORETRY)
> +               return NULL;
> +       if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)) !=
> +               (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL))
> +               return NULL;
> +
> +       if (gfpflag & __GFP_THISNODE)
> +               zonelist = &NODE_DATA(node)->node_zonelists[1];
> +       else
> +               zonelist = &NODE_DATA(node)->node_zonelists[0];
> +       /*
> +        * Base/nr_page/end should be aligned to MAX_ORDER
> +        */
> +       found = 0;
> +
> +       if (align_order < MAX_ORDER)
> +               align_order = MAX_ORDER;
> +
> +       align_mask = (1 << align_order) - 1;
> +       if (end - base == nr_pages)
> +               no_search = true;

no_search is not used ?

> +       /*
> +        * We allocates MAX_ORDER aligned pages and cut tail pages later.
> +        */
> +       aligned_pages = ALIGN(nr_pages, (1 << MAX_ORDER));
> +       /*
> +        * If end - base == nr_pages, we can't search range. base must be
> +        * aligned.
> +        */
> +       if ((end - base == nr_pages) && (base & align_mask))
> +               return NULL;
> +
> +       base = ALIGN(base, (1 << align_order));
> +       if ((end <= base) || (end - base < aligned_pages))
> +               return NULL;
> +
> +       /*
> +        * searching contig memory range within [pos, end).
> +        * pos is updated at migration failure to find next chunk in zone.
> +        * pos is reset to the base at searching next zone.
> +        * (see for_each_zone_zonelist_nodemask in mmzone.h)
> +        *
> +        * Note: we cannot assume zones/nodes are in linear memory layout.
> +        */
> +       z = first_zones_zonelist(zonelist, highzone_idx, mask, &zone);
> +       pos = base;
> +retry:
> +       if (!zone)
> +               return NULL;
> +
> +       zone_start = ALIGN(zone->zone_start_pfn, 1 << align_order);
> +       zone_end = zone->zone_start_pfn + zone->spanned_pages;
> +
> +       /* check [pos, end) is in this zone. */
> +       if ((pos >= end) ||
> +            (__calc_search_range(pos, end, aligned_pages,
> +                       zone_start, zone_end, &rs, &re))) {
> +next_zone:
> +               /* go to the next zone */
> +               z = next_zones_zonelist(++z, highzone_idx, mask, &zone);
> +               /* reset the pos */
> +               pos = base;
> +               goto retry;
> +       }
> +       /* [pos, end) is trimmed to [rs, re) in this zone. */
> +       pos = rs;
> +
> +       found = find_contig_block(rs, re, aligned_pages, align_order, zone);
> +       if (!found)
> +               goto next_zone;
> +
> +       /*
> +        * OK, here, we have contiguous pageblock marked as "isolated"
> +        * try migration.
> +        */
> +       drain_all_pages();
> +       lru_add_drain_all();
> +
> +       /*
> +        * scan_lru_pages() finds the next PG_lru page in the range
> +        * scan_lru_pages() returns 0 when it reaches the end.
> +        */
> +       migration_failed = 0;
> +       rs = found;
> +       re = found + aligned_pages;
> +       for (rs = scan_lru_pages(rs, re);
> +            rs && rs < re;
> +            rs = scan_lru_pages(rs, re)) {
> +               if (do_migrate_range(rs, re)) {
> +                       /* it's better to try another block ? */
> +                       if (++migration_failed >= MIGRATION_RETRY)
> +                               break;
> +                       /* take a rest and synchronize LRU etc. */
> +                       drain_all_pages();
> +                       lru_add_drain_all();
> +               } else /* reset migration_failure counter */
> +                       migration_failed = 0;
> +       }
> +
> +       if (!migration_failed) {
> +               drain_all_pages();
> +               lru_add_drain_all();
> +       }
> +       /* Check all pages are isolated */
> +       if (test_pages_isolated(found, found + aligned_pages)) {
> +               undo_isolate_page_range(found, aligned_pages);
> +               /*
> +                * We failed at [found...found+aligned_pages) migration.
> +                * "rs" is the last pfn scan_lru_pages() found that the page
> +                * is LRU page. Update pos and try next chunk.
> +                */
> +               pos = ALIGN(rs + 1, (1 << align_order));
> +               goto retry; /* goto next chunk */
> +       }
> +       /*
> +        * OK, here, [found...found+pages) memory are isolated.
> +        * All pages in the range will be moved into the list with
> +        * page_count(page)=1.
> +        */
> +       ret = pfn_to_page(found);
> +       alloc_contig_freed_pages(found, found + aligned_pages, gfpflag);
> +       /* unset ISOLATE */
> +       undo_isolate_page_range(found, aligned_pages);
> +       /* Free unnecessary pages in tail */
> +       for (start = found + nr_pages; start < found + aligned_pages; start++)
> +               __free_page(pfn_to_page(start));
> +       return ret;
> +
> +}
> +EXPORT_SYMBOL_GPL(__alloc_contig_pages);
> +
> +void free_contig_pages(struct page *page, int nr_pages)
> +{
> +       int i;
> +       for (i = 0; i < nr_pages; i++)
> +               __free_page(page + i);
> +}
> +EXPORT_SYMBOL_GPL(free_contig_pages);
> +
> +/*
> + * Allocated pages will not be MOVABLE but MOVABLE zone is a suitable
> + * for allocating big chunk. So, using ZONE_MOVABLE is a default.
> + */
> +
> +struct page *alloc_contig_pages(unsigned long base, unsigned long end,
> +                       unsigned long nr_pages, int align_order)
> +{
> +       return __alloc_contig_pages(base, end, nr_pages, align_order, -1,
> +                               GFP_KERNEL | __GFP_MOVABLE, NULL);
> +}
> +EXPORT_SYMBOL_GPL(alloc_contig_pages);
> +
> +struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order)
> +{
> +       return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, -1,
> +                               GFP_KERNEL | __GFP_MOVABLE, NULL);
> +}
> +EXPORT_SYMBOL_GPL(alloc_contig_pages_host);
> +
> +struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
> +                               int align_order)
> +{
> +       return __alloc_contig_pages(0, max_pfn, nr_pages, align_order, nid,
> +                       GFP_KERNEL | __GFP_THISNODE | __GFP_MOVABLE, NULL);
> +}
> +EXPORT_SYMBOL_GPL(alloc_contig_pages_node);
> Index: mmotm-1024/include/linux/page-isolation.h
> ===================================================================
> --- mmotm-1024.orig/include/linux/page-isolation.h
> +++ mmotm-1024/include/linux/page-isolation.h
> @@ -32,6 +32,8 @@ test_pages_isolated(unsigned long start_
>  */
>  extern int set_migratetype_isolate(struct page *page);
>  extern void unset_migratetype_isolate(struct page *page);
> +extern void alloc_contig_freed_pages(unsigned long pfn,
> +               unsigned long pages, gfp_t flag);
>
>  /*
>  * For migration.
> @@ -41,4 +43,17 @@ int test_pages_in_a_zone(unsigned long s
>  unsigned long scan_lru_pages(unsigned long start, unsigned long end);
>  int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
>
> +/*
> + * For large alloc.
> + */
> +struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
> +                               unsigned long nr_pages, int align_order,
> +                               int node, gfp_t flag, nodemask_t *mask);
> +struct page *alloc_contig_pages(unsigned long base, unsigned long end,
> +                               unsigned long nr_pages, int align_order);
> +struct page *alloc_contig_pages_host(unsigned long nr_pages, int align_order);
> +struct page *alloc_contig_pages_node(int nid, unsigned long nr_pages,
> +               int align_order);
> +void free_contig_pages(struct page *page, int nr_pages);
> +
>  #endif
> Index: mmotm-1024/mm/page_alloc.c
> ===================================================================
> --- mmotm-1024.orig/mm/page_alloc.c
> +++ mmotm-1024/mm/page_alloc.c
> @@ -5430,6 +5430,35 @@ out:
>        spin_unlock_irqrestore(&zone->lock, flags);
>  }
>
> +
> +void alloc_contig_freed_pages(unsigned long pfn,  unsigned long end, gfp_t flag)
> +{
> +       struct page *page;
> +       struct zone *zone;
> +       int order;
> +       unsigned long start = pfn;
> +
> +       zone = page_zone(pfn_to_page(pfn));
> +       spin_lock_irq(&zone->lock);
> +       while (pfn < end) {
> +               VM_BUG_ON(!pfn_valid(pfn));
> +               page = pfn_to_page(pfn);
> +               VM_BUG_ON(page_count(page));
> +               VM_BUG_ON(!PageBuddy(page));
> +               list_del(&page->lru);
> +               order = page_order(page);
> +               zone->free_area[order].nr_free--;
> +               rmv_page_order(page);
> +               __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
> +               pfn += 1 << order;
> +       }
> +       spin_unlock_irq(&zone->lock);
> +
> +       /*After this, pages in the range can be freed one be one */
> +       for (pfn = start; pfn < end; pfn++)
> +               prep_new_page(pfn_to_page(pfn), 0, flag);
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>  * All pages in the range must be isolated before calling this.
>
-- 
Thanks,
--Bob

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 3/3] a big contig memory allocator
@ 2010-10-29  3:55     ` Bob Liu
  0 siblings, 0 replies; 46+ messages in thread
From: Bob Liu @ 2010-10-29  3:55 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

On Tue, Oct 26, 2010 at 6:08 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Add an function to allocate contiguous memory larger than MAX_ORDER.
> The main difference between usual page allocator is that this uses
> memory offline technique (Isolate pages and migrate remaining pages.).
>
> I think this is not 100% solution because we can't avoid fragmentation,
> but we have kernelcore= boot option and can create MOVABLE zone. That
> helps us to allow allocate a contiguous range on demand.
>
> The new function is
>
>  alloc_contig_pages(base, end, nr_pages, alignment)
>
> This function will allocate contiguous pages of nr_pages from the range
> [base, end). If [base, end) is bigger than nr_pages, some pfn which
> meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> it will be raised to be MAX_ORDER.
>
> __alloc_contig_pages() has much more arguments.
>
> Some drivers allocates contig pages by bootmem or hiding some memory
> from the kernel at boot. But if contig pages are necessary only in some
> situation, kernelcore= boot option and using page migration is a choice

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 2/3] a help function for find physically contiguous block.
  2010-10-29  3:53     ` Bob Liu
@ 2010-10-29  4:00       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29  4:00 UTC (permalink / raw)
  To: Bob Liu
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

Thank you for review.

On Fri, 29 Oct 2010 11:53:18 +0800
Bob Liu <lliubbo@gmail.com> wrote:

> On Tue, Oct 26, 2010 at 6:04 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Unlike memory hotplug, at an allocation of contigous memory range, address
> > may not be a problem. IOW, if a requester of memory wants to allocate 100M of
> > of contigous memory, placement of allocated memory may not be a problem.
> > So, "finding a range of memory which seems to be MOVABLE" is required.
> >
> > This patch adds a functon to isolate a length of memory within [start, end).
> > This function returns a pfn which is 1st page of isolated contigous chunk
> > of given length within [start, end).
> >
> > After isolation, free memory within this area will never be allocated.
> > But some pages will remain as "Used/LRU" pages. They should be dropped by
> > page reclaim or migration.
> >
> > Changelog:
> >  - zone is added to the argument.
> >  - fixed a case that zones are not in linear.
> >  - added zone->lock.
> >
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/page_isolation.c |  148 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 148 insertions(+)
> >
> > Index: mmotm-1024/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1024.orig/mm/page_isolation.c
> > +++ mmotm-1024/mm/page_isolation.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/pageblock-flags.h>
> >  #include <linux/memcontrol.h>
> >  #include <linux/migrate.h>
> > +#include <linux/memory_hotplug.h>
> >  #include <linux/mm_inline.h>
> >  #include "internal.h"
> >
> > @@ -250,3 +251,150 @@ int do_migrate_range(unsigned long start
> >  out:
> >        return ret;
> >  }
> > +
> > +/*
> > + * Functions for getting contiguous MOVABLE pages in a zone.
> > + */
> > +struct page_range {
> > +       unsigned long base; /* Base address of searching contigouous block */
> > +       unsigned long end;
> > +       unsigned long pages;/* Length of contiguous block */
> > +       int align_order;
> > +       unsigned long align_mask;
> > +};
> > +
> > +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> > +{
> > +       struct page_range *blockinfo = arg;
> > +       unsigned long end;
> > +
> > +       end = pfn + nr_pages;
> > +       pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> > +       end = end & ~(MAX_ORDER_NR_PAGES - 1);
> > +
> > +       if (end < pfn)
> > +               return 0;
> > +       if (end - pfn >= blockinfo->pages) {
> > +               blockinfo->base = pfn;
> > +               blockinfo->end = end;
> > +               return 1;
> > +       }
> > +       return 0;
> > +}
> > +
> > +static void __trim_zone(struct zone *zone, struct page_range *range)
> > +{
> > +       unsigned long pfn;
> > +       /*
> > +        * skip pages which dones'nt under the zone.
> > +        * There are some archs which zones are not in linear layout.
> > +        */
> > +       if (page_zone(pfn_to_page(range->base)) != zone) {
> > +               for (pfn = range->base;
> > +                       pfn < range->end;
> > +                       pfn += MAX_ORDER_NR_PAGES) {
> > +                       if (page_zone(pfn_to_page(pfn)) == zone)
> > +                               break;
> > +               }
> > +               range->base = min(pfn, range->end);
> > +       }
> > +       /* Here, range-> base is in the zone if range->base != range->end */
> > +       for (pfn = range->base;
> > +            pfn < range->end;
> > +            pfn += MAX_ORDER_NR_PAGES) {
> > +               if (zone != page_zone(pfn_to_page(pfn))) {
> > +                       pfn = pfn - MAX_ORDER_NR_PAGES;
> > +                       break;
> > +               }
> > +       }
> > +       range->end = min(pfn, range->end);
> > +       return;
> > +}
> > +
> > +/*
> > + * This function is for finding a contiguous memory block which has length
> > + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> > + * and return the first page's pfn.
> > + * This checks all pages in the returned range is free of Pg_LRU. To reduce
> > + * the risk of false-positive testing, lru_add_drain_all() should be called
> > + * before this function to reduce pages on pagevec for zones.
> > + */
> > +
> > +static unsigned long find_contig_block(unsigned long base,
> > +               unsigned long end, unsigned long pages,
> > +               int align_order, struct zone *zone)
> > +{
> > +       unsigned long pfn, pos;
> > +       struct page_range blockinfo;
> > +       int ret;
> > +
> > +       VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> > +       VM_BUG_ON(base & ((1 << align_order) - 1));
> > +retry:
> > +       blockinfo.base = base;
> > +       blockinfo.end = end;
> > +       blockinfo.pages = pages;
> > +       blockinfo.align_order = align_order;
> > +       blockinfo.align_mask = (1 << align_order) - 1;
> > +       /*
> > +        * At first, check physical page layout and skip memory holes.
> > +        */
> > +       ret = walk_system_ram_range(base, end - base, &blockinfo,
> > +               __get_contig_block);
> > +       if (!ret)
> > +               return 0;
> > +       /* check contiguous pages in a zone */
> > +       __trim_zone(zone, &blockinfo);
> > +
> > +       /*
> > +        * Ok, we found contiguous memory chunk of size. Isolate it.
> > +        * We just search MAX_ORDER aligned range.
> > +        */
> > +       for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
> > +            pfn += (1 << align_order)) {
> > +               struct zone *z = page_zone(pfn_to_page(pfn));
> > +
> > +               spin_lock_irq(&z->lock);
> > +               pos = pfn;
> > +               /*
> > +                * Check the range only contains free pages or LRU pages.
> > +                */
> > +               while (pos < pfn + pages) {
> > +                       struct page *p;
> > +
> > +                       if (!pfn_valid_within(pos))
> > +                               break;
> > +                       p = pfn_to_page(pos);
> > +                       if (PageReserved(p))
> > +                               break;
> > +                       if (!page_count(p)) {
> > +                               if (!PageBuddy(p))
> > +                                       pos++;
> > +                               else if (PageBuddy(p)) {
> 
> just else is okay?
> 
yes.


> > +                                       int order = page_order(p);
> > +                                       pos += (1 << order);
> > +                               }
> > +                       } else if (PageLRU(p)) {
> > +                               pos++;
> > +                       } else
> > +                               break;
> > +               }
> > +               spin_unlock_irq(&z->lock);
> > +               if ((pos == pfn + pages) &&
> > +                       !start_isolate_page_range(pfn, pfn + pages))
> > +                               return pfn;
> > +               if (pos & ((1 << align_order) - 1))
> > +                       pfn = ALIGN(pos, (1 << align_order));
> > +               else
> > +                       pfn = pos + (1 << align_order);
> 
> pfn has changed here, then why the for loop still need pfn += (1 <<
> align_order))?
> or maybe I missed something.
> 
you'r right. I'll fix.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 2/3] a help function for find physically contiguous block.
@ 2010-10-29  4:00       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29  4:00 UTC (permalink / raw)
  To: Bob Liu
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

Thank you for review.

On Fri, 29 Oct 2010 11:53:18 +0800
Bob Liu <lliubbo@gmail.com> wrote:

> On Tue, Oct 26, 2010 at 6:04 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Unlike memory hotplug, at an allocation of contigous memory range, address
> > may not be a problem. IOW, if a requester of memory wants to allocate 100M of
> > of contigous memory, placement of allocated memory may not be a problem.
> > So, "finding a range of memory which seems to be MOVABLE" is required.
> >
> > This patch adds a functon to isolate a length of memory within [start, end).
> > This function returns a pfn which is 1st page of isolated contigous chunk
> > of given length within [start, end).
> >
> > After isolation, free memory within this area will never be allocated.
> > But some pages will remain as "Used/LRU" pages. They should be dropped by
> > page reclaim or migration.
> >
> > Changelog:
> > A - zone is added to the argument.
> > A - fixed a case that zones are not in linear.
> > A - added zone->lock.
> >
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > A mm/page_isolation.c | A 148 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > A 1 file changed, 148 insertions(+)
> >
> > Index: mmotm-1024/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1024.orig/mm/page_isolation.c
> > +++ mmotm-1024/mm/page_isolation.c
> > @@ -7,6 +7,7 @@
> > A #include <linux/pageblock-flags.h>
> > A #include <linux/memcontrol.h>
> > A #include <linux/migrate.h>
> > +#include <linux/memory_hotplug.h>
> > A #include <linux/mm_inline.h>
> > A #include "internal.h"
> >
> > @@ -250,3 +251,150 @@ int do_migrate_range(unsigned long start
> > A out:
> > A  A  A  A return ret;
> > A }
> > +
> > +/*
> > + * Functions for getting contiguous MOVABLE pages in a zone.
> > + */
> > +struct page_range {
> > + A  A  A  unsigned long base; /* Base address of searching contigouous block */
> > + A  A  A  unsigned long end;
> > + A  A  A  unsigned long pages;/* Length of contiguous block */
> > + A  A  A  int align_order;
> > + A  A  A  unsigned long align_mask;
> > +};
> > +
> > +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
> > +{
> > + A  A  A  struct page_range *blockinfo = arg;
> > + A  A  A  unsigned long end;
> > +
> > + A  A  A  end = pfn + nr_pages;
> > + A  A  A  pfn = ALIGN(pfn, 1 << blockinfo->align_order);
> > + A  A  A  end = end & ~(MAX_ORDER_NR_PAGES - 1);
> > +
> > + A  A  A  if (end < pfn)
> > + A  A  A  A  A  A  A  return 0;
> > + A  A  A  if (end - pfn >= blockinfo->pages) {
> > + A  A  A  A  A  A  A  blockinfo->base = pfn;
> > + A  A  A  A  A  A  A  blockinfo->end = end;
> > + A  A  A  A  A  A  A  return 1;
> > + A  A  A  }
> > + A  A  A  return 0;
> > +}
> > +
> > +static void __trim_zone(struct zone *zone, struct page_range *range)
> > +{
> > + A  A  A  unsigned long pfn;
> > + A  A  A  /*
> > + A  A  A  A * skip pages which dones'nt under the zone.
> > + A  A  A  A * There are some archs which zones are not in linear layout.
> > + A  A  A  A */
> > + A  A  A  if (page_zone(pfn_to_page(range->base)) != zone) {
> > + A  A  A  A  A  A  A  for (pfn = range->base;
> > + A  A  A  A  A  A  A  A  A  A  A  pfn < range->end;
> > + A  A  A  A  A  A  A  A  A  A  A  pfn += MAX_ORDER_NR_PAGES) {
> > + A  A  A  A  A  A  A  A  A  A  A  if (page_zone(pfn_to_page(pfn)) == zone)
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  break;
> > + A  A  A  A  A  A  A  }
> > + A  A  A  A  A  A  A  range->base = min(pfn, range->end);
> > + A  A  A  }
> > + A  A  A  /* Here, range-> base is in the zone if range->base != range->end */
> > + A  A  A  for (pfn = range->base;
> > + A  A  A  A  A  A pfn < range->end;
> > + A  A  A  A  A  A pfn += MAX_ORDER_NR_PAGES) {
> > + A  A  A  A  A  A  A  if (zone != page_zone(pfn_to_page(pfn))) {
> > + A  A  A  A  A  A  A  A  A  A  A  pfn = pfn - MAX_ORDER_NR_PAGES;
> > + A  A  A  A  A  A  A  A  A  A  A  break;
> > + A  A  A  A  A  A  A  }
> > + A  A  A  }
> > + A  A  A  range->end = min(pfn, range->end);
> > + A  A  A  return;
> > +}
> > +
> > +/*
> > + * This function is for finding a contiguous memory block which has length
> > + * of pages and MOVABLE. If it finds, make the range of pages as ISOLATED
> > + * and return the first page's pfn.
> > + * This checks all pages in the returned range is free of Pg_LRU. To reduce
> > + * the risk of false-positive testing, lru_add_drain_all() should be called
> > + * before this function to reduce pages on pagevec for zones.
> > + */
> > +
> > +static unsigned long find_contig_block(unsigned long base,
> > + A  A  A  A  A  A  A  unsigned long end, unsigned long pages,
> > + A  A  A  A  A  A  A  int align_order, struct zone *zone)
> > +{
> > + A  A  A  unsigned long pfn, pos;
> > + A  A  A  struct page_range blockinfo;
> > + A  A  A  int ret;
> > +
> > + A  A  A  VM_BUG_ON(pages & (MAX_ORDER_NR_PAGES - 1));
> > + A  A  A  VM_BUG_ON(base & ((1 << align_order) - 1));
> > +retry:
> > + A  A  A  blockinfo.base = base;
> > + A  A  A  blockinfo.end = end;
> > + A  A  A  blockinfo.pages = pages;
> > + A  A  A  blockinfo.align_order = align_order;
> > + A  A  A  blockinfo.align_mask = (1 << align_order) - 1;
> > + A  A  A  /*
> > + A  A  A  A * At first, check physical page layout and skip memory holes.
> > + A  A  A  A */
> > + A  A  A  ret = walk_system_ram_range(base, end - base, &blockinfo,
> > + A  A  A  A  A  A  A  __get_contig_block);
> > + A  A  A  if (!ret)
> > + A  A  A  A  A  A  A  return 0;
> > + A  A  A  /* check contiguous pages in a zone */
> > + A  A  A  __trim_zone(zone, &blockinfo);
> > +
> > + A  A  A  /*
> > + A  A  A  A * Ok, we found contiguous memory chunk of size. Isolate it.
> > + A  A  A  A * We just search MAX_ORDER aligned range.
> > + A  A  A  A */
> > + A  A  A  for (pfn = blockinfo.base; pfn + pages <= blockinfo.end;
> > + A  A  A  A  A  A pfn += (1 << align_order)) {
> > + A  A  A  A  A  A  A  struct zone *z = page_zone(pfn_to_page(pfn));
> > +
> > + A  A  A  A  A  A  A  spin_lock_irq(&z->lock);
> > + A  A  A  A  A  A  A  pos = pfn;
> > + A  A  A  A  A  A  A  /*
> > + A  A  A  A  A  A  A  A * Check the range only contains free pages or LRU pages.
> > + A  A  A  A  A  A  A  A */
> > + A  A  A  A  A  A  A  while (pos < pfn + pages) {
> > + A  A  A  A  A  A  A  A  A  A  A  struct page *p;
> > +
> > + A  A  A  A  A  A  A  A  A  A  A  if (!pfn_valid_within(pos))
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  break;
> > + A  A  A  A  A  A  A  A  A  A  A  p = pfn_to_page(pos);
> > + A  A  A  A  A  A  A  A  A  A  A  if (PageReserved(p))
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  break;
> > + A  A  A  A  A  A  A  A  A  A  A  if (!page_count(p)) {
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  if (!PageBuddy(p))
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  pos++;
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  else if (PageBuddy(p)) {
> 
> just else is okay?
> 
yes.


> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  int order = page_order(p);
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  pos += (1 << order);
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  }
> > + A  A  A  A  A  A  A  A  A  A  A  } else if (PageLRU(p)) {
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  pos++;
> > + A  A  A  A  A  A  A  A  A  A  A  } else
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  break;
> > + A  A  A  A  A  A  A  }
> > + A  A  A  A  A  A  A  spin_unlock_irq(&z->lock);
> > + A  A  A  A  A  A  A  if ((pos == pfn + pages) &&
> > + A  A  A  A  A  A  A  A  A  A  A  !start_isolate_page_range(pfn, pfn + pages))
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  return pfn;
> > + A  A  A  A  A  A  A  if (pos & ((1 << align_order) - 1))
> > + A  A  A  A  A  A  A  A  A  A  A  pfn = ALIGN(pos, (1 << align_order));
> > + A  A  A  A  A  A  A  else
> > + A  A  A  A  A  A  A  A  A  A  A  pfn = pos + (1 << align_order);
> 
> pfn has changed here, then why the for loop still need pfn += (1 <<
> align_order))?
> or maybe I missed something.
> 
you'r right. I'll fix.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 3/3] a big contig memory allocator
  2010-10-29  3:55     ` Bob Liu
@ 2010-10-29  4:02       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29  4:02 UTC (permalink / raw)
  To: Bob Liu
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

On Fri, 29 Oct 2010 11:55:10 +0800
Bob Liu <lliubbo@gmail.com> wrote:

> On Tue, Oct 26, 2010 at 6:08 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Add an function to allocate contiguous memory larger than MAX_ORDER.
> > The main difference between usual page allocator is that this uses
> > memory offline technique (Isolate pages and migrate remaining pages.).
> >
> > I think this is not 100% solution because we can't avoid fragmentation,
> > but we have kernelcore= boot option and can create MOVABLE zone. That
> > helps us to allow allocate a contiguous range on demand.
> >
> > The new function is
> >
> >  alloc_contig_pages(base, end, nr_pages, alignment)
> >
> > This function will allocate contiguous pages of nr_pages from the range
> > [base, end). If [base, end) is bigger than nr_pages, some pfn which
> > meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> > it will be raised to be MAX_ORDER.
> >
> > __alloc_contig_pages() has much more arguments.
> >
> > Some drivers allocates contig pages by bootmem or hiding some memory
> > from the kernel at boot. But if contig pages are necessary only in some
> > situation, kernelcore= boot option and using page migration is a choice.
> >
> > Note: I'm not 100% sure __GFP_HARDWALL check is required or not..
> >
> >
> > Changelog: 2010-10-26
> >  - support gfp_t
> >  - support zonelist/nodemask
> >  - support [base, end)
> >  - support alignment
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  include/linux/page-isolation.h |   15 ++
> >  mm/page_alloc.c                |   29 ++++
> >  mm/page_isolation.c            |  239 +++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 283 insertions(+)
> >
> > Index: mmotm-1024/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1024.orig/mm/page_isolation.c
> > +++ mmotm-1024/mm/page_isolation.c
> > @@ -5,6 +5,7 @@
> >  #include <linux/mm.h>
> >  #include <linux/page-isolation.h>
> >  #include <linux/pageblock-flags.h>
> > +#include <linux/swap.h>
> >  #include <linux/memcontrol.h>
> >  #include <linux/migrate.h>
> >  #include <linux/memory_hotplug.h>
> > @@ -398,3 +399,241 @@ retry:
> >        }
> >        return 0;
> >  }
> > +
> > +/*
> > + * Comparing user specified [user_start, user_end) with physical memory layout
> > + * [phys_start, phys_end). If no intersection of length nr_pages, return 1.
> > + * If there is an intersection, return 0 and fill range in [*start, *end)
> > + */
> > +static int
> > +__calc_search_range(unsigned long user_start, unsigned long user_end,
> > +               unsigned long nr_pages,
> > +               unsigned long phys_start, unsigned long phys_end,
> > +               unsigned long *start, unsigned long *end)
> > +{
> > +       if ((user_start >= phys_end) || (user_end <= phys_start))
> > +               return 1;
> > +       if (user_start <= phys_start) {
> > +               *start = phys_start;
> > +               *end = min(user_end, phys_end);
> > +       } else {
> > +               *start = user_start;
> > +               *end = min(user_end, phys_end);
> > +       }
> > +       if (*end - *start < nr_pages)
> > +               return 1;
> > +       return 0;
> > +}
> > +
> > +
> > +/**
> > + * __alloc_contig_pages - allocate a contiguous physical pages
> > + * @base: the lowest pfn which caller wants.
> > + * @end:  the highest pfn which caller wants.
> > + * @nr_pages: the length of a chunk of pages to be allocated.
> > + * @align_order: alignment of start address of returned chunk in order.
> > + *   Returned' page's order will be aligned to (1 << align_order).If smaller
> > + *   than MAX_ORDER, it's raised to MAX_ORDER.
> > + * @node: allocate near memory to the node, If -1, current node is used.
> > + * @gfpflag: used to specify what zone the memory should be from.
> > + * @nodemask: allocate memory within the nodemask.
> > + *
> > + * Search a memory range [base, end) and allocates physically contiguous
> > + * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
> > + * be allocated
> > + *
> > + * This returns a page of the beginning of contiguous block. At failure, NULL
> > + * is returned.
> > + *
> > + * Limitation: at allocation, nr_pages may be increased to be aligned to
> > + * MAX_ORDER before searching a range. So, even if there is a enough chunk
> > + * for nr_pages, it may not be able to be allocated. Extra tail pages of
> > + * allocated chunk is returned to buddy allocator before returning the caller.
> > + */
> > +
> > +#define MIGRATION_RETRY        (5)
> > +struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
> > +                       unsigned long nr_pages, int align_order,
> > +                       int node, gfp_t gfpflag, nodemask_t *mask)
> > +{
> > +       unsigned long found, aligned_pages, start;
> > +       struct page *ret = NULL;
> > +       int migration_failed;
> > +       bool no_search = false;
> > +       unsigned long align_mask;
> > +       struct zoneref *z;
> > +       struct zone *zone;
> > +       struct zonelist *zonelist;
> > +       enum zone_type highzone_idx = gfp_zone(gfpflag);
> > +       unsigned long zone_start, zone_end, rs, re, pos;
> > +
> > +       if (node == -1)
> > +               node = numa_node_id();
> > +
> > +       /* check unsupported flags */
> > +       if (gfpflag & __GFP_NORETRY)
> > +               return NULL;
> > +       if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)) !=
> > +               (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL))
> > +               return NULL;
> > +
> > +       if (gfpflag & __GFP_THISNODE)
> > +               zonelist = &NODE_DATA(node)->node_zonelists[1];
> > +       else
> > +               zonelist = &NODE_DATA(node)->node_zonelists[0];
> > +       /*
> > +        * Base/nr_page/end should be aligned to MAX_ORDER
> > +        */
> > +       found = 0;
> > +
> > +       if (align_order < MAX_ORDER)
> > +               align_order = MAX_ORDER;
> > +
> > +       align_mask = (1 << align_order) - 1;
> > +       if (end - base == nr_pages)
> > +               no_search = true;
> 
> no_search is not used ?
> 
Ah, yes. I wanted to remove this and I missed this one.
But I have to do check again whether no_search check is required or not..

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 3/3] a big contig memory allocator
@ 2010-10-29  4:02       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29  4:02 UTC (permalink / raw)
  To: Bob Liu
  Cc: linux-mm, linux-kernel, minchan.kim, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras

On Fri, 29 Oct 2010 11:55:10 +0800
Bob Liu <lliubbo@gmail.com> wrote:

> On Tue, Oct 26, 2010 at 6:08 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Add an function to allocate contiguous memory larger than MAX_ORDER.
> > The main difference between usual page allocator is that this uses
> > memory offline technique (Isolate pages and migrate remaining pages.).
> >
> > I think this is not 100% solution because we can't avoid fragmentation,
> > but we have kernelcore= boot option and can create MOVABLE zone. That
> > helps us to allow allocate a contiguous range on demand.
> >
> > The new function is
> >
> > A alloc_contig_pages(base, end, nr_pages, alignment)
> >
> > This function will allocate contiguous pages of nr_pages from the range
> > [base, end). If [base, end) is bigger than nr_pages, some pfn which
> > meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> > it will be raised to be MAX_ORDER.
> >
> > __alloc_contig_pages() has much more arguments.
> >
> > Some drivers allocates contig pages by bootmem or hiding some memory
> > from the kernel at boot. But if contig pages are necessary only in some
> > situation, kernelcore= boot option and using page migration is a choice.
> >
> > Note: I'm not 100% sure __GFP_HARDWALL check is required or not..
> >
> >
> > Changelog: 2010-10-26
> > A - support gfp_t
> > A - support zonelist/nodemask
> > A - support [base, end)
> > A - support alignment
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > A include/linux/page-isolation.h | A  15 ++
> > A mm/page_alloc.c A  A  A  A  A  A  A  A | A  29 ++++
> > A mm/page_isolation.c A  A  A  A  A  A | A 239 +++++++++++++++++++++++++++++++++++++++++
> > A 3 files changed, 283 insertions(+)
> >
> > Index: mmotm-1024/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1024.orig/mm/page_isolation.c
> > +++ mmotm-1024/mm/page_isolation.c
> > @@ -5,6 +5,7 @@
> > A #include <linux/mm.h>
> > A #include <linux/page-isolation.h>
> > A #include <linux/pageblock-flags.h>
> > +#include <linux/swap.h>
> > A #include <linux/memcontrol.h>
> > A #include <linux/migrate.h>
> > A #include <linux/memory_hotplug.h>
> > @@ -398,3 +399,241 @@ retry:
> > A  A  A  A }
> > A  A  A  A return 0;
> > A }
> > +
> > +/*
> > + * Comparing user specified [user_start, user_end) with physical memory layout
> > + * [phys_start, phys_end). If no intersection of length nr_pages, return 1.
> > + * If there is an intersection, return 0 and fill range in [*start, *end)
> > + */
> > +static int
> > +__calc_search_range(unsigned long user_start, unsigned long user_end,
> > + A  A  A  A  A  A  A  unsigned long nr_pages,
> > + A  A  A  A  A  A  A  unsigned long phys_start, unsigned long phys_end,
> > + A  A  A  A  A  A  A  unsigned long *start, unsigned long *end)
> > +{
> > + A  A  A  if ((user_start >= phys_end) || (user_end <= phys_start))
> > + A  A  A  A  A  A  A  return 1;
> > + A  A  A  if (user_start <= phys_start) {
> > + A  A  A  A  A  A  A  *start = phys_start;
> > + A  A  A  A  A  A  A  *end = min(user_end, phys_end);
> > + A  A  A  } else {
> > + A  A  A  A  A  A  A  *start = user_start;
> > + A  A  A  A  A  A  A  *end = min(user_end, phys_end);
> > + A  A  A  }
> > + A  A  A  if (*end - *start < nr_pages)
> > + A  A  A  A  A  A  A  return 1;
> > + A  A  A  return 0;
> > +}
> > +
> > +
> > +/**
> > + * __alloc_contig_pages - allocate a contiguous physical pages
> > + * @base: the lowest pfn which caller wants.
> > + * @end: A the highest pfn which caller wants.
> > + * @nr_pages: the length of a chunk of pages to be allocated.
> > + * @align_order: alignment of start address of returned chunk in order.
> > + * A  Returned' page's order will be aligned to (1 << align_order).If smaller
> > + * A  than MAX_ORDER, it's raised to MAX_ORDER.
> > + * @node: allocate near memory to the node, If -1, current node is used.
> > + * @gfpflag: used to specify what zone the memory should be from.
> > + * @nodemask: allocate memory within the nodemask.
> > + *
> > + * Search a memory range [base, end) and allocates physically contiguous
> > + * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
> > + * be allocated
> > + *
> > + * This returns a page of the beginning of contiguous block. At failure, NULL
> > + * is returned.
> > + *
> > + * Limitation: at allocation, nr_pages may be increased to be aligned to
> > + * MAX_ORDER before searching a range. So, even if there is a enough chunk
> > + * for nr_pages, it may not be able to be allocated. Extra tail pages of
> > + * allocated chunk is returned to buddy allocator before returning the caller.
> > + */
> > +
> > +#define MIGRATION_RETRY A  A  A  A (5)
> > +struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
> > + A  A  A  A  A  A  A  A  A  A  A  unsigned long nr_pages, int align_order,
> > + A  A  A  A  A  A  A  A  A  A  A  int node, gfp_t gfpflag, nodemask_t *mask)
> > +{
> > + A  A  A  unsigned long found, aligned_pages, start;
> > + A  A  A  struct page *ret = NULL;
> > + A  A  A  int migration_failed;
> > + A  A  A  bool no_search = false;
> > + A  A  A  unsigned long align_mask;
> > + A  A  A  struct zoneref *z;
> > + A  A  A  struct zone *zone;
> > + A  A  A  struct zonelist *zonelist;
> > + A  A  A  enum zone_type highzone_idx = gfp_zone(gfpflag);
> > + A  A  A  unsigned long zone_start, zone_end, rs, re, pos;
> > +
> > + A  A  A  if (node == -1)
> > + A  A  A  A  A  A  A  node = numa_node_id();
> > +
> > + A  A  A  /* check unsupported flags */
> > + A  A  A  if (gfpflag & __GFP_NORETRY)
> > + A  A  A  A  A  A  A  return NULL;
> > + A  A  A  if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)) !=
> > + A  A  A  A  A  A  A  (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL))
> > + A  A  A  A  A  A  A  return NULL;
> > +
> > + A  A  A  if (gfpflag & __GFP_THISNODE)
> > + A  A  A  A  A  A  A  zonelist = &NODE_DATA(node)->node_zonelists[1];
> > + A  A  A  else
> > + A  A  A  A  A  A  A  zonelist = &NODE_DATA(node)->node_zonelists[0];
> > + A  A  A  /*
> > + A  A  A  A * Base/nr_page/end should be aligned to MAX_ORDER
> > + A  A  A  A */
> > + A  A  A  found = 0;
> > +
> > + A  A  A  if (align_order < MAX_ORDER)
> > + A  A  A  A  A  A  A  align_order = MAX_ORDER;
> > +
> > + A  A  A  align_mask = (1 << align_order) - 1;
> > + A  A  A  if (end - base == nr_pages)
> > + A  A  A  A  A  A  A  no_search = true;
> 
> no_search is not used ?
> 
Ah, yes. I wanted to remove this and I missed this one.
But I have to do check again whether no_search check is required or not..

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
  2010-10-27 23:22   ` Minchan Kim
  (?)
@ 2010-10-29  9:20     ` Michał Nazarewicz
  -1 siblings, 0 replies; 46+ messages in thread
From: Michał Nazarewicz @ 2010-10-29  9:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Minchan Kim
  Cc: linux-mm, linux-kernel, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras, linux-arm-kernel,
	Jonathan Corbet, Russell King, Pawel Osciak, Peter Zijlstra

> On Tue, Oct 26, 2010 at 7:00 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> I only did small test and it seems to work (but I think there will be bug...)
>> I post this now just because I'll be out of office 10/31-11/15 with ksummit and
>> a private trip.
>>
>> Any comments are welcome but please see the interface is enough for use cases or
>> not.  For example) If MAX_ORDER alignment is too bad, I need to rewrite almost
>> all code.

On Thu, 28 Oct 2010 01:22:38 +0200, Minchan Kim <minchan.kim@gmail.com> wrote:
> First of all, thanks for the endless your effort to embedded system.
> It's time for statkeholders to review this.
> Cced some guys. Maybe many people of them have to attend KS.
> So I hope SAMSUNG guys review this.
>
> Maybe they can't test this since ARM doesn't support movable zone now.
> (I will look into this).
> As Kame said, please, review this patch whether this patch have enough
> interface and meet your requirement.
> I think this can't meet _all_ of your requirements(ex, latency and
> making sure getting big contiguous memory) but I believe it can meet
> NOT CRITICAL many cases, I guess.

I'm currently working on a framework (the CMA framework some may be aware of) which
in principle is meant for the same purpose: allocating physically contiguous blocks
of memory.  I'm hoping to help with latency, remove the need for MAX_ORDER alignment
as well as help with fragmentation by letting different drivers allocate memory from
different memory range.

When I was posting CMA, it had been suggested to create a new migration type
dedicated to contiguous allocations.  I think I already did that and thanks to
this new migration type we have (i) an area of memory that only accepts movable
and reclaimable pages and (ii) is used only if all other (non-reserved) pages have
been allocated.

I'm currently working on migration so that those movable and reclaimable pages
allocated in area dedicated for CMA are freed and Kame's work is quite helpful
in this regard as I have something to base my work on. :)

Nonetheless, it's a conference time now (ELC, PLC; interestingly both are in
Cambridge :P) so I guess we, here at SPRC, will look into it more after PLC.

>> Now interface is:
>>
>> struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
>>                        unsigned long nr_pages, int align_order,
>>                        int node, gfp_t gfpflag, nodemask_t *mask)
>>
>>  * @base: the lowest pfn which caller wants.
>>  * @end:  the highest pfn which caller wants.
>>  * @nr_pages: the length of a chunk of pages to be allocated.
>>  * @align_order: alignment of start address of returned chunk in order.
>>  *   Returned' page's order will be aligned to (1 << align_order).If smaller
>>  *   than MAX_ORDER, it's raised to MAX_ORDER.
>>  * @node: allocate near memory to the node, If -1, current node is used


PS. Please note that Pawel's new address is <pawel@osciak.com>.  Fixing in Cc.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29  9:20     ` Michał Nazarewicz
  0 siblings, 0 replies; 46+ messages in thread
From: Michał Nazarewicz @ 2010-10-29  9:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Minchan Kim
  Cc: linux-mm, linux-kernel, andi.kleen, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras, linux-arm-kernel,
	Jonathan Corbet, Russell King, Pawel Osciak, Peter Zijlstra

> On Tue, Oct 26, 2010 at 7:00 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> I only did small test and it seems to work (but I think there will be bug...)
>> I post this now just because I'll be out of office 10/31-11/15 with ksummit and
>> a private trip.
>>
>> Any comments are welcome but please see the interface is enough for use cases or
>> not.  For example) If MAX_ORDER alignment is too bad, I need to rewrite almost
>> all code.

On Thu, 28 Oct 2010 01:22:38 +0200, Minchan Kim <minchan.kim@gmail.com> wrote:
> First of all, thanks for the endless your effort to embedded system.
> It's time for statkeholders to review this.
> Cced some guys. Maybe many people of them have to attend KS.
> So I hope SAMSUNG guys review this.
>
> Maybe they can't test this since ARM doesn't support movable zone now.
> (I will look into this).
> As Kame said, please, review this patch whether this patch have enough
> interface and meet your requirement.
> I think this can't meet _all_ of your requirements(ex, latency and
> making sure getting big contiguous memory) but I believe it can meet
> NOT CRITICAL many cases, I guess.

I'm currently working on a framework (the CMA framework some may be aware of) which
in principle is meant for the same purpose: allocating physically contiguous blocks
of memory.  I'm hoping to help with latency, remove the need for MAX_ORDER alignment
as well as help with fragmentation by letting different drivers allocate memory from
different memory range.

When I was posting CMA, it had been suggested to create a new migration type
dedicated to contiguous allocations.  I think I already did that and thanks to
this new migration type we have (i) an area of memory that only accepts movable
and reclaimable pages and (ii) is used only if all other (non-reserved) pages have
been allocated.

I'm currently working on migration so that those movable and reclaimable pages
allocated in area dedicated for CMA are freed and Kame's work is quite helpful
in this regard as I have something to base my work on. :)

Nonetheless, it's a conference time now (ELC, PLC; interestingly both are in
Cambridge :P) so I guess we, here at SPRC, will look into it more after PLC.

>> Now interface is:
>>
>> struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
>>                        unsigned long nr_pages, int align_order,
>>                        int node, gfp_t gfpflag, nodemask_t *mask)
>>
>>  * @base: the lowest pfn which caller wants.
>>  * @end:  the highest pfn which caller wants.
>>  * @nr_pages: the length of a chunk of pages to be allocated.
>>  * @align_order: alignment of start address of returned chunk in order.
>>  *   Returned' page's order will be aligned to (1 << align_order).If smaller
>>  *   than MAX_ORDER, it's raised to MAX_ORDER.
>>  * @node: allocate near memory to the node, If -1, current node is used


PS. Please note that Pawel's new address is <pawel@osciak.com>.  Fixing in Cc.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29  9:20     ` Michał Nazarewicz
  0 siblings, 0 replies; 46+ messages in thread
From: Michał Nazarewicz @ 2010-10-29  9:20 UTC (permalink / raw)
  To: linux-arm-kernel

> On Tue, Oct 26, 2010 at 7:00 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> I only did small test and it seems to work (but I think there will be bug...)
>> I post this now just because I'll be out of office 10/31-11/15 with ksummit and
>> a private trip.
>>
>> Any comments are welcome but please see the interface is enough for use cases or
>> not.  For example) If MAX_ORDER alignment is too bad, I need to rewrite almost
>> all code.

On Thu, 28 Oct 2010 01:22:38 +0200, Minchan Kim <minchan.kim@gmail.com> wrote:
> First of all, thanks for the endless your effort to embedded system.
> It's time for statkeholders to review this.
> Cced some guys. Maybe many people of them have to attend KS.
> So I hope SAMSUNG guys review this.
>
> Maybe they can't test this since ARM doesn't support movable zone now.
> (I will look into this).
> As Kame said, please, review this patch whether this patch have enough
> interface and meet your requirement.
> I think this can't meet _all_ of your requirements(ex, latency and
> making sure getting big contiguous memory) but I believe it can meet
> NOT CRITICAL many cases, I guess.

I'm currently working on a framework (the CMA framework some may be aware of) which
in principle is meant for the same purpose: allocating physically contiguous blocks
of memory.  I'm hoping to help with latency, remove the need for MAX_ORDER alignment
as well as help with fragmentation by letting different drivers allocate memory from
different memory range.

When I was posting CMA, it had been suggested to create a new migration type
dedicated to contiguous allocations.  I think I already did that and thanks to
this new migration type we have (i) an area of memory that only accepts movable
and reclaimable pages and (ii) is used only if all other (non-reserved) pages have
been allocated.

I'm currently working on migration so that those movable and reclaimable pages
allocated in area dedicated for CMA are freed and Kame's work is quite helpful
in this regard as I have something to base my work on. :)

Nonetheless, it's a conference time now (ELC, PLC; interestingly both are in
Cambridge :P) so I guess we, here at SPRC, will look into it more after PLC.

>> Now interface is:
>>
>> struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
>>                        unsigned long nr_pages, int align_order,
>>                        int node, gfp_t gfpflag, nodemask_t *mask)
>>
>>  * @base: the lowest pfn which caller wants.
>>  * @end:  the highest pfn which caller wants.
>>  * @nr_pages: the length of a chunk of pages to be allocated.
>>  * @align_order: alignment of start address of returned chunk in order.
>>  *   Returned' page's order will be aligned to (1 << align_order).If smaller
>>  *   than MAX_ORDER, it's raised to MAX_ORDER.
>>  * @node: allocate near memory to the node, If -1, current node is used


PS. Please note that Pawel's new address is <pawel@osciak.com>.  Fixing in Cc.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Micha? "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
  2010-10-29  9:20     ` Michał Nazarewicz
  (?)
@ 2010-10-29 10:31       ` Andi Kleen
  -1 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-10-29 10:31 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

> When I was posting CMA, it had been suggested to create a new migration type
> dedicated to contiguous allocations.  I think I already did that and thanks to
> this new migration type we have (i) an area of memory that only accepts movable
> and reclaimable pages and 

Aka highmem next generation :-(

> (ii) is used only if all other (non-reserved) pages have
> been allocated.

That will be near always the case after some uptime, as memory fills up
with caches. Unless you do early reclaim? 

-Andi


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 10:31       ` Andi Kleen
  0 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-10-29 10:31 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

> When I was posting CMA, it had been suggested to create a new migration type
> dedicated to contiguous allocations.  I think I already did that and thanks to
> this new migration type we have (i) an area of memory that only accepts movable
> and reclaimable pages and 

Aka highmem next generation :-(

> (ii) is used only if all other (non-reserved) pages have
> been allocated.

That will be near always the case after some uptime, as memory fills up
with caches. Unless you do early reclaim? 

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 10:31       ` Andi Kleen
  0 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-10-29 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

> When I was posting CMA, it had been suggested to create a new migration type
> dedicated to contiguous allocations.  I think I already did that and thanks to
> this new migration type we have (i) an area of memory that only accepts movable
> and reclaimable pages and 

Aka highmem next generation :-(

> (ii) is used only if all other (non-reserved) pages have
> been allocated.

That will be near always the case after some uptime, as memory fills up
with caches. Unless you do early reclaim? 

-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
  2010-10-29 10:31       ` Andi Kleen
  (?)
@ 2010-10-29 10:59         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29 10:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Michał Nazarewicz, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

On Fri, 29 Oct 2010 12:31:54 +0200
Andi Kleen <andi.kleen@intel.com> wrote:

> > When I was posting CMA, it had been suggested to create a new migration type
> > dedicated to contiguous allocations.  I think I already did that and thanks to
> > this new migration type we have (i) an area of memory that only accepts movable
> > and reclaimable pages and 
> 
> Aka highmem next generation :-(
> 

yes. But Nick's new shrink_slab() may be a new help even without
new zone.


> > (ii) is used only if all other (non-reserved) pages have
> > been allocated.
> 
> That will be near always the case after some uptime, as memory fills up
> with caches. Unless you do early reclaim? 
> 

memory migration always do work with alloc_page() for getting migration target
pages. So, memory will be reclaimed if filled by cache.

About my patch, I may have to prealloc all required pages before start.
But I didn't do that at this time.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 10:59         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29 10:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Michał Nazarewicz, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

On Fri, 29 Oct 2010 12:31:54 +0200
Andi Kleen <andi.kleen@intel.com> wrote:

> > When I was posting CMA, it had been suggested to create a new migration type
> > dedicated to contiguous allocations.  I think I already did that and thanks to
> > this new migration type we have (i) an area of memory that only accepts movable
> > and reclaimable pages and 
> 
> Aka highmem next generation :-(
> 

yes. But Nick's new shrink_slab() may be a new help even without
new zone.


> > (ii) is used only if all other (non-reserved) pages have
> > been allocated.
> 
> That will be near always the case after some uptime, as memory fills up
> with caches. Unless you do early reclaim? 
> 

memory migration always do work with alloc_page() for getting migration target
pages. So, memory will be reclaimed if filled by cache.

About my patch, I may have to prealloc all required pages before start.
But I didn't do that at this time.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 10:59         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29 10:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 29 Oct 2010 12:31:54 +0200
Andi Kleen <andi.kleen@intel.com> wrote:

> > When I was posting CMA, it had been suggested to create a new migration type
> > dedicated to contiguous allocations.  I think I already did that and thanks to
> > this new migration type we have (i) an area of memory that only accepts movable
> > and reclaimable pages and 
> 
> Aka highmem next generation :-(
> 

yes. But Nick's new shrink_slab() may be a new help even without
new zone.


> > (ii) is used only if all other (non-reserved) pages have
> > been allocated.
> 
> That will be near always the case after some uptime, as memory fills up
> with caches. Unless you do early reclaim? 
> 

memory migration always do work with alloc_page() for getting migration target
pages. So, memory will be reclaimed if filled by cache.

About my patch, I may have to prealloc all required pages before start.
But I didn't do that at this time.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
  2010-10-29 10:59         ` KAMEZAWA Hiroyuki
  (?)
@ 2010-10-29 12:29           ` Andi Kleen
  -1 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-10-29 12:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michał Nazarewicz, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

On Fri, Oct 29, 2010 at 11:59:00AM +0100, KAMEZAWA Hiroyuki wrote:
> On Fri, 29 Oct 2010 12:31:54 +0200
> Andi Kleen <andi.kleen@intel.com> wrote:
> 
> > > When I was posting CMA, it had been suggested to create a new migration type
> > > dedicated to contiguous allocations.  I think I already did that and thanks to
> > > this new migration type we have (i) an area of memory that only accepts movable
> > > and reclaimable pages and 
> > 
> > Aka highmem next generation :-(
> > 
> 
> yes. But Nick's new shrink_slab() may be a new help even without
> new zone.

You would really need callbacks into lots of code. Christoph
used to have some patches for directed shrink of dcache/icache,
but they are currently not on the table.

I don't think Nick's patch does that, he simply optimizes the existing
shrinker (which in practice tends to not shrink a lot) to be a bit
less wasteful.

The coverage will never be 100% in any case. So you always have to
make a choice between movable or fully usable. That's essentially
highmem with most of its problems.

> 
> 
> > > (ii) is used only if all other (non-reserved) pages have
> > > been allocated.
> > 
> > That will be near always the case after some uptime, as memory fills up
> > with caches. Unless you do early reclaim? 
> > 
> 
> memory migration always do work with alloc_page() for getting migration target
> pages. So, memory will be reclaimed if filled by cache.

Was talking about that paragraph CMA, not your patch. 

If I understand it correctly CMA wants to define
a new zone which is somehow similar to movable, but only sometimes used
when another zone is full (which is the usual state in normal
operation actually)

It was unclear to me how this was all supposed to work. At least
as described in the paragraph it cannot I think.


> About my patch, I may have to prealloc all required pages before start.
> But I didn't do that at this time.

preallocate when? I thought the whole point of the large memory allocator
was to not have to pre-allocate.

-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 12:29           ` Andi Kleen
  0 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-10-29 12:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michał Nazarewicz, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

On Fri, Oct 29, 2010 at 11:59:00AM +0100, KAMEZAWA Hiroyuki wrote:
> On Fri, 29 Oct 2010 12:31:54 +0200
> Andi Kleen <andi.kleen@intel.com> wrote:
> 
> > > When I was posting CMA, it had been suggested to create a new migration type
> > > dedicated to contiguous allocations.  I think I already did that and thanks to
> > > this new migration type we have (i) an area of memory that only accepts movable
> > > and reclaimable pages and 
> > 
> > Aka highmem next generation :-(
> > 
> 
> yes. But Nick's new shrink_slab() may be a new help even without
> new zone.

You would really need callbacks into lots of code. Christoph
used to have some patches for directed shrink of dcache/icache,
but they are currently not on the table.

I don't think Nick's patch does that, he simply optimizes the existing
shrinker (which in practice tends to not shrink a lot) to be a bit
less wasteful.

The coverage will never be 100% in any case. So you always have to
make a choice between movable or fully usable. That's essentially
highmem with most of its problems.

> 
> 
> > > (ii) is used only if all other (non-reserved) pages have
> > > been allocated.
> > 
> > That will be near always the case after some uptime, as memory fills up
> > with caches. Unless you do early reclaim? 
> > 
> 
> memory migration always do work with alloc_page() for getting migration target
> pages. So, memory will be reclaimed if filled by cache.

Was talking about that paragraph CMA, not your patch. 

If I understand it correctly CMA wants to define
a new zone which is somehow similar to movable, but only sometimes used
when another zone is full (which is the usual state in normal
operation actually)

It was unclear to me how this was all supposed to work. At least
as described in the paragraph it cannot I think.


> About my patch, I may have to prealloc all required pages before start.
> But I didn't do that at this time.

preallocate when? I thought the whole point of the large memory allocator
was to not have to pre-allocate.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 12:29           ` Andi Kleen
  0 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-10-29 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 29, 2010 at 11:59:00AM +0100, KAMEZAWA Hiroyuki wrote:
> On Fri, 29 Oct 2010 12:31:54 +0200
> Andi Kleen <andi.kleen@intel.com> wrote:
> 
> > > When I was posting CMA, it had been suggested to create a new migration type
> > > dedicated to contiguous allocations.  I think I already did that and thanks to
> > > this new migration type we have (i) an area of memory that only accepts movable
> > > and reclaimable pages and 
> > 
> > Aka highmem next generation :-(
> > 
> 
> yes. But Nick's new shrink_slab() may be a new help even without
> new zone.

You would really need callbacks into lots of code. Christoph
used to have some patches for directed shrink of dcache/icache,
but they are currently not on the table.

I don't think Nick's patch does that, he simply optimizes the existing
shrinker (which in practice tends to not shrink a lot) to be a bit
less wasteful.

The coverage will never be 100% in any case. So you always have to
make a choice between movable or fully usable. That's essentially
highmem with most of its problems.

> 
> 
> > > (ii) is used only if all other (non-reserved) pages have
> > > been allocated.
> > 
> > That will be near always the case after some uptime, as memory fills up
> > with caches. Unless you do early reclaim? 
> > 
> 
> memory migration always do work with alloc_page() for getting migration target
> pages. So, memory will be reclaimed if filled by cache.

Was talking about that paragraph CMA, not your patch. 

If I understand it correctly CMA wants to define
a new zone which is somehow similar to movable, but only sometimes used
when another zone is full (which is the usual state in normal
operation actually)

It was unclear to me how this was all supposed to work. At least
as described in the paragraph it cannot I think.


> About my patch, I may have to prealloc all required pages before start.
> But I didn't do that at this time.

preallocate when? I thought the whole point of the large memory allocator
was to not have to pre-allocate.

-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
  2010-10-29 12:29           ` Andi Kleen
  (?)
@ 2010-10-29 12:31             ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29 12:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Michał Nazarewicz, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

On Fri, 29 Oct 2010 14:29:28 +0200
Andi Kleen <andi.kleen@intel.com> wrote:

> 
> > About my patch, I may have to prealloc all required pages before start.
> > But I didn't do that at this time.
> 
> preallocate when? I thought the whole point of the large memory allocator
> was to not have to pre-allocate.
> 

Yes, one-by-one allocation prevents the allocation from sudden-attack.
I just wonder to add a knob for "migrate pages here" :)


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 12:31             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29 12:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Michał Nazarewicz, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

On Fri, 29 Oct 2010 14:29:28 +0200
Andi Kleen <andi.kleen@intel.com> wrote:

> 
> > About my patch, I may have to prealloc all required pages before start.
> > But I didn't do that at this time.
> 
> preallocate when? I thought the whole point of the large memory allocator
> was to not have to pre-allocate.
> 

Yes, one-by-one allocation prevents the allocation from sudden-attack.
I just wonder to add a knob for "migrate pages here" :)


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 12:31             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 46+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29 12:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 29 Oct 2010 14:29:28 +0200
Andi Kleen <andi.kleen@intel.com> wrote:

> 
> > About my patch, I may have to prealloc all required pages before start.
> > But I didn't do that at this time.
> 
> preallocate when? I thought the whole point of the large memory allocator
> was to not have to pre-allocate.
> 

Yes, one-by-one allocation prevents the allocation from sudden-attack.
I just wonder to add a knob for "migrate pages here" :)


Thanks,
-Kame

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
  2010-10-29 12:29           ` Andi Kleen
  (?)
@ 2010-10-29 12:43             ` Michał Nazarewicz
  -1 siblings, 0 replies; 46+ messages in thread
From: Michał Nazarewicz @ 2010-10-29 12:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Andi Kleen
  Cc: Minchan Kim, linux-mm, linux-kernel, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras, linux-arm-kernel,
	Jonathan Corbet, Russell King, Pawel Osciak, Peter Zijlstra

>>>> When I was posting CMA, it had been suggested to create a new migration type
>>>> dedicated to contiguous allocations.  I think I already did that and thanks to
>>>> this new migration type we have (i) an area of memory that only accepts movable
>>>> and reclaimable pages and

>> Andi Kleen <andi.kleen@intel.com> wrote:
>>> Aka highmem next generation :-(

> On Fri, Oct 29, 2010 at 11:59:00AM +0100, KAMEZAWA Hiroyuki wrote:
>> yes. But Nick's new shrink_slab() may be a new help even without
>> new zone.

On Fri, 29 Oct 2010 14:29:28 +0200, Andi Kleen <andi.kleen@intel.com> wrote:
> You would really need callbacks into lots of code. Christoph
> used to have some patches for directed shrink of dcache/icache,
> but they are currently not on the table.
>
> I don't think Nick's patch does that, he simply optimizes the existing
> shrinker (which in practice tends to not shrink a lot) to be a bit
> less wasteful.
>
> The coverage will never be 100% in any case. So you always have to
> make a choice between movable or fully usable. That's essentially
> highmem with most of its problems.

Yep.

>>>> (ii) is used only if all other (non-reserved) pages have
>>>> been allocated.

>>> That will be near always the case after some uptime, as memory fills up
>>> with caches. Unless you do early reclaim?

Hmm... true.  Still the point remains that only movable and reclaimable pages are
allowed in the marked regions.  This in effect means that from unmovable pages
point of view, the area is unusable but I havn't thought of any other way to
guarantee that because of fragmentation, long sequence of free/movable/reclaimable
pages is available.

>> memory migration always do work with alloc_page() for getting migration target
>> pages. So, memory will be reclaimed if filled by cache.
>
> Was talking about that paragraph CMA, not your patch.
>
> If I understand it correctly CMA wants to define
> a new zone which is somehow similar to movable, but only sometimes used
> when another zone is full (which is the usual state in normal
> operation actually)
>
> It was unclear to me how this was all supposed to work. At least
> as described in the paragraph it cannot I think.

It's not a new zone, just a new migrate type.  I haven't tested it yet,
but the idea is that once pageblock's migrate type is set to this
new MIGRATE_CMA type, buddy allocator never changes it and in
fallback list it's put on the end of entries for MIGRATE_RECLAIMABLE
and MIGRATE_MOVABLE.

If I got everything right, this means that pages from MIGRATE_CMA pageblocks
are available for movable and reclaimable allocations but not for unmovable.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 12:43             ` Michał Nazarewicz
  0 siblings, 0 replies; 46+ messages in thread
From: Michał Nazarewicz @ 2010-10-29 12:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Andi Kleen
  Cc: Minchan Kim, linux-mm, linux-kernel, KOSAKI Motohiro,
	fujita.tomonori, felipe.contreras, linux-arm-kernel,
	Jonathan Corbet, Russell King, Pawel Osciak, Peter Zijlstra

>>>> When I was posting CMA, it had been suggested to create a new migration type
>>>> dedicated to contiguous allocations.  I think I already did that and thanks to
>>>> this new migration type we have (i) an area of memory that only accepts movable
>>>> and reclaimable pages and

>> Andi Kleen <andi.kleen@intel.com> wrote:
>>> Aka highmem next generation :-(

> On Fri, Oct 29, 2010 at 11:59:00AM +0100, KAMEZAWA Hiroyuki wrote:
>> yes. But Nick's new shrink_slab() may be a new help even without
>> new zone.

On Fri, 29 Oct 2010 14:29:28 +0200, Andi Kleen <andi.kleen@intel.com> wrote:
> You would really need callbacks into lots of code. Christoph
> used to have some patches for directed shrink of dcache/icache,
> but they are currently not on the table.
>
> I don't think Nick's patch does that, he simply optimizes the existing
> shrinker (which in practice tends to not shrink a lot) to be a bit
> less wasteful.
>
> The coverage will never be 100% in any case. So you always have to
> make a choice between movable or fully usable. That's essentially
> highmem with most of its problems.

Yep.

>>>> (ii) is used only if all other (non-reserved) pages have
>>>> been allocated.

>>> That will be near always the case after some uptime, as memory fills up
>>> with caches. Unless you do early reclaim?

Hmm... true.  Still the point remains that only movable and reclaimable pages are
allowed in the marked regions.  This in effect means that from unmovable pages
point of view, the area is unusable but I havn't thought of any other way to
guarantee that because of fragmentation, long sequence of free/movable/reclaimable
pages is available.

>> memory migration always do work with alloc_page() for getting migration target
>> pages. So, memory will be reclaimed if filled by cache.
>
> Was talking about that paragraph CMA, not your patch.
>
> If I understand it correctly CMA wants to define
> a new zone which is somehow similar to movable, but only sometimes used
> when another zone is full (which is the usual state in normal
> operation actually)
>
> It was unclear to me how this was all supposed to work. At least
> as described in the paragraph it cannot I think.

It's not a new zone, just a new migrate type.  I haven't tested it yet,
but the idea is that once pageblock's migrate type is set to this
new MIGRATE_CMA type, buddy allocator never changes it and in
fallback list it's put on the end of entries for MIGRATE_RECLAIMABLE
and MIGRATE_MOVABLE.

If I got everything right, this means that pages from MIGRATE_CMA pageblocks
are available for movable and reclaimable allocations but not for unmovable.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 12:43             ` Michał Nazarewicz
  0 siblings, 0 replies; 46+ messages in thread
From: Michał Nazarewicz @ 2010-10-29 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

>>>> When I was posting CMA, it had been suggested to create a new migration type
>>>> dedicated to contiguous allocations.  I think I already did that and thanks to
>>>> this new migration type we have (i) an area of memory that only accepts movable
>>>> and reclaimable pages and

>> Andi Kleen <andi.kleen@intel.com> wrote:
>>> Aka highmem next generation :-(

> On Fri, Oct 29, 2010 at 11:59:00AM +0100, KAMEZAWA Hiroyuki wrote:
>> yes. But Nick's new shrink_slab() may be a new help even without
>> new zone.

On Fri, 29 Oct 2010 14:29:28 +0200, Andi Kleen <andi.kleen@intel.com> wrote:
> You would really need callbacks into lots of code. Christoph
> used to have some patches for directed shrink of dcache/icache,
> but they are currently not on the table.
>
> I don't think Nick's patch does that, he simply optimizes the existing
> shrinker (which in practice tends to not shrink a lot) to be a bit
> less wasteful.
>
> The coverage will never be 100% in any case. So you always have to
> make a choice between movable or fully usable. That's essentially
> highmem with most of its problems.

Yep.

>>>> (ii) is used only if all other (non-reserved) pages have
>>>> been allocated.

>>> That will be near always the case after some uptime, as memory fills up
>>> with caches. Unless you do early reclaim?

Hmm... true.  Still the point remains that only movable and reclaimable pages are
allowed in the marked regions.  This in effect means that from unmovable pages
point of view, the area is unusable but I havn't thought of any other way to
guarantee that because of fragmentation, long sequence of free/movable/reclaimable
pages is available.

>> memory migration always do work with alloc_page() for getting migration target
>> pages. So, memory will be reclaimed if filled by cache.
>
> Was talking about that paragraph CMA, not your patch.
>
> If I understand it correctly CMA wants to define
> a new zone which is somehow similar to movable, but only sometimes used
> when another zone is full (which is the usual state in normal
> operation actually)
>
> It was unclear to me how this was all supposed to work. At least
> as described in the paragraph it cannot I think.

It's not a new zone, just a new migrate type.  I haven't tested it yet,
but the idea is that once pageblock's migrate type is set to this
new MIGRATE_CMA type, buddy allocator never changes it and in
fallback list it's put on the end of entries for MIGRATE_RECLAIMABLE
and MIGRATE_MOVABLE.

If I got everything right, this means that pages from MIGRATE_CMA pageblocks
are available for movable and reclaimable allocations but not for unmovable.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Micha? "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
  2010-10-29 10:31       ` Andi Kleen
  (?)
@ 2010-10-29 13:11         ` Minchan Kim
  -1 siblings, 0 replies; 46+ messages in thread
From: Minchan Kim @ 2010-10-29 13:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Michał Nazarewicz, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

2010/10/29 Andi Kleen <andi.kleen@intel.com>:
>> When I was posting CMA, it had been suggested to create a new migration type
>> dedicated to contiguous allocations.  I think I already did that and thanks to
>> this new migration type we have (i) an area of memory that only accepts movable
>> and reclaimable pages and
>
> Aka highmem next generation :-(

I lost the road. What is highmem next generation?
Could you point it to me?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 13:11         ` Minchan Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Minchan Kim @ 2010-10-29 13:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Michał Nazarewicz, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

2010/10/29 Andi Kleen <andi.kleen@intel.com>:
>> When I was posting CMA, it had been suggested to create a new migration type
>> dedicated to contiguous allocations.  I think I already did that and thanks to
>> this new migration type we have (i) an area of memory that only accepts movable
>> and reclaimable pages and
>
> Aka highmem next generation :-(

I lost the road. What is highmem next generation?
Could you point it to me?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 13:11         ` Minchan Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Minchan Kim @ 2010-10-29 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

2010/10/29 Andi Kleen <andi.kleen@intel.com>:
>> When I was posting CMA, it had been suggested to create a new migration type
>> dedicated to contiguous allocations. ?I think I already did that and thanks to
>> this new migration type we have (i) an area of memory that only accepts movable
>> and reclaimable pages and
>
> Aka highmem next generation :-(

I lost the road. What is highmem next generation?
Could you point it to me?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
  2010-10-29 12:43             ` Michał Nazarewicz
  (?)
@ 2010-10-29 14:27               ` Andi Kleen
  -1 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-10-29 14:27 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

On Fri, Oct 29, 2010 at 01:43:51PM +0100, Michał Nazarewicz wrote:
> >>>> (ii) is used only if all other (non-reserved) pages have
> >>>> been allocated.
> 
> >>> That will be near always the case after some uptime, as memory fills up
> >>> with caches. Unless you do early reclaim?
> 
> Hmm... true.  Still the point remains that only movable and reclaimable pages are
> allowed in the marked regions.  This in effect means that from unmovable pages
> point of view, the area is unusable but I havn't thought of any other way to
> guarantee that because of fragmentation, long sequence of free/movable/reclaimable
> pages is available.

Essentially a movable zone as defined today.

That gets you near all the problems of highmem (except for the mapping
problem and you're a bit more flexible in the splits): 

Someone has to decide at boot how much should be movable
and what not, some workloads will run out of space, some may
deadlock when it runs out of management objects, etc.etc. 
Classic highmem had a long string of issues with all of this.

If it was an easy problem it had been long solved, but it isn't really.

-Andi


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 14:27               ` Andi Kleen
  0 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-10-29 14:27 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

On Fri, Oct 29, 2010 at 01:43:51PM +0100, MichaA? Nazarewicz wrote:
> >>>> (ii) is used only if all other (non-reserved) pages have
> >>>> been allocated.
> 
> >>> That will be near always the case after some uptime, as memory fills up
> >>> with caches. Unless you do early reclaim?
> 
> Hmm... true.  Still the point remains that only movable and reclaimable pages are
> allowed in the marked regions.  This in effect means that from unmovable pages
> point of view, the area is unusable but I havn't thought of any other way to
> guarantee that because of fragmentation, long sequence of free/movable/reclaimable
> pages is available.

Essentially a movable zone as defined today.

That gets you near all the problems of highmem (except for the mapping
problem and you're a bit more flexible in the splits): 

Someone has to decide at boot how much should be movable
and what not, some workloads will run out of space, some may
deadlock when it runs out of management objects, etc.etc. 
Classic highmem had a long string of issues with all of this.

If it was an easy problem it had been long solved, but it isn't really.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 14:27               ` Andi Kleen
  0 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-10-29 14:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 29, 2010 at 01:43:51PM +0100, Micha? Nazarewicz wrote:
> >>>> (ii) is used only if all other (non-reserved) pages have
> >>>> been allocated.
> 
> >>> That will be near always the case after some uptime, as memory fills up
> >>> with caches. Unless you do early reclaim?
> 
> Hmm... true.  Still the point remains that only movable and reclaimable pages are
> allowed in the marked regions.  This in effect means that from unmovable pages
> point of view, the area is unusable but I havn't thought of any other way to
> guarantee that because of fragmentation, long sequence of free/movable/reclaimable
> pages is available.

Essentially a movable zone as defined today.

That gets you near all the problems of highmem (except for the mapping
problem and you're a bit more flexible in the splits): 

Someone has to decide at boot how much should be movable
and what not, some workloads will run out of space, some may
deadlock when it runs out of management objects, etc.etc. 
Classic highmem had a long string of issues with all of this.

If it was an easy problem it had been long solved, but it isn't really.

-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
  2010-10-29 14:27               ` Andi Kleen
  (?)
@ 2010-10-29 14:58                 ` Michał Nazarewicz
  -1 siblings, 0 replies; 46+ messages in thread
From: Michał Nazarewicz @ 2010-10-29 14:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

On Fri, 29 Oct 2010 16:27:41 +0200, Andi Kleen <andi.kleen@intel.com> wrote:

> On Fri, Oct 29, 2010 at 01:43:51PM +0100, Michał Nazarewicz wrote:
>> Hmm... true.  Still the point remains that only movable and reclaimable pages are
>> allowed in the marked regions.  This in effect means that from unmovable pages
>> point of view, the area is unusable but I havn't thought of any other way to
>> guarantee that because of fragmentation, long sequence of free/movable/reclaimable
>> pages is available.

> Essentially a movable zone as defined today.

Ah, right, I somehow was under the impresion that movable zone can be used as a fallback
zone.  When I'm finished with my current approach I'll look more closely into it.

> That gets you near all the problems of highmem (except for the mapping
> problem and you're a bit more flexible in the splits):
>
> Someone has to decide at boot how much should be movable
> and what not, some workloads will run out of space, some may
> deadlock when it runs out of management objects, etc.etc.
> Classic highmem had a long string of issues with all of this.

Here's where the rest of CMA comes.  The solution may be not perfect but it's
probably better then nothing.  The idea is to define regions for each device
(with possibility for a single region to be shared) which, hopefuly, can help
with fragmentation.

In the current form, CMA is designed mostly for embeded systems where one can
define what kind of devices will be used, but in general this could be used
for other systems as well.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 14:58                 ` Michał Nazarewicz
  0 siblings, 0 replies; 46+ messages in thread
From: Michał Nazarewicz @ 2010-10-29 14:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, linux-mm, linux-kernel,
	KOSAKI Motohiro, fujita.tomonori, felipe.contreras,
	linux-arm-kernel, Jonathan Corbet, Russell King, Pawel Osciak,
	Peter Zijlstra

On Fri, 29 Oct 2010 16:27:41 +0200, Andi Kleen <andi.kleen@intel.com> wrote:

> On Fri, Oct 29, 2010 at 01:43:51PM +0100, Michał Nazarewicz wrote:
>> Hmm... true.  Still the point remains that only movable and reclaimable pages are
>> allowed in the marked regions.  This in effect means that from unmovable pages
>> point of view, the area is unusable but I havn't thought of any other way to
>> guarantee that because of fragmentation, long sequence of free/movable/reclaimable
>> pages is available.

> Essentially a movable zone as defined today.

Ah, right, I somehow was under the impresion that movable zone can be used as a fallback
zone.  When I'm finished with my current approach I'll look more closely into it.

> That gets you near all the problems of highmem (except for the mapping
> problem and you're a bit more flexible in the splits):
>
> Someone has to decide at boot how much should be movable
> and what not, some workloads will run out of space, some may
> deadlock when it runs out of management objects, etc.etc.
> Classic highmem had a long string of issues with all of this.

Here's where the rest of CMA comes.  The solution may be not perfect but it's
probably better then nothing.  The idea is to define regions for each device
(with possibility for a single region to be shared) which, hopefuly, can help
with fragmentation.

In the current form, CMA is designed mostly for embeded systems where one can
define what kind of devices will be used, but in general this could be used
for other systems as well.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC][PATCH 0/3] big chunk memory allocator v2
@ 2010-10-29 14:58                 ` Michał Nazarewicz
  0 siblings, 0 replies; 46+ messages in thread
From: Michał Nazarewicz @ 2010-10-29 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 29 Oct 2010 16:27:41 +0200, Andi Kleen <andi.kleen@intel.com> wrote:

> On Fri, Oct 29, 2010 at 01:43:51PM +0100, Micha? Nazarewicz wrote:
>> Hmm... true.  Still the point remains that only movable and reclaimable pages are
>> allowed in the marked regions.  This in effect means that from unmovable pages
>> point of view, the area is unusable but I havn't thought of any other way to
>> guarantee that because of fragmentation, long sequence of free/movable/reclaimable
>> pages is available.

> Essentially a movable zone as defined today.

Ah, right, I somehow was under the impresion that movable zone can be used as a fallback
zone.  When I'm finished with my current approach I'll look more closely into it.

> That gets you near all the problems of highmem (except for the mapping
> problem and you're a bit more flexible in the splits):
>
> Someone has to decide at boot how much should be movable
> and what not, some workloads will run out of space, some may
> deadlock when it runs out of management objects, etc.etc.
> Classic highmem had a long string of issues with all of this.

Here's where the rest of CMA comes.  The solution may be not perfect but it's
probably better then nothing.  The idea is to define regions for each device
(with possibility for a single region to be shared) which, hopefuly, can help
with fragmentation.

In the current form, CMA is designed mostly for embeded systems where one can
define what kind of devices will be used, but in general this could be used
for other systems as well.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Micha? "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2010-10-29 14:58 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-26 10:00 [RFC][PATCH 0/3] big chunk memory allocator v2 KAMEZAWA Hiroyuki
2010-10-26 10:00 ` KAMEZAWA Hiroyuki
2010-10-26 10:02 ` [RFC][PATCH 1/3] move code from memory_hotplug to page_isolation KAMEZAWA Hiroyuki
2010-10-26 10:02   ` KAMEZAWA Hiroyuki
2010-10-26 10:04 ` [RFC][PATCH 2/3] a help function for find physically contiguous block KAMEZAWA Hiroyuki
2010-10-26 10:04   ` KAMEZAWA Hiroyuki
2010-10-29  3:53   ` Bob Liu
2010-10-29  3:53     ` Bob Liu
2010-10-29  4:00     ` KAMEZAWA Hiroyuki
2010-10-29  4:00       ` KAMEZAWA Hiroyuki
2010-10-26 10:08 ` [RFC][PATCH 3/3] a big contig memory allocator KAMEZAWA Hiroyuki
2010-10-26 10:08   ` KAMEZAWA Hiroyuki
2010-10-29  3:55   ` Bob Liu
2010-10-29  3:55     ` Bob Liu
2010-10-29  4:02     ` KAMEZAWA Hiroyuki
2010-10-29  4:02       ` KAMEZAWA Hiroyuki
2010-10-27 23:22 ` [RFC][PATCH 0/3] big chunk memory allocator v2 Minchan Kim
2010-10-27 23:22   ` Minchan Kim
2010-10-27 23:22   ` Minchan Kim
2010-10-29  9:20   ` Michał Nazarewicz
2010-10-29  9:20     ` Michał Nazarewicz
2010-10-29  9:20     ` Michał Nazarewicz
2010-10-29 10:31     ` Andi Kleen
2010-10-29 10:31       ` Andi Kleen
2010-10-29 10:31       ` Andi Kleen
2010-10-29 10:59       ` KAMEZAWA Hiroyuki
2010-10-29 10:59         ` KAMEZAWA Hiroyuki
2010-10-29 10:59         ` KAMEZAWA Hiroyuki
2010-10-29 12:29         ` Andi Kleen
2010-10-29 12:29           ` Andi Kleen
2010-10-29 12:29           ` Andi Kleen
2010-10-29 12:31           ` KAMEZAWA Hiroyuki
2010-10-29 12:31             ` KAMEZAWA Hiroyuki
2010-10-29 12:31             ` KAMEZAWA Hiroyuki
2010-10-29 12:43           ` Michał Nazarewicz
2010-10-29 12:43             ` Michał Nazarewicz
2010-10-29 12:43             ` Michał Nazarewicz
2010-10-29 14:27             ` Andi Kleen
2010-10-29 14:27               ` Andi Kleen
2010-10-29 14:27               ` Andi Kleen
2010-10-29 14:58               ` Michał Nazarewicz
2010-10-29 14:58                 ` Michał Nazarewicz
2010-10-29 14:58                 ` Michał Nazarewicz
2010-10-29 13:11       ` Minchan Kim
2010-10-29 13:11         ` Minchan Kim
2010-10-29 13:11         ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.