All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -v16 00/35] Use lmb with x86
@ 2010-05-14  0:19 Yinghai Lu
  2010-05-14  0:19 ` [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res Yinghai Lu
                   ` (34 more replies)
  0 siblings, 35 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

the new lmb could be used to replace early_res in x86.

Suggested by: David, Ben, and Thomas

-v6: change sequence as requested by Thomas
-v7: seperate them to more patches
-v8: add boundary checking to make sure not free partial page.
-v9: use lmb_debug to control print out of reserve_lmb.
     add e820 clean up, and e820 become __initdata
-v10:use lmb.rmo_size and ARCH_DISCARD_LMB according to Michael
     change name to lmb_find_area/reserve_lmb_area/free_lmb_area,
      according to Michael
     update find_lmb_area to use __lmb_alloc_base according to ben
-v11:move find_lmb_area_size back to x86.
     x86 has own find_lmb_area, and could be disabled by ARCH_LMB_FIND_AREA
      because _lmb_find_base has different behavoir from x86's old one.
      one from high to high and one from low to high
      need more test
     tested for x86 32bit/64bit, numa/nonuma, nobootmem/bootmem.
-v12:refresh the series with current tip
     seperate nobootmem.c, so could remove some #ifdef
     still keep CONFIG_NO_BOOTMEM, in x86 .c, and could use the as tags
     so other lmb could refer them to use NO_BOOTMEM.

-v14:refresh to current tip

-v15:remove x86 version lmb_find_area
     remove other nobootmem and x86 e820 from this patchset

-v16: rebase to Ben's cleanup powerpc/lmb
     move back most func back to arch/x86/mm/lmb.c

this patches is based on tip/master+powerpc/lmb

todo:
	1. use for_each_lmb to replace for cycle
	2. replace range handling with lmb.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:12   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 02/35] lmb: Prepare to include linux/lmb.h in core file Yinghai Lu
                   ` (33 subsequent siblings)
  34 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

1. expose lmb_debug
2. expose lmb_reserved_init_regions
3. expose lmb_add_region
4. prection for include linux/lmb.h in mm/page_alloc.c and mm/bootmem.c
5. lmb_find_base() should return LMB_ERROR in one failing path.
   (this one cost me 3 hours !)
6. move LMB_ERROR to lmb.h

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 include/linux/lmb.h |    4 ++++
 lib/lmb.c           |   21 +++++++++------------
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/lmb.h b/include/linux/lmb.h
index 6f8c4bd..7987766 100644
--- a/include/linux/lmb.h
+++ b/include/linux/lmb.h
@@ -19,6 +19,7 @@
 #include <asm/lmb.h>
 
 #define INIT_LMB_REGIONS 128
+#define LMB_ERROR	(~(phys_addr_t)0)
 
 struct lmb_region {
 	phys_addr_t base;
@@ -39,6 +40,8 @@ struct lmb {
 };
 
 extern struct lmb lmb;
+extern int lmb_debug;
+extern struct lmb_region lmb_reserved_init_regions[];
 
 extern void __init lmb_init(void);
 extern void __init lmb_analyze(void);
@@ -61,6 +64,7 @@ extern phys_addr_t __init lmb_alloc(phys_addr_t size, phys_addr_t align);
 #define LMB_ALLOC_ANYWHERE	(~(phys_addr_t)0)
 #define LMB_ALLOC_ACCESSIBLE	0
 
+long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size);
 extern phys_addr_t __init lmb_alloc_base(phys_addr_t size,
 					 phys_addr_t align,
 					 phys_addr_t max_addr);
diff --git a/lib/lmb.c b/lib/lmb.c
index 2cd5aaa..fddd72c 100644
--- a/lib/lmb.c
+++ b/lib/lmb.c
@@ -22,11 +22,9 @@
 
 struct lmb lmb;
 
-static int lmb_debug;
+int lmb_debug;
 static struct lmb_region lmb_memory_init_regions[INIT_LMB_REGIONS + 1];
-static struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1];
-
-#define LMB_ERROR	(~(phys_addr_t)0)
+struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1];
 
 /* inline so we don't get a warning when pr_debug is compiled out */
 static inline const char *lmb_type_name(struct lmb_type *type)
@@ -154,7 +152,7 @@ static phys_addr_t __init lmb_find_base(phys_addr_t size, phys_addr_t align,
 		if (found != LMB_ERROR)
 			return found;
 	}
-	return 0;
+	return LMB_ERROR;
 }
 
 static void lmb_remove_region(struct lmb_type *type, unsigned long r)
@@ -176,17 +174,12 @@ static void lmb_coalesce_regions(struct lmb_type *type,
 	lmb_remove_region(type, r2);
 }
 
-/* Defined below but needed now */
-static long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size);
-
 static int lmb_double_array(struct lmb_type *type)
 {
 	struct lmb_region *new_array, *old_array;
 	phys_addr_t old_size, new_size, addr;
 	int use_slab = slab_is_available();
 
-	pr_debug("lmb: %s array full, doubling...", lmb_type_name(type));
-
 	/* Calculate new doubled size */
 	old_size = type->max * sizeof(struct lmb_region);
 	new_size = old_size << 1;
@@ -206,7 +199,7 @@ static int lmb_double_array(struct lmb_type *type)
 		new_array = kmalloc(new_size, GFP_KERNEL);
 		addr = new_array == NULL ? LMB_ERROR : __pa(new_array);
 	} else
-		addr = lmb_find_base(new_size, sizeof(phys_addr_t), 0, LMB_ALLOC_ACCESSIBLE);
+		addr = lmb_find_base(new_size, sizeof(struct lmb_region), 0, LMB_ALLOC_ACCESSIBLE);
 	if (addr == LMB_ERROR) {
 		pr_err("lmb: Failed to double %s array from %ld to %ld entries !\n",
 		       lmb_type_name(type), type->max, type->max * 2);
@@ -214,6 +207,10 @@ static int lmb_double_array(struct lmb_type *type)
 	}
 	new_array = __va(addr);
 
+	if (lmb_debug)
+		pr_info("lmb: %s array is doubled to %ld at %llx - %llx",
+			 lmb_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size);
+
 	/* Found space, we now need to move the array over before
 	 * we add the reserved region since it may be our reserved
 	 * array itself that is full.
@@ -249,7 +246,7 @@ extern int __weak lmb_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
 	return 1;
 }
 
-static long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
+long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
 {
 	unsigned long coalesced = 0;
 	long adjacent, i;
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 02/35] lmb: Prepare to include linux/lmb.h in core file
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
  2010-05-14  0:19 ` [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 03/35] lmb: Add ARCH_DISCARD_LMB to put lmb code to .init Yinghai Lu
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

Need to add protection in linux/lmb.h, to prepare to include it in
 mm/page_alloc.c and mm/bootmem.c etc.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 include/linux/lmb.h |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/lmb.h b/include/linux/lmb.h
index 7987766..2b87641 100644
--- a/include/linux/lmb.h
+++ b/include/linux/lmb.h
@@ -2,6 +2,7 @@
 #define _LINUX_LMB_H
 #ifdef __KERNEL__
 
+#ifdef CONFIG_HAVE_LMB
 /*
  * Logical memory blocks.
  *
@@ -144,6 +145,8 @@ static inline unsigned long lmb_region_pages(const struct lmb_region *reg)
 	     region++)
 
 
+#endif /* CONFIG_HAVE_LMB */
+
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_LMB_H */
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 03/35] lmb: Add ARCH_DISCARD_LMB to put lmb code to .init
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
  2010-05-14  0:19 ` [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res Yinghai Lu
  2010-05-14  0:19 ` [PATCH 02/35] lmb: Prepare to include linux/lmb.h in core file Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:14   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 04/35] lmb: Add lmb_find_area() Yinghai Lu
                   ` (31 subsequent siblings)
  34 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

So those lmb bits could be released after kernel is booted up.

Arch code could define ARCH_DISCARD_LMB in asm/lmb.h,
__init_lmb will become __init, __initdata_lmb will becom __initdata

x86 code will use that.

-v2: use ARCH_DISCARD_LMB according to Michael Ellerman

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 include/linux/lmb.h |    8 ++++++++
 lib/lmb.c           |   46 +++++++++++++++++++++++-----------------------
 2 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/include/linux/lmb.h b/include/linux/lmb.h
index 2b87641..0b073a3 100644
--- a/include/linux/lmb.h
+++ b/include/linux/lmb.h
@@ -145,6 +145,14 @@ static inline unsigned long lmb_region_pages(const struct lmb_region *reg)
 	     region++)
 
 
+#ifdef ARCH_DISCARD_LMB
+#define __init_lmb __init
+#define __initdata_lmb __initdata
+#else
+#define __init_lmb
+#define __initdata_lmb
+#endif
+
 #endif /* CONFIG_HAVE_LMB */
 
 #endif /* __KERNEL__ */
diff --git a/lib/lmb.c b/lib/lmb.c
index fddd72c..6d49a17 100644
--- a/lib/lmb.c
+++ b/lib/lmb.c
@@ -20,11 +20,11 @@
 #include <linux/seq_file.h>
 #include <linux/lmb.h>
 
-struct lmb lmb;
+struct lmb lmb __initdata_lmb;
 
-int lmb_debug;
-static struct lmb_region lmb_memory_init_regions[INIT_LMB_REGIONS + 1];
-struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1];
+int lmb_debug __initdata_lmb;
+static struct lmb_region lmb_memory_init_regions[INIT_LMB_REGIONS + 1] __initdata_lmb;
+struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1] __initdata_lmb;
 
 /* inline so we don't get a warning when pr_debug is compiled out */
 static inline const char *lmb_type_name(struct lmb_type *type)
@@ -41,23 +41,23 @@ static inline const char *lmb_type_name(struct lmb_type *type)
  * Address comparison utilities
  */
 
-static phys_addr_t lmb_align_down(phys_addr_t addr, phys_addr_t size)
+static phys_addr_t __init_lmb lmb_align_down(phys_addr_t addr, phys_addr_t size)
 {
 	return addr & ~(size - 1);
 }
 
-static phys_addr_t lmb_align_up(phys_addr_t addr, phys_addr_t size)
+static phys_addr_t __init_lmb lmb_align_up(phys_addr_t addr, phys_addr_t size)
 {
 	return (addr + (size - 1)) & ~(size - 1);
 }
 
-static unsigned long lmb_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
+static unsigned long __init_lmb lmb_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
 				       phys_addr_t base2, phys_addr_t size2)
 {
 	return ((base1 < (base2 + size2)) && (base2 < (base1 + size1)));
 }
 
-static long lmb_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
+static long __init_lmb lmb_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
 			       phys_addr_t base2, phys_addr_t size2)
 {
 	if (base2 == base1 + size1)
@@ -68,7 +68,7 @@ static long lmb_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
 	return 0;
 }
 
-static long lmb_regions_adjacent(struct lmb_type *type,
+static long __init_lmb lmb_regions_adjacent(struct lmb_type *type,
 				 unsigned long r1, unsigned long r2)
 {
 	phys_addr_t base1 = type->regions[r1].base;
@@ -79,7 +79,7 @@ static long lmb_regions_adjacent(struct lmb_type *type,
 	return lmb_addrs_adjacent(base1, size1, base2, size2);
 }
 
-long lmb_overlaps_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
+long __init_lmb lmb_overlaps_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
 {
 	unsigned long i;
 
@@ -155,7 +155,7 @@ static phys_addr_t __init lmb_find_base(phys_addr_t size, phys_addr_t align,
 	return LMB_ERROR;
 }
 
-static void lmb_remove_region(struct lmb_type *type, unsigned long r)
+static void __init_lmb lmb_remove_region(struct lmb_type *type, unsigned long r)
 {
 	unsigned long i;
 
@@ -167,14 +167,14 @@ static void lmb_remove_region(struct lmb_type *type, unsigned long r)
 }
 
 /* Assumption: base addr of region 1 < base addr of region 2 */
-static void lmb_coalesce_regions(struct lmb_type *type,
+static void __init_lmb lmb_coalesce_regions(struct lmb_type *type,
 		unsigned long r1, unsigned long r2)
 {
 	type->regions[r1].size += type->regions[r2].size;
 	lmb_remove_region(type, r2);
 }
 
-static int lmb_double_array(struct lmb_type *type)
+static int __init_lmb lmb_double_array(struct lmb_type *type)
 {
 	struct lmb_region *new_array, *old_array;
 	phys_addr_t old_size, new_size, addr;
@@ -240,13 +240,13 @@ static int lmb_double_array(struct lmb_type *type)
 	return 0;
 }
 
-extern int __weak lmb_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
+extern int __init_lmb __weak lmb_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
 					  phys_addr_t addr2, phys_addr_t size2)
 {
 	return 1;
 }
 
-long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
+long __init_lmb lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
 {
 	unsigned long coalesced = 0;
 	long adjacent, i;
@@ -333,13 +333,13 @@ long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
 	return 0;
 }
 
-long lmb_add(phys_addr_t base, phys_addr_t size)
+long __init_lmb lmb_add(phys_addr_t base, phys_addr_t size)
 {
 	return lmb_add_region(&lmb.memory, base, size);
 
 }
 
-static long __lmb_remove(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
+static long __init_lmb __lmb_remove(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
 {
 	phys_addr_t rgnbegin, rgnend;
 	phys_addr_t end = base + size;
@@ -387,7 +387,7 @@ static long __lmb_remove(struct lmb_type *type, phys_addr_t base, phys_addr_t si
 	return lmb_add_region(type, end, rgnend - end);
 }
 
-long lmb_remove(phys_addr_t base, phys_addr_t size)
+long __init_lmb lmb_remove(phys_addr_t base, phys_addr_t size)
 {
 	return __lmb_remove(&lmb.memory, base, size);
 }
@@ -544,7 +544,7 @@ phys_addr_t __init lmb_phys_mem_size(void)
 	return lmb.memory_size;
 }
 
-phys_addr_t lmb_end_of_DRAM(void)
+phys_addr_t __init_lmb lmb_end_of_DRAM(void)
 {
 	int idx = lmb.memory.cnt - 1;
 
@@ -605,7 +605,7 @@ int __init lmb_is_reserved(phys_addr_t addr)
 	return 0;
 }
 
-int lmb_is_region_reserved(phys_addr_t base, phys_addr_t size)
+int __init_lmb lmb_is_region_reserved(phys_addr_t base, phys_addr_t size)
 {
 	return lmb_overlaps_region(&lmb.reserved, base, size);
 }
@@ -616,7 +616,7 @@ void __init lmb_set_current_limit(phys_addr_t limit)
 	lmb.current_limit = limit;
 }
 
-static void lmb_dump(struct lmb_type *region, char *name)
+static void __init_lmb lmb_dump(struct lmb_type *region, char *name)
 {
 	unsigned long long base, size;
 	int i;
@@ -632,7 +632,7 @@ static void lmb_dump(struct lmb_type *region, char *name)
 	}
 }
 
-void lmb_dump_all(void)
+void __init_lmb lmb_dump_all(void)
 {
 	if (!lmb_debug)
 		return;
@@ -695,7 +695,7 @@ static int __init early_lmb(char *p)
 }
 early_param("lmb", early_lmb);
 
-#ifdef CONFIG_DEBUG_FS
+#if defined(CONFIG_DEBUG_FS) && !defined(ARCH_DISCARD_LMB)
 
 static int lmb_debug_show(struct seq_file *m, void *private)
 {
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 04/35] lmb: Add lmb_find_area()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (2 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 03/35] lmb: Add ARCH_DISCARD_LMB to put lmb code to .init Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:16   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 05/35] x86, lmb: Add lmb_find_area_size() Yinghai Lu
                   ` (30 subsequent siblings)
  34 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

it is a wrapper for lmb_find_base

make it more easy for x86 to use lmb. ( rebase )
x86 early_res is using find/reserve pattern instead of alloc.

-v2: Change name to lmb_find_area() according to Michael Ellerman
-v3: Add generic weak version __lmb_find_area()
     so keep the path for fallback to x86 version that handle from low

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 include/linux/lmb.h |    4 ++++
 lib/lmb.c           |   27 ++++++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/include/linux/lmb.h b/include/linux/lmb.h
index 0b073a3..3c23dc8 100644
--- a/include/linux/lmb.h
+++ b/include/linux/lmb.h
@@ -44,6 +44,10 @@ extern struct lmb lmb;
 extern int lmb_debug;
 extern struct lmb_region lmb_reserved_init_regions[];
 
+u64 __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
+			u64 size, u64 align);
+u64 lmb_find_area(u64 start, u64 end, u64 size, u64 align);
+
 extern void __init lmb_init(void);
 extern void __init lmb_analyze(void);
 extern long lmb_add(phys_addr_t base, phys_addr_t size);
diff --git a/lib/lmb.c b/lib/lmb.c
index 6d49a17..f917dbf 100644
--- a/lib/lmb.c
+++ b/lib/lmb.c
@@ -155,6 +155,31 @@ static phys_addr_t __init lmb_find_base(phys_addr_t size, phys_addr_t align,
 	return LMB_ERROR;
 }
 
+u64 __init __weak __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
+				 u64 size, u64 align)
+{
+	u64 final_start, final_end;
+	u64 mem;
+
+	final_start = max(ei_start, start);
+	final_end = min(ei_last, end);
+
+	if (final_start >= final_end)
+		return LMB_ERROR;
+
+	mem = lmb_find_base(size, align, final_start, final_end);
+
+	return mem;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range.
+ */
+u64 __init __weak lmb_find_area(u64 start, u64 end, u64 size, u64 align)
+{
+	return lmb_find_base(size, align, start, end);
+}
+
 static void __init_lmb lmb_remove_region(struct lmb_type *type, unsigned long r)
 {
 	unsigned long i;
@@ -199,7 +224,7 @@ static int __init_lmb lmb_double_array(struct lmb_type *type)
 		new_array = kmalloc(new_size, GFP_KERNEL);
 		addr = new_array == NULL ? LMB_ERROR : __pa(new_array);
 	} else
-		addr = lmb_find_base(new_size, sizeof(struct lmb_region), 0, LMB_ALLOC_ACCESSIBLE);
+		addr = lmb_find_area(0, lmb.current_limit, new_size, sizeof(struct lmb_region));
 	if (addr == LMB_ERROR) {
 		pr_err("lmb: Failed to double %s array from %ld to %ld entries !\n",
 		       lmb_type_name(type), type->max, type->max * 2);
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 05/35] x86, lmb: Add lmb_find_area_size()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (3 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 04/35] lmb: Add lmb_find_area() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:20   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 06/35] bootmem, x86: Add weak version of reserve_bootmem_generic Yinghai Lu
                   ` (29 subsequent siblings)
  34 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

size is returned according free range.
Will be used to find free ranges for early_memtest and memory corruption check

Do not mess it up with mm/lmb.c yet.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/lmb.h |    8 ++++
 arch/x86/mm/Makefile       |    2 +
 arch/x86/mm/lmb.c          |   88 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/lmb.h
 create mode 100644 arch/x86/mm/lmb.c

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
new file mode 100644
index 0000000..aa3a66e
--- /dev/null
+++ b/arch/x86/include/asm/lmb.h
@@ -0,0 +1,8 @@
+#ifndef _X86_LMB_H
+#define _X86_LMB_H
+
+#define ARCH_DISCARD_LMB
+
+u64 lmb_find_area_size(u64 start, u64 *sizep, u64 align);
+
+#endif
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index a4c7683..8ab0505 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -26,4 +26,6 @@ obj-$(CONFIG_NUMA)		+= numa.o numa_$(BITS).o
 obj-$(CONFIG_K8_NUMA)		+= k8topology_64.o
 obj-$(CONFIG_ACPI_NUMA)		+= srat_$(BITS).o
 
+obj-$(CONFIG_HAVE_LMB)		+= lmb.o
+
 obj-$(CONFIG_MEMTEST)		+= memtest.o
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
new file mode 100644
index 0000000..9d26eed
--- /dev/null
+++ b/arch/x86/mm/lmb.c
@@ -0,0 +1,88 @@
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/bitops.h>
+#include <linux/lmb.h>
+#include <linux/bootmem.h>
+#include <linux/mm.h>
+#include <linux/range.h>
+
+/* Check for already reserved areas */
+static inline bool __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
+{
+	int i;
+	u64 addr = *addrp, last;
+	u64 size = *sizep;
+	bool changed = false;
+again:
+	last = addr + size;
+	for (i = 0; i < lmb.reserved.cnt && lmb.reserved.regions[i].size; i++) {
+		struct lmb_region *r = &lmb.reserved.regions[i];
+		if (last > r->base && addr < r->base) {
+			size = r->base - addr;
+			changed = true;
+			goto again;
+		}
+		if (last > (r->base + r->size) && addr < (r->base + r->size)) {
+			addr = round_up(r->base + r->size, align);
+			size = last - addr;
+			changed = true;
+			goto again;
+		}
+		if (last <= (r->base + r->size) && addr >= r->base) {
+			(*sizep)++;
+			return false;
+		}
+	}
+	if (changed) {
+		*addrp = addr;
+		*sizep = size;
+	}
+	return changed;
+}
+
+static u64 __init __lmb_find_area_size(u64 ei_start, u64 ei_last, u64 start,
+			 u64 *sizep, u64 align)
+{
+	u64 addr, last;
+
+	addr = round_up(ei_start, align);
+	if (addr < start)
+		addr = round_up(start, align);
+	if (addr >= ei_last)
+		goto out;
+	*sizep = ei_last - addr;
+	while (bad_addr_size(&addr, sizep, align) && addr + *sizep <= ei_last)
+		;
+	last = addr + *sizep;
+	if (last > ei_last)
+		goto out;
+
+	return addr;
+
+out:
+	return LMB_ERROR;
+}
+
+/*
+ * Find next free range after *start
+ */
+u64 __init lmb_find_area_size(u64 start, u64 *sizep, u64 align)
+{
+	int i;
+
+	for (i = 0; i < lmb.memory.cnt; i++) {
+		u64 ei_start = lmb.memory.regions[i].base;
+		u64 ei_last = ei_start + lmb.memory.regions[i].size;
+		u64 addr;
+
+		addr = __lmb_find_area_size(ei_start, ei_last, start,
+					 sizep, align);
+
+		if (addr != LMB_ERROR)
+			return addr;
+	}
+
+	return LMB_ERROR;
+}
+
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 06/35] bootmem, x86: Add weak version of reserve_bootmem_generic
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (4 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 05/35] x86, lmb: Add lmb_find_area_size() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 07/35] x86, lmb: Add lmb_to_bootmem() Yinghai Lu
                   ` (28 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

It will be used lmb_to_bootmem converting

It is an wrapper for reserve_bootmem, and x86 64bit is using special one.

Also clean up that version for x86_64. We don't need to take care of numa
path for that, bootmem can handle it how

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_32.c |    6 ------
 arch/x86/mm/init_64.c |   20 ++------------------
 mm/bootmem.c          |    6 ++++++
 3 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bca7909..90e0545 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -1069,9 +1069,3 @@ void mark_rodata_ro(void)
 #endif
 }
 #endif
-
-int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
-				   int flags)
-{
-	return reserve_bootmem(phys, len, flags);
-}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ee41bba..634fa08 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -799,13 +799,10 @@ void mark_rodata_ro(void)
 
 #endif
 
+#ifndef CONFIG_NO_BOOTMEM
 int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
 				   int flags)
 {
-#ifdef CONFIG_NUMA
-	int nid, next_nid;
-	int ret;
-#endif
 	unsigned long pfn = phys >> PAGE_SHIFT;
 
 	if (pfn >= max_pfn) {
@@ -821,21 +818,7 @@ int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
 		return -EFAULT;
 	}
 
-	/* Should check here against the e820 map to avoid double free */
-#ifdef CONFIG_NUMA
-	nid = phys_to_nid(phys);
-	next_nid = phys_to_nid(phys + len - 1);
-	if (nid == next_nid)
-		ret = reserve_bootmem_node(NODE_DATA(nid), phys, len, flags);
-	else
-		ret = reserve_bootmem(phys, len, flags);
-
-	if (ret != 0)
-		return ret;
-
-#else
 	reserve_bootmem(phys, len, flags);
-#endif
 
 	if (phys+len <= MAX_DMA_PFN*PAGE_SIZE) {
 		dma_reserve += len / PAGE_SIZE;
@@ -844,6 +827,7 @@ int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
 
 	return 0;
 }
+#endif
 
 int kern_addr_valid(unsigned long addr)
 {
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 58c66cc..ee31b95 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -526,6 +526,12 @@ int __init reserve_bootmem(unsigned long addr, unsigned long size,
 }
 
 #ifndef CONFIG_NO_BOOTMEM
+int __weak __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
+				   int flags)
+{
+	return reserve_bootmem(phys, len, flags);
+}
+
 static unsigned long __init align_idx(struct bootmem_data *bdata,
 				      unsigned long idx, unsigned long step)
 {
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 07/35] x86, lmb: Add lmb_to_bootmem()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (5 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 06/35] bootmem, x86: Add weak version of reserve_bootmem_generic Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 08/35] x86,lmb: Add lmb_reserve_area/lmb_free_area Yinghai Lu
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

lmb_to_bootmem() will reserve lmb.reserved.region in bootmem after bootmem is
set up.

We can use it to with all arches that support lmb later.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/lmb.h |    1 +
 arch/x86/mm/lmb.c          |   31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index aa3a66e..2f06714 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -4,5 +4,6 @@
 #define ARCH_DISCARD_LMB
 
 u64 lmb_find_area_size(u64 start, u64 *sizep, u64 align);
+void lmb_to_bootmem(u64 start, u64 end);
 
 #endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 9d26eed..37a05e2 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -86,3 +86,34 @@ u64 __init lmb_find_area_size(u64 start, u64 *sizep, u64 align)
 	return LMB_ERROR;
 }
 
+#ifndef CONFIG_NO_BOOTMEM
+void __init lmb_to_bootmem(u64 start, u64 end)
+{
+	int i, count;
+	u64 final_start, final_end;
+
+	/* Take out region array itself */
+	if (lmb.reserved.regions != lmb_reserved_init_regions)
+		lmb_free(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
+
+	count  = lmb.reserved.cnt;
+	pr_info("(%d early reservations) ==> bootmem [%010llx - %010llx]\n", count, start, end);
+	for (i = 0; i < count; i++) {
+		struct lmb_region *r = &lmb.reserved.regions[i];
+		pr_info("  #%d [%010llx - %010llx] ", i, (u64)r->base, (u64)r->base + r->size);
+		final_start = max(start, r->base);
+		final_end = min(end, r->base + r->size);
+		if (final_start >= final_end) {
+			pr_cont("\n");
+			continue;
+		}
+		pr_cont(" ==> [%010llx - %010llx]\n", final_start, final_end);
+		reserve_bootmem_generic(final_start, final_end - final_start, BOOTMEM_DEFAULT);
+	}
+	/* Clear them to avoid misusing ? */
+	memset(&lmb.reserved.regions[0], 0, sizeof(struct lmb_region) * lmb.reserved.max);
+	lmb.reserved.regions = NULL;
+	lmb.reserved.max = 0;
+	lmb.reserved.cnt = 0;
+}
+#endif
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 08/35] x86,lmb: Add lmb_reserve_area/lmb_free_area
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (6 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 07/35] x86, lmb: Add lmb_to_bootmem() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:26   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 09/35] x86, lmb: Add get_free_all_memory_range() Yinghai Lu
                   ` (26 subsequent siblings)
  34 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

they are wrappers for core versions.

they are taking start/end/name instead of base/size.

could add more debug print out

-v2: change get_max_mapped() to lmb.default_alloc_limit according to Michael
      Ellerman and Ben
     change to lmb_reserve_area and lmb_free_area according to Michael Ellerman
-v3: call check_and_double after reserve/free, so could avoid to use
      find_lmb_area. Suggested by Michael Ellerman

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/lmb.h |    4 ++++
 arch/x86/mm/lmb.c          |   27 +++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index 2f06714..bc85c1e 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -6,4 +6,8 @@
 u64 lmb_find_area_size(u64 start, u64 *sizep, u64 align);
 void lmb_to_bootmem(u64 start, u64 end);
 
+void lmb_reserve_area(u64 start, u64 end, char *name);
+void lmb_free_area(u64 start, u64 end);
+void lmb_add_memory(u64 start, u64 end);
+
 #endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 37a05e2..0dbe05b 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -117,3 +117,30 @@ void __init lmb_to_bootmem(u64 start, u64 end)
 	lmb.reserved.cnt = 0;
 }
 #endif
+
+void __init lmb_add_memory(u64 start, u64 end)
+{
+	lmb_add_region(&lmb.memory, start, end - start);
+}
+
+void __init lmb_reserve_area(u64 start, u64 end, char *name)
+{
+	if (start == end)
+		return;
+
+	if (WARN_ONCE(start > end, "lmb_reserve_area: wrong range [%#llx, %#llx]\n", start, end))
+		return;
+
+	lmb_add_region(&lmb.reserved, start, end - start);
+}
+
+void __init lmb_free_area(u64 start, u64 end)
+{
+	if (start == end)
+		return;
+
+	if (WARN_ONCE(start > end, "lmb_free_area: wrong range [%#llx, %#llx]\n", start, end))
+		return;
+
+	lmb_free(start, end - start);
+}
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 09/35] x86, lmb: Add get_free_all_memory_range()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (7 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 08/35] x86,lmb: Add lmb_reserve_area/lmb_free_area Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 10/35] x86, lmb: Add lmb_register_active_regions() and lmb_hole_size() Yinghai Lu
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu,
	Jan Beulich

get_free_all_memory_range is for CONFIG_NO_BOOTMEM=y, and will be called by
free_all_memory_core_early().

It will use early_node_map aka active ranges subtract lmb.reserved to
get all free range, and those ranges will convert to slab pages.

-v4: increase range size

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jan Beulich <jbeulich@novell.com>
---
 arch/x86/include/asm/lmb.h |    2 +
 arch/x86/mm/lmb.c          |  110 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 111 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index bc85c1e..80e9e4c 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -9,5 +9,7 @@ void lmb_to_bootmem(u64 start, u64 end);
 void lmb_reserve_area(u64 start, u64 end, char *name);
 void lmb_free_area(u64 start, u64 end);
 void lmb_add_memory(u64 start, u64 end);
+struct range;
+int get_free_all_memory_range(struct range **rangep, int nodeid);
 
 #endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 0dbe05b..65a9356 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -86,7 +86,115 @@ u64 __init lmb_find_area_size(u64 start, u64 *sizep, u64 align)
 	return LMB_ERROR;
 }
 
-#ifndef CONFIG_NO_BOOTMEM
+static __init struct range *find_range_array(int count)
+{
+	u64 end, size, mem;
+	struct range *range;
+
+	size = sizeof(struct range) * count;
+	end = lmb.current_limit;
+
+	mem = lmb_find_area(0, end, size, sizeof(struct range));
+	if (mem == -1ULL)
+		panic("can not find more space for range array");
+
+	/*
+	 * This range is tempoaray, so don't reserve it, it will not be
+	 * overlapped because We will not alloccate new buffer before
+	 * We discard this one
+	 */
+	range = __va(mem);
+	memset(range, 0, size);
+
+	return range;
+}
+
+#ifdef CONFIG_NO_BOOTMEM
+static void __init subtract_lmb_reserved(struct range *range, int az)
+{
+	int i, count;
+	u64 final_start, final_end;
+
+	/* Take out region array itself at first*/
+	if (lmb.reserved.regions != lmb_reserved_init_regions)
+		lmb_free(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
+
+	count  = lmb.reserved.cnt;
+
+	pr_info("Subtract (%d early reservations)\n", count);
+
+	for (i = 0; i < count; i++) {
+		struct lmb_region *r = &lmb.reserved.regions[i];
+		pr_info("  #%d [%010llx - %010llx]\n", i, (u64)r->base, (u64)r->base + r->size);
+		final_start = PFN_DOWN(r->base);
+		final_end = PFN_UP(r->base + r->size);
+		if (final_start >= final_end)
+			continue;
+		subtract_range(range, az, final_start, final_end);
+	}
+	/* Put region array back ? */
+	if (lmb.reserved.regions != lmb_reserved_init_regions)
+		lmb_reserve(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
+}
+
+struct count_data {
+	int nr;
+};
+
+static int __init count_work_fn(unsigned long start_pfn,
+				unsigned long end_pfn, void *datax)
+{
+	struct count_data *data = datax;
+
+	data->nr++;
+
+	return 0;
+}
+
+static int __init count_early_node_map(int nodeid)
+{
+	struct count_data data;
+
+	data.nr = 0;
+	work_with_active_regions(nodeid, count_work_fn, &data);
+
+	return data.nr;
+}
+
+int __init get_free_all_memory_range(struct range **rangep, int nodeid)
+{
+	int count;
+	struct range *range;
+	int nr_range;
+
+	count = (lmb.reserved.cnt + count_early_node_map(nodeid)) * 2;
+
+	range = find_range_array(count);
+	nr_range = 0;
+
+	/*
+	 * Use early_node_map[] and lmb.reserved.region to get range array
+	 * at first
+	 */
+	nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
+#ifdef CONFIG_X86_32
+	subtract_range(range, count, max_low_pfn, -1ULL);
+#endif
+	subtract_lmb_reserved(range, count);
+	nr_range = clean_sort_range(range, count);
+
+	/* Need to clear it ? */
+	if (nodeid == MAX_NUMNODES) {
+		memset(&lmb.reserved.regions[0], 0, sizeof(struct lmb_region) * lmb.reserved.max);
+		lmb.reserved.regions = NULL;
+		lmb.reserved.max = 0;
+		lmb.reserved.cnt = 0;
+	}
+
+	*rangep = range;
+	return nr_range;
+}
+#else
 void __init lmb_to_bootmem(u64 start, u64 end)
 {
 	int i, count;
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 10/35] x86, lmb: Add lmb_register_active_regions() and lmb_hole_size()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (8 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 09/35] x86, lmb: Add get_free_all_memory_range() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 11/35] lmb: Add find_memory_core_early() Yinghai Lu
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

lmb_register_active_regions() will be used to fill early_node_map,
the result will be lmb.memory.region AND numa data

lmb_hole_size will be used to find hole size on lmb.memory.region
with specified range.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/lmb.h |    4 ++
 arch/x86/mm/lmb.c          |   69 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index 80e9e4c..d0ae19f 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -12,4 +12,8 @@ void lmb_add_memory(u64 start, u64 end);
 struct range;
 int get_free_all_memory_range(struct range **rangep, int nodeid);
 
+void lmb_register_active_regions(int nid, unsigned long start_pfn,
+					 unsigned long last_pfn);
+u64 lmb_hole_size(u64 start, u64 end);
+
 #endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 65a9356..faa9ce6 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -252,3 +252,72 @@ void __init lmb_free_area(u64 start, u64 end)
 
 	lmb_free(start, end - start);
 }
+
+/*
+ * Finds an active region in the address range from start_pfn to last_pfn and
+ * returns its range in ei_startpfn and ei_endpfn for the lmb entry.
+ */
+static int __init lmb_find_active_region(const struct lmb_region *ei,
+				  unsigned long start_pfn,
+				  unsigned long last_pfn,
+				  unsigned long *ei_startpfn,
+				  unsigned long *ei_endpfn)
+{
+	u64 align = PAGE_SIZE;
+
+	*ei_startpfn = round_up(ei->base, align) >> PAGE_SHIFT;
+	*ei_endpfn = round_down(ei->base + ei->size, align) >> PAGE_SHIFT;
+
+	/* Skip map entries smaller than a page */
+	if (*ei_startpfn >= *ei_endpfn)
+		return 0;
+
+	/* Skip if map is outside the node */
+	if (*ei_endpfn <= start_pfn || *ei_startpfn >= last_pfn)
+		return 0;
+
+	/* Check for overlaps */
+	if (*ei_startpfn < start_pfn)
+		*ei_startpfn = start_pfn;
+	if (*ei_endpfn > last_pfn)
+		*ei_endpfn = last_pfn;
+
+	return 1;
+}
+
+/* Walk the lmb.memory map and register active regions within a node */
+void __init lmb_register_active_regions(int nid, unsigned long start_pfn,
+					 unsigned long last_pfn)
+{
+	unsigned long ei_startpfn;
+	unsigned long ei_endpfn;
+	int i;
+
+	for (i = 0; i < lmb.memory.cnt; i++)
+		if (lmb_find_active_region(&lmb.memory.regions[i],
+					    start_pfn, last_pfn,
+					    &ei_startpfn, &ei_endpfn))
+			add_active_range(nid, ei_startpfn, ei_endpfn);
+}
+
+/*
+ * Find the hole size (in bytes) in the memory range.
+ * @start: starting address of the memory range to scan
+ * @end: ending address of the memory range to scan
+ */
+u64 __init lmb_hole_size(u64 start, u64 end)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long last_pfn = end >> PAGE_SHIFT;
+	unsigned long ei_startpfn, ei_endpfn, ram = 0;
+	int i;
+
+	for (i = 0; i < lmb.memory.cnt; i++) {
+		if (lmb_find_active_region(&lmb.memory.regions[i],
+					    start_pfn, last_pfn,
+					    &ei_startpfn, &ei_endpfn))
+			ram += ei_endpfn - ei_startpfn;
+	}
+	return end - start - ((u64)ram << PAGE_SHIFT);
+}
+
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 11/35] lmb: Add find_memory_core_early()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (9 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 10/35] x86, lmb: Add lmb_register_active_regions() and lmb_hole_size() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:29   ` Benjamin Herrenschmidt
  2010-05-14  2:30   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 12/35] x86, lmb: Add lmb_find_area_node() Yinghai Lu
                   ` (23 subsequent siblings)
  34 siblings, 2 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

According to node range in early_node_map[] with __lmb_find_area
to find free range.

Will be used by lmb_find_area_node()

lmb_find_area_node will be used to find right buffer for NODE_DATA

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 include/linux/mm.h |    2 ++
 mm/page_alloc.c    |   29 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fb19bb9..7774e1d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1159,6 +1159,8 @@ extern void free_bootmem_with_active_regions(int nid,
 						unsigned long max_low_pfn);
 int add_from_early_node_map(struct range *range, int az,
 				   int nr_range, int nid);
+u64 __init find_memory_core_early(int nid, u64 size, u64 align,
+					u64 goal, u64 limit);
 void *__alloc_memory_core_early(int nodeid, u64 size, u64 align,
 				 u64 goal, u64 limit);
 typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d03c946..72afd94 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -21,6 +21,7 @@
 #include <linux/pagemap.h>
 #include <linux/jiffies.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/compiler.h>
 #include <linux/kernel.h>
 #include <linux/kmemcheck.h>
@@ -3393,6 +3394,34 @@ void __init free_bootmem_with_active_regions(int nid,
 	}
 }
 
+#ifdef CONFIG_HAVE_LMB
+u64 __init find_memory_core_early(int nid, u64 size, u64 align,
+					u64 goal, u64 limit)
+{
+	int i;
+
+	/* Need to go over early_node_map to find out good range for node */
+	for_each_active_range_index_in_nid(i, nid) {
+		u64 addr;
+		u64 ei_start, ei_last;
+
+		ei_last = early_node_map[i].end_pfn;
+		ei_last <<= PAGE_SHIFT;
+		ei_start = early_node_map[i].start_pfn;
+		ei_start <<= PAGE_SHIFT;
+		addr = __lmb_find_area(ei_start, ei_last,
+					 goal, limit, size, align);
+
+		if (addr == LMB_ERROR)
+			continue;
+
+		return addr;
+	}
+
+	return -1ULL;
+}
+#endif
+
 int __init add_from_early_node_map(struct range *range, int az,
 				   int nr_range, int nid)
 {
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 12/35] x86, lmb: Add lmb_find_area_node()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (10 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 11/35] lmb: Add find_memory_core_early() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 13/35] x86, lmb: Add lmb_free_memory_size() Yinghai Lu
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

It can be used to find NODE_DATA for numa.

Need to make sure early_node_map[] is filled before it is called, otherwise
it will fallback to lmb_find_area(), with node range.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/lmb.h |    1 +
 arch/x86/mm/lmb.c          |   15 +++++++++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index d0ae19f..358d8a6 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -15,5 +15,6 @@ int get_free_all_memory_range(struct range **rangep, int nodeid);
 void lmb_register_active_regions(int nid, unsigned long start_pfn,
 					 unsigned long last_pfn);
 u64 lmb_hole_size(u64 start, u64 end);
+u64 lmb_find_area_node(int nid, u64 start, u64 end, u64 size, u64 align);
 
 #endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index faa9ce6..c5fa1dd 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -254,6 +254,21 @@ void __init lmb_free_area(u64 start, u64 end)
 }
 
 /*
+ * Need to call this function after lmb_register_active_regions,
+ * so early_node_map[] is filled already.
+ */
+u64 __init lmb_find_area_node(int nid, u64 start, u64 end, u64 size, u64 align)
+{
+	u64 addr;
+	addr = find_memory_core_early(nid, size, align, start, end);
+	if (addr != LMB_ERROR)
+		return addr;
+
+	/* Fallback, should already have start end within node range */
+	return lmb_find_area(start, end, size, align);
+}
+
+/*
  * Finds an active region in the address range from start_pfn to last_pfn and
  * returns its range in ei_startpfn and ei_endpfn for the lmb entry.
  */
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 13/35] x86, lmb: Add lmb_free_memory_size()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (11 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 12/35] x86, lmb: Add lmb_find_area_node() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:31   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 14/35] x86, lmb: Add lmb_memory_size() Yinghai Lu
                   ` (21 subsequent siblings)
  34 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

It will return free memory size in specified range.

We can not use memory_size - reserved_size here, because some reserved area
may not be in the scope of lmb.memory.region.

Use lmb.memory.region subtracting lmb.reserved.region to get free range array.
then count size of all free ranges.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/lmb.h |    1 +
 arch/x86/mm/lmb.c          |   51 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index 358d8a6..4fb94b5 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -16,5 +16,6 @@ void lmb_register_active_regions(int nid, unsigned long start_pfn,
 					 unsigned long last_pfn);
 u64 lmb_hole_size(u64 start, u64 end);
 u64 lmb_find_area_node(int nid, u64 start, u64 end, u64 size, u64 align);
+u64 lmb_free_memory_size(u64 addr, u64 limit);
 
 #endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index c5fa1dd..6c69e99 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -226,6 +226,57 @@ void __init lmb_to_bootmem(u64 start, u64 end)
 }
 #endif
 
+u64 __init lmb_free_memory_size(u64 addr, u64 limit)
+{
+	int i, count;
+	struct range *range;
+	int nr_range;
+	u64 final_start, final_end;
+	u64 free_size;
+
+	count = (lmb.reserved.cnt + lmb.memory.cnt) * 2;
+
+	range = find_range_array(count);
+	nr_range = 0;
+
+	addr = PFN_UP(addr);
+	limit = PFN_DOWN(limit);
+
+	for (i = 0; i < lmb.memory.cnt; i++) {
+		struct lmb_region *r = &lmb.memory.regions[i];
+
+		final_start = PFN_UP(r->base);
+		final_end = PFN_DOWN(r->base + r->size);
+		if (final_start >= final_end)
+			continue;
+		if (final_start >= limit || final_end <= addr)
+			continue;
+
+		nr_range = add_range(range, count, nr_range, final_start, final_end);
+	}
+	subtract_range(range, count, 0, addr);
+	subtract_range(range, count, limit, -1ULL);
+	for (i = 0; i < lmb.reserved.cnt; i++) {
+		struct lmb_region *r = &lmb.reserved.regions[i];
+
+		final_start = PFN_DOWN(r->base);
+		final_end = PFN_UP(r->base + r->size);
+		if (final_start >= final_end)
+			continue;
+		if (final_start >= limit || final_end <= addr)
+			continue;
+
+		subtract_range(range, count, final_start, final_end);
+	}
+	nr_range = clean_sort_range(range, count);
+
+	free_size = 0;
+	for (i = 0; i < nr_range; i++)
+		free_size += range[i].end - range[i].start;
+
+	return free_size << PAGE_SHIFT;
+}
+
 void __init lmb_add_memory(u64 start, u64 end)
 {
 	lmb_add_region(&lmb.memory, start, end - start);
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 14/35] x86, lmb: Add lmb_memory_size()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (12 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 13/35] x86, lmb: Add lmb_free_memory_size() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:31   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok() Yinghai Lu
                   ` (20 subsequent siblings)
  34 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

It will return memory size in specified range according to lmb.memory.region

Try to share some code with lmb_free_memory_size() by passing get_free to
__lmb_memory_size().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/lmb.h |    1 +
 arch/x86/mm/lmb.c          |   18 +++++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index 4fb94b5..dd42ac1 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -17,5 +17,6 @@ void lmb_register_active_regions(int nid, unsigned long start_pfn,
 u64 lmb_hole_size(u64 start, u64 end);
 u64 lmb_find_area_node(int nid, u64 start, u64 end, u64 size, u64 align);
 u64 lmb_free_memory_size(u64 addr, u64 limit);
+u64 lmb_memory_size(u64 addr, u64 limit);
 
 #endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 6c69e99..19a5f49 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -226,7 +226,7 @@ void __init lmb_to_bootmem(u64 start, u64 end)
 }
 #endif
 
-u64 __init lmb_free_memory_size(u64 addr, u64 limit)
+static u64 __init __lmb_memory_size(u64 addr, u64 limit, bool get_free)
 {
 	int i, count;
 	struct range *range;
@@ -256,6 +256,10 @@ u64 __init lmb_free_memory_size(u64 addr, u64 limit)
 	}
 	subtract_range(range, count, 0, addr);
 	subtract_range(range, count, limit, -1ULL);
+
+	/* Subtract lmb.reserved.region in range ? */
+	if (!get_free)
+		goto sort_and_count_them;
 	for (i = 0; i < lmb.reserved.cnt; i++) {
 		struct lmb_region *r = &lmb.reserved.regions[i];
 
@@ -268,6 +272,8 @@ u64 __init lmb_free_memory_size(u64 addr, u64 limit)
 
 		subtract_range(range, count, final_start, final_end);
 	}
+
+sort_and_count_them:
 	nr_range = clean_sort_range(range, count);
 
 	free_size = 0;
@@ -277,6 +283,16 @@ u64 __init lmb_free_memory_size(u64 addr, u64 limit)
 	return free_size << PAGE_SHIFT;
 }
 
+u64 __init lmb_free_memory_size(u64 addr, u64 limit)
+{
+	return __lmb_memory_size(addr, limit, true);
+}
+
+u64 __init lmb_memory_size(u64 addr, u64 limit)
+{
+	return __lmb_memory_size(addr, limit, false);
+}
+
 void __init lmb_add_memory(u64 start, u64 end)
 {
 	lmb_add_region(&lmb.memory, start, end - start);
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (13 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 14/35] x86, lmb: Add lmb_memory_size() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:32   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 16/35] x86, lmb: Use lmb_debug to control debug message print out Yinghai Lu
                   ` (19 subsequent siblings)
  34 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

Some areas from firmware could be reserved several times from different callers.

If these area are overlapped, We may have overlapped entries in lmb.reserved.

Try to free the area at first, before rerserve them again.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/lmb.h |    1 +
 arch/x86/mm/lmb.c          |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index dd42ac1..9329e09 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -7,6 +7,7 @@ u64 lmb_find_area_size(u64 start, u64 *sizep, u64 align);
 void lmb_to_bootmem(u64 start, u64 end);
 
 void lmb_reserve_area(u64 start, u64 end, char *name);
+void lmb_reserve_area_overlap_ok(u64 start, u64 end, char *name);
 void lmb_free_area(u64 start, u64 end);
 void lmb_add_memory(u64 start, u64 end);
 struct range;
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 19a5f49..1100c18 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -309,6 +309,24 @@ void __init lmb_reserve_area(u64 start, u64 end, char *name)
 	lmb_add_region(&lmb.reserved, start, end - start);
 }
 
+/*
+ * Could be used to avoid having overlap entries in lmb.reserved.region.
+ *  Don't need to use it with area that is from lmb_find_area()
+ *  Only use it for the area that fw hidden area.
+ */
+void __init lmb_reserve_area_overlap_ok(u64 start, u64 end, char *name)
+{
+	if (start == end)
+		return;
+
+	if (WARN_ONCE(start > end, "lmb_reserve_area_overlap_ok: wrong range [%#llx, %#llx]\n", start, end))
+		return;
+
+	/* Free that region at first */
+	lmb_free(start, end - start);
+	lmb_add_region(&lmb.reserved, start, end - start);
+}
+
 void __init lmb_free_area(u64 start, u64 end)
 {
 	if (start == end)
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 16/35] x86, lmb: Use lmb_debug to control debug message print out
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (14 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 17/35] x86, lmb: Add x86 version of __lmb_find_area() Yinghai Lu
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

Also let lmb_reserve_area/lmb_free_area could print out name if lmb=debug is
specified

will also print ther name when reserve_lmb_area/free_lmb_area are called.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/lmb.c |   26 ++++++++++++++++++++------
 1 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 1100c18..c0c4220 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -121,11 +121,13 @@ static void __init subtract_lmb_reserved(struct range *range, int az)
 
 	count  = lmb.reserved.cnt;
 
-	pr_info("Subtract (%d early reservations)\n", count);
+	if (lmb_debug)
+		pr_info("Subtract (%d early reservations)\n", count);
 
 	for (i = 0; i < count; i++) {
 		struct lmb_region *r = &lmb.reserved.regions[i];
-		pr_info("  #%d [%010llx - %010llx]\n", i, (u64)r->base, (u64)r->base + r->size);
+		if (lmb_debug)
+			pr_info("  #%d [%010llx - %010llx]\n", i, (u64)r->base, (u64)r->base + r->size);
 		final_start = PFN_DOWN(r->base);
 		final_end = PFN_UP(r->base + r->size);
 		if (final_start >= final_end)
@@ -205,17 +207,21 @@ void __init lmb_to_bootmem(u64 start, u64 end)
 		lmb_free(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
 
 	count  = lmb.reserved.cnt;
-	pr_info("(%d early reservations) ==> bootmem [%010llx - %010llx]\n", count, start, end);
+	if (lmb_debug)
+		pr_info("(%d early reservations) ==> bootmem [%010llx - %010llx]\n", count, start, end);
 	for (i = 0; i < count; i++) {
 		struct lmb_region *r = &lmb.reserved.regions[i];
-		pr_info("  #%d [%010llx - %010llx] ", i, (u64)r->base, (u64)r->base + r->size);
+		if (lmb_debug)
+			pr_info("  #%d [%010llx - %010llx] ", i, (u64)r->base, (u64)r->base + r->size);
 		final_start = max(start, r->base);
 		final_end = min(end, r->base + r->size);
 		if (final_start >= final_end) {
-			pr_cont("\n");
+			if (lmb_debug)
+				pr_cont("\n");
 			continue;
 		}
-		pr_cont(" ==> [%010llx - %010llx]\n", final_start, final_end);
+		if (lmb_debug)
+			pr_cont(" ==> [%010llx - %010llx]\n", final_start, final_end);
 		reserve_bootmem_generic(final_start, final_end - final_start, BOOTMEM_DEFAULT);
 	}
 	/* Clear them to avoid misusing ? */
@@ -306,6 +312,9 @@ void __init lmb_reserve_area(u64 start, u64 end, char *name)
 	if (WARN_ONCE(start > end, "lmb_reserve_area: wrong range [%#llx, %#llx]\n", start, end))
 		return;
 
+	if (lmb_debug)
+		pr_info("    lmb_reserve_area: [%010llx, %010llx] %16s\n", start, end, name);
+
 	lmb_add_region(&lmb.reserved, start, end - start);
 }
 
@@ -322,6 +331,8 @@ void __init lmb_reserve_area_overlap_ok(u64 start, u64 end, char *name)
 	if (WARN_ONCE(start > end, "lmb_reserve_area_overlap_ok: wrong range [%#llx, %#llx]\n", start, end))
 		return;
 
+	if (lmb_debug)
+		pr_info("    lmb_reserve_area_overlap_ok: [%010llx, %010llx] %16s\n", start, end, name);
 	/* Free that region at first */
 	lmb_free(start, end - start);
 	lmb_add_region(&lmb.reserved, start, end - start);
@@ -335,6 +346,9 @@ void __init lmb_free_area(u64 start, u64 end)
 	if (WARN_ONCE(start > end, "lmb_free_area: wrong range [%#llx, %#llx]\n", start, end))
 		return;
 
+	if (lmb_debug)
+		pr_info("       lmb_free_area: [%010llx, %010llx]\n", start, end);
+
 	lmb_free(start, end - start);
 }
 
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 17/35] x86, lmb: Add x86 version of __lmb_find_area()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (15 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 16/35] x86, lmb: Use lmb_debug to control debug message print out Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:34   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 18/35] x86: Use lmb to replace early_res Yinghai Lu
                   ` (17 subsequent siblings)
  34 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

Generic version is going from high to low, and it seems it can not find
right area compact enough.

the x86 version will go from goal to limit and just like the way We used
for early_res

use ARCH_FIND_LMB_AREA to select from them.

-v2: default to no

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/Kconfig  |    8 +++++
 arch/x86/mm/lmb.c |   78 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 86 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d80d2ab..36a5665 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -584,6 +584,14 @@ config PARAVIRT_DEBUG
 	  Enable to debug paravirt_ops internals.  Specifically, BUG if
 	  a paravirt_op is missing when it is called.
 
+config ARCH_LMB_FIND_AREA
+	default n
+	bool "Use x86 own lmb_find_area()"
+	---help---
+	  Use lmb_find_area() version instead of generic version, it get free
+	  area up from low.
+	  Generic one try to get free area down from limit.
+
 config NO_BOOTMEM
 	default y
 	bool "Disable Bootmem code"
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index c0c4220..cf9d488 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -435,3 +435,81 @@ u64 __init lmb_hole_size(u64 start, u64 end)
 	return end - start - ((u64)ram << PAGE_SHIFT);
 }
 
+#ifdef CONFIG_ARCH_LMB_FIND_AREA
+static int __init find_overlapped_early(u64 start, u64 end)
+{
+	int i;
+	struct lmb_region *r;
+
+	for (i = 0; i < lmb.reserved.cnt && lmb.reserved.regions[i].size; i++) {
+		r = &lmb.reserved.regions[i];
+		if (end > r->base && start < (r->base + r->size))
+			break;
+	}
+
+	return i;
+}
+
+/* Check for already reserved areas */
+static inline bool __init bad_addr(u64 *addrp, u64 size, u64 align)
+{
+	int i;
+	u64 addr = *addrp;
+	bool changed = false;
+	struct lmb_region *r;
+again:
+	i = find_overlapped_early(addr, addr + size);
+	r = &lmb.reserved.regions[i];
+	if (i < lmb.reserved.cnt && r->size) {
+		*addrp = addr = round_up(r->base + r->size, align);
+		changed = true;
+		goto again;
+	}
+	return changed;
+}
+
+u64 __init __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
+				 u64 size, u64 align)
+{
+	u64 addr, last;
+
+	addr = round_up(ei_start, align);
+	if (addr < start)
+		addr = round_up(start, align);
+	if (addr >= ei_last)
+		goto out;
+	while (bad_addr(&addr, size, align) && addr+size <= ei_last)
+		;
+	last = addr + size;
+	if (last > ei_last)
+		goto out;
+	if (last > end)
+		goto out;
+
+	return addr;
+
+out:
+	return LMB_ERROR;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range.
+ */
+u64 __init lmb_find_area(u64 start, u64 end, u64 size, u64 align)
+{
+	int i;
+
+	for (i = 0; i < lmb.memory.cnt; i++) {
+		u64 ei_start = lmb.memory.regions[i].base;
+		u64 ei_last = ei_start + lmb.memory.regions[i].size;
+		u64 addr;
+
+		addr = __lmb_find_area(ei_start, ei_last, start, end,
+					 size, align);
+
+		if (addr != LMB_ERROR)
+			return addr;
+	}
+	return LMB_ERROR;
+}
+#endif
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 18/35] x86: Use lmb to replace early_res
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (16 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 17/35] x86, lmb: Add x86 version of __lmb_find_area() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 19/35] x86: Replace e820_/_early string with lmb_ Yinghai Lu
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

1. replace find_e820_area with lmb_find_area
2. replace reserve_early with lmb_reserve_area
3. replace free_early with lmb_free_area.
4. NO_BOOTMEM will switch to use lmb too.
5. use _e820, _early wrap in the patch, in following patch, will
   replace them all
6. because lmb_free_area support partial free, we can remove some special care
7. Need to make sure that lmb_find_area() is called after fill_lmb_memory()
   so adjust some calling later in setup.c::setup_arch()
   -- corruption_check and mptable_update

-v2: Move reserve_brk() early
    Before fill_lmb_area, to avoid overlap between brk and lmb_find_area()
    that could happen We have more then 128 RAM entry in E820 tables, and
    fill_lmb_memory() could use lmb_find_area() to find a new place for
    lmb.memory.region array.
    and We don't need to use extend_brk() after fill_lmb_area()
    So move reserve_brk() early before fill_lmb_area().
-v3: Move find_smp_config early
    To make sure lmb_find_area not find wrong place, if BIOS doesn't put mptable
    in right place.
-v4: Treat RESERVED_KERN as RAM in lmb.memory. and they are already in
    lmb.reserved already..
    use __NOT_KEEP_LMB to make sure lmb related code could be freed later.
-v5: Generic version __lmb_find_area() is going from high to low, and for 32bit
    active_region for 32bit does include high pages
    need to replace the limit with lmb.default_alloc_limit, aka get_max_mapped()
-v6: use current_lmimit instead

Suggested-by: David S. Miller <davem@davemloft.net>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/Kconfig               |    9 +--
 arch/x86/include/asm/e820.h    |   15 +++--
 arch/x86/kernel/check.c        |   14 ++--
 arch/x86/kernel/e820.c         |  153 ++++++++++++----------------------------
 arch/x86/kernel/head.c         |    3 +-
 arch/x86/kernel/head32.c       |    6 +-
 arch/x86/kernel/head64.c       |    3 +
 arch/x86/kernel/mpparse.c      |    5 +-
 arch/x86/kernel/setup.c        |   46 +++++++++----
 arch/x86/kernel/setup_percpu.c |    6 --
 arch/x86/mm/numa_64.c          |    5 +-
 kernel/Makefile                |    1 -
 mm/bootmem.c                   |    1 +
 mm/page_alloc.c                |   36 +++-------
 mm/sparse-vmemmap.c            |   11 ---
 15 files changed, 123 insertions(+), 191 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 36a5665..72763d8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -27,6 +27,7 @@ config X86
 	select HAVE_PERF_EVENTS if (!M386 && !M486)
 	select HAVE_IOREMAP_PROT
 	select HAVE_KPROBES
+	select HAVE_LMB
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select ARCH_WANT_FRAME_POINTERS
 	select HAVE_DMA_ATTRS
@@ -193,9 +194,6 @@ config ARCH_SUPPORTS_OPTIMIZED_INLINING
 config ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	def_bool y
 
-config HAVE_EARLY_RES
-	def_bool y
-
 config HAVE_INTEL_TXT
 	def_bool y
 	depends on EXPERIMENTAL && DMAR && ACPI
@@ -596,14 +594,13 @@ config NO_BOOTMEM
 	default y
 	bool "Disable Bootmem code"
 	---help---
-	  Use early_res directly instead of bootmem before slab is ready.
+	  Use lmb directly instead of bootmem before slab is ready.
 		- allocator (buddy) [generic]
 		- early allocator (bootmem) [generic]
-		- very early allocator (reserve_early*()) [x86]
+		- very early allocator (lmb) [some generic]
 		- very very early allocator (early brk model) [x86]
 	  So reduce one layer between early allocator to final allocator
 
-
 config MEMTEST
 	bool "Memtest"
 	---help---
diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index ec8a52d..38adac8 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -117,24 +117,27 @@ extern unsigned long end_user_pfn;
 extern u64 find_e820_area(u64 start, u64 end, u64 size, u64 align);
 extern u64 find_e820_area_size(u64 start, u64 *sizep, u64 align);
 extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
-#include <linux/early_res.h>
 
 extern unsigned long e820_end_of_ram_pfn(void);
 extern unsigned long e820_end_of_low_ram_pfn(void);
-extern int e820_find_active_region(const struct e820entry *ei,
-				  unsigned long start_pfn,
-				  unsigned long last_pfn,
-				  unsigned long *ei_startpfn,
-				  unsigned long *ei_endpfn);
 extern void e820_register_active_regions(int nid, unsigned long start_pfn,
 					 unsigned long end_pfn);
 extern u64 e820_hole_size(u64 start, u64 end);
+
+extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
+
+void init_lmb_memory(void);
+void fill_lmb_memory(void);
+
 extern void finish_e820_parsing(void);
 extern void e820_reserve_resources(void);
 extern void e820_reserve_resources_late(void);
 extern void setup_memory_map(void);
 extern char *default_machine_specific_memory_setup(void);
 
+void reserve_early(u64 start, u64 end, char *name);
+void free_early(u64 start, u64 end);
+
 /*
  * Returns true iff the specified range [s,e) is completely contained inside
  * the ISA region.
diff --git a/arch/x86/kernel/check.c b/arch/x86/kernel/check.c
index fc999e6..fcb3f11 100644
--- a/arch/x86/kernel/check.c
+++ b/arch/x86/kernel/check.c
@@ -2,7 +2,8 @@
 #include <linux/sched.h>
 #include <linux/kthread.h>
 #include <linux/workqueue.h>
-#include <asm/e820.h>
+#include <linux/lmb.h>
+
 #include <asm/proto.h>
 
 /*
@@ -18,10 +19,12 @@ static int __read_mostly memory_corruption_check = -1;
 static unsigned __read_mostly corruption_check_size = 64*1024;
 static unsigned __read_mostly corruption_check_period = 60; /* seconds */
 
-static struct e820entry scan_areas[MAX_SCAN_AREAS];
+static struct scan_area {
+	u64 addr;
+	u64 size;
+} scan_areas[MAX_SCAN_AREAS];
 static int num_scan_areas;
 
-
 static __init int set_corruption_check(char *arg)
 {
 	char *end;
@@ -81,7 +84,7 @@ void __init setup_bios_corruption_check(void)
 
 	while (addr < corruption_check_size && num_scan_areas < MAX_SCAN_AREAS) {
 		u64 size;
-		addr = find_e820_area_size(addr, &size, PAGE_SIZE);
+		addr = lmb_find_area_size(addr, &size, PAGE_SIZE);
 
 		if (!(addr + 1))
 			break;
@@ -92,7 +95,7 @@ void __init setup_bios_corruption_check(void)
 		if ((addr + size) > corruption_check_size)
 			size = corruption_check_size - addr;
 
-		e820_update_range(addr, size, E820_RAM, E820_RESERVED);
+		lmb_reserve_area(addr, addr + size, "SCAN RAM");
 		scan_areas[num_scan_areas].addr = addr;
 		scan_areas[num_scan_areas].size = size;
 		num_scan_areas++;
@@ -105,7 +108,6 @@ void __init setup_bios_corruption_check(void)
 
 	printk(KERN_INFO "Scanning %d areas for low memory corruption\n",
 	       num_scan_areas);
-	update_e820();
 }
 
 
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 7bca3c6..a0eca94 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -15,6 +15,7 @@
 #include <linux/pfn.h>
 #include <linux/suspend.h>
 #include <linux/firmware-map.h>
+#include <linux/lmb.h>
 
 #include <asm/e820.h>
 #include <asm/proto.h>
@@ -742,69 +743,19 @@ core_initcall(e820_mark_nvs_memory);
  */
 u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
 {
-	int i;
-
-	for (i = 0; i < e820.nr_map; i++) {
-		struct e820entry *ei = &e820.map[i];
-		u64 addr;
-		u64 ei_start, ei_last;
-
-		if (ei->type != E820_RAM)
-			continue;
-
-		ei_last = ei->addr + ei->size;
-		ei_start = ei->addr;
-		addr = find_early_area(ei_start, ei_last, start, end,
-					 size, align);
-
-		if (addr != -1ULL)
-			return addr;
-	}
-	return -1ULL;
-}
-
-u64 __init find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align)
-{
-	return find_e820_area(start, end, size, align);
+	return lmb_find_area(start, end, size, align);
 }
 
-u64 __init get_max_mapped(void)
-{
-	u64 end = max_pfn_mapped;
-
-	end <<= PAGE_SHIFT;
-
-	return end;
-}
 /*
  * Find next free range after *start
  */
 u64 __init find_e820_area_size(u64 start, u64 *sizep, u64 align)
 {
-	int i;
-
-	for (i = 0; i < e820.nr_map; i++) {
-		struct e820entry *ei = &e820.map[i];
-		u64 addr;
-		u64 ei_start, ei_last;
-
-		if (ei->type != E820_RAM)
-			continue;
-
-		ei_last = ei->addr + ei->size;
-		ei_start = ei->addr;
-		addr = find_early_area_size(ei_start, ei_last, start,
-					 sizep, align);
-
-		if (addr != -1ULL)
-			return addr;
-	}
-
-	return -1ULL;
+	return lmb_find_area_size(start, sizep, align);
 }
 
 /*
- * pre allocated 4k and reserved it in e820
+ * pre allocated 4k and reserved it in lmb and e820_saved
  */
 u64 __init early_reserve_e820(u64 startt, u64 sizet, u64 align)
 {
@@ -813,7 +764,7 @@ u64 __init early_reserve_e820(u64 startt, u64 sizet, u64 align)
 	u64 start;
 
 	for (start = startt; ; start += size) {
-		start = find_e820_area_size(start, &size, align);
+		start = lmb_find_area_size(start, &size, align);
 		if (!(start + 1))
 			return 0;
 		if (size >= sizet)
@@ -830,10 +781,9 @@ u64 __init early_reserve_e820(u64 startt, u64 sizet, u64 align)
 	addr = round_down(start + size - sizet, align);
 	if (addr < start)
 		return 0;
-	e820_update_range(addr, sizet, E820_RAM, E820_RESERVED);
+	lmb_reserve_area(addr, addr + sizet, "new next");
 	e820_update_range_saved(addr, sizet, E820_RAM, E820_RESERVED);
-	printk(KERN_INFO "update e820 for early_reserve_e820\n");
-	update_e820();
+	printk(KERN_INFO "update e820_saved for early_reserve_e820\n");
 	update_e820_saved();
 
 	return addr;
@@ -895,52 +845,12 @@ unsigned long __init e820_end_of_low_ram_pfn(void)
 {
 	return e820_end_pfn(1UL<<(32 - PAGE_SHIFT), E820_RAM);
 }
-/*
- * Finds an active region in the address range from start_pfn to last_pfn and
- * returns its range in ei_startpfn and ei_endpfn for the e820 entry.
- */
-int __init e820_find_active_region(const struct e820entry *ei,
-				  unsigned long start_pfn,
-				  unsigned long last_pfn,
-				  unsigned long *ei_startpfn,
-				  unsigned long *ei_endpfn)
-{
-	u64 align = PAGE_SIZE;
-
-	*ei_startpfn = round_up(ei->addr, align) >> PAGE_SHIFT;
-	*ei_endpfn = round_down(ei->addr + ei->size, align) >> PAGE_SHIFT;
-
-	/* Skip map entries smaller than a page */
-	if (*ei_startpfn >= *ei_endpfn)
-		return 0;
-
-	/* Skip if map is outside the node */
-	if (ei->type != E820_RAM || *ei_endpfn <= start_pfn ||
-				    *ei_startpfn >= last_pfn)
-		return 0;
-
-	/* Check for overlaps */
-	if (*ei_startpfn < start_pfn)
-		*ei_startpfn = start_pfn;
-	if (*ei_endpfn > last_pfn)
-		*ei_endpfn = last_pfn;
-
-	return 1;
-}
 
 /* Walk the e820 map and register active regions within a node */
 void __init e820_register_active_regions(int nid, unsigned long start_pfn,
 					 unsigned long last_pfn)
 {
-	unsigned long ei_startpfn;
-	unsigned long ei_endpfn;
-	int i;
-
-	for (i = 0; i < e820.nr_map; i++)
-		if (e820_find_active_region(&e820.map[i],
-					    start_pfn, last_pfn,
-					    &ei_startpfn, &ei_endpfn))
-			add_active_range(nid, ei_startpfn, ei_endpfn);
+	lmb_register_active_regions(nid, start_pfn, last_pfn);
 }
 
 /*
@@ -950,18 +860,16 @@ void __init e820_register_active_regions(int nid, unsigned long start_pfn,
  */
 u64 __init e820_hole_size(u64 start, u64 end)
 {
-	unsigned long start_pfn = start >> PAGE_SHIFT;
-	unsigned long last_pfn = end >> PAGE_SHIFT;
-	unsigned long ei_startpfn, ei_endpfn, ram = 0;
-	int i;
+	return lmb_hole_size(start, end);
+}
 
-	for (i = 0; i < e820.nr_map; i++) {
-		if (e820_find_active_region(&e820.map[i],
-					    start_pfn, last_pfn,
-					    &ei_startpfn, &ei_endpfn))
-			ram += ei_endpfn - ei_startpfn;
-	}
-	return end - start - ((u64)ram << PAGE_SHIFT);
+void reserve_early(u64 start, u64 end, char *name)
+{
+	lmb_reserve_area(start, end, name);
+}
+void free_early(u64 start, u64 end)
+{
+	lmb_free_area(start, end);
 }
 
 static void early_panic(char *msg)
@@ -1210,3 +1118,30 @@ void __init setup_memory_map(void)
 	printk(KERN_INFO "BIOS-provided physical RAM map:\n");
 	e820_print_map(who);
 }
+
+void __init init_lmb_memory(void)
+{
+	lmb_init();
+}
+
+void __init fill_lmb_memory(void)
+{
+	int i;
+	u64 end;
+
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+
+		end = ei->addr + ei->size;
+                if (end != (resource_size_t)end)
+                        continue;
+
+		if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
+			continue;
+
+		lmb_add_memory(ei->addr, end);
+	}
+
+	lmb_analyze();
+	lmb_dump_all();
+}
diff --git a/arch/x86/kernel/head.c b/arch/x86/kernel/head.c
index 3e66bd3..5802287 100644
--- a/arch/x86/kernel/head.c
+++ b/arch/x86/kernel/head.c
@@ -1,5 +1,6 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
+#include <linux/lmb.h>
 
 #include <asm/setup.h>
 #include <asm/bios_ebda.h>
@@ -51,5 +52,5 @@ void __init reserve_ebda_region(void)
 		lowmem = 0x9f000;
 
 	/* reserve all memory between lowmem and the 1MB mark */
-	reserve_early_overlap_ok(lowmem, 0x100000, "BIOS reserved");
+	lmb_reserve_area_overlap_ok(lowmem, 0x100000, "BIOS reserved");
 }
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index b2e2460..ab3e366 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -8,6 +8,7 @@
 #include <linux/init.h>
 #include <linux/start_kernel.h>
 #include <linux/mm.h>
+#include <linux/lmb.h>
 
 #include <asm/setup.h>
 #include <asm/sections.h>
@@ -30,14 +31,15 @@ static void __init i386_default_early_setup(void)
 
 void __init i386_start_kernel(void)
 {
+	init_lmb_memory();
+
 #ifdef CONFIG_X86_TRAMPOLINE
 	/*
 	 * But first pinch a few for the stack/trampoline stuff
 	 * FIXME: Don't need the extra page at 4K, but need to fix
 	 * trampoline before removing it. (see the GDT stuff)
 	 */
-	reserve_early_overlap_ok(PAGE_SIZE, PAGE_SIZE + PAGE_SIZE,
-					 "EX TRAMPOLINE");
+	lmb_reserve_area(PAGE_SIZE, PAGE_SIZE + PAGE_SIZE, "EX TRAMPOLINE");
 #endif
 
 	reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 7147143..89dd2de 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -12,6 +12,7 @@
 #include <linux/percpu.h>
 #include <linux/start_kernel.h>
 #include <linux/io.h>
+#include <linux/lmb.h>
 
 #include <asm/processor.h>
 #include <asm/proto.h>
@@ -96,6 +97,8 @@ void __init x86_64_start_kernel(char * real_mode_data)
 
 void __init x86_64_start_reservations(char *real_mode_data)
 {
+	init_lmb_memory();
+
 	copy_bootdata(__va(real_mode_data));
 
 	reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index 5ae5d24..e5a7873 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -11,6 +11,7 @@
 #include <linux/init.h>
 #include <linux/delay.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/kernel_stat.h>
 #include <linux/mc146818rtc.h>
 #include <linux/bitops.h>
@@ -641,7 +642,7 @@ static void __init smp_reserve_memory(struct mpf_intel *mpf)
 {
 	unsigned long size = get_mpc_size(mpf->physptr);
 
-	reserve_early_overlap_ok(mpf->physptr, mpf->physptr+size, "MP-table mpc");
+	lmb_reserve_area_overlap_ok(mpf->physptr, mpf->physptr+size, "MP-table mpc");
 }
 
 static int __init smp_scan_config(unsigned long base, unsigned long length)
@@ -670,7 +671,7 @@ static int __init smp_scan_config(unsigned long base, unsigned long length)
 			       mpf, (u64)virt_to_phys(mpf));
 
 			mem = virt_to_phys(mpf);
-			reserve_early_overlap_ok(mem, mem + sizeof(*mpf), "MP-table mpf");
+			lmb_reserve_area_overlap_ok(mem, mem + sizeof(*mpf), "MP-table mpf");
 			if (mpf->physptr)
 				smp_reserve_memory(mpf);
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4851ef..8e45394 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -31,6 +31,7 @@
 #include <linux/apm_bios.h>
 #include <linux/initrd.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/seq_file.h>
 #include <linux/console.h>
 #include <linux/mca.h>
@@ -614,7 +615,7 @@ static __init void reserve_ibft_region(void)
 	addr = find_ibft_region(&size);
 
 	if (size)
-		reserve_early_overlap_ok(addr, addr + size, "ibft");
+		lmb_reserve_area_overlap_ok(addr, addr + size, "ibft");
 }
 
 #ifdef CONFIG_X86_RESERVE_LOW_64K
@@ -697,6 +698,15 @@ static void __init trim_bios_range(void)
 	sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
 }
 
+static u64 __init get_max_mapped(void)
+{
+	u64 end = max_pfn_mapped;
+
+	end <<= PAGE_SHIFT;
+
+	return end;
+}
+
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -879,8 +889,6 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	max_pfn = e820_end_of_ram_pfn();
 
-	/* preallocate 4k for mptable mpc */
-	early_reserve_e820_mpc_new();
 	/* update e820 for memory not covered by WB MTRRs */
 	mtrr_bp_init();
 	if (mtrr_trim_uncached_memory(max_pfn))
@@ -905,15 +913,6 @@ void __init setup_arch(char **cmdline_p)
 	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 #endif
 
-#ifdef CONFIG_X86_CHECK_BIOS_CORRUPTION
-	setup_bios_corruption_check();
-#endif
-
-	printk(KERN_DEBUG "initial memory mapped : 0 - %08lx\n",
-			max_pfn_mapped<<PAGE_SHIFT);
-
-	reserve_brk();
-
 	/*
 	 * Find and reserve possible boot-time SMP configuration:
 	 */
@@ -921,6 +920,26 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_ibft_region();
 
+	/*
+	 * Need to conclude brk, before fill_lmb_memory()
+	 *  it could use lmb_find_area, could overlap with
+	 *  brk area.
+	 */
+	reserve_brk();
+
+	lmb.current_limit = get_max_mapped();
+	fill_lmb_memory();
+
+	/* preallocate 4k for mptable mpc */
+	early_reserve_e820_mpc_new();
+
+#ifdef CONFIG_X86_CHECK_BIOS_CORRUPTION
+	setup_bios_corruption_check();
+#endif
+
+	printk(KERN_DEBUG "initial memory mapped : 0 - %08lx\n",
+			max_pfn_mapped<<PAGE_SHIFT);
+
 	reserve_trampoline_memory();
 
 #ifdef CONFIG_ACPI_SLEEP
@@ -944,6 +963,7 @@ void __init setup_arch(char **cmdline_p)
 		max_low_pfn = max_pfn;
 	}
 #endif
+	lmb.current_limit = get_max_mapped();
 
 	/*
 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
@@ -983,7 +1003,7 @@ void __init setup_arch(char **cmdline_p)
 
 	initmem_init(0, max_pfn, acpi, k8);
 #ifndef CONFIG_NO_BOOTMEM
-	early_res_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);
+	lmb_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);
 #endif
 
 	dma32_reserve_bootmem();
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index ef6370b..35abcb8 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -137,13 +137,7 @@ static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
 
 static void __init pcpu_fc_free(void *ptr, size_t size)
 {
-#ifdef CONFIG_NO_BOOTMEM
-	u64 start = __pa(ptr);
-	u64 end = start + size;
-	free_early_partial(start, end);
-#else
 	free_bootmem(__pa(ptr), size);
-#endif
 }
 
 static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 8948f47..6e0f896 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -7,6 +7,7 @@
 #include <linux/string.h>
 #include <linux/init.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/mmzone.h>
 #include <linux/ctype.h>
 #include <linux/module.h>
@@ -174,7 +175,7 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
 	if (start < (MAX_DMA32_PFN<<PAGE_SHIFT) &&
 	    end > (MAX_DMA32_PFN<<PAGE_SHIFT))
 		start = MAX_DMA32_PFN<<PAGE_SHIFT;
-	mem = find_e820_area(start, end, size, align);
+	mem = lmb_find_area_node(nodeid, start, end, size, align);
 	if (mem != -1L)
 		return __va(mem);
 
@@ -184,7 +185,7 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
 		start = MAX_DMA32_PFN<<PAGE_SHIFT;
 	else
 		start = MAX_DMA_PFN<<PAGE_SHIFT;
-	mem = find_e820_area(start, end, size, align);
+	mem = lmb_find_area_node(nodeid, start, end, size, align);
 	if (mem != -1L)
 		return __va(mem);
 
diff --git a/kernel/Makefile b/kernel/Makefile
index 34d123b..8aa4fa2 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -11,7 +11,6 @@ obj-y     = sched.o fork.o exec_domain.o panic.o printk.o \
 	    hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
 	    notifier.o ksysfs.o pm_qos_params.o sched_clock.o cred.o \
 	    async.o range.o
-obj-$(CONFIG_HAVE_EARLY_RES) += early_res.o
 obj-y += groups.o
 
 ifdef CONFIG_FUNCTION_TRACER
diff --git a/mm/bootmem.c b/mm/bootmem.c
index ee31b95..dac3f56 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/kmemleak.h>
 #include <linux/range.h>
+#include <linux/lmb.h>
 
 #include <asm/bug.h>
 #include <asm/io.h>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 72afd94..631d2fc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3441,38 +3441,22 @@ int __init add_from_early_node_map(struct range *range, int az,
 void * __init __alloc_memory_core_early(int nid, u64 size, u64 align,
 					u64 goal, u64 limit)
 {
-	int i;
 	void *ptr;
 
-	/* need to go over early_node_map to find out good range for node */
-	for_each_active_range_index_in_nid(i, nid) {
-		u64 addr;
-		u64 ei_start, ei_last;
-
-		ei_last = early_node_map[i].end_pfn;
-		ei_last <<= PAGE_SHIFT;
-		ei_start = early_node_map[i].start_pfn;
-		ei_start <<= PAGE_SHIFT;
-		addr = find_early_area(ei_start, ei_last,
-					 goal, limit, size, align);
+	u64 addr;
 
-		if (addr == -1ULL)
-			continue;
+	if (limit > lmb.current_limit)
+		limit = lmb.current_limit;
 
-#if 0
-		printk(KERN_DEBUG "alloc (nid=%d %llx - %llx) (%llx - %llx) %llx %llx => %llx\n",
-				nid,
-				ei_start, ei_last, goal, limit, size,
-				align, addr);
-#endif
+	addr = find_memory_core_early(nid, size, align, goal, limit);
 
-		ptr = phys_to_virt(addr);
-		memset(ptr, 0, size);
-		reserve_early_without_check(addr, addr + size, "BOOTMEM");
-		return ptr;
-	}
+	if (addr == LMB_ERROR)
+		return NULL;
 
-	return NULL;
+	ptr = phys_to_virt(addr);
+	memset(ptr, 0, size);
+	lmb_reserve_area(addr, addr + size, "BOOTMEM");
+	return ptr;
 }
 #endif
 
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index aa33fd6..29d6cbf 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -220,18 +220,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 
 	if (vmemmap_buf_start) {
 		/* need to free left buf */
-#ifdef CONFIG_NO_BOOTMEM
-		free_early(__pa(vmemmap_buf_start), __pa(vmemmap_buf_end));
-		if (vmemmap_buf_start < vmemmap_buf) {
-			char name[15];
-
-			snprintf(name, sizeof(name), "MEMMAP %d", nodeid);
-			reserve_early_without_check(__pa(vmemmap_buf_start),
-						    __pa(vmemmap_buf), name);
-		}
-#else
 		free_bootmem(__pa(vmemmap_buf), vmemmap_buf_end - vmemmap_buf);
-#endif
 		vmemmap_buf = NULL;
 		vmemmap_buf_end = NULL;
 	}
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 19/35] x86: Replace e820_/_early string with lmb_
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (17 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 18/35] x86: Use lmb to replace early_res Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 20/35] x86: Remove not used early_res code Yinghai Lu
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

1.include linux/lmb.h directly. so later could reduce e820.h reference.
2 this patch is done by sed scripts mainly

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/efi.h      |    2 +-
 arch/x86/kernel/acpi/sleep.c    |    5 +++--
 arch/x86/kernel/apic/numaq_32.c |    3 ++-
 arch/x86/kernel/efi.c           |    5 +++--
 arch/x86/kernel/head32.c        |    4 ++--
 arch/x86/kernel/head64.c        |    4 ++--
 arch/x86/kernel/setup.c         |   25 ++++++++++++-------------
 arch/x86/kernel/trampoline.c    |    6 +++---
 arch/x86/mm/init.c              |    5 +++--
 arch/x86/mm/init_32.c           |   10 ++++++----
 arch/x86/mm/init_64.c           |    9 +++++----
 arch/x86/mm/k8topology_64.c     |    4 +++-
 arch/x86/mm/memtest.c           |    7 +++----
 arch/x86/mm/numa_32.c           |   17 +++++++++--------
 arch/x86/mm/numa_64.c           |   32 ++++++++++++++++----------------
 arch/x86/mm/srat_32.c           |    3 ++-
 arch/x86/mm/srat_64.c           |    9 +++++----
 arch/x86/xen/mmu.c              |    5 +++--
 arch/x86/xen/setup.c            |    3 ++-
 mm/bootmem.c                    |    4 ++--
 20 files changed, 87 insertions(+), 75 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 8406ed7..06703f3 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -90,7 +90,7 @@ extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size,
 #endif /* CONFIG_X86_32 */
 
 extern int add_efi_memmap;
-extern void efi_reserve_early(void);
+extern void efi_lmb_reserve_area(void);
 extern void efi_call_phys_prelog(void);
 extern void efi_call_phys_epilog(void);
 
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 82e5086..86be7ba 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -7,6 +7,7 @@
 
 #include <linux/acpi.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/dmi.h>
 #include <linux/cpumask.h>
 #include <asm/segment.h>
@@ -133,7 +134,7 @@ void __init acpi_reserve_wakeup_memory(void)
 		return;
 	}
 
-	mem = find_e820_area(0, 1<<20, WAKEUP_SIZE, PAGE_SIZE);
+	mem = lmb_find_area(0, 1<<20, WAKEUP_SIZE, PAGE_SIZE);
 
 	if (mem == -1L) {
 		printk(KERN_ERR "ACPI: Cannot allocate lowmem, S3 disabled.\n");
@@ -141,7 +142,7 @@ void __init acpi_reserve_wakeup_memory(void)
 	}
 	acpi_realmode = (unsigned long) phys_to_virt(mem);
 	acpi_wakeup_address = mem;
-	reserve_early(mem, mem + WAKEUP_SIZE, "ACPI WAKEUP");
+	lmb_reserve_area(mem, mem + WAKEUP_SIZE, "ACPI WAKEUP");
 }
 
 
diff --git a/arch/x86/kernel/apic/numaq_32.c b/arch/x86/kernel/apic/numaq_32.c
index 3e28401..c71e494 100644
--- a/arch/x86/kernel/apic/numaq_32.c
+++ b/arch/x86/kernel/apic/numaq_32.c
@@ -26,6 +26,7 @@
 #include <linux/nodemask.h>
 #include <linux/topology.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/threads.h>
 #include <linux/cpumask.h>
 #include <linux/kernel.h>
@@ -88,7 +89,7 @@ static inline void numaq_register_node(int node, struct sys_cfg_data *scd)
 	node_end_pfn[node] =
 		 MB_TO_PAGES(eq->hi_shrd_mem_start + eq->hi_shrd_mem_size);
 
-	e820_register_active_regions(node, node_start_pfn[node],
+	lmb_register_active_regions(node, node_start_pfn[node],
 						node_end_pfn[node]);
 
 	memory_present(node, node_start_pfn[node], node_end_pfn[node]);
diff --git a/arch/x86/kernel/efi.c b/arch/x86/kernel/efi.c
index c2fa9b8..ebe7c09 100644
--- a/arch/x86/kernel/efi.c
+++ b/arch/x86/kernel/efi.c
@@ -30,6 +30,7 @@
 #include <linux/init.h>
 #include <linux/efi.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/spinlock.h>
 #include <linux/uaccess.h>
 #include <linux/time.h>
@@ -275,7 +276,7 @@ static void __init do_add_efi_memmap(void)
 	sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
 }
 
-void __init efi_reserve_early(void)
+void __init efi_lmb_reserve_area(void)
 {
 	unsigned long pmap;
 
@@ -290,7 +291,7 @@ void __init efi_reserve_early(void)
 		boot_params.efi_info.efi_memdesc_size;
 	memmap.desc_version = boot_params.efi_info.efi_memdesc_version;
 	memmap.desc_size = boot_params.efi_info.efi_memdesc_size;
-	reserve_early(pmap, pmap + memmap.nr_map * memmap.desc_size,
+	lmb_reserve_area(pmap, pmap + memmap.nr_map * memmap.desc_size,
 		      "EFI memmap");
 }
 
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index ab3e366..ecd12a9 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -42,7 +42,7 @@ void __init i386_start_kernel(void)
 	lmb_reserve_area(PAGE_SIZE, PAGE_SIZE + PAGE_SIZE, "EX TRAMPOLINE");
 #endif
 
-	reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
+	lmb_reserve_area(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
 
 #ifdef CONFIG_BLK_DEV_INITRD
 	/* Reserve INITRD */
@@ -51,7 +51,7 @@ void __init i386_start_kernel(void)
 		u64 ramdisk_image = boot_params.hdr.ramdisk_image;
 		u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
 		u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-		reserve_early(ramdisk_image, ramdisk_end, "RAMDISK");
+		lmb_reserve_area(ramdisk_image, ramdisk_end, "RAMDISK");
 	}
 #endif
 
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 89dd2de..4063134 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -101,7 +101,7 @@ void __init x86_64_start_reservations(char *real_mode_data)
 
 	copy_bootdata(__va(real_mode_data));
 
-	reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
+	lmb_reserve_area(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
 
 #ifdef CONFIG_BLK_DEV_INITRD
 	/* Reserve INITRD */
@@ -110,7 +110,7 @@ void __init x86_64_start_reservations(char *real_mode_data)
 		unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
 		unsigned long ramdisk_size  = boot_params.hdr.ramdisk_size;
 		unsigned long ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-		reserve_early(ramdisk_image, ramdisk_end, "RAMDISK");
+		lmb_reserve_area(ramdisk_image, ramdisk_end, "RAMDISK");
 	}
 #endif
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 8e45394..e4fe910 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -302,7 +302,7 @@ static inline void init_gbpages(void)
 static void __init reserve_brk(void)
 {
 	if (_brk_end > _brk_start)
-		reserve_early(__pa(_brk_start), __pa(_brk_end), "BRK");
+		lmb_reserve_area(__pa(_brk_start), __pa(_brk_end), "BRK");
 
 	/* Mark brk area as locked down and no longer taking any
 	   new allocations */
@@ -324,7 +324,7 @@ static void __init relocate_initrd(void)
 	char *p, *q;
 
 	/* We need to move the initrd down into lowmem */
-	ramdisk_here = find_e820_area(0, end_of_lowmem, area_size,
+	ramdisk_here = lmb_find_area(0, end_of_lowmem, area_size,
 					 PAGE_SIZE);
 
 	if (ramdisk_here == -1ULL)
@@ -333,8 +333,7 @@ static void __init relocate_initrd(void)
 
 	/* Note: this includes all the lowmem currently occupied by
 	   the initrd, we rely on that fact to keep the data intact. */
-	reserve_early(ramdisk_here, ramdisk_here + area_size,
-			 "NEW RAMDISK");
+	lmb_reserve_area(ramdisk_here, ramdisk_here + area_size, "NEW RAMDISK");
 	initrd_start = ramdisk_here + PAGE_OFFSET;
 	initrd_end   = initrd_start + ramdisk_size;
 	printk(KERN_INFO "Allocated new RAMDISK: %08llx - %08llx\n",
@@ -390,7 +389,7 @@ static void __init reserve_initrd(void)
 	initrd_start = 0;
 
 	if (ramdisk_size >= (end_of_lowmem>>1)) {
-		free_early(ramdisk_image, ramdisk_end);
+		lmb_free_area(ramdisk_image, ramdisk_end);
 		printk(KERN_ERR "initrd too large to handle, "
 		       "disabling initrd\n");
 		return;
@@ -413,7 +412,7 @@ static void __init reserve_initrd(void)
 
 	relocate_initrd();
 
-	free_early(ramdisk_image, ramdisk_end);
+	lmb_free_area(ramdisk_image, ramdisk_end);
 }
 #else
 static void __init reserve_initrd(void)
@@ -469,7 +468,7 @@ static void __init e820_reserve_setup_data(void)
 	e820_print_map("reserve setup_data");
 }
 
-static void __init reserve_early_setup_data(void)
+static void __init lmb_reserve_area_setup_data(void)
 {
 	struct setup_data *data;
 	u64 pa_data;
@@ -481,7 +480,7 @@ static void __init reserve_early_setup_data(void)
 	while (pa_data) {
 		data = early_memremap(pa_data, sizeof(*data));
 		sprintf(buf, "setup data %x", data->type);
-		reserve_early(pa_data, pa_data+sizeof(*data)+data->len, buf);
+		lmb_reserve_area(pa_data, pa_data+sizeof(*data)+data->len, buf);
 		pa_data = data->next;
 		early_iounmap(data, sizeof(*data));
 	}
@@ -519,7 +518,7 @@ static void __init reserve_crashkernel(void)
 	if (crash_base <= 0) {
 		const unsigned long long alignment = 16<<20;	/* 16M */
 
-		crash_base = find_e820_area(alignment, ULONG_MAX, crash_size,
+		crash_base = lmb_find_area(alignment, ULONG_MAX, crash_size,
 				 alignment);
 		if (crash_base == -1ULL) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
@@ -528,14 +527,14 @@ static void __init reserve_crashkernel(void)
 	} else {
 		unsigned long long start;
 
-		start = find_e820_area(crash_base, ULONG_MAX, crash_size,
+		start = lmb_find_area(crash_base, ULONG_MAX, crash_size,
 				 1<<20);
 		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - memory is in use.\n");
 			return;
 		}
 	}
-	reserve_early(crash_base, crash_base + crash_size, "CRASH KERNEL");
+	lmb_reserve_area(crash_base, crash_base + crash_size, "CRASH KERNEL");
 
 	printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
 			"for crashkernel (System RAM: %ldMB)\n",
@@ -774,7 +773,7 @@ void __init setup_arch(char **cmdline_p)
 #endif
 	 4)) {
 		efi_enabled = 1;
-		efi_reserve_early();
+		efi_lmb_reserve_area();
 	}
 #endif
 
@@ -834,7 +833,7 @@ void __init setup_arch(char **cmdline_p)
 	vmi_activate();
 
 	/* after early param, so could get panic from serial */
-	reserve_early_setup_data();
+	lmb_reserve_area_setup_data();
 
 	if (acpi_mps_check()) {
 #ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index c652ef6..1192dcb 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -1,7 +1,7 @@
 #include <linux/io.h>
+#include <linux/lmb.h>
 
 #include <asm/trampoline.h>
-#include <asm/e820.h>
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
 #define __trampinit
@@ -19,12 +19,12 @@ void __init reserve_trampoline_memory(void)
 	unsigned long mem;
 
 	/* Has to be in very low memory so we can execute real-mode AP code. */
-	mem = find_e820_area(0, 1<<20, TRAMPOLINE_SIZE, PAGE_SIZE);
+	mem = lmb_find_area(0, 1<<20, TRAMPOLINE_SIZE, PAGE_SIZE);
 	if (mem == -1L)
 		panic("Cannot allocate trampoline\n");
 
 	trampoline_base = __va(mem);
-	reserve_early(mem, mem + TRAMPOLINE_SIZE, "TRAMPOLINE");
+	lmb_reserve_area(mem, mem + TRAMPOLINE_SIZE, "TRAMPOLINE");
 }
 
 /*
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index b278535..94f7cb9 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -2,6 +2,7 @@
 #include <linux/initrd.h>
 #include <linux/ioport.h>
 #include <linux/swap.h>
+#include <linux/lmb.h>
 
 #include <asm/cacheflush.h>
 #include <asm/e820.h>
@@ -75,7 +76,7 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
 #else
 	start = 0x8000;
 #endif
-	e820_table_start = find_e820_area(start, max_pfn_mapped<<PAGE_SHIFT,
+	e820_table_start = lmb_find_area(start, max_pfn_mapped<<PAGE_SHIFT,
 					tables, PAGE_SIZE);
 	if (e820_table_start == -1UL)
 		panic("Cannot find space for the kernel page tables");
@@ -299,7 +300,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 	__flush_tlb_all();
 
 	if (!after_bootmem && e820_table_end > e820_table_start)
-		reserve_early(e820_table_start << PAGE_SHIFT,
+		lmb_reserve_area(e820_table_start << PAGE_SHIFT,
 				 e820_table_end << PAGE_SHIFT, "PGTABLE");
 
 	if (!after_bootmem)
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 90e0545..c01c711 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -25,6 +25,7 @@
 #include <linux/pfn.h>
 #include <linux/poison.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/proc_fs.h>
 #include <linux/memory_hotplug.h>
 #include <linux/initrd.h>
@@ -712,14 +713,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
 	highstart_pfn = highend_pfn = max_pfn;
 	if (max_pfn > max_low_pfn)
 		highstart_pfn = max_low_pfn;
-	e820_register_active_regions(0, 0, highend_pfn);
+	lmb_register_active_regions(0, 0, highend_pfn);
 	sparse_memory_present_with_active_regions(0);
 	printk(KERN_NOTICE "%ldMB HIGHMEM available.\n",
 		pages_to_mb(highend_pfn - highstart_pfn));
 	num_physpages = highend_pfn;
 	high_memory = (void *) __va(highstart_pfn * PAGE_SIZE - 1) + 1;
 #else
-	e820_register_active_regions(0, 0, max_low_pfn);
+	lmb_register_active_regions(0, 0, max_low_pfn);
 	sparse_memory_present_with_active_regions(0);
 	num_physpages = max_low_pfn;
 	high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1;
@@ -781,11 +782,11 @@ void __init setup_bootmem_allocator(void)
 	 * Initialize the boot-time allocator (with low memory only):
 	 */
 	bootmap_size = bootmem_bootmap_pages(max_low_pfn)<<PAGE_SHIFT;
-	bootmap = find_e820_area(0, max_pfn_mapped<<PAGE_SHIFT, bootmap_size,
+	bootmap = lmb_find_area(0, max_pfn_mapped<<PAGE_SHIFT, bootmap_size,
 				 PAGE_SIZE);
 	if (bootmap == -1L)
 		panic("Cannot find bootmem map of size %ld\n", bootmap_size);
-	reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP");
+	lmb_reserve_area(bootmap, bootmap + bootmap_size, "BOOTMAP");
 #endif
 
 	printk(KERN_INFO "  mapped low ram: 0 - %08lx\n",
@@ -1069,3 +1070,4 @@ void mark_rodata_ro(void)
 #endif
 }
 #endif
+
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 634fa08..0d2252c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -21,6 +21,7 @@
 #include <linux/initrd.h>
 #include <linux/pagemap.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/proc_fs.h>
 #include <linux/pci.h>
 #include <linux/pfn.h>
@@ -577,18 +578,18 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
 	unsigned long bootmap_size, bootmap;
 
 	bootmap_size = bootmem_bootmap_pages(end_pfn)<<PAGE_SHIFT;
-	bootmap = find_e820_area(0, end_pfn<<PAGE_SHIFT, bootmap_size,
+	bootmap = lmb_find_area(0, end_pfn<<PAGE_SHIFT, bootmap_size,
 				 PAGE_SIZE);
 	if (bootmap == -1L)
 		panic("Cannot find bootmem map of size %ld\n", bootmap_size);
-	reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP");
+	lmb_reserve_area(bootmap, bootmap + bootmap_size, "BOOTMAP");
 	/* don't touch min_low_pfn */
 	bootmap_size = init_bootmem_node(NODE_DATA(0), bootmap >> PAGE_SHIFT,
 					 0, end_pfn);
-	e820_register_active_regions(0, start_pfn, end_pfn);
+	lmb_register_active_regions(0, start_pfn, end_pfn);
 	free_bootmem_with_active_regions(0, end_pfn);
 #else
-	e820_register_active_regions(0, start_pfn, end_pfn);
+	lmb_register_active_regions(0, start_pfn, end_pfn);
 #endif
 }
 #endif
diff --git a/arch/x86/mm/k8topology_64.c b/arch/x86/mm/k8topology_64.c
index 970ed57..d7d031b 100644
--- a/arch/x86/mm/k8topology_64.c
+++ b/arch/x86/mm/k8topology_64.c
@@ -11,6 +11,8 @@
 #include <linux/string.h>
 #include <linux/module.h>
 #include <linux/nodemask.h>
+#include <linux/lmb.h>
+
 #include <asm/io.h>
 #include <linux/pci_ids.h>
 #include <linux/acpi.h>
@@ -222,7 +224,7 @@ int __init k8_scan_nodes(void)
 	for_each_node_mask(i, node_possible_map) {
 		int j;
 
-		e820_register_active_regions(i,
+		lmb_register_active_regions(i,
 				nodes[i].start >> PAGE_SHIFT,
 				nodes[i].end >> PAGE_SHIFT);
 		for (j = apicid_base; j < cores + apicid_base; j++)
diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
index 18d244f..0e4a006 100644
--- a/arch/x86/mm/memtest.c
+++ b/arch/x86/mm/memtest.c
@@ -6,8 +6,7 @@
 #include <linux/smp.h>
 #include <linux/init.h>
 #include <linux/pfn.h>
-
-#include <asm/e820.h>
+#include <linux/lmb.h>
 
 static u64 patterns[] __initdata = {
 	0,
@@ -35,7 +34,7 @@ static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
 	       (unsigned long long) pattern,
 	       (unsigned long long) start_bad,
 	       (unsigned long long) end_bad);
-	reserve_early(start_bad, end_bad, "BAD RAM");
+	lmb_reserve_area(start_bad, end_bad, "BAD RAM");
 }
 
 static void __init memtest(u64 pattern, u64 start_phys, u64 size)
@@ -74,7 +73,7 @@ static void __init do_one_pass(u64 pattern, u64 start, u64 end)
 	u64 size = 0;
 
 	while (start < end) {
-		start = find_e820_area_size(start, &size, 1);
+		start = lmb_find_area_size(start, &size, 1);
 
 		/* done ? */
 		if (start >= end)
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 809baaa..d8d655f 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -24,6 +24,7 @@
 
 #include <linux/mm.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/mmzone.h>
 #include <linux/highmem.h>
 #include <linux/initrd.h>
@@ -120,7 +121,7 @@ int __init get_memcfg_numa_flat(void)
 
 	node_start_pfn[0] = 0;
 	node_end_pfn[0] = max_pfn;
-	e820_register_active_regions(0, 0, max_pfn);
+	lmb_register_active_regions(0, 0, max_pfn);
 	memory_present(0, 0, max_pfn);
 	node_remap_size[0] = node_memmap_size_bytes(0, 0, max_pfn);
 
@@ -161,14 +162,14 @@ static void __init allocate_pgdat(int nid)
 		NODE_DATA(nid) = (pg_data_t *)node_remap_start_vaddr[nid];
 	else {
 		unsigned long pgdat_phys;
-		pgdat_phys = find_e820_area(min_low_pfn<<PAGE_SHIFT,
+		pgdat_phys = lmb_find_area(min_low_pfn<<PAGE_SHIFT,
 				 max_pfn_mapped<<PAGE_SHIFT,
 				 sizeof(pg_data_t),
 				 PAGE_SIZE);
 		NODE_DATA(nid) = (pg_data_t *)(pfn_to_kaddr(pgdat_phys>>PAGE_SHIFT));
 		memset(buf, 0, sizeof(buf));
 		sprintf(buf, "NODE_DATA %d",  nid);
-		reserve_early(pgdat_phys, pgdat_phys + sizeof(pg_data_t), buf);
+		lmb_reserve_area(pgdat_phys, pgdat_phys + sizeof(pg_data_t), buf);
 	}
 	printk(KERN_DEBUG "allocate_pgdat: node %d NODE_DATA %08lx\n",
 		nid, (unsigned long)NODE_DATA(nid));
@@ -291,7 +292,7 @@ static __init unsigned long calculate_numa_remap_pages(void)
 						 PTRS_PER_PTE);
 		node_kva_target <<= PAGE_SHIFT;
 		do {
-			node_kva_final = find_e820_area(node_kva_target,
+			node_kva_final = lmb_find_area(node_kva_target,
 					((u64)node_end_pfn[nid])<<PAGE_SHIFT,
 						((u64)size)<<PAGE_SHIFT,
 						LARGE_PAGE_BYTES);
@@ -318,9 +319,9 @@ static __init unsigned long calculate_numa_remap_pages(void)
 		 *  but we could have some hole in high memory, and it will only
 		 *  check page_is_ram(pfn) && !page_is_reserved_early(pfn) to decide
 		 *  to use it as free.
-		 *  So reserve_early here, hope we don't run out of that array
+		 *  So lmb_reserve_area here, hope we don't run out of that array
 		 */
-		reserve_early(node_kva_final,
+		lmb_reserve_area(node_kva_final,
 			      node_kva_final+(((u64)size)<<PAGE_SHIFT),
 			      "KVA RAM");
 
@@ -367,7 +368,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
 
 	kva_target_pfn = round_down(max_low_pfn - kva_pages, PTRS_PER_PTE);
 	do {
-		kva_start_pfn = find_e820_area(kva_target_pfn<<PAGE_SHIFT,
+		kva_start_pfn = lmb_find_area(kva_target_pfn<<PAGE_SHIFT,
 					max_low_pfn<<PAGE_SHIFT,
 					kva_pages<<PAGE_SHIFT,
 					PTRS_PER_PTE<<PAGE_SHIFT) >> PAGE_SHIFT;
@@ -382,7 +383,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
 	printk(KERN_INFO "max_pfn = %lx\n", max_pfn);
 
 	/* avoid clash with initrd */
-	reserve_early(kva_start_pfn<<PAGE_SHIFT,
+	lmb_reserve_area(kva_start_pfn<<PAGE_SHIFT,
 		      (kva_start_pfn + kva_pages)<<PAGE_SHIFT,
 		     "KVA PG");
 #ifdef CONFIG_HIGHMEM
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 6e0f896..18d2296 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -90,7 +90,7 @@ static int __init allocate_cachealigned_memnodemap(void)
 
 	addr = 0x8000;
 	nodemap_size = roundup(sizeof(s16) * memnodemapsize, L1_CACHE_BYTES);
-	nodemap_addr = find_e820_area(addr, max_pfn<<PAGE_SHIFT,
+	nodemap_addr = lmb_find_area(addr, max_pfn<<PAGE_SHIFT,
 				      nodemap_size, L1_CACHE_BYTES);
 	if (nodemap_addr == -1UL) {
 		printk(KERN_ERR
@@ -99,7 +99,7 @@ static int __init allocate_cachealigned_memnodemap(void)
 		return -1;
 	}
 	memnodemap = phys_to_virt(nodemap_addr);
-	reserve_early(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP");
+	lmb_reserve_area(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP");
 
 	printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
 	       nodemap_addr, nodemap_addr + nodemap_size);
@@ -230,7 +230,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
 	if (node_data[nodeid] == NULL)
 		return;
 	nodedata_phys = __pa(node_data[nodeid]);
-	reserve_early(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA");
+	lmb_reserve_area(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA");
 	printk(KERN_INFO "  NODE_DATA [%016lx - %016lx]\n", nodedata_phys,
 		nodedata_phys + pgdat_size - 1);
 	nid = phys_to_nid(nodedata_phys);
@@ -249,7 +249,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
 	 * Find a place for the bootmem map
 	 * nodedata_phys could be on other nodes by alloc_bootmem,
 	 * so need to sure bootmap_start not to be small, otherwise
-	 * early_node_mem will get that with find_e820_area instead
+	 * early_node_mem will get that with lmb_find_area instead
 	 * of alloc_bootmem, that could clash with reserved range
 	 */
 	bootmap_pages = bootmem_bootmap_pages(last_pfn - start_pfn);
@@ -261,12 +261,12 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
 	bootmap = early_node_mem(nodeid, bootmap_start, end,
 				 bootmap_pages<<PAGE_SHIFT, PAGE_SIZE);
 	if (bootmap == NULL)  {
-		free_early(nodedata_phys, nodedata_phys + pgdat_size);
+		lmb_free_area(nodedata_phys, nodedata_phys + pgdat_size);
 		node_data[nodeid] = NULL;
 		return;
 	}
 	bootmap_start = __pa(bootmap);
-	reserve_early(bootmap_start, bootmap_start+(bootmap_pages<<PAGE_SHIFT),
+	lmb_reserve_area(bootmap_start, bootmap_start+(bootmap_pages<<PAGE_SHIFT),
 			"BOOTMAP");
 
 	bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
@@ -420,7 +420,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
 		nr_nodes = MAX_NUMNODES;
 	}
 
-	size = (max_addr - addr - e820_hole_size(addr, max_addr)) / nr_nodes;
+	size = (max_addr - addr - lmb_hole_size(addr, max_addr)) / nr_nodes;
 	/*
 	 * Calculate the number of big nodes that can be allocated as a result
 	 * of consolidating the remainder.
@@ -456,7 +456,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
 			 * non-reserved memory is less than the per-node size.
 			 */
 			while (end - physnodes[i].start -
-				e820_hole_size(physnodes[i].start, end) < size) {
+				lmb_hole_size(physnodes[i].start, end) < size) {
 				end += FAKE_NODE_MIN_SIZE;
 				if (end > physnodes[i].end) {
 					end = physnodes[i].end;
@@ -470,7 +470,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
 			 * this one must extend to the boundary.
 			 */
 			if (end < dma32_end && dma32_end - end -
-			    e820_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
+			    lmb_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
 				end = dma32_end;
 
 			/*
@@ -479,7 +479,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
 			 * physical node.
 			 */
 			if (physnodes[i].end - end -
-			    e820_hole_size(end, physnodes[i].end) < size)
+			    lmb_hole_size(end, physnodes[i].end) < size)
 				end = physnodes[i].end;
 
 			/*
@@ -507,7 +507,7 @@ static u64 __init find_end_of_node(u64 start, u64 max_addr, u64 size)
 {
 	u64 end = start + size;
 
-	while (end - start - e820_hole_size(start, end) < size) {
+	while (end - start - lmb_hole_size(start, end) < size) {
 		end += FAKE_NODE_MIN_SIZE;
 		if (end > max_addr) {
 			end = max_addr;
@@ -536,7 +536,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
 	 * creates a uniform distribution of node sizes across the entire
 	 * machine (but not necessarily over physical nodes).
 	 */
-	min_size = (max_addr - addr - e820_hole_size(addr, max_addr)) /
+	min_size = (max_addr - addr - lmb_hole_size(addr, max_addr)) /
 						MAX_NUMNODES;
 	min_size = max(min_size, FAKE_NODE_MIN_SIZE);
 	if ((min_size & FAKE_NODE_MIN_HASH_MASK) < min_size)
@@ -569,7 +569,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
 			 * this one must extend to the boundary.
 			 */
 			if (end < dma32_end && dma32_end - end -
-			    e820_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
+			    lmb_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
 				end = dma32_end;
 
 			/*
@@ -578,7 +578,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
 			 * physical node.
 			 */
 			if (physnodes[i].end - end -
-			    e820_hole_size(end, physnodes[i].end) < size)
+			    lmb_hole_size(end, physnodes[i].end) < size)
 				end = physnodes[i].end;
 
 			/*
@@ -642,7 +642,7 @@ static int __init numa_emulation(unsigned long start_pfn,
 	 */
 	remove_all_active_ranges();
 	for_each_node_mask(i, node_possible_map) {
-		e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
+		lmb_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
 						nodes[i].end >> PAGE_SHIFT);
 		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
 	}
@@ -695,7 +695,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 	node_set(0, node_possible_map);
 	for (i = 0; i < nr_cpu_ids; i++)
 		numa_set_node(i, 0);
-	e820_register_active_regions(0, start_pfn, last_pfn);
+	lmb_register_active_regions(0, start_pfn, last_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
 }
 
diff --git a/arch/x86/mm/srat_32.c b/arch/x86/mm/srat_32.c
index 9324f13..68dd606 100644
--- a/arch/x86/mm/srat_32.c
+++ b/arch/x86/mm/srat_32.c
@@ -25,6 +25,7 @@
  */
 #include <linux/mm.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/mmzone.h>
 #include <linux/acpi.h>
 #include <linux/nodemask.h>
@@ -264,7 +265,7 @@ int __init get_memcfg_from_srat(void)
 		if (node_read_chunk(chunk->nid, chunk))
 			continue;
 
-		e820_register_active_regions(chunk->nid, chunk->start_pfn,
+		lmb_register_active_regions(chunk->nid, chunk->start_pfn,
 					     min(chunk->end_pfn, max_pfn));
 	}
 	/* for out of order entries in SRAT */
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index f9897f7..23f274c 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -16,6 +16,7 @@
 #include <linux/module.h>
 #include <linux/topology.h>
 #include <linux/bootmem.h>
+#include <linux/lmb.h>
 #include <linux/mm.h>
 #include <asm/proto.h>
 #include <asm/numa.h>
@@ -98,7 +99,7 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
 	unsigned long phys;
 
 	length = slit->header.length;
-	phys = find_e820_area(0, max_pfn_mapped<<PAGE_SHIFT, length,
+	phys = lmb_find_area(0, max_pfn_mapped<<PAGE_SHIFT, length,
 		 PAGE_SIZE);
 
 	if (phys == -1L)
@@ -106,7 +107,7 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
 
 	acpi_slit = __va(phys);
 	memcpy(acpi_slit, slit, length);
-	reserve_early(phys, phys + length, "ACPI SLIT");
+	lmb_reserve_area(phys, phys + length, "ACPI SLIT");
 }
 
 /* Callback for Proximity Domain -> x2APIC mapping */
@@ -324,7 +325,7 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
 			pxmram = 0;
 	}
 
-	e820ram = max_pfn - (e820_hole_size(0, max_pfn<<PAGE_SHIFT)>>PAGE_SHIFT);
+	e820ram = max_pfn - (lmb_hole_size(0, max_pfn<<PAGE_SHIFT)>>PAGE_SHIFT);
 	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
 	if ((long)(e820ram - pxmram) >= (1<<(20 - PAGE_SHIFT))) {
 		printk(KERN_ERR
@@ -421,7 +422,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
 	}
 
 	for_each_node_mask(i, nodes_parsed)
-		e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
+		lmb_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
 						nodes[i].end >> PAGE_SHIFT);
 	/* for out of order entries in SRAT */
 	sort_node_map();
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 914f046..28185a8 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -44,6 +44,7 @@
 #include <linux/bug.h>
 #include <linux/module.h>
 #include <linux/gfp.h>
+#include <linux/lmb.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -1735,7 +1736,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 	__xen_write_cr3(true, __pa(pgd));
 	xen_mc_issue(PARAVIRT_LAZY_CPU);
 
-	reserve_early(__pa(xen_start_info->pt_base),
+	lmb_reserve_area(__pa(xen_start_info->pt_base),
 		      __pa(xen_start_info->pt_base +
 			   xen_start_info->nr_pt_frames * PAGE_SIZE),
 		      "XEN PAGETABLES");
@@ -1773,7 +1774,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 
 	pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir)));
 
-	reserve_early(__pa(xen_start_info->pt_base),
+	lmb_reserve_area(__pa(xen_start_info->pt_base),
 		      __pa(xen_start_info->pt_base +
 			   xen_start_info->nr_pt_frames * PAGE_SIZE),
 		      "XEN PAGETABLES");
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index ad0047f..be3fcf3 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -8,6 +8,7 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/pm.h>
+#include <linux/lmb.h>
 
 #include <asm/elf.h>
 #include <asm/vdso.h>
@@ -61,7 +62,7 @@ char * __init xen_memory_setup(void)
 	 *  - xen_start_info
 	 * See comment above "struct start_info" in <xen/interface/xen.h>
 	 */
-	reserve_early(__pa(xen_start_info->mfn_list),
+	lmb_reserve_area(__pa(xen_start_info->mfn_list),
 		      __pa(xen_start_info->pt_base),
 			"XEN START INFO");
 
diff --git a/mm/bootmem.c b/mm/bootmem.c
index dac3f56..2a4c8b5 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -435,7 +435,7 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
 			      unsigned long size)
 {
 #ifdef CONFIG_NO_BOOTMEM
-	free_early(physaddr, physaddr + size);
+	lmb_free_area(physaddr, physaddr + size);
 #else
 	unsigned long start, end;
 
@@ -460,7 +460,7 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
 void __init free_bootmem(unsigned long addr, unsigned long size)
 {
 #ifdef CONFIG_NO_BOOTMEM
-	free_early(addr, addr + size);
+	lmb_free_area(addr, addr + size);
 #else
 	unsigned long start, end;
 
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 20/35] x86: Remove not used early_res code
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (18 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 19/35] x86: Replace e820_/_early string with lmb_ Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 21/35] x86, lmb: Use lmb_memory_size()/lmb_free_memory_size() to get correct dma_reserve Yinghai Lu
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

and some functions in e820.c that are not used anymore

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/e820.h |   14 -
 arch/x86/kernel/e820.c      |   42 ---
 include/linux/early_res.h   |   23 --
 kernel/early_res.c          |  584 -------------------------------------------
 4 files changed, 0 insertions(+), 663 deletions(-)
 delete mode 100644 include/linux/early_res.h
 delete mode 100644 kernel/early_res.c

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 38adac8..6fbd8cd 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -112,32 +112,18 @@ static inline void early_memtest(unsigned long start, unsigned long end)
 }
 #endif
 
-extern unsigned long end_user_pfn;
-
-extern u64 find_e820_area(u64 start, u64 end, u64 size, u64 align);
-extern u64 find_e820_area_size(u64 start, u64 *sizep, u64 align);
-extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
-
 extern unsigned long e820_end_of_ram_pfn(void);
 extern unsigned long e820_end_of_low_ram_pfn(void);
-extern void e820_register_active_regions(int nid, unsigned long start_pfn,
-					 unsigned long end_pfn);
-extern u64 e820_hole_size(u64 start, u64 end);
-
 extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
 
 void init_lmb_memory(void);
 void fill_lmb_memory(void);
-
 extern void finish_e820_parsing(void);
 extern void e820_reserve_resources(void);
 extern void e820_reserve_resources_late(void);
 extern void setup_memory_map(void);
 extern char *default_machine_specific_memory_setup(void);
 
-void reserve_early(u64 start, u64 end, char *name);
-void free_early(u64 start, u64 end);
-
 /*
  * Returns true iff the specified range [s,e) is completely contained inside
  * the ISA region.
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index a0eca94..12e827b 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -739,22 +739,6 @@ core_initcall(e820_mark_nvs_memory);
 #endif
 
 /*
- * Find a free area with specified alignment in a specific range.
- */
-u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
-{
-	return lmb_find_area(start, end, size, align);
-}
-
-/*
- * Find next free range after *start
- */
-u64 __init find_e820_area_size(u64 start, u64 *sizep, u64 align)
-{
-	return lmb_find_area_size(start, sizep, align);
-}
-
-/*
  * pre allocated 4k and reserved it in lmb and e820_saved
  */
 u64 __init early_reserve_e820(u64 startt, u64 sizet, u64 align)
@@ -846,32 +830,6 @@ unsigned long __init e820_end_of_low_ram_pfn(void)
 	return e820_end_pfn(1UL<<(32 - PAGE_SHIFT), E820_RAM);
 }
 
-/* Walk the e820 map and register active regions within a node */
-void __init e820_register_active_regions(int nid, unsigned long start_pfn,
-					 unsigned long last_pfn)
-{
-	lmb_register_active_regions(nid, start_pfn, last_pfn);
-}
-
-/*
- * Find the hole size (in bytes) in the memory range.
- * @start: starting address of the memory range to scan
- * @end: ending address of the memory range to scan
- */
-u64 __init e820_hole_size(u64 start, u64 end)
-{
-	return lmb_hole_size(start, end);
-}
-
-void reserve_early(u64 start, u64 end, char *name)
-{
-	lmb_reserve_area(start, end, name);
-}
-void free_early(u64 start, u64 end)
-{
-	lmb_free_area(start, end);
-}
-
 static void early_panic(char *msg)
 {
 	early_printk(msg);
diff --git a/include/linux/early_res.h b/include/linux/early_res.h
deleted file mode 100644
index 29c09f5..0000000
--- a/include/linux/early_res.h
+++ /dev/null
@@ -1,23 +0,0 @@
-#ifndef _LINUX_EARLY_RES_H
-#define _LINUX_EARLY_RES_H
-#ifdef __KERNEL__
-
-extern void reserve_early(u64 start, u64 end, char *name);
-extern void reserve_early_overlap_ok(u64 start, u64 end, char *name);
-extern void free_early(u64 start, u64 end);
-void free_early_partial(u64 start, u64 end);
-extern void early_res_to_bootmem(u64 start, u64 end);
-
-void reserve_early_without_check(u64 start, u64 end, char *name);
-u64 find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
-			 u64 size, u64 align);
-u64 find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
-			 u64 *sizep, u64 align);
-u64 find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align);
-u64 get_max_mapped(void);
-#include <linux/range.h>
-int get_free_all_memory_range(struct range **rangep, int nodeid);
-
-#endif /* __KERNEL__ */
-
-#endif /* _LINUX_EARLY_RES_H */
diff --git a/kernel/early_res.c b/kernel/early_res.c
deleted file mode 100644
index 31aa933..0000000
--- a/kernel/early_res.c
+++ /dev/null
@@ -1,584 +0,0 @@
-/*
- * early_res, could be used to replace bootmem
- */
-#include <linux/kernel.h>
-#include <linux/types.h>
-#include <linux/init.h>
-#include <linux/bootmem.h>
-#include <linux/mm.h>
-#include <linux/early_res.h>
-
-/*
- * Early reserved memory areas.
- */
-/*
- * need to make sure this one is bigger enough before
- * find_fw_memmap_area could be used
- */
-#define MAX_EARLY_RES_X 32
-
-struct early_res {
-	u64 start, end;
-	char name[15];
-	char overlap_ok;
-};
-static struct early_res early_res_x[MAX_EARLY_RES_X] __initdata;
-
-static int max_early_res __initdata = MAX_EARLY_RES_X;
-static struct early_res *early_res __initdata = &early_res_x[0];
-static int early_res_count __initdata;
-
-static int __init find_overlapped_early(u64 start, u64 end)
-{
-	int i;
-	struct early_res *r;
-
-	for (i = 0; i < max_early_res && early_res[i].end; i++) {
-		r = &early_res[i];
-		if (end > r->start && start < r->end)
-			break;
-	}
-
-	return i;
-}
-
-/*
- * Drop the i-th range from the early reservation map,
- * by copying any higher ranges down one over it, and
- * clearing what had been the last slot.
- */
-static void __init drop_range(int i)
-{
-	int j;
-
-	for (j = i + 1; j < max_early_res && early_res[j].end; j++)
-		;
-
-	memmove(&early_res[i], &early_res[i + 1],
-	       (j - 1 - i) * sizeof(struct early_res));
-
-	early_res[j - 1].end = 0;
-	early_res_count--;
-}
-
-static void __init drop_range_partial(int i, u64 start, u64 end)
-{
-	u64 common_start, common_end;
-	u64 old_start, old_end;
-
-	old_start = early_res[i].start;
-	old_end = early_res[i].end;
-	common_start = max(old_start, start);
-	common_end = min(old_end, end);
-
-	/* no overlap ? */
-	if (common_start >= common_end)
-		return;
-
-	if (old_start < common_start) {
-		/* make head segment */
-		early_res[i].end = common_start;
-		if (old_end > common_end) {
-			char name[15];
-
-			/*
-			 * Save a local copy of the name, since the
-			 * early_res array could get resized inside
-			 * reserve_early_without_check() ->
-			 * __check_and_double_early_res(), which would
-			 * make the current name pointer invalid.
-			 */
-			strncpy(name, early_res[i].name,
-					 sizeof(early_res[i].name) - 1);
-			/* add another for left over on tail */
-			reserve_early_without_check(common_end, old_end, name);
-		}
-		return;
-	} else {
-		if (old_end > common_end) {
-			/* reuse the entry for tail left */
-			early_res[i].start = common_end;
-			return;
-		}
-		/* all covered */
-		drop_range(i);
-	}
-}
-
-/*
- * Split any existing ranges that:
- *  1) are marked 'overlap_ok', and
- *  2) overlap with the stated range [start, end)
- * into whatever portion (if any) of the existing range is entirely
- * below or entirely above the stated range.  Drop the portion
- * of the existing range that overlaps with the stated range,
- * which will allow the caller of this routine to then add that
- * stated range without conflicting with any existing range.
- */
-static void __init drop_overlaps_that_are_ok(u64 start, u64 end)
-{
-	int i;
-	struct early_res *r;
-	u64 lower_start, lower_end;
-	u64 upper_start, upper_end;
-	char name[15];
-
-	for (i = 0; i < max_early_res && early_res[i].end; i++) {
-		r = &early_res[i];
-
-		/* Continue past non-overlapping ranges */
-		if (end <= r->start || start >= r->end)
-			continue;
-
-		/*
-		 * Leave non-ok overlaps as is; let caller
-		 * panic "Overlapping early reservations"
-		 * when it hits this overlap.
-		 */
-		if (!r->overlap_ok)
-			return;
-
-		/*
-		 * We have an ok overlap.  We will drop it from the early
-		 * reservation map, and add back in any non-overlapping
-		 * portions (lower or upper) as separate, overlap_ok,
-		 * non-overlapping ranges.
-		 */
-
-		/* 1. Note any non-overlapping (lower or upper) ranges. */
-		strncpy(name, r->name, sizeof(name) - 1);
-
-		lower_start = lower_end = 0;
-		upper_start = upper_end = 0;
-		if (r->start < start) {
-			lower_start = r->start;
-			lower_end = start;
-		}
-		if (r->end > end) {
-			upper_start = end;
-			upper_end = r->end;
-		}
-
-		/* 2. Drop the original ok overlapping range */
-		drop_range(i);
-
-		i--;		/* resume for-loop on copied down entry */
-
-		/* 3. Add back in any non-overlapping ranges. */
-		if (lower_end)
-			reserve_early_overlap_ok(lower_start, lower_end, name);
-		if (upper_end)
-			reserve_early_overlap_ok(upper_start, upper_end, name);
-	}
-}
-
-static void __init __reserve_early(u64 start, u64 end, char *name,
-						int overlap_ok)
-{
-	int i;
-	struct early_res *r;
-
-	i = find_overlapped_early(start, end);
-	if (i >= max_early_res)
-		panic("Too many early reservations");
-	r = &early_res[i];
-	if (r->end)
-		panic("Overlapping early reservations "
-		      "%llx-%llx %s to %llx-%llx %s\n",
-		      start, end - 1, name ? name : "", r->start,
-		      r->end - 1, r->name);
-	r->start = start;
-	r->end = end;
-	r->overlap_ok = overlap_ok;
-	if (name)
-		strncpy(r->name, name, sizeof(r->name) - 1);
-	early_res_count++;
-}
-
-/*
- * A few early reservtations come here.
- *
- * The 'overlap_ok' in the name of this routine does -not- mean it
- * is ok for these reservations to overlap an earlier reservation.
- * Rather it means that it is ok for subsequent reservations to
- * overlap this one.
- *
- * Use this entry point to reserve early ranges when you are doing
- * so out of "Paranoia", reserving perhaps more memory than you need,
- * just in case, and don't mind a subsequent overlapping reservation
- * that is known to be needed.
- *
- * The drop_overlaps_that_are_ok() call here isn't really needed.
- * It would be needed if we had two colliding 'overlap_ok'
- * reservations, so that the second such would not panic on the
- * overlap with the first.  We don't have any such as of this
- * writing, but might as well tolerate such if it happens in
- * the future.
- */
-void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
-{
-	drop_overlaps_that_are_ok(start, end);
-	__reserve_early(start, end, name, 1);
-}
-
-static void __init __check_and_double_early_res(u64 ex_start, u64 ex_end)
-{
-	u64 start, end, size, mem;
-	struct early_res *new;
-
-	/* do we have enough slots left ? */
-	if ((max_early_res - early_res_count) > max(max_early_res/8, 2))
-		return;
-
-	/* double it */
-	mem = -1ULL;
-	size = sizeof(struct early_res) * max_early_res * 2;
-	if (early_res == early_res_x)
-		start = 0;
-	else
-		start = early_res[0].end;
-	end = ex_start;
-	if (start + size < end)
-		mem = find_fw_memmap_area(start, end, size,
-					 sizeof(struct early_res));
-	if (mem == -1ULL) {
-		start = ex_end;
-		end = get_max_mapped();
-		if (start + size < end)
-			mem = find_fw_memmap_area(start, end, size,
-						 sizeof(struct early_res));
-	}
-	if (mem == -1ULL)
-		panic("can not find more space for early_res array");
-
-	new = __va(mem);
-	/* save the first one for own */
-	new[0].start = mem;
-	new[0].end = mem + size;
-	new[0].overlap_ok = 0;
-	/* copy old to new */
-	if (early_res == early_res_x) {
-		memcpy(&new[1], &early_res[0],
-			 sizeof(struct early_res) * max_early_res);
-		memset(&new[max_early_res+1], 0,
-			 sizeof(struct early_res) * (max_early_res - 1));
-		early_res_count++;
-	} else {
-		memcpy(&new[1], &early_res[1],
-			 sizeof(struct early_res) * (max_early_res - 1));
-		memset(&new[max_early_res], 0,
-			 sizeof(struct early_res) * max_early_res);
-	}
-	memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
-	early_res = new;
-	max_early_res *= 2;
-	printk(KERN_DEBUG "early_res array is doubled to %d at [%llx - %llx]\n",
-		max_early_res, mem, mem + size - 1);
-}
-
-/*
- * Most early reservations come here.
- *
- * We first have drop_overlaps_that_are_ok() drop any pre-existing
- * 'overlap_ok' ranges, so that we can then reserve this memory
- * range without risk of panic'ing on an overlapping overlap_ok
- * early reservation.
- */
-void __init reserve_early(u64 start, u64 end, char *name)
-{
-	if (start >= end)
-		return;
-
-	__check_and_double_early_res(start, end);
-
-	drop_overlaps_that_are_ok(start, end);
-	__reserve_early(start, end, name, 0);
-}
-
-void __init reserve_early_without_check(u64 start, u64 end, char *name)
-{
-	struct early_res *r;
-
-	if (start >= end)
-		return;
-
-	__check_and_double_early_res(start, end);
-
-	r = &early_res[early_res_count];
-
-	r->start = start;
-	r->end = end;
-	r->overlap_ok = 0;
-	if (name)
-		strncpy(r->name, name, sizeof(r->name) - 1);
-	early_res_count++;
-}
-
-void __init free_early(u64 start, u64 end)
-{
-	struct early_res *r;
-	int i;
-
-	i = find_overlapped_early(start, end);
-	r = &early_res[i];
-	if (i >= max_early_res || r->end != end || r->start != start)
-		panic("free_early on not reserved area: %llx-%llx!",
-			 start, end - 1);
-
-	drop_range(i);
-}
-
-void __init free_early_partial(u64 start, u64 end)
-{
-	struct early_res *r;
-	int i;
-
-	if (start == end)
-		return;
-
-	if (WARN_ONCE(start > end, "  wrong range [%#llx, %#llx]\n", start, end))
-		return;
-
-try_next:
-	i = find_overlapped_early(start, end);
-	if (i >= max_early_res)
-		return;
-
-	r = &early_res[i];
-	/* hole ? */
-	if (r->end >= end && r->start <= start) {
-		drop_range_partial(i, start, end);
-		return;
-	}
-
-	drop_range_partial(i, start, end);
-	goto try_next;
-}
-
-#ifdef CONFIG_NO_BOOTMEM
-static void __init subtract_early_res(struct range *range, int az)
-{
-	int i, count;
-	u64 final_start, final_end;
-	int idx = 0;
-
-	count  = 0;
-	for (i = 0; i < max_early_res && early_res[i].end; i++)
-		count++;
-
-	/* need to skip first one ?*/
-	if (early_res != early_res_x)
-		idx = 1;
-
-#define DEBUG_PRINT_EARLY_RES 1
-
-#if DEBUG_PRINT_EARLY_RES
-	printk(KERN_INFO "Subtract (%d early reservations)\n", count);
-#endif
-	for (i = idx; i < count; i++) {
-		struct early_res *r = &early_res[i];
-#if DEBUG_PRINT_EARLY_RES
-		printk(KERN_INFO "  #%d [%010llx - %010llx] %15s\n", i,
-			r->start, r->end, r->name);
-#endif
-		final_start = PFN_DOWN(r->start);
-		final_end = PFN_UP(r->end);
-		if (final_start >= final_end)
-			continue;
-		subtract_range(range, az, final_start, final_end);
-	}
-
-}
-
-int __init get_free_all_memory_range(struct range **rangep, int nodeid)
-{
-	int i, count;
-	u64 start = 0, end;
-	u64 size;
-	u64 mem;
-	struct range *range;
-	int nr_range;
-
-	count  = 0;
-	for (i = 0; i < max_early_res && early_res[i].end; i++)
-		count++;
-
-	count *= 2;
-
-	size = sizeof(struct range) * count;
-	end = get_max_mapped();
-#ifdef MAX_DMA32_PFN
-	if (end > (MAX_DMA32_PFN << PAGE_SHIFT))
-		start = MAX_DMA32_PFN << PAGE_SHIFT;
-#endif
-	mem = find_fw_memmap_area(start, end, size, sizeof(struct range));
-	if (mem == -1ULL)
-		panic("can not find more space for range free");
-
-	range = __va(mem);
-	/* use early_node_map[] and early_res to get range array at first */
-	memset(range, 0, size);
-	nr_range = 0;
-
-	/* need to go over early_node_map to find out good range for node */
-	nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
-#ifdef CONFIG_X86_32
-	subtract_range(range, count, max_low_pfn, -1ULL);
-#endif
-	subtract_early_res(range, count);
-	nr_range = clean_sort_range(range, count);
-
-	/* need to clear it ? */
-	if (nodeid == MAX_NUMNODES) {
-		memset(&early_res[0], 0,
-			 sizeof(struct early_res) * max_early_res);
-		early_res = NULL;
-		max_early_res = 0;
-	}
-
-	*rangep = range;
-	return nr_range;
-}
-#else
-void __init early_res_to_bootmem(u64 start, u64 end)
-{
-	int i, count;
-	u64 final_start, final_end;
-	int idx = 0;
-
-	count  = 0;
-	for (i = 0; i < max_early_res && early_res[i].end; i++)
-		count++;
-
-	/* need to skip first one ?*/
-	if (early_res != early_res_x)
-		idx = 1;
-
-	printk(KERN_INFO "(%d/%d early reservations) ==> bootmem [%010llx - %010llx]\n",
-			 count - idx, max_early_res, start, end);
-	for (i = idx; i < count; i++) {
-		struct early_res *r = &early_res[i];
-		printk(KERN_INFO "  #%d [%010llx - %010llx] %16s", i,
-			r->start, r->end, r->name);
-		final_start = max(start, r->start);
-		final_end = min(end, r->end);
-		if (final_start >= final_end) {
-			printk(KERN_CONT "\n");
-			continue;
-		}
-		printk(KERN_CONT " ==> [%010llx - %010llx]\n",
-			final_start, final_end);
-		reserve_bootmem_generic(final_start, final_end - final_start,
-				BOOTMEM_DEFAULT);
-	}
-	/* clear them */
-	memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
-	early_res = NULL;
-	max_early_res = 0;
-	early_res_count = 0;
-}
-#endif
-
-/* Check for already reserved areas */
-static inline int __init bad_addr(u64 *addrp, u64 size, u64 align)
-{
-	int i;
-	u64 addr = *addrp;
-	int changed = 0;
-	struct early_res *r;
-again:
-	i = find_overlapped_early(addr, addr + size);
-	r = &early_res[i];
-	if (i < max_early_res && r->end) {
-		*addrp = addr = round_up(r->end, align);
-		changed = 1;
-		goto again;
-	}
-	return changed;
-}
-
-/* Check for already reserved areas */
-static inline int __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
-{
-	int i;
-	u64 addr = *addrp, last;
-	u64 size = *sizep;
-	int changed = 0;
-again:
-	last = addr + size;
-	for (i = 0; i < max_early_res && early_res[i].end; i++) {
-		struct early_res *r = &early_res[i];
-		if (last > r->start && addr < r->start) {
-			size = r->start - addr;
-			changed = 1;
-			goto again;
-		}
-		if (last > r->end && addr < r->end) {
-			addr = round_up(r->end, align);
-			size = last - addr;
-			changed = 1;
-			goto again;
-		}
-		if (last <= r->end && addr >= r->start) {
-			(*sizep)++;
-			return 0;
-		}
-	}
-	if (changed) {
-		*addrp = addr;
-		*sizep = size;
-	}
-	return changed;
-}
-
-/*
- * Find a free area with specified alignment in a specific range.
- * only with the area.between start to end is active range from early_node_map
- * so they are good as RAM
- */
-u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
-			 u64 size, u64 align)
-{
-	u64 addr, last;
-
-	addr = round_up(ei_start, align);
-	if (addr < start)
-		addr = round_up(start, align);
-	if (addr >= ei_last)
-		goto out;
-	while (bad_addr(&addr, size, align) && addr+size <= ei_last)
-		;
-	last = addr + size;
-	if (last > ei_last)
-		goto out;
-	if (last > end)
-		goto out;
-
-	return addr;
-
-out:
-	return -1ULL;
-}
-
-u64 __init find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
-			 u64 *sizep, u64 align)
-{
-	u64 addr, last;
-
-	addr = round_up(ei_start, align);
-	if (addr < start)
-		addr = round_up(start, align);
-	if (addr >= ei_last)
-		goto out;
-	*sizep = ei_last - addr;
-	while (bad_addr_size(&addr, sizep, align) && addr + *sizep <= ei_last)
-		;
-	last = addr + *sizep;
-	if (last > ei_last)
-		goto out;
-
-	return addr;
-
-out:
-	return -1ULL;
-}
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 21/35] x86, lmb: Use lmb_memory_size()/lmb_free_memory_size() to get correct dma_reserve
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (19 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 20/35] x86: Remove not used early_res code Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 22/35] bootmem: Add nobootmem.c to reduce the #ifdef Yinghai Lu
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

lmb_memory_size() will return memory size in lmb.memory.region.
lmb_free_memory_size() will return free memory size in lmb.memory.region.

So We can get exact reseved size in specified range.

Set the size right after initmem_init(), because later bootmem API will
get area above 16M. (except some fallback).

Later after we remove the bootmem, We could call that just before paging_init().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/e820.h |    2 ++
 arch/x86/kernel/e820.c      |   17 +++++++++++++++++
 arch/x86/kernel/setup.c     |    1 +
 arch/x86/mm/init_64.c       |    7 -------
 4 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 6fbd8cd..f59db16 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -118,6 +118,8 @@ extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
 
 void init_lmb_memory(void);
 void fill_lmb_memory(void);
+void find_lmb_dma_reserve(void);
+
 extern void finish_e820_parsing(void);
 extern void e820_reserve_resources(void);
 extern void e820_reserve_resources_late(void);
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 12e827b..92c6021 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1103,3 +1103,20 @@ void __init fill_lmb_memory(void)
 	lmb_analyze();
 	lmb_dump_all();
 }
+
+void __init find_lmb_dma_reserve(void)
+{
+#ifdef CONFIG_X86_64
+	u64 free_size_pfn;
+	u64 mem_size_pfn;
+	/*
+	 * need to find out used area below MAX_DMA_PFN
+	 * need to use lmb to get free size in [0, MAX_DMA_PFN]
+	 * at first, and assume boot_mem will not take below MAX_DMA_PFN
+	 */
+	mem_size_pfn = lmb_memory_size(0, MAX_DMA_PFN << PAGE_SHIFT) >> PAGE_SHIFT;
+	free_size_pfn = lmb_free_memory_size(0, MAX_DMA_PFN << PAGE_SHIFT) >> PAGE_SHIFT;
+	set_dma_reserve(mem_size_pfn - free_size_pfn);
+#endif
+}
+
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e4fe910..c5ec1b8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1001,6 +1001,7 @@ void __init setup_arch(char **cmdline_p)
 #endif
 
 	initmem_init(0, max_pfn, acpi, k8);
+	find_lmb_dma_reserve();
 #ifndef CONFIG_NO_BOOTMEM
 	lmb_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);
 #endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 0d2252c..37c7a82 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -53,8 +53,6 @@
 #include <asm/init.h>
 #include <linux/bootmem.h>
 
-static unsigned long dma_reserve __initdata;
-
 static int __init parse_direct_gbpages_off(char *arg)
 {
 	direct_gbpages = 0;
@@ -821,11 +819,6 @@ int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
 
 	reserve_bootmem(phys, len, flags);
 
-	if (phys+len <= MAX_DMA_PFN*PAGE_SIZE) {
-		dma_reserve += len / PAGE_SIZE;
-		set_dma_reserve(dma_reserve);
-	}
-
 	return 0;
 }
 #endif
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 22/35] bootmem: Add nobootmem.c to reduce the #ifdef
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (20 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 21/35] x86, lmb: Use lmb_memory_size()/lmb_free_memory_size() to get correct dma_reserve Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 23/35] mm: move contig_page_data define to bootmem.c/nobootmem.c Yinghai Lu
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

Introduce nobootmem.c to hold wrapper for CONFIG_NO_BOOTMEM=y.

that will remove related #ifdef in bootmem.c

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 mm/Makefile    |    8 +-
 mm/bootmem.c   |  151 +----------------------
 mm/nobootmem.c |  389 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 397 insertions(+), 151 deletions(-)
 create mode 100644 mm/nobootmem.c

diff --git a/mm/Makefile b/mm/Makefile
index 6c2a73a..08c58d5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -7,12 +7,18 @@ mmu-$(CONFIG_MMU)	:= fremap.o highmem.o madvise.o memory.o mincore.o \
 			   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
 			   vmalloc.o pagewalk.o
 
-obj-y			:= bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
+obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
 			   maccess.o page_alloc.o page-writeback.o \
 			   readahead.o swap.o truncate.o vmscan.o shmem.o \
 			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
 			   page_isolation.o mm_init.o mmu_context.o \
 			   $(mmu-y)
+ifdef CONFIG_NO_BOOTMEM
+	obj-y		+= nobootmem.o
+else
+	obj-y		+= bootmem.o
+endif
+
 obj-y += init-mm.o
 
 obj-$(CONFIG_BOUNCE)	+= bounce.o
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 2a4c8b5..2741c34 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -35,7 +35,6 @@ unsigned long max_pfn;
 unsigned long saved_max_pfn;
 #endif
 
-#ifndef CONFIG_NO_BOOTMEM
 bootmem_data_t bootmem_node_data[MAX_NUMNODES] __initdata;
 
 static struct list_head bdata_list __initdata = LIST_HEAD_INIT(bdata_list);
@@ -146,7 +145,7 @@ unsigned long __init init_bootmem(unsigned long start, unsigned long pages)
 	min_low_pfn = start;
 	return init_bootmem_core(NODE_DATA(0)->bdata, start, 0, pages);
 }
-#endif
+
 /*
  * free_bootmem_late - free bootmem pages directly to page allocator
  * @addr: starting address of the range
@@ -171,53 +170,6 @@ void __init free_bootmem_late(unsigned long addr, unsigned long size)
 	}
 }
 
-#ifdef CONFIG_NO_BOOTMEM
-static void __init __free_pages_memory(unsigned long start, unsigned long end)
-{
-	int i;
-	unsigned long start_aligned, end_aligned;
-	int order = ilog2(BITS_PER_LONG);
-
-	start_aligned = (start + (BITS_PER_LONG - 1)) & ~(BITS_PER_LONG - 1);
-	end_aligned = end & ~(BITS_PER_LONG - 1);
-
-	if (end_aligned <= start_aligned) {
-		for (i = start; i < end; i++)
-			__free_pages_bootmem(pfn_to_page(i), 0);
-
-		return;
-	}
-
-	for (i = start; i < start_aligned; i++)
-		__free_pages_bootmem(pfn_to_page(i), 0);
-
-	for (i = start_aligned; i < end_aligned; i += BITS_PER_LONG)
-		__free_pages_bootmem(pfn_to_page(i), order);
-
-	for (i = end_aligned; i < end; i++)
-		__free_pages_bootmem(pfn_to_page(i), 0);
-}
-
-unsigned long __init free_all_memory_core_early(int nodeid)
-{
-	int i;
-	u64 start, end;
-	unsigned long count = 0;
-	struct range *range = NULL;
-	int nr_range;
-
-	nr_range = get_free_all_memory_range(&range, nodeid);
-
-	for (i = 0; i < nr_range; i++) {
-		start = range[i].start;
-		end = range[i].end;
-		count += end - start;
-		__free_pages_memory(start, end);
-	}
-
-	return count;
-}
-#else
 static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 {
 	int aligned;
@@ -278,7 +230,6 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 
 	return count;
 }
-#endif
 
 /**
  * free_all_bootmem_node - release a node's free pages to the buddy allocator
@@ -289,12 +240,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
 {
 	register_page_bootmem_info_node(pgdat);
-#ifdef CONFIG_NO_BOOTMEM
-	/* free_all_memory_core_early(MAX_NUMNODES) will be called later */
-	return 0;
-#else
 	return free_all_bootmem_core(pgdat->bdata);
-#endif
 }
 
 /**
@@ -304,16 +250,6 @@ unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
  */
 unsigned long __init free_all_bootmem(void)
 {
-#ifdef CONFIG_NO_BOOTMEM
-	/*
-	 * We need to use MAX_NUMNODES instead of NODE_DATA(0)->node_id
-	 *  because in some case like Node0 doesnt have RAM installed
-	 *  low ram will be on Node1
-	 * Use MAX_NUMNODES will make sure all ranges in early_node_map[]
-	 *  will be used instead of only Node0 related
-	 */
-	return free_all_memory_core_early(MAX_NUMNODES);
-#else
 	unsigned long total_pages = 0;
 	bootmem_data_t *bdata;
 
@@ -321,10 +257,8 @@ unsigned long __init free_all_bootmem(void)
 		total_pages += free_all_bootmem_core(bdata);
 
 	return total_pages;
-#endif
 }
 
-#ifndef CONFIG_NO_BOOTMEM
 static void __init __free(bootmem_data_t *bdata,
 			unsigned long sidx, unsigned long eidx)
 {
@@ -419,7 +353,6 @@ static int __init mark_bootmem(unsigned long start, unsigned long end,
 	}
 	BUG();
 }
-#endif
 
 /**
  * free_bootmem_node - mark a page range as usable
@@ -434,9 +367,6 @@ static int __init mark_bootmem(unsigned long start, unsigned long end,
 void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
 			      unsigned long size)
 {
-#ifdef CONFIG_NO_BOOTMEM
-	lmb_free_area(physaddr, physaddr + size);
-#else
 	unsigned long start, end;
 
 	kmemleak_free_part(__va(physaddr), size);
@@ -445,7 +375,6 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
 	end = PFN_DOWN(physaddr + size);
 
 	mark_bootmem_node(pgdat->bdata, start, end, 0, 0);
-#endif
 }
 
 /**
@@ -459,9 +388,6 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
  */
 void __init free_bootmem(unsigned long addr, unsigned long size)
 {
-#ifdef CONFIG_NO_BOOTMEM
-	lmb_free_area(addr, addr + size);
-#else
 	unsigned long start, end;
 
 	kmemleak_free_part(__va(addr), size);
@@ -470,7 +396,6 @@ void __init free_bootmem(unsigned long addr, unsigned long size)
 	end = PFN_DOWN(addr + size);
 
 	mark_bootmem(start, end, 0, 0);
-#endif
 }
 
 /**
@@ -487,17 +412,12 @@ void __init free_bootmem(unsigned long addr, unsigned long size)
 int __init reserve_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
 				 unsigned long size, int flags)
 {
-#ifdef CONFIG_NO_BOOTMEM
-	panic("no bootmem");
-	return 0;
-#else
 	unsigned long start, end;
 
 	start = PFN_DOWN(physaddr);
 	end = PFN_UP(physaddr + size);
 
 	return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);
-#endif
 }
 
 /**
@@ -513,20 +433,14 @@ int __init reserve_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
 int __init reserve_bootmem(unsigned long addr, unsigned long size,
 			    int flags)
 {
-#ifdef CONFIG_NO_BOOTMEM
-	panic("no bootmem");
-	return 0;
-#else
 	unsigned long start, end;
 
 	start = PFN_DOWN(addr);
 	end = PFN_UP(addr + size);
 
 	return mark_bootmem(start, end, 1, flags);
-#endif
 }
 
-#ifndef CONFIG_NO_BOOTMEM
 int __weak __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
 				   int flags)
 {
@@ -683,33 +597,12 @@ static void * __init alloc_arch_preferred_bootmem(bootmem_data_t *bdata,
 #endif
 	return NULL;
 }
-#endif
 
 static void * __init ___alloc_bootmem_nopanic(unsigned long size,
 					unsigned long align,
 					unsigned long goal,
 					unsigned long limit)
 {
-#ifdef CONFIG_NO_BOOTMEM
-	void *ptr;
-
-	if (WARN_ON_ONCE(slab_is_available()))
-		return kzalloc(size, GFP_NOWAIT);
-
-restart:
-
-	ptr = __alloc_memory_core_early(MAX_NUMNODES, size, align, goal, limit);
-
-	if (ptr)
-		return ptr;
-
-	if (goal != 0) {
-		goal = 0;
-		goto restart;
-	}
-
-	return NULL;
-#else
 	bootmem_data_t *bdata;
 	void *region;
 
@@ -735,7 +628,6 @@ restart:
 	}
 
 	return NULL;
-#endif
 }
 
 /**
@@ -756,10 +648,6 @@ void * __init __alloc_bootmem_nopanic(unsigned long size, unsigned long align,
 {
 	unsigned long limit = 0;
 
-#ifdef CONFIG_NO_BOOTMEM
-	limit = -1UL;
-#endif
-
 	return ___alloc_bootmem_nopanic(size, align, goal, limit);
 }
 
@@ -796,14 +684,9 @@ void * __init __alloc_bootmem(unsigned long size, unsigned long align,
 {
 	unsigned long limit = 0;
 
-#ifdef CONFIG_NO_BOOTMEM
-	limit = -1UL;
-#endif
-
 	return ___alloc_bootmem(size, align, goal, limit);
 }
 
-#ifndef CONFIG_NO_BOOTMEM
 static void * __init ___alloc_bootmem_node(bootmem_data_t *bdata,
 				unsigned long size, unsigned long align,
 				unsigned long goal, unsigned long limit)
@@ -820,7 +703,6 @@ static void * __init ___alloc_bootmem_node(bootmem_data_t *bdata,
 
 	return ___alloc_bootmem(size, align, goal, limit);
 }
-#endif
 
 /**
  * __alloc_bootmem_node - allocate boot memory from a specific node
@@ -843,12 +725,7 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
 	if (WARN_ON_ONCE(slab_is_available()))
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
 
-#ifdef CONFIG_NO_BOOTMEM
-	return __alloc_memory_core_early(pgdat->node_id, size, align,
-					 goal, -1ULL);
-#else
 	return ___alloc_bootmem_node(pgdat->bdata, size, align, goal, 0);
-#endif
 }
 
 void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
@@ -869,13 +746,8 @@ void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
 		unsigned long new_goal;
 
 		new_goal = MAX_DMA32_PFN << PAGE_SHIFT;
-#ifdef CONFIG_NO_BOOTMEM
-		ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
-						 new_goal, -1ULL);
-#else
 		ptr = alloc_bootmem_core(pgdat->bdata, size, align,
 						 new_goal, 0);
-#endif
 		if (ptr)
 			return ptr;
 	}
@@ -896,16 +768,6 @@ void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
 void * __init alloc_bootmem_section(unsigned long size,
 				    unsigned long section_nr)
 {
-#ifdef CONFIG_NO_BOOTMEM
-	unsigned long pfn, goal, limit;
-
-	pfn = section_nr_to_pfn(section_nr);
-	goal = pfn << PAGE_SHIFT;
-	limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
-
-	return __alloc_memory_core_early(early_pfn_to_nid(pfn), size,
-					 SMP_CACHE_BYTES, goal, limit);
-#else
 	bootmem_data_t *bdata;
 	unsigned long pfn, goal, limit;
 
@@ -915,7 +777,6 @@ void * __init alloc_bootmem_section(unsigned long size,
 	bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];
 
 	return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
-#endif
 }
 #endif
 
@@ -927,16 +788,11 @@ void * __init __alloc_bootmem_node_nopanic(pg_data_t *pgdat, unsigned long size,
 	if (WARN_ON_ONCE(slab_is_available()))
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
 
-#ifdef CONFIG_NO_BOOTMEM
-	ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
-						 goal, -1ULL);
-#else
 	ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0);
 	if (ptr)
 		return ptr;
 
 	ptr = alloc_bootmem_core(pgdat->bdata, size, align, goal, 0);
-#endif
 	if (ptr)
 		return ptr;
 
@@ -987,11 +843,6 @@ void * __init __alloc_bootmem_low_node(pg_data_t *pgdat, unsigned long size,
 	if (WARN_ON_ONCE(slab_is_available()))
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
 
-#ifdef CONFIG_NO_BOOTMEM
-	return __alloc_memory_core_early(pgdat->node_id, size, align,
-				goal, ARCH_LOW_ADDRESS_LIMIT);
-#else
 	return ___alloc_bootmem_node(pgdat->bdata, size, align,
 				goal, ARCH_LOW_ADDRESS_LIMIT);
-#endif
 }
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
new file mode 100644
index 0000000..283673e
--- /dev/null
+++ b/mm/nobootmem.c
@@ -0,0 +1,389 @@
+/*
+ *  bootmem - A boot-time physical memory allocator and configurator
+ *
+ *  Copyright (C) 1999 Ingo Molnar
+ *                1999 Kanoj Sarcar, SGI
+ *                2008 Johannes Weiner
+ *
+ * Access to this subsystem has to be serialized externally (which is true
+ * for the boot process anyway).
+ */
+#include <linux/init.h>
+#include <linux/pfn.h>
+#include <linux/slab.h>
+#include <linux/bootmem.h>
+#include <linux/module.h>
+#include <linux/kmemleak.h>
+#include <linux/range.h>
+#include <linux/lmb.h>
+
+#include <asm/bug.h>
+#include <asm/io.h>
+#include <asm/processor.h>
+
+#include "internal.h"
+
+unsigned long max_low_pfn;
+unsigned long min_low_pfn;
+unsigned long max_pfn;
+
+#ifdef CONFIG_CRASH_DUMP
+/*
+ * If we have booted due to a crash, max_pfn will be a very low value. We need
+ * to know the amount of memory that the previous kernel used.
+ */
+unsigned long saved_max_pfn;
+#endif
+
+/*
+ * free_bootmem_late - free bootmem pages directly to page allocator
+ * @addr: starting address of the range
+ * @size: size of the range in bytes
+ *
+ * This is only useful when the bootmem allocator has already been torn
+ * down, but we are still initializing the system.  Pages are given directly
+ * to the page allocator, no bootmem metadata is updated because it is gone.
+ */
+void __init free_bootmem_late(unsigned long addr, unsigned long size)
+{
+	unsigned long cursor, end;
+
+	kmemleak_free_part(__va(addr), size);
+
+	cursor = PFN_UP(addr);
+	end = PFN_DOWN(addr + size);
+
+	for (; cursor < end; cursor++) {
+		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		totalram_pages++;
+	}
+}
+
+static void __init __free_pages_memory(unsigned long start, unsigned long end)
+{
+	int i;
+	unsigned long start_aligned, end_aligned;
+	int order = ilog2(BITS_PER_LONG);
+
+	start_aligned = (start + (BITS_PER_LONG - 1)) & ~(BITS_PER_LONG - 1);
+	end_aligned = end & ~(BITS_PER_LONG - 1);
+
+	if (end_aligned <= start_aligned) {
+		for (i = start; i < end; i++)
+			__free_pages_bootmem(pfn_to_page(i), 0);
+
+		return;
+	}
+
+	for (i = start; i < start_aligned; i++)
+		__free_pages_bootmem(pfn_to_page(i), 0);
+
+	for (i = start_aligned; i < end_aligned; i += BITS_PER_LONG)
+		__free_pages_bootmem(pfn_to_page(i), order);
+
+	for (i = end_aligned; i < end; i++)
+		__free_pages_bootmem(pfn_to_page(i), 0);
+}
+
+unsigned long __init free_all_memory_core_early(int nodeid)
+{
+	int i;
+	u64 start, end;
+	unsigned long count = 0;
+	struct range *range = NULL;
+	int nr_range;
+
+	nr_range = get_free_all_memory_range(&range, nodeid);
+
+	for (i = 0; i < nr_range; i++) {
+		start = range[i].start;
+		end = range[i].end;
+		count += end - start;
+		__free_pages_memory(start, end);
+	}
+
+	return count;
+}
+
+/**
+ * free_all_bootmem_node - release a node's free pages to the buddy allocator
+ * @pgdat: node to be released
+ *
+ * Returns the number of pages actually released.
+ */
+unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
+{
+	register_page_bootmem_info_node(pgdat);
+
+	/* free_all_memory_core_early(MAX_NUMNODES) will be called later */
+	return 0;
+}
+
+/**
+ * free_all_bootmem - release free pages to the buddy allocator
+ *
+ * Returns the number of pages actually released.
+ */
+unsigned long __init free_all_bootmem(void)
+{
+	/*
+	 * We need to use MAX_NUMNODES instead of NODE_DATA(0)->node_id
+	 *  because in some case like Node0 doesnt have RAM installed
+	 *  low ram will be on Node1
+	 * Use MAX_NUMNODES will make sure all ranges in early_node_map[]
+	 *  will be used instead of only Node0 related
+	 */
+	return free_all_memory_core_early(MAX_NUMNODES);
+}
+
+/**
+ * free_bootmem_node - mark a page range as usable
+ * @pgdat: node the range resides on
+ * @physaddr: starting address of the range
+ * @size: size of the range in bytes
+ *
+ * Partial pages will be considered reserved and left as they are.
+ *
+ * The range must reside completely on the specified node.
+ */
+void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
+			      unsigned long size)
+{
+	lmb_free_area(physaddr, physaddr + size);
+}
+
+/**
+ * free_bootmem - mark a page range as usable
+ * @addr: starting address of the range
+ * @size: size of the range in bytes
+ *
+ * Partial pages will be considered reserved and left as they are.
+ *
+ * The range must be contiguous but may span node boundaries.
+ */
+void __init free_bootmem(unsigned long addr, unsigned long size)
+{
+	lmb_free_area(addr, addr + size);
+}
+
+static void * __init ___alloc_bootmem_nopanic(unsigned long size,
+					unsigned long align,
+					unsigned long goal,
+					unsigned long limit)
+{
+	void *ptr;
+
+	if (WARN_ON_ONCE(slab_is_available()))
+		return kzalloc(size, GFP_NOWAIT);
+
+restart:
+
+	ptr = __alloc_memory_core_early(MAX_NUMNODES, size, align, goal, limit);
+
+	if (ptr)
+		return ptr;
+
+	if (goal != 0) {
+		goal = 0;
+		goto restart;
+	}
+
+	return NULL;
+}
+
+/**
+ * __alloc_bootmem_nopanic - allocate boot memory without panicking
+ * @size: size of the request in bytes
+ * @align: alignment of the region
+ * @goal: preferred starting address of the region
+ *
+ * The goal is dropped if it can not be satisfied and the allocation will
+ * fall back to memory below @goal.
+ *
+ * Allocation may happen on any node in the system.
+ *
+ * Returns NULL on failure.
+ */
+void * __init __alloc_bootmem_nopanic(unsigned long size, unsigned long align,
+					unsigned long goal)
+{
+	unsigned long limit = -1UL;
+
+	return ___alloc_bootmem_nopanic(size, align, goal, limit);
+}
+
+static void * __init ___alloc_bootmem(unsigned long size, unsigned long align,
+					unsigned long goal, unsigned long limit)
+{
+	void *mem = ___alloc_bootmem_nopanic(size, align, goal, limit);
+
+	if (mem)
+		return mem;
+	/*
+	 * Whoops, we cannot satisfy the allocation request.
+	 */
+	printk(KERN_ALERT "bootmem alloc of %lu bytes failed!\n", size);
+	panic("Out of memory");
+	return NULL;
+}
+
+/**
+ * __alloc_bootmem - allocate boot memory
+ * @size: size of the request in bytes
+ * @align: alignment of the region
+ * @goal: preferred starting address of the region
+ *
+ * The goal is dropped if it can not be satisfied and the allocation will
+ * fall back to memory below @goal.
+ *
+ * Allocation may happen on any node in the system.
+ *
+ * The function panics if the request can not be satisfied.
+ */
+void * __init __alloc_bootmem(unsigned long size, unsigned long align,
+			      unsigned long goal)
+{
+	unsigned long limit = -1UL;
+
+	return ___alloc_bootmem(size, align, goal, limit);
+}
+
+/**
+ * __alloc_bootmem_node - allocate boot memory from a specific node
+ * @pgdat: node to allocate from
+ * @size: size of the request in bytes
+ * @align: alignment of the region
+ * @goal: preferred starting address of the region
+ *
+ * The goal is dropped if it can not be satisfied and the allocation will
+ * fall back to memory below @goal.
+ *
+ * Allocation may fall back to any node in the system if the specified node
+ * can not hold the requested memory.
+ *
+ * The function panics if the request can not be satisfied.
+ */
+void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
+				   unsigned long align, unsigned long goal)
+{
+	if (WARN_ON_ONCE(slab_is_available()))
+		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
+
+	return __alloc_memory_core_early(pgdat->node_id, size, align,
+					 goal, -1ULL);
+}
+
+void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
+				   unsigned long align, unsigned long goal)
+{
+#ifdef MAX_DMA32_PFN
+	unsigned long end_pfn;
+
+	if (WARN_ON_ONCE(slab_is_available()))
+		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
+
+	/* update goal according ...MAX_DMA32_PFN */
+	end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
+
+	if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) &&
+	    (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) {
+		void *ptr;
+		unsigned long new_goal;
+
+		new_goal = MAX_DMA32_PFN << PAGE_SHIFT;
+		ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
+						 new_goal, -1ULL);
+		if (ptr)
+			return ptr;
+	}
+#endif
+
+	return __alloc_bootmem_node(pgdat, size, align, goal);
+
+}
+
+#ifdef CONFIG_SPARSEMEM
+/**
+ * alloc_bootmem_section - allocate boot memory from a specific section
+ * @size: size of the request in bytes
+ * @section_nr: sparse map section to allocate from
+ *
+ * Return NULL on failure.
+ */
+void * __init alloc_bootmem_section(unsigned long size,
+				    unsigned long section_nr)
+{
+	unsigned long pfn, goal, limit;
+
+	pfn = section_nr_to_pfn(section_nr);
+	goal = pfn << PAGE_SHIFT;
+	limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
+
+	return __alloc_memory_core_early(early_pfn_to_nid(pfn), size,
+					 SMP_CACHE_BYTES, goal, limit);
+}
+#endif
+
+void * __init __alloc_bootmem_node_nopanic(pg_data_t *pgdat, unsigned long size,
+				   unsigned long align, unsigned long goal)
+{
+	void *ptr;
+
+	if (WARN_ON_ONCE(slab_is_available()))
+		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
+
+	ptr =  __alloc_memory_core_early(pgdat->node_id, size, align,
+						 goal, -1ULL);
+	if (ptr)
+		return ptr;
+
+	return __alloc_bootmem_nopanic(size, align, goal);
+}
+
+#ifndef ARCH_LOW_ADDRESS_LIMIT
+#define ARCH_LOW_ADDRESS_LIMIT	0xffffffffUL
+#endif
+
+/**
+ * __alloc_bootmem_low - allocate low boot memory
+ * @size: size of the request in bytes
+ * @align: alignment of the region
+ * @goal: preferred starting address of the region
+ *
+ * The goal is dropped if it can not be satisfied and the allocation will
+ * fall back to memory below @goal.
+ *
+ * Allocation may happen on any node in the system.
+ *
+ * The function panics if the request can not be satisfied.
+ */
+void * __init __alloc_bootmem_low(unsigned long size, unsigned long align,
+				  unsigned long goal)
+{
+	return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
+}
+
+/**
+ * __alloc_bootmem_low_node - allocate low boot memory from a specific node
+ * @pgdat: node to allocate from
+ * @size: size of the request in bytes
+ * @align: alignment of the region
+ * @goal: preferred starting address of the region
+ *
+ * The goal is dropped if it can not be satisfied and the allocation will
+ * fall back to memory below @goal.
+ *
+ * Allocation may fall back to any node in the system if the specified node
+ * can not hold the requested memory.
+ *
+ * The function panics if the request can not be satisfied.
+ */
+void * __init __alloc_bootmem_low_node(pg_data_t *pgdat, unsigned long size,
+				       unsigned long align, unsigned long goal)
+{
+	if (WARN_ON_ONCE(slab_is_available()))
+		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
+
+	return __alloc_memory_core_early(pgdat->node_id, size, align,
+				goal, ARCH_LOW_ADDRESS_LIMIT);
+}
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 23/35] mm: move contig_page_data define to bootmem.c/nobootmem.c
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (21 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 22/35] bootmem: Add nobootmem.c to reduce the #ifdef Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 24/35] lmb: Move __alloc_memory_core_early() to nobootmem.c Yinghai Lu
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

We can remove #ifdef in mm/page_alloc.c

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 mm/bootmem.c    |    7 +++++++
 mm/nobootmem.c  |    5 +++++
 mm/page_alloc.c |    9 ---------
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 2741c34..ff55ad7 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -23,6 +23,13 @@
 
 #include "internal.h"
 
+#ifndef CONFIG_NEED_MULTIPLE_NODES
+struct pglist_data __refdata contig_page_data = {
+ .bdata = &bootmem_node_data[0]
+ };
+EXPORT_SYMBOL(contig_page_data);
+#endif
+
 unsigned long max_low_pfn;
 unsigned long min_low_pfn;
 unsigned long max_pfn;
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 283673e..abaec96 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -23,6 +23,11 @@
 
 #include "internal.h"
 
+#ifndef CONFIG_NEED_MULTIPLE_NODES
+struct pglist_data __refdata contig_page_data;
+EXPORT_SYMBOL(contig_page_data);
+#endif
+
 unsigned long max_low_pfn;
 unsigned long min_low_pfn;
 unsigned long max_pfn;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 631d2fc..867a3a8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4496,15 +4496,6 @@ void __init set_dma_reserve(unsigned long new_dma_reserve)
 	dma_reserve = new_dma_reserve;
 }
 
-#ifndef CONFIG_NEED_MULTIPLE_NODES
-struct pglist_data __refdata contig_page_data = {
-#ifndef CONFIG_NO_BOOTMEM
- .bdata = &bootmem_node_data[0]
-#endif
- };
-EXPORT_SYMBOL(contig_page_data);
-#endif
-
 void __init free_area_init(unsigned long *zones_size)
 {
 	free_area_init_node(0, zones_size,
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 24/35] lmb: Move __alloc_memory_core_early() to nobootmem.c
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (22 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 23/35] mm: move contig_page_data define to bootmem.c/nobootmem.c Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  2:36   ` Benjamin Herrenschmidt
  2010-05-14  0:19 ` [PATCH 25/35] x86: Have nobootmem version setup_bootmem_allocator() Yinghai Lu
                   ` (10 subsequent siblings)
  34 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

We can remove #ifdef in mm/page_alloc.c

and change that function to static

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 include/linux/mm.h |    2 --
 mm/nobootmem.c     |   21 +++++++++++++++++++++
 mm/page_alloc.c    |   24 ------------------------
 3 files changed, 21 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7774e1d..2a14361 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1161,8 +1161,6 @@ int add_from_early_node_map(struct range *range, int az,
 				   int nr_range, int nid);
 u64 __init find_memory_core_early(int nid, u64 size, u64 align,
 					u64 goal, u64 limit);
-void *__alloc_memory_core_early(int nodeid, u64 size, u64 align,
-				 u64 goal, u64 limit);
 typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
 extern void work_with_active_regions(int nid, work_fn_t work_fn, void *data);
 extern void sparse_memory_present_with_active_regions(int nid);
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index abaec96..e3cbde7 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -40,6 +40,27 @@ unsigned long max_pfn;
 unsigned long saved_max_pfn;
 #endif
 
+static void * __init __alloc_memory_core_early(int nid, u64 size, u64 align,
+					u64 goal, u64 limit)
+{
+	void *ptr;
+
+	u64 addr;
+
+	if (limit > lmb.current_limit)
+		limit = lmb.current_limit;
+
+	addr = find_memory_core_early(nid, size, align, goal, limit);
+
+	if (addr == LMB_ERROR)
+		return NULL;
+
+	ptr = phys_to_virt(addr);
+	memset(ptr, 0, size);
+	lmb_reserve_area(addr, addr + size, "BOOTMEM");
+	return ptr;
+}
+
 /*
  * free_bootmem_late - free bootmem pages directly to page allocator
  * @addr: starting address of the range
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 867a3a8..3449811 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3437,30 +3437,6 @@ int __init add_from_early_node_map(struct range *range, int az,
 	return nr_range;
 }
 
-#ifdef CONFIG_NO_BOOTMEM
-void * __init __alloc_memory_core_early(int nid, u64 size, u64 align,
-					u64 goal, u64 limit)
-{
-	void *ptr;
-
-	u64 addr;
-
-	if (limit > lmb.current_limit)
-		limit = lmb.current_limit;
-
-	addr = find_memory_core_early(nid, size, align, goal, limit);
-
-	if (addr == LMB_ERROR)
-		return NULL;
-
-	ptr = phys_to_virt(addr);
-	memset(ptr, 0, size);
-	lmb_reserve_area(addr, addr + size, "BOOTMEM");
-	return ptr;
-}
-#endif
-
-
 void __init work_with_active_regions(int nid, work_fn_t work_fn, void *data)
 {
 	int i;
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 25/35] x86: Have nobootmem version setup_bootmem_allocator()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (23 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 24/35] lmb: Move __alloc_memory_core_early() to nobootmem.c Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 26/35] x86: Put 64 bit numa node memmap above 16M Yinghai Lu
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

We can reduce #ifdef number from 3 to one in init_32.c

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_32.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index c01c711..dfdd035 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -771,11 +771,9 @@ static unsigned long __init setup_node_bootmem(int nodeid,
 
 	return bootmap + bootmap_size;
 }
-#endif
 
 void __init setup_bootmem_allocator(void)
 {
-#ifndef CONFIG_NO_BOOTMEM
 	int nodeid;
 	unsigned long bootmap_size, bootmap;
 	/*
@@ -787,13 +785,11 @@ void __init setup_bootmem_allocator(void)
 	if (bootmap == -1L)
 		panic("Cannot find bootmem map of size %ld\n", bootmap_size);
 	lmb_reserve_area(bootmap, bootmap + bootmap_size, "BOOTMAP");
-#endif
 
 	printk(KERN_INFO "  mapped low ram: 0 - %08lx\n",
 		 max_pfn_mapped<<PAGE_SHIFT);
 	printk(KERN_INFO "  low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT);
 
-#ifndef CONFIG_NO_BOOTMEM
 	for_each_online_node(nodeid) {
 		 unsigned long start_pfn, end_pfn;
 
@@ -811,10 +807,19 @@ void __init setup_bootmem_allocator(void)
 		bootmap = setup_node_bootmem(nodeid, start_pfn, end_pfn,
 						 bootmap);
 	}
-#endif
 
 	after_bootmem = 1;
 }
+#else
+void __init setup_bootmem_allocator(void)
+{
+	printk(KERN_INFO "  mapped low ram: 0 - %08lx\n",
+		 max_pfn_mapped<<PAGE_SHIFT);
+	printk(KERN_INFO "  low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT);
+
+	after_bootmem = 1;
+}
+#endif
 
 /*
  * paging_init() sets up the page tables - note that the first 8MB are
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 26/35] x86: Put 64 bit numa node memmap above 16M
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (24 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 25/35] x86: Have nobootmem version setup_bootmem_allocator() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 27/35] swiotlb: Use page alignment for early buffer allocation Yinghai Lu
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

Do not use 0x8000 hard code value anymore.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/numa_64.c |    2 +-
 arch/x86/mm/srat_64.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 18d2296..b8438ac 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -88,7 +88,7 @@ static int __init allocate_cachealigned_memnodemap(void)
 	if (memnodemapsize <= ARRAY_SIZE(memnode.embedded_map))
 		return 0;
 
-	addr = 0x8000;
+	addr = __pa(MAX_DMA_ADDRESS);
 	nodemap_size = roundup(sizeof(s16) * memnodemapsize, L1_CACHE_BYTES);
 	nodemap_addr = lmb_find_area(addr, max_pfn<<PAGE_SHIFT,
 				      nodemap_size, L1_CACHE_BYTES);
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 23f274c..c206b08 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -99,8 +99,8 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
 	unsigned long phys;
 
 	length = slit->header.length;
-	phys = lmb_find_area(0, max_pfn_mapped<<PAGE_SHIFT, length,
-		 PAGE_SIZE);
+	phys = lmb_find_area(__pa(MAX_DMA_ADDRESS), max_pfn_mapped<<PAGE_SHIFT,
+				 length, PAGE_SIZE);
 
 	if (phys == -1L)
 		panic(" Can not save slit!\n");
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 27/35] swiotlb: Use page alignment for early buffer allocation
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (25 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 26/35] x86: Put 64 bit numa node memmap above 16M Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 28/35] x86: Add sanitize_e820_map() Yinghai Lu
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu,
	FUJITA Tomonori, Becky Bruce

for 2.6.34

We could call free_bootmem_late() if swiotlb is not used, and
it will shrink to page alignement.

So alloc them with page alignment at first, to avoid lose two pages

before patch:
[    0.000000]     lmb_reserve_area: [00d3600000, 00d7600000]   swiotlb buffer
[    0.000000]     lmb_reserve_area: [00d7e7ef40, 00d7e9ef40]     swiotlb list
[    0.000000]     lmb_reserve_area: [00d7e3ef40, 00d7e7ef40]  swiotlb orig_ad
[    0.000000]     lmb_reserve_area: [000008a000, 0000092000]  swiotlb overflo

after patch will get
[    0.000000]     lmb_reserve_area: [00d3600000, 00d7600000]   swiotlb buffer
[    0.000000]     lmb_reserve_area: [00d7e7e000, 00d7e9e000]     swiotlb list
[    0.000000]     lmb_reserve_area: [00d7e3e000, 00d7e7e000]  swiotlb orig_ad
[    0.000000]     lmb_reserve_area: [000008a000, 0000092000]  swiotlb overflo

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Becky Bruce <beckyb@kernel.crashing.org>
---
 lib/swiotlb.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 5fddf72..1bd4258 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -159,7 +159,7 @@ swiotlb_init_with_default_size(size_t default_size, int verbose)
 	/*
 	 * Get IO TLB memory from the low pages
 	 */
-	io_tlb_start = alloc_bootmem_low_pages(bytes);
+	io_tlb_start = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
 	if (!io_tlb_start)
 		panic("Cannot allocate SWIOTLB buffer");
 	io_tlb_end = io_tlb_start + bytes;
@@ -169,16 +169,16 @@ swiotlb_init_with_default_size(size_t default_size, int verbose)
 	 * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
 	 * between io_tlb_start and io_tlb_end.
 	 */
-	io_tlb_list = alloc_bootmem(io_tlb_nslabs * sizeof(int));
+	io_tlb_list = alloc_bootmem_pages(PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
 	for (i = 0; i < io_tlb_nslabs; i++)
  		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
 	io_tlb_index = 0;
-	io_tlb_orig_addr = alloc_bootmem(io_tlb_nslabs * sizeof(phys_addr_t));
+	io_tlb_orig_addr = alloc_bootmem_pages(PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
 
 	/*
 	 * Get the overflow emergency buffer
 	 */
-	io_tlb_overflow_buffer = alloc_bootmem_low(io_tlb_overflow);
+	io_tlb_overflow_buffer = alloc_bootmem_low_pages(PAGE_ALIGN(io_tlb_overflow));
 	if (!io_tlb_overflow_buffer)
 		panic("Cannot allocate SWIOTLB overflow buffer!\n");
 	if (verbose)
@@ -304,13 +304,13 @@ void __init swiotlb_free(void)
 			   get_order(io_tlb_nslabs << IO_TLB_SHIFT));
 	} else {
 		free_bootmem_late(__pa(io_tlb_overflow_buffer),
-				  io_tlb_overflow);
+				  PAGE_ALIGN(io_tlb_overflow));
 		free_bootmem_late(__pa(io_tlb_orig_addr),
-				  io_tlb_nslabs * sizeof(phys_addr_t));
+				  PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
 		free_bootmem_late(__pa(io_tlb_list),
-				  io_tlb_nslabs * sizeof(int));
+				  PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
 		free_bootmem_late(__pa(io_tlb_start),
-				  io_tlb_nslabs << IO_TLB_SHIFT);
+				  PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
 	}
 }
 
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 28/35] x86: Add sanitize_e820_map()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (26 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 27/35] swiotlb: Use page alignment for early buffer allocation Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 29/35] x86: Change e820_saved to __initdata Yinghai Lu
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

for 2.6.34

So We don't need to carry e820.map with it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/e820.h |    3 +--
 arch/x86/kernel/e820.c      |   17 +++++++++++------
 arch/x86/kernel/efi.c       |    2 +-
 arch/x86/kernel/setup.c     |    8 ++++----
 arch/x86/xen/setup.c        |    4 +---
 5 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index f59db16..25962e1 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -82,8 +82,7 @@ extern int e820_any_mapped(u64 start, u64 end, unsigned type);
 extern int e820_all_mapped(u64 start, u64 end, unsigned type);
 extern void e820_add_region(u64 start, u64 size, int type);
 extern void e820_print_map(char *who);
-extern int
-sanitize_e820_map(struct e820entry *biosmap, int max_nr_map, u32 *pnr_map);
+int sanitize_e820_map(void);
 extern u64 e820_update_range(u64 start, u64 size, unsigned old_type,
 			       unsigned new_type);
 extern u64 e820_remove_range(u64 start, u64 size, unsigned old_type,
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 92c6021..416acee 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -225,7 +225,7 @@ void __init e820_print_map(char *who)
  *	   ______________________4_
  */
 
-int __init sanitize_e820_map(struct e820entry *biosmap, int max_nr_map,
+static int __init __sanitize_e820_map(struct e820entry *biosmap, int max_nr_map,
 			     u32 *pnr_map)
 {
 	struct change_member {
@@ -384,6 +384,11 @@ int __init sanitize_e820_map(struct e820entry *biosmap, int max_nr_map,
 	return 0;
 }
 
+int __init sanitize_e820_map(void)
+{
+	return __sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
+}
+
 static int __init __append_e820_map(struct e820entry *biosmap, int nr_map)
 {
 	while (nr_map) {
@@ -572,7 +577,7 @@ void __init update_e820(void)
 	u32 nr_map;
 
 	nr_map = e820.nr_map;
-	if (sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &nr_map))
+	if (__sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &nr_map))
 		return;
 	e820.nr_map = nr_map;
 	printk(KERN_INFO "modified physical RAM map:\n");
@@ -583,7 +588,7 @@ static void __init update_e820_saved(void)
 	u32 nr_map;
 
 	nr_map = e820_saved.nr_map;
-	if (sanitize_e820_map(e820_saved.map, ARRAY_SIZE(e820_saved.map), &nr_map))
+	if (__sanitize_e820_map(e820_saved.map, ARRAY_SIZE(e820_saved.map), &nr_map))
 		return;
 	e820_saved.nr_map = nr_map;
 }
@@ -678,7 +683,7 @@ void __init parse_e820_ext(struct setup_data *sdata, unsigned long pa_data)
 		sdata = early_ioremap(pa_data, map_len);
 	extmap = (struct e820entry *)(sdata->data);
 	__append_e820_map(extmap, entries);
-	sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
+	sanitize_e820_map();
 	if (map_len > PAGE_SIZE)
 		early_iounmap(sdata, map_len);
 	printk(KERN_INFO "extended physical RAM map:\n");
@@ -910,7 +915,7 @@ void __init finish_e820_parsing(void)
 	if (userdef) {
 		u32 nr = e820.nr_map;
 
-		if (sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &nr) < 0)
+		if (__sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &nr) < 0)
 			early_panic("Invalid user supplied memory map");
 		e820.nr_map = nr;
 
@@ -1040,7 +1045,7 @@ char *__init default_machine_specific_memory_setup(void)
 	 * the next section from 1mb->appropriate_mem_k
 	 */
 	new_nr = boot_params.e820_entries;
-	sanitize_e820_map(boot_params.e820_map,
+	__sanitize_e820_map(boot_params.e820_map,
 			ARRAY_SIZE(boot_params.e820_map),
 			&new_nr);
 	boot_params.e820_entries = new_nr;
diff --git a/arch/x86/kernel/efi.c b/arch/x86/kernel/efi.c
index ebe7c09..bb31919 100644
--- a/arch/x86/kernel/efi.c
+++ b/arch/x86/kernel/efi.c
@@ -273,7 +273,7 @@ static void __init do_add_efi_memmap(void)
 		}
 		e820_add_region(start, size, e820_type);
 	}
-	sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
+	sanitize_e820_map();
 }
 
 void __init efi_lmb_reserve_area(void)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c5ec1b8..9d9890c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -462,7 +462,7 @@ static void __init e820_reserve_setup_data(void)
 	if (!found)
 		return;
 
-	sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
+	sanitize_e820_map();
 	memcpy(&e820_saved, &e820, sizeof(struct e820map));
 	printk(KERN_INFO "extended physical RAM map:\n");
 	e820_print_map("reserve setup_data");
@@ -625,7 +625,7 @@ static int __init dmi_low_memory_corruption(const struct dmi_system_id *d)
 		d->ident);
 
 	e820_update_range(0, 0x10000, E820_RAM, E820_RESERVED);
-	sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
+	sanitize_e820_map();
 
 	return 0;
 }
@@ -694,7 +694,7 @@ static void __init trim_bios_range(void)
 	 * take them out.
 	 */
 	e820_remove_range(BIOS_BEGIN, BIOS_END - BIOS_BEGIN, E820_RAM, 1);
-	sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
+	sanitize_e820_map();
 }
 
 static u64 __init get_max_mapped(void)
@@ -874,7 +874,7 @@ void __init setup_arch(char **cmdline_p)
 	if (ppro_with_ram_bug()) {
 		e820_update_range(0x70000000ULL, 0x40000ULL, E820_RAM,
 				  E820_RESERVED);
-		sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
+		sanitize_e820_map();
 		printk(KERN_INFO "fixed physical RAM map:\n");
 		e820_print_map("bad_ppro");
 	}
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index be3fcf3..d2954eb 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -44,8 +44,6 @@ char * __init xen_memory_setup(void)
 
 	max_pfn = min(MAX_DOMAIN_PAGES, max_pfn);
 
-	e820.nr_map = 0;
-
 	e820_add_region(0, PFN_PHYS((u64)max_pfn), E820_RAM);
 
 	/*
@@ -66,7 +64,7 @@ char * __init xen_memory_setup(void)
 		      __pa(xen_start_info->pt_base),
 			"XEN START INFO");
 
-	sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
+	sanitize_e820_map();
 
 	return "Xen";
 }
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 29/35] x86: Change e820_saved to __initdata
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (27 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 28/35] x86: Add sanitize_e820_map() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 30/35] x86: Align e820 ram range to page Yinghai Lu
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

for 2.6.34

Add save_e820_map() and change e820_saved to static
also make it to be __initdata to get some bytes memory back.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/e820.h |    2 +-
 arch/x86/kernel/e820.c      |    9 +++++++--
 arch/x86/kernel/setup.c     |    2 +-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 25962e1..334281f 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -75,7 +75,6 @@ struct e820map {
 #ifdef __KERNEL__
 /* see comment in arch/x86/kernel/e820.c */
 extern struct e820map e820;
-extern struct e820map e820_saved;
 
 extern unsigned long pci_mem_start;
 extern int e820_any_mapped(u64 start, u64 end, unsigned type);
@@ -83,6 +82,7 @@ extern int e820_all_mapped(u64 start, u64 end, unsigned type);
 extern void e820_add_region(u64 start, u64 size, int type);
 extern void e820_print_map(char *who);
 int sanitize_e820_map(void);
+void save_e820_map(void);
 extern u64 e820_update_range(u64 start, u64 size, unsigned old_type,
 			       unsigned new_type);
 extern u64 e820_remove_range(u64 start, u64 size, unsigned old_type,
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 416acee..4bbbd6b 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -36,7 +36,7 @@
  * next kernel with full memory.
  */
 struct e820map e820;
-struct e820map e820_saved;
+static struct e820map __initdata e820_saved;
 
 /* For PCI or other memory-mapped resources */
 unsigned long pci_mem_start = 0xaeedbabe;
@@ -1072,12 +1072,17 @@ char *__init default_machine_specific_memory_setup(void)
 	return who;
 }
 
+void __init save_e820_map(void)
+{
+	memcpy(&e820_saved, &e820, sizeof(struct e820map));
+}
+
 void __init setup_memory_map(void)
 {
 	char *who;
 
 	who = x86_init.resources.memory_setup();
-	memcpy(&e820_saved, &e820, sizeof(struct e820map));
+	save_e820_map();
 	printk(KERN_INFO "BIOS-provided physical RAM map:\n");
 	e820_print_map(who);
 }
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 9d9890c..9083c9a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -463,7 +463,7 @@ static void __init e820_reserve_setup_data(void)
 		return;
 
 	sanitize_e820_map();
-	memcpy(&e820_saved, &e820, sizeof(struct e820map));
+	save_e820_map();
 	printk(KERN_INFO "extended physical RAM map:\n");
 	e820_print_map("reserve setup_data");
 }
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 30/35] x86: Align e820 ram range to page
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (28 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 29/35] x86: Change e820_saved to __initdata Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 31/35] x86: Use wake_system_ram_range() instead of e820_any_mapped() in agp path Yinghai Lu
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

for 2.6.34

To workaround wrong BIOS memory map.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/e820.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 4bbbd6b..78ca7ce 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -910,6 +910,47 @@ static int __init parse_memmap_opt(char *p)
 }
 early_param("memmap", parse_memmap_opt);
 
+static void __init e820_align_ram_page(void)
+{
+	int i;
+	bool changed = false;;
+
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *entry = &e820.map[i];
+		u64 start, end;
+		u64 start_aligned, end_aligned;
+
+		if (entry->type != E820_RAM)
+			continue;
+
+		start = entry->addr;
+		end = start + entry->size;
+
+		start_aligned = round_up(start, PAGE_SIZE);
+		end_aligned = round_down(end, PAGE_SIZE);
+
+		if (end_aligned <= start_aligned) {
+			e820_update_range(start, end - start, E820_RAM, E820_RESERVED);
+			changed = true;
+			continue;
+		}
+		if (start < start_aligned) {
+			e820_update_range(start, start_aligned - start, E820_RAM, E820_RESERVED);
+			changed = true;
+		}
+		if (end_aligned < end) {
+			e820_update_range(end_aligned, end - end_aligned, E820_RAM, E820_RESERVED);
+			changed = true;
+		}
+	}
+
+	if (changed) {
+		sanitize_e820_map();
+		printk(KERN_INFO "aligned physical RAM map:\n");
+		e820_print_map("aligned");
+	}
+}
+
 void __init finish_e820_parsing(void)
 {
 	if (userdef) {
@@ -922,6 +963,9 @@ void __init finish_e820_parsing(void)
 		printk(KERN_INFO "user-defined physical RAM map:\n");
 		e820_print_map("user");
 	}
+
+	/* In case, We have RAM entres that are not PAGE aligned */
+	e820_align_ram_page();
 }
 
 static inline const char *e820_type_to_string(int e820_type)
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 31/35] x86: Use wake_system_ram_range() instead of e820_any_mapped() in agp path
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (29 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 30/35] x86: Align e820 ram range to page Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 32/35] x86: Add get_centaur_ram_top() Yinghai Lu
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

Move apterture_valid back to .c

and early path still use e820_any_mapped()

So later we can make e820_any_mapped() to _init

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/gart.h   |   22 ----------------------
 arch/x86/kernel/aperture_64.c |   22 ++++++++++++++++++++++
 drivers/char/agp/amd64-agp.c  |   39 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/gart.h b/arch/x86/include/asm/gart.h
index 4ac5b0f..2b63a91 100644
--- a/arch/x86/include/asm/gart.h
+++ b/arch/x86/include/asm/gart.h
@@ -74,26 +74,4 @@ static inline void enable_gart_translation(struct pci_dev *dev, u64 addr)
         pci_write_config_dword(dev, AMD64_GARTAPERTURECTL, ctl);
 }
 
-static inline int aperture_valid(u64 aper_base, u32 aper_size, u32 min_size)
-{
-	if (!aper_base)
-		return 0;
-
-	if (aper_base + aper_size > 0x100000000ULL) {
-		printk(KERN_INFO "Aperture beyond 4GB. Ignoring.\n");
-		return 0;
-	}
-	if (e820_any_mapped(aper_base, aper_base + aper_size, E820_RAM)) {
-		printk(KERN_INFO "Aperture pointing to e820 RAM. Ignoring.\n");
-		return 0;
-	}
-	if (aper_size < min_size) {
-		printk(KERN_INFO "Aperture too small (%d MB) than (%d MB)\n",
-				 aper_size>>20, min_size>>20);
-		return 0;
-	}
-
-	return 1;
-}
-
 #endif /* _ASM_X86_GART_H */
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index b5d8b0b..4755b5a 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -145,6 +145,28 @@ static u32 __init find_cap(int bus, int slot, int func, int cap)
 	return 0;
 }
 
+static int __init aperture_valid(u64 aper_base, u32 aper_size, u32 min_size)
+{
+	if (!aper_base)
+		return 0;
+
+	if (aper_base + aper_size > 0x100000000ULL) {
+		printk(KERN_INFO "Aperture beyond 4GB. Ignoring.\n");
+		return 0;
+	}
+	if (e820_any_mapped(aper_base, aper_base + aper_size, E820_RAM)) {
+		printk(KERN_INFO "Aperture pointing to e820 RAM. Ignoring.\n");
+		return 0;
+	}
+	if (aper_size < min_size) {
+		printk(KERN_INFO "Aperture too small (%d MB) than (%d MB)\n",
+				 aper_size>>20, min_size>>20);
+		return 0;
+	}
+
+	return 1;
+}
+
 /* Read a standard AGPv3 bridge header */
 static u32 __init read_agp(int bus, int slot, int func, int cap, u32 *order)
 {
diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c
index fd50ead..85cabd0 100644
--- a/drivers/char/agp/amd64-agp.c
+++ b/drivers/char/agp/amd64-agp.c
@@ -14,7 +14,6 @@
 #include <linux/agp_backend.h>
 #include <linux/mmzone.h>
 #include <asm/page.h>		/* PAGE_SIZE */
-#include <asm/e820.h>
 #include <asm/k8.h>
 #include <asm/gart.h>
 #include "agp.h"
@@ -231,6 +230,44 @@ static const struct agp_bridge_driver amd_8151_driver = {
 	.agp_type_to_mask_type  = agp_generic_type_to_mask_type,
 };
 
+static int __devinit
+__is_ram(unsigned long pfn, unsigned long nr_pages, void *arg)
+{
+	return 1;
+}
+
+static int __devinit any_ram_in_range(u64 base, u64 size)
+{
+	unsigned long pfn, nr_pages;
+
+	pfn = base >> PAGE_SHIFT;
+	nr_pages = size >> PAGE_SHIFT;
+
+	return walk_system_ram_range(pfn, nr_pages, NULL, __is_ram) == 1;
+}
+
+static int __devinit aperture_valid(u64 aper_base, u32 aper_size, u32 min_size)
+{
+	if (!aper_base)
+		return 0;
+
+	if (aper_base + aper_size > 0x100000000ULL) {
+		printk(KERN_INFO "Aperture beyond 4GB. Ignoring.\n");
+		return 0;
+	}
+	if (any_ram_in_range(aper_base, aper_size)) {
+		printk(KERN_INFO "Aperture pointing to E820 RAM. Ignoring.\n");
+		return 0;
+	}
+	if (aper_size < min_size) {
+		printk(KERN_INFO "Aperture too small (%d MB) than (%d MB)\n",
+				 aper_size>>20, min_size>>20);
+		return 0;
+	}
+
+	return 1;
+}
+
 /* Some basic sanity checks for the aperture. */
 static int __devinit agp_aperture_valid(u64 aper, u32 size)
 {
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 32/35] x86: Add get_centaur_ram_top()
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (30 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 31/35] x86: Use wake_system_ram_range() instead of e820_any_mapped() in agp path Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 33/35] x86: Change e820_any_mapped() to __init Yinghai Lu
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

So we can avoid to access e820.map[] directly.

later we could move e820 to static and _initdata

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/e820.h   |    9 ++++++
 arch/x86/kernel/cpu/centaur.c |   53 +-------------------------------------
 arch/x86/kernel/e820.c        |   56 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup.c       |    2 +
 4 files changed, 69 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 334281f..cd7de51 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -76,6 +76,15 @@ struct e820map {
 /* see comment in arch/x86/kernel/e820.c */
 extern struct e820map e820;
 
+#if defined(CONFIG_X86_OOSTORE) && defined(CONFIG_CPU_SUP_CENTAUR)
+extern int centaur_ram_top;
+void get_centaur_ram_top(void);
+#else
+static inline void get_centaur_ram_top(void)
+{
+}
+#endif
+
 extern unsigned long pci_mem_start;
 extern int e820_any_mapped(u64 start, u64 end, unsigned type);
 extern int e820_all_mapped(u64 start, u64 end, unsigned type);
diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
index e58d978..bb49358 100644
--- a/arch/x86/kernel/cpu/centaur.c
+++ b/arch/x86/kernel/cpu/centaur.c
@@ -37,63 +37,14 @@ static void __cpuinit centaur_mcr_insert(int reg, u32 base, u32 size, int key)
 	mtrr_centaur_report_mcr(reg, lo, hi);	/* Tell the mtrr driver */
 }
 
-/*
- * Figure what we can cover with MCR's
- *
- * Shortcut: We know you can't put 4Gig of RAM on a winchip
- */
-static u32 __cpuinit ramtop(void)
-{
-	u32 clip = 0xFFFFFFFFUL;
-	u32 top = 0;
-	int i;
-
-	for (i = 0; i < e820.nr_map; i++) {
-		unsigned long start, end;
-
-		if (e820.map[i].addr > 0xFFFFFFFFUL)
-			continue;
-		/*
-		 * Don't MCR over reserved space. Ignore the ISA hole
-		 * we frob around that catastrophe already
-		 */
-		if (e820.map[i].type == E820_RESERVED) {
-			if (e820.map[i].addr >= 0x100000UL &&
-			    e820.map[i].addr < clip)
-				clip = e820.map[i].addr;
-			continue;
-		}
-		start = e820.map[i].addr;
-		end = e820.map[i].addr + e820.map[i].size;
-		if (start >= end)
-			continue;
-		if (end > top)
-			top = end;
-	}
-	/*
-	 * Everything below 'top' should be RAM except for the ISA hole.
-	 * Because of the limited MCR's we want to map NV/ACPI into our
-	 * MCR range for gunk in RAM
-	 *
-	 * Clip might cause us to MCR insufficient RAM but that is an
-	 * acceptable failure mode and should only bite obscure boxes with
-	 * a VESA hole at 15Mb
-	 *
-	 * The second case Clip sometimes kicks in is when the EBDA is marked
-	 * as reserved. Again we fail safe with reasonable results
-	 */
-	if (top > clip)
-		top = clip;
-
-	return top;
-}
+int __cpuinitdata centaur_ram_top;
 
 /*
  * Compute a set of MCR's to give maximum coverage
  */
 static int __cpuinit centaur_mcr_compute(int nr, int key)
 {
-	u32 mem = ramtop();
+	u32 mem = centaur_ram_top;
 	u32 root = power2(mem);
 	u32 base = root;
 	u32 top = root;
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 78ca7ce..f4b1285 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1131,6 +1131,62 @@ void __init setup_memory_map(void)
 	e820_print_map(who);
 }
 
+#if defined(CONFIG_X86_OOSTORE) && defined(CONFIG_CPU_SUP_CENTAUR)
+/*
+ * Figure what we can cover with MCR's
+ *
+ * Shortcut: We know you can't put 4Gig of RAM on a winchip
+ */
+void __init get_centaur_ram_top(void)
+{
+	u32 clip = 0xFFFFFFFFUL;
+	u32 top = 0;
+	int i;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR)
+		return;
+
+	for (i = 0; i < e820.nr_map; i++) {
+		unsigned long start, end;
+
+		if (e820.map[i].addr > 0xFFFFFFFFUL)
+			continue;
+		/*
+		 * Don't MCR over reserved space. Ignore the ISA hole
+		 * we frob around that catastrophe already
+		 */
+		if (e820.map[i].type == E820_RESERVED) {
+			if (e820.map[i].addr >= 0x100000UL &&
+			    e820.map[i].addr < clip)
+				clip = e820.map[i].addr;
+			continue;
+		}
+		start = e820.map[i].addr;
+		end = e820.map[i].addr + e820.map[i].size;
+		if (start >= end)
+			continue;
+		if (end > top)
+			top = end;
+	}
+	/*
+	 * Everything below 'top' should be RAM except for the ISA hole.
+	 * Because of the limited MCR's we want to map NV/ACPI into our
+	 * MCR range for gunk in RAM
+	 *
+	 * Clip might cause us to MCR insufficient RAM but that is an
+	 * acceptable failure mode and should only bite obscure boxes with
+	 * a VESA hole at 15Mb
+	 *
+	 * The second case Clip sometimes kicks in is when the EBDA is marked
+	 * as reserved. Again we fail safe with reasonable results
+	 */
+	if (top > clip)
+		top = clip;
+
+	centaur_ram_top = top;
+}
+#endif
+
 void __init init_lmb_memory(void)
 {
 	lmb_init();
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 9083c9a..e15a626 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -893,6 +893,8 @@ void __init setup_arch(char **cmdline_p)
 	if (mtrr_trim_uncached_memory(max_pfn))
 		max_pfn = e820_end_of_ram_pfn();
 
+	get_centaur_ram_top();
+
 #ifdef CONFIG_X86_32
 	/* max_low_pfn get updated here */
 	find_low_pfn_range();
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 33/35] x86: Change e820_any_mapped() to __init
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (31 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 32/35] x86: Add get_centaur_ram_top() Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 34/35] x86: Use walk_system_ream_range() instead of referring e820.map directly for tboot Yinghai Lu
  2010-05-14  0:19 ` [PATCH 35/35] x86: make e820 to be __initdata Yinghai Lu
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

We don't need to expose e820_any_mapped() anymore

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/e820.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index f4b1285..0e30c2e 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -47,9 +47,10 @@ EXPORT_SYMBOL(pci_mem_start);
 /*
  * This function checks if any part of the range <start,end> is mapped
  * with type.
+ * phys_pud_init() is using it and is _meminit, but we have !after_bootmem
+ * so could use refok here
  */
-int
-e820_any_mapped(u64 start, u64 end, unsigned type)
+int __init_refok e820_any_mapped(u64 start, u64 end, unsigned type)
 {
 	int i;
 
@@ -64,7 +65,6 @@ e820_any_mapped(u64 start, u64 end, unsigned type)
 	}
 	return 0;
 }
-EXPORT_SYMBOL_GPL(e820_any_mapped);
 
 /*
  * This function checks if the entire range <start,end> is mapped with type.
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 34/35] x86: Use walk_system_ream_range() instead of referring e820.map directly for tboot
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (32 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 33/35] x86: Change e820_any_mapped() to __init Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  2010-05-14  0:19 ` [PATCH 35/35] x86: make e820 to be __initdata Yinghai Lu
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

So we can make e820 to be __initdata

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/tboot.c |   22 +++++++++-------------
 1 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index c2f1b26..20b9531 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -171,34 +171,30 @@ static void tboot_create_trampoline(void)
 
 #ifdef CONFIG_ACPI_SLEEP
 
-static void add_mac_region(phys_addr_t start, unsigned long size)
+static int
+add_mac_region(unsigned long start_pfn, unsigned long nr_pages, void  *arg)
 {
+	u64 start = start_pfn;
+	u64 size = nr_pages;
 	struct tboot_mac_region *mr;
-	phys_addr_t end = start + size;
 
 	if (tboot->num_mac_regions >= MAX_TB_MAC_REGIONS)
 		panic("tboot: Too many MAC regions\n");
 
 	if (start && size) {
 		mr = &tboot->mac_regions[tboot->num_mac_regions++];
-		mr->start = round_down(start, PAGE_SIZE);
-		mr->size  = round_up(end, PAGE_SIZE) - mr->start;
+		mr->start = start << PAGE_SHIFT;
+		mr->size  = (u32) (size << PAGE_SHIFT);
 	}
+
+	return 0;
 }
 
 static int tboot_setup_sleep(void)
 {
-	int i;
-
 	tboot->num_mac_regions = 0;
 
-	for (i = 0; i < e820.nr_map; i++) {
-		if ((e820.map[i].type != E820_RAM)
-		 && (e820.map[i].type != E820_RESERVED_KERN))
-			continue;
-
-		add_mac_region(e820.map[i].addr, e820.map[i].size);
-	}
+	walk_system_ram_range(0, max_pfn, NULL, add_mac_region);
 
 	tboot->acpi_sinfo.kernel_s3_resume_vector = acpi_wakeup_address;
 
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 35/35] x86: make e820 to be __initdata
  2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
                   ` (33 preceding siblings ...)
  2010-05-14  0:19 ` [PATCH 34/35] x86: Use walk_system_ream_range() instead of referring e820.map directly for tboot Yinghai Lu
@ 2010-05-14  0:19 ` Yinghai Lu
  34 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14  0:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: David Miller, Benjamin Herrenschmidt, Linus Torvalds,
	Johannes Weiner, linux-kernel, linux-arch, Yinghai Lu

Finally no user after init boot stage. We can free it to save some bytes.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/e820.h |    2 --
 arch/x86/kernel/e820.c      |    2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index cd7de51..f2ab72e 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -73,8 +73,6 @@ struct e820map {
 #define BIOS_END		0x00100000
 
 #ifdef __KERNEL__
-/* see comment in arch/x86/kernel/e820.c */
-extern struct e820map e820;
 
 #if defined(CONFIG_X86_OOSTORE) && defined(CONFIG_CPU_SUP_CENTAUR)
 extern int centaur_ram_top;
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 0e30c2e..70255d5 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -35,7 +35,7 @@
  * user can e.g. boot the original kernel with mem=1G while still booting the
  * next kernel with full memory.
  */
-struct e820map e820;
+static struct e820map __initdata e820;
 static struct e820map __initdata e820_saved;
 
 /* For PCI or other memory-mapped resources */
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res
  2010-05-14  0:19 ` [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res Yinghai Lu
@ 2010-05-14  2:12   ` Benjamin Herrenschmidt
  2010-05-14  6:19     ` Yinghai
  2010-05-14  7:03     ` Yinghai
  0 siblings, 2 replies; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:12 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> 1. expose lmb_debug
> 2. expose lmb_reserved_init_regions
> 3. expose lmb_add_region
> 4. prection for include linux/lmb.h in mm/page_alloc.c and mm/bootmem.c
> 5. lmb_find_base() should return LMB_ERROR in one failing path.
>    (this one cost me 3 hours !)
> 6. move LMB_ERROR to lmb.h

Oh well, let's start somewhere...

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  include/linux/lmb.h |    4 ++++
>  lib/lmb.c           |   21 +++++++++------------
>  2 files changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/lmb.h b/include/linux/lmb.h
> index 6f8c4bd..7987766 100644
> --- a/include/linux/lmb.h
> +++ b/include/linux/lmb.h
> @@ -19,6 +19,7 @@
>  #include <asm/lmb.h>
>  
>  #define INIT_LMB_REGIONS 128
> +#define LMB_ERROR	(~(phys_addr_t)0)

Ok so this was meant to remain internal. You seem to want to expose a
whole lot of LMB internals, I suppose for your new arch/x86/lmb.c and I
really really don't like it.

If we expose LMB_ERROR then all lmb calls that can fail should return
that. However, the API calls all return 0 instead. Changing that means
fixing all callers.

We can't just have a mix bag of result code in stuff that is exposed.

If all you need LMB_ERROR is to expose lmb_find_area() and
lmb_add_region() then make the above __ and export a public variant of
it that returns 0.

But that's not the right approach. The right thing to do I believe is to
instead change LMB to use proper errno.h values.

For things like lmb_add_region(), return then as a negative int. For
things that return a phys_addr_t as well with a proper casting macro
since I -think- we can safely consider that phys addrs in the range
-PAGE_SIZE..-1 can be error codes. Just like we do for PTR_ERR etc...

This should be a separate patch btw.

I'm also not too happy with exposing lmb_add_region(). Why would you
ever need to expose it ? Just call lmb_reserve() if you want to reserve
something. lmb_add_region() is an internal function and has no business
being used outside of the main lmb.c file.

Also:

>  	/* Calculate new doubled size */
>  	old_size = type->max * sizeof(struct lmb_region);
>  	new_size = old_size << 1;
> @@ -206,7 +199,7 @@ static int lmb_double_array(struct lmb_type *type)
>  		new_array = kmalloc(new_size, GFP_KERNEL);
>  		addr = new_array == NULL ? LMB_ERROR : __pa(new_array);
>  	} else
> -		addr = lmb_find_base(new_size, sizeof(phys_addr_t), 0, LMB_ALLOC_ACCESSIBLE);
> +		addr = lmb_find_base(new_size, sizeof(struct lmb_region), 0, LMB_ALLOC_ACCESSIBLE);

Why this change ? Does it need to be aligned to the struct size ? if you
really want that and have a good justification, make this a separate
patch and explain why you are doing that in the changeset comment.

>  	if (addr == LMB_ERROR) {
>  		pr_err("lmb: Failed to double %s array from %ld to %ld entries !\n",
>  		       lmb_type_name(type), type->max, type->max * 2);
> @@ -214,6 +207,10 @@ static int lmb_double_array(struct lmb_type *type)
>  	}
>  	new_array = __va(addr);
>  
> +	if (lmb_debug)
> +		pr_info("lmb: %s array is doubled to %ld at %llx - %llx",
> +			 lmb_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size);
> +
>  	/* Found space, we now need to move the array over before
>  	 * we add the reserved region since it may be our reserved
>  	 * array itself that is full.
> @@ -249,7 +246,7 @@ extern int __weak lmb_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
>  	return 1;
>  }

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/35] lmb: Add ARCH_DISCARD_LMB to put lmb code to .init
  2010-05-14  0:19 ` [PATCH 03/35] lmb: Add ARCH_DISCARD_LMB to put lmb code to .init Yinghai Lu
@ 2010-05-14  2:14   ` Benjamin Herrenschmidt
  2010-05-14  6:21     ` Yinghai
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:14 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> So those lmb bits could be released after kernel is booted up.
> 
> Arch code could define ARCH_DISCARD_LMB in asm/lmb.h,
> __init_lmb will become __init, __initdata_lmb will becom __initdata
> 
> x86 code will use that.

So you do not intend to use lmb after boot ? This will break the debugfs
files unless you also remove those.

Cheers,
Ben.

> -v2: use ARCH_DISCARD_LMB according to Michael Ellerman
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  include/linux/lmb.h |    8 ++++++++
>  lib/lmb.c           |   46 +++++++++++++++++++++++-----------------------
>  2 files changed, 31 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/lmb.h b/include/linux/lmb.h
> index 2b87641..0b073a3 100644
> --- a/include/linux/lmb.h
> +++ b/include/linux/lmb.h
> @@ -145,6 +145,14 @@ static inline unsigned long lmb_region_pages(const struct lmb_region *reg)
>  	     region++)
>  
> 
> +#ifdef ARCH_DISCARD_LMB
> +#define __init_lmb __init
> +#define __initdata_lmb __initdata
> +#else
> +#define __init_lmb
> +#define __initdata_lmb
> +#endif
> +
>  #endif /* CONFIG_HAVE_LMB */
>  
>  #endif /* __KERNEL__ */
> diff --git a/lib/lmb.c b/lib/lmb.c
> index fddd72c..6d49a17 100644
> --- a/lib/lmb.c
> +++ b/lib/lmb.c
> @@ -20,11 +20,11 @@
>  #include <linux/seq_file.h>
>  #include <linux/lmb.h>
>  
> -struct lmb lmb;
> +struct lmb lmb __initdata_lmb;
>  
> -int lmb_debug;
> -static struct lmb_region lmb_memory_init_regions[INIT_LMB_REGIONS + 1];
> -struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1];
> +int lmb_debug __initdata_lmb;
> +static struct lmb_region lmb_memory_init_regions[INIT_LMB_REGIONS + 1] __initdata_lmb;
> +struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1] __initdata_lmb;
>  
>  /* inline so we don't get a warning when pr_debug is compiled out */
>  static inline const char *lmb_type_name(struct lmb_type *type)
> @@ -41,23 +41,23 @@ static inline const char *lmb_type_name(struct lmb_type *type)
>   * Address comparison utilities
>   */
>  
> -static phys_addr_t lmb_align_down(phys_addr_t addr, phys_addr_t size)
> +static phys_addr_t __init_lmb lmb_align_down(phys_addr_t addr, phys_addr_t size)
>  {
>  	return addr & ~(size - 1);
>  }
>  
> -static phys_addr_t lmb_align_up(phys_addr_t addr, phys_addr_t size)
> +static phys_addr_t __init_lmb lmb_align_up(phys_addr_t addr, phys_addr_t size)
>  {
>  	return (addr + (size - 1)) & ~(size - 1);
>  }
>  
> -static unsigned long lmb_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
> +static unsigned long __init_lmb lmb_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
>  				       phys_addr_t base2, phys_addr_t size2)
>  {
>  	return ((base1 < (base2 + size2)) && (base2 < (base1 + size1)));
>  }
>  
> -static long lmb_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
> +static long __init_lmb lmb_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
>  			       phys_addr_t base2, phys_addr_t size2)
>  {
>  	if (base2 == base1 + size1)
> @@ -68,7 +68,7 @@ static long lmb_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
>  	return 0;
>  }
>  
> -static long lmb_regions_adjacent(struct lmb_type *type,
> +static long __init_lmb lmb_regions_adjacent(struct lmb_type *type,
>  				 unsigned long r1, unsigned long r2)
>  {
>  	phys_addr_t base1 = type->regions[r1].base;
> @@ -79,7 +79,7 @@ static long lmb_regions_adjacent(struct lmb_type *type,
>  	return lmb_addrs_adjacent(base1, size1, base2, size2);
>  }
>  
> -long lmb_overlaps_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
> +long __init_lmb lmb_overlaps_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
>  {
>  	unsigned long i;
>  
> @@ -155,7 +155,7 @@ static phys_addr_t __init lmb_find_base(phys_addr_t size, phys_addr_t align,
>  	return LMB_ERROR;
>  }
>  
> -static void lmb_remove_region(struct lmb_type *type, unsigned long r)
> +static void __init_lmb lmb_remove_region(struct lmb_type *type, unsigned long r)
>  {
>  	unsigned long i;
>  
> @@ -167,14 +167,14 @@ static void lmb_remove_region(struct lmb_type *type, unsigned long r)
>  }
>  
>  /* Assumption: base addr of region 1 < base addr of region 2 */
> -static void lmb_coalesce_regions(struct lmb_type *type,
> +static void __init_lmb lmb_coalesce_regions(struct lmb_type *type,
>  		unsigned long r1, unsigned long r2)
>  {
>  	type->regions[r1].size += type->regions[r2].size;
>  	lmb_remove_region(type, r2);
>  }
>  
> -static int lmb_double_array(struct lmb_type *type)
> +static int __init_lmb lmb_double_array(struct lmb_type *type)
>  {
>  	struct lmb_region *new_array, *old_array;
>  	phys_addr_t old_size, new_size, addr;
> @@ -240,13 +240,13 @@ static int lmb_double_array(struct lmb_type *type)
>  	return 0;
>  }
>  
> -extern int __weak lmb_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
> +extern int __init_lmb __weak lmb_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
>  					  phys_addr_t addr2, phys_addr_t size2)
>  {
>  	return 1;
>  }
>  
> -long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
> +long __init_lmb lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
>  {
>  	unsigned long coalesced = 0;
>  	long adjacent, i;
> @@ -333,13 +333,13 @@ long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
>  	return 0;
>  }
>  
> -long lmb_add(phys_addr_t base, phys_addr_t size)
> +long __init_lmb lmb_add(phys_addr_t base, phys_addr_t size)
>  {
>  	return lmb_add_region(&lmb.memory, base, size);
>  
>  }
>  
> -static long __lmb_remove(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
> +static long __init_lmb __lmb_remove(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
>  {
>  	phys_addr_t rgnbegin, rgnend;
>  	phys_addr_t end = base + size;
> @@ -387,7 +387,7 @@ static long __lmb_remove(struct lmb_type *type, phys_addr_t base, phys_addr_t si
>  	return lmb_add_region(type, end, rgnend - end);
>  }
>  
> -long lmb_remove(phys_addr_t base, phys_addr_t size)
> +long __init_lmb lmb_remove(phys_addr_t base, phys_addr_t size)
>  {
>  	return __lmb_remove(&lmb.memory, base, size);
>  }
> @@ -544,7 +544,7 @@ phys_addr_t __init lmb_phys_mem_size(void)
>  	return lmb.memory_size;
>  }
>  
> -phys_addr_t lmb_end_of_DRAM(void)
> +phys_addr_t __init_lmb lmb_end_of_DRAM(void)
>  {
>  	int idx = lmb.memory.cnt - 1;
>  
> @@ -605,7 +605,7 @@ int __init lmb_is_reserved(phys_addr_t addr)
>  	return 0;
>  }
>  
> -int lmb_is_region_reserved(phys_addr_t base, phys_addr_t size)
> +int __init_lmb lmb_is_region_reserved(phys_addr_t base, phys_addr_t size)
>  {
>  	return lmb_overlaps_region(&lmb.reserved, base, size);
>  }
> @@ -616,7 +616,7 @@ void __init lmb_set_current_limit(phys_addr_t limit)
>  	lmb.current_limit = limit;
>  }
>  
> -static void lmb_dump(struct lmb_type *region, char *name)
> +static void __init_lmb lmb_dump(struct lmb_type *region, char *name)
>  {
>  	unsigned long long base, size;
>  	int i;
> @@ -632,7 +632,7 @@ static void lmb_dump(struct lmb_type *region, char *name)
>  	}
>  }
>  
> -void lmb_dump_all(void)
> +void __init_lmb lmb_dump_all(void)
>  {
>  	if (!lmb_debug)
>  		return;
> @@ -695,7 +695,7 @@ static int __init early_lmb(char *p)
>  }
>  early_param("lmb", early_lmb);
>  
> -#ifdef CONFIG_DEBUG_FS
> +#if defined(CONFIG_DEBUG_FS) && !defined(ARCH_DISCARD_LMB)
>  
>  static int lmb_debug_show(struct seq_file *m, void *private)
>  {



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/35] lmb: Add lmb_find_area()
  2010-05-14  0:19 ` [PATCH 04/35] lmb: Add lmb_find_area() Yinghai Lu
@ 2010-05-14  2:16   ` Benjamin Herrenschmidt
  2010-05-14  6:25     ` Yinghai
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> it is a wrapper for lmb_find_base
> 
> make it more easy for x86 to use lmb. ( rebase )
> x86 early_res is using find/reserve pattern instead of alloc.
> 
> -v2: Change name to lmb_find_area() according to Michael Ellerman
> -v3: Add generic weak version __lmb_find_area()
>      so keep the path for fallback to x86 version that handle from low
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  include/linux/lmb.h |    4 ++++
>  lib/lmb.c           |   27 ++++++++++++++++++++++++++-
>  2 files changed, 30 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/lmb.h b/include/linux/lmb.h
> index 0b073a3..3c23dc8 100644
> --- a/include/linux/lmb.h
> +++ b/include/linux/lmb.h
> @@ -44,6 +44,10 @@ extern struct lmb lmb;
>  extern int lmb_debug;
>  extern struct lmb_region lmb_reserved_init_regions[];
>  
> +u64 __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
> +			u64 size, u64 align);
> +u64 lmb_find_area(u64 start, u64 end, u64 size, u64 align);

See my comments about sorting out the return from that function.

Also, I don't understand the need for that __ version. It looks like
something you should keep inside x86, I don't see the need for it in the
generic LMB code, since it just does trivial cropping of the arguments.

Also "ei_last" and "ei_start" are pretty bad names for its arguments
anyways. To some extent I wonder if the caller should be responsible for
doing the cropping in the first place.

Cheers,
Ben.

>  extern void __init lmb_init(void);
>  extern void __init lmb_analyze(void);
>  extern long lmb_add(phys_addr_t base, phys_addr_t size);
> diff --git a/lib/lmb.c b/lib/lmb.c
> index 6d49a17..f917dbf 100644
> --- a/lib/lmb.c
> +++ b/lib/lmb.c
> @@ -155,6 +155,31 @@ static phys_addr_t __init lmb_find_base(phys_addr_t size, phys_addr_t align,
>  	return LMB_ERROR;
>  }
>  
> +u64 __init __weak __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
> +				 u64 size, u64 align)
> +{
> +	u64 final_start, final_end;
> +	u64 mem;
> +
> +	final_start = max(ei_start, start);
> +	final_end = min(ei_last, end);
> +
> +	if (final_start >= final_end)
> +		return LMB_ERROR;
> +
> +	mem = lmb_find_base(size, align, final_start, final_end);
> +
> +	return mem;
> +}
> +
> +/*
> + * Find a free area with specified alignment in a specific range.
> + */
> +u64 __init __weak lmb_find_area(u64 start, u64 end, u64 size, u64 align)
> +{
> +	return lmb_find_base(size, align, start, end);
> +}
> +
>  static void __init_lmb lmb_remove_region(struct lmb_type *type, unsigned long r)
>  {
>  	unsigned long i;
> @@ -199,7 +224,7 @@ static int __init_lmb lmb_double_array(struct lmb_type *type)
>  		new_array = kmalloc(new_size, GFP_KERNEL);
>  		addr = new_array == NULL ? LMB_ERROR : __pa(new_array);
>  	} else
> -		addr = lmb_find_base(new_size, sizeof(struct lmb_region), 0, LMB_ALLOC_ACCESSIBLE);
> +		addr = lmb_find_area(0, lmb.current_limit, new_size, sizeof(struct lmb_region));
>  	if (addr == LMB_ERROR) {
>  		pr_err("lmb: Failed to double %s array from %ld to %ld entries !\n",
>  		       lmb_type_name(type), type->max, type->max * 2);



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/35] x86, lmb: Add lmb_find_area_size()
  2010-05-14  0:19 ` [PATCH 05/35] x86, lmb: Add lmb_find_area_size() Yinghai Lu
@ 2010-05-14  2:20   ` Benjamin Herrenschmidt
  2010-05-14  6:28     ` Yinghai
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:20 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> size is returned according free range.
> Will be used to find free ranges for early_memtest and memory corruption check

Please provide a better explanation of what these functions do. It's
very unclear from the code (which looks like it could be a lot simpler),
and the name of the function is totally obscure as well.

We have asked you how many times to improve on your changeset comments
at the -very-least- ? Explain what functions do and why they do it, and
when I say explain, I don't mean 2 lines of rot13. I mean actual
sentences that a human being can read and have a chance to understand.

Also, I would appreciate if you picked up the habit of adding docbook
doco for any API function you add, even if it's in the x86 "internal"
file.

Cheers,
Ben.

> Do not mess it up with mm/lmb.c yet.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/include/asm/lmb.h |    8 ++++
>  arch/x86/mm/Makefile       |    2 +
>  arch/x86/mm/lmb.c          |   88 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 98 insertions(+), 0 deletions(-)
>  create mode 100644 arch/x86/include/asm/lmb.h
>  create mode 100644 arch/x86/mm/lmb.c
> 
> diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
> new file mode 100644
> index 0000000..aa3a66e
> --- /dev/null
> +++ b/arch/x86/include/asm/lmb.h
> @@ -0,0 +1,8 @@
> +#ifndef _X86_LMB_H
> +#define _X86_LMB_H
> +
> +#define ARCH_DISCARD_LMB
> +
> +u64 lmb_find_area_size(u64 start, u64 *sizep, u64 align);
> +
> +#endif
> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index a4c7683..8ab0505 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -26,4 +26,6 @@ obj-$(CONFIG_NUMA)		+= numa.o numa_$(BITS).o
>  obj-$(CONFIG_K8_NUMA)		+= k8topology_64.o
>  obj-$(CONFIG_ACPI_NUMA)		+= srat_$(BITS).o
>  
> +obj-$(CONFIG_HAVE_LMB)		+= lmb.o
> +
>  obj-$(CONFIG_MEMTEST)		+= memtest.o
> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
> new file mode 100644
> index 0000000..9d26eed
> --- /dev/null
> +++ b/arch/x86/mm/lmb.c
> @@ -0,0 +1,88 @@
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/init.h>
> +#include <linux/bitops.h>
> +#include <linux/lmb.h>
> +#include <linux/bootmem.h>
> +#include <linux/mm.h>
> +#include <linux/range.h>
> +
> +/* Check for already reserved areas */
> +static inline bool __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
> +{
> +	int i;
> +	u64 addr = *addrp, last;
> +	u64 size = *sizep;
> +	bool changed = false;
> +again:
> +	last = addr + size;
> +	for (i = 0; i < lmb.reserved.cnt && lmb.reserved.regions[i].size; i++) {
> +		struct lmb_region *r = &lmb.reserved.regions[i];
> +		if (last > r->base && addr < r->base) {
> +			size = r->base - addr;
> +			changed = true;
> +			goto again;
> +		}
> +		if (last > (r->base + r->size) && addr < (r->base + r->size)) {
> +			addr = round_up(r->base + r->size, align);
> +			size = last - addr;
> +			changed = true;
> +			goto again;
> +		}
> +		if (last <= (r->base + r->size) && addr >= r->base) {
> +			(*sizep)++;
> +			return false;
> +		}
> +	}
> +	if (changed) {
> +		*addrp = addr;
> +		*sizep = size;
> +	}
> +	return changed;
> +}
> +
> +static u64 __init __lmb_find_area_size(u64 ei_start, u64 ei_last, u64 start,
> +			 u64 *sizep, u64 align)
> +{
> +	u64 addr, last;
> +
> +	addr = round_up(ei_start, align);
> +	if (addr < start)
> +		addr = round_up(start, align);
> +	if (addr >= ei_last)
> +		goto out;
> +	*sizep = ei_last - addr;
> +	while (bad_addr_size(&addr, sizep, align) && addr + *sizep <= ei_last)
> +		;
> +	last = addr + *sizep;
> +	if (last > ei_last)
> +		goto out;
> +
> +	return addr;
> +
> +out:
> +	return LMB_ERROR;
> +}
> +
> +/*
> + * Find next free range after *start
> + */
> +u64 __init lmb_find_area_size(u64 start, u64 *sizep, u64 align)
> +{
> +	int i;
> +
> +	for (i = 0; i < lmb.memory.cnt; i++) {
> +		u64 ei_start = lmb.memory.regions[i].base;
> +		u64 ei_last = ei_start + lmb.memory.regions[i].size;
> +		u64 addr;
> +
> +		addr = __lmb_find_area_size(ei_start, ei_last, start,
> +					 sizep, align);
> +
> +		if (addr != LMB_ERROR)
> +			return addr;
> +	}
> +
> +	return LMB_ERROR;
> +}
> +



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/35] x86,lmb: Add lmb_reserve_area/lmb_free_area
  2010-05-14  0:19 ` [PATCH 08/35] x86,lmb: Add lmb_reserve_area/lmb_free_area Yinghai Lu
@ 2010-05-14  2:26   ` Benjamin Herrenschmidt
  2010-05-14  6:30     ` Yinghai
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:

>  #endif
> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
> index 37a05e2..0dbe05b 100644
> --- a/arch/x86/mm/lmb.c
> +++ b/arch/x86/mm/lmb.c
> @@ -117,3 +117,30 @@ void __init lmb_to_bootmem(u64 start, u64 end)
>  	lmb.reserved.cnt = 0;
>  }
>  #endif
> +
> +void __init lmb_add_memory(u64 start, u64 end)
> +{
> +	lmb_add_region(&lmb.memory, start, end - start);
> +}

I completely fail the point of doing such a minor argument conversion
as an exported function, with a naming that would cause certain
confusion with the existing lmb_add().

In any case, the above should be done at the call sites. Just call
lmb_add(start, end-start). You also aren't consistent since you do the
similar conversion using the _area suffix below, but not above.

> +void __init lmb_reserve_area(u64 start, u64 end, char *name)
> +{
> +	if (start == end)
> +		return;
> +
> +	if (WARN_ONCE(start > end, "lmb_reserve_area: wrong range [%#llx, %#llx]\n", start, end))
> +		return;
> +
> +	lmb_add_region(&lmb.reserved, start, end - start);
> +}

You seem to be fond of gratuituous bloat... 

> +void __init lmb_free_area(u64 start, u64 end)
> +{
> +	if (start == end)
> +		return;
> +
> +	if (WARN_ONCE(start > end, "lmb_free_area: wrong range [%#llx, %#llx]\n", start, end))
> +		return;
> +
> +	lmb_free(start, end - start);
> +}

And here again.

If you -really- think there's value in the prototype conversions, then
make those _area() variants static inlines, group the well together in
the .h with a clear explanation saying something like "it's more
practical for some arch to use start/end rather than start/size".

I personally don't see why you are doing that tho.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/35] lmb: Add find_memory_core_early()
  2010-05-14  0:19 ` [PATCH 11/35] lmb: Add find_memory_core_early() Yinghai Lu
@ 2010-05-14  2:29   ` Benjamin Herrenschmidt
  2010-05-14  6:34     ` Yinghai
  2010-05-14  2:30   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:29 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> According to node range in early_node_map[] with __lmb_find_area
> to find free range.
> 
> Will be used by lmb_find_area_node()
> 
> lmb_find_area_node will be used to find right buffer for NODE_DATA

The prototype for this has nothing to do in lmb.h under that name at
least.

Cheers,
Ben.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  include/linux/mm.h |    2 ++
>  mm/page_alloc.c    |   29 +++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index fb19bb9..7774e1d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1159,6 +1159,8 @@ extern void free_bootmem_with_active_regions(int nid,
>  						unsigned long max_low_pfn);
>  int add_from_early_node_map(struct range *range, int az,
>  				   int nr_range, int nid);
> +u64 __init find_memory_core_early(int nid, u64 size, u64 align,
> +					u64 goal, u64 limit);
>  void *__alloc_memory_core_early(int nodeid, u64 size, u64 align,
>  				 u64 goal, u64 limit);
>  typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d03c946..72afd94 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -21,6 +21,7 @@
>  #include <linux/pagemap.h>
>  #include <linux/jiffies.h>
>  #include <linux/bootmem.h>
> +#include <linux/lmb.h>
>  #include <linux/compiler.h>
>  #include <linux/kernel.h>
>  #include <linux/kmemcheck.h>
> @@ -3393,6 +3394,34 @@ void __init free_bootmem_with_active_regions(int nid,
>  	}
>  }
>  
> +#ifdef CONFIG_HAVE_LMB
> +u64 __init find_memory_core_early(int nid, u64 size, u64 align,
> +					u64 goal, u64 limit)
> +{
> +	int i;
> +
> +	/* Need to go over early_node_map to find out good range for node */
> +	for_each_active_range_index_in_nid(i, nid) {
> +		u64 addr;
> +		u64 ei_start, ei_last;
> +
> +		ei_last = early_node_map[i].end_pfn;
> +		ei_last <<= PAGE_SHIFT;
> +		ei_start = early_node_map[i].start_pfn;
> +		ei_start <<= PAGE_SHIFT;
> +		addr = __lmb_find_area(ei_start, ei_last,
> +					 goal, limit, size, align);
> +
> +		if (addr == LMB_ERROR)
> +			continue;
> +
> +		return addr;
> +	}
> +
> +	return -1ULL;
> +}
> +#endif
> +
>  int __init add_from_early_node_map(struct range *range, int az,
>  				   int nr_range, int nid)
>  {



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/35] lmb: Add find_memory_core_early()
  2010-05-14  0:19 ` [PATCH 11/35] lmb: Add find_memory_core_early() Yinghai Lu
  2010-05-14  2:29   ` Benjamin Herrenschmidt
@ 2010-05-14  2:30   ` Benjamin Herrenschmidt
  2010-05-14  6:39     ` Yinghai
  1 sibling, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:30 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> According to node range in early_node_map[] with __lmb_find_area
> to find free range.
> 
> Will be used by lmb_find_area_node()
> 
> lmb_find_area_node will be used to find right buffer for NODE_DATA
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---

Oh and this wont' work on sparc. You should probably instead add a
lmb_find_in_nid() to lmb which shares code with lmb_alloc_nid().

However, why do you wnat to do this find + separate reserve again ? Why
not lmb_alloc ?

Cheers,
Ben.

>  include/linux/mm.h |    2 ++
>  mm/page_alloc.c    |   29 +++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index fb19bb9..7774e1d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1159,6 +1159,8 @@ extern void free_bootmem_with_active_regions(int nid,
>  						unsigned long max_low_pfn);
>  int add_from_early_node_map(struct range *range, int az,
>  				   int nr_range, int nid);
> +u64 __init find_memory_core_early(int nid, u64 size, u64 align,
> +					u64 goal, u64 limit);
>  void *__alloc_memory_core_early(int nodeid, u64 size, u64 align,
>  				 u64 goal, u64 limit);
>  typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d03c946..72afd94 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -21,6 +21,7 @@
>  #include <linux/pagemap.h>
>  #include <linux/jiffies.h>
>  #include <linux/bootmem.h>
> +#include <linux/lmb.h>
>  #include <linux/compiler.h>
>  #include <linux/kernel.h>
>  #include <linux/kmemcheck.h>
> @@ -3393,6 +3394,34 @@ void __init free_bootmem_with_active_regions(int nid,
>  	}
>  }
>  
> +#ifdef CONFIG_HAVE_LMB
> +u64 __init find_memory_core_early(int nid, u64 size, u64 align,
> +					u64 goal, u64 limit)
> +{
> +	int i;
> +
> +	/* Need to go over early_node_map to find out good range for node */
> +	for_each_active_range_index_in_nid(i, nid) {
> +		u64 addr;
> +		u64 ei_start, ei_last;
> +
> +		ei_last = early_node_map[i].end_pfn;
> +		ei_last <<= PAGE_SHIFT;
> +		ei_start = early_node_map[i].start_pfn;
> +		ei_start <<= PAGE_SHIFT;
> +		addr = __lmb_find_area(ei_start, ei_last,
> +					 goal, limit, size, align);
> +
> +		if (addr == LMB_ERROR)
> +			continue;
> +
> +		return addr;
> +	}
> +
> +	return -1ULL;
> +}
> +#endif
> +
>  int __init add_from_early_node_map(struct range *range, int az,
>  				   int nr_range, int nid)
>  {



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/35] x86, lmb: Add lmb_free_memory_size()
  2010-05-14  0:19 ` [PATCH 13/35] x86, lmb: Add lmb_free_memory_size() Yinghai Lu
@ 2010-05-14  2:31   ` Benjamin Herrenschmidt
  2010-05-14  6:42     ` Yinghai
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:31 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> It will return free memory size in specified range.
> 
> We can not use memory_size - reserved_size here, because some reserved area
> may not be in the scope of lmb.memory.region.
> 
> Use lmb.memory.region subtracting lmb.reserved.region to get free range array.
> then count size of all free ranges.

I remember having already told you that the naming sucks.

Also, you fail to explain what this is actually needed for.

Cheers,
Ben.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/include/asm/lmb.h |    1 +
>  arch/x86/mm/lmb.c          |   51 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 52 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
> index 358d8a6..4fb94b5 100644
> --- a/arch/x86/include/asm/lmb.h
> +++ b/arch/x86/include/asm/lmb.h
> @@ -16,5 +16,6 @@ void lmb_register_active_regions(int nid, unsigned long start_pfn,
>  					 unsigned long last_pfn);
>  u64 lmb_hole_size(u64 start, u64 end);
>  u64 lmb_find_area_node(int nid, u64 start, u64 end, u64 size, u64 align);
> +u64 lmb_free_memory_size(u64 addr, u64 limit);
>  
>  #endif
> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
> index c5fa1dd..6c69e99 100644
> --- a/arch/x86/mm/lmb.c
> +++ b/arch/x86/mm/lmb.c
> @@ -226,6 +226,57 @@ void __init lmb_to_bootmem(u64 start, u64 end)
>  }
>  #endif
>  
> +u64 __init lmb_free_memory_size(u64 addr, u64 limit)
> +{
> +	int i, count;
> +	struct range *range;
> +	int nr_range;
> +	u64 final_start, final_end;
> +	u64 free_size;
> +
> +	count = (lmb.reserved.cnt + lmb.memory.cnt) * 2;
> +
> +	range = find_range_array(count);
> +	nr_range = 0;
> +
> +	addr = PFN_UP(addr);
> +	limit = PFN_DOWN(limit);
> +
> +	for (i = 0; i < lmb.memory.cnt; i++) {
> +		struct lmb_region *r = &lmb.memory.regions[i];
> +
> +		final_start = PFN_UP(r->base);
> +		final_end = PFN_DOWN(r->base + r->size);
> +		if (final_start >= final_end)
> +			continue;
> +		if (final_start >= limit || final_end <= addr)
> +			continue;
> +
> +		nr_range = add_range(range, count, nr_range, final_start, final_end);
> +	}
> +	subtract_range(range, count, 0, addr);
> +	subtract_range(range, count, limit, -1ULL);
> +	for (i = 0; i < lmb.reserved.cnt; i++) {
> +		struct lmb_region *r = &lmb.reserved.regions[i];
> +
> +		final_start = PFN_DOWN(r->base);
> +		final_end = PFN_UP(r->base + r->size);
> +		if (final_start >= final_end)
> +			continue;
> +		if (final_start >= limit || final_end <= addr)
> +			continue;
> +
> +		subtract_range(range, count, final_start, final_end);
> +	}
> +	nr_range = clean_sort_range(range, count);
> +
> +	free_size = 0;
> +	for (i = 0; i < nr_range; i++)
> +		free_size += range[i].end - range[i].start;
> +
> +	return free_size << PAGE_SHIFT;
> +}
> +
>  void __init lmb_add_memory(u64 start, u64 end)
>  {
>  	lmb_add_region(&lmb.memory, start, end - start);



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 14/35] x86, lmb: Add lmb_memory_size()
  2010-05-14  0:19 ` [PATCH 14/35] x86, lmb: Add lmb_memory_size() Yinghai Lu
@ 2010-05-14  2:31   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:31 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> It will return memory size in specified range according to lmb.memory.region
> 
> Try to share some code with lmb_free_memory_size() by passing get_free to
> __lmb_memory_size().

Same comments as previous patch.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/include/asm/lmb.h |    1 +
>  arch/x86/mm/lmb.c          |   18 +++++++++++++++++-
>  2 files changed, 18 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
> index 4fb94b5..dd42ac1 100644
> --- a/arch/x86/include/asm/lmb.h
> +++ b/arch/x86/include/asm/lmb.h
> @@ -17,5 +17,6 @@ void lmb_register_active_regions(int nid, unsigned long start_pfn,
>  u64 lmb_hole_size(u64 start, u64 end);
>  u64 lmb_find_area_node(int nid, u64 start, u64 end, u64 size, u64 align);
>  u64 lmb_free_memory_size(u64 addr, u64 limit);
> +u64 lmb_memory_size(u64 addr, u64 limit);
>  
>  #endif
> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
> index 6c69e99..19a5f49 100644
> --- a/arch/x86/mm/lmb.c
> +++ b/arch/x86/mm/lmb.c
> @@ -226,7 +226,7 @@ void __init lmb_to_bootmem(u64 start, u64 end)
>  }
>  #endif
>  
> -u64 __init lmb_free_memory_size(u64 addr, u64 limit)
> +static u64 __init __lmb_memory_size(u64 addr, u64 limit, bool get_free)
>  {
>  	int i, count;
>  	struct range *range;
> @@ -256,6 +256,10 @@ u64 __init lmb_free_memory_size(u64 addr, u64 limit)
>  	}
>  	subtract_range(range, count, 0, addr);
>  	subtract_range(range, count, limit, -1ULL);
> +
> +	/* Subtract lmb.reserved.region in range ? */
> +	if (!get_free)
> +		goto sort_and_count_them;
>  	for (i = 0; i < lmb.reserved.cnt; i++) {
>  		struct lmb_region *r = &lmb.reserved.regions[i];
>  
> @@ -268,6 +272,8 @@ u64 __init lmb_free_memory_size(u64 addr, u64 limit)
>  
>  		subtract_range(range, count, final_start, final_end);
>  	}
> +
> +sort_and_count_them:
>  	nr_range = clean_sort_range(range, count);
>  
>  	free_size = 0;
> @@ -277,6 +283,16 @@ u64 __init lmb_free_memory_size(u64 addr, u64 limit)
>  	return free_size << PAGE_SHIFT;
>  }
>  
> +u64 __init lmb_free_memory_size(u64 addr, u64 limit)
> +{
> +	return __lmb_memory_size(addr, limit, true);
> +}
> +
> +u64 __init lmb_memory_size(u64 addr, u64 limit)
> +{
> +	return __lmb_memory_size(addr, limit, false);
> +}
> +
>  void __init lmb_add_memory(u64 start, u64 end)
>  {
>  	lmb_add_region(&lmb.memory, start, end - start);



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-14  0:19 ` [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok() Yinghai Lu
@ 2010-05-14  2:32   ` Benjamin Herrenschmidt
  2010-05-14  6:44     ` Yinghai
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:32 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> Some areas from firmware could be reserved several times from different callers.
> 
> If these area are overlapped, We may have overlapped entries in lmb.reserved.
> 
> Try to free the area at first, before rerserve them again.

I have already told you to make this a property of lmb_reserve() instead
of adding that function with a terrible name.

Cheers,
Ben.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/include/asm/lmb.h |    1 +
>  arch/x86/mm/lmb.c          |   18 ++++++++++++++++++
>  2 files changed, 19 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
> index dd42ac1..9329e09 100644
> --- a/arch/x86/include/asm/lmb.h
> +++ b/arch/x86/include/asm/lmb.h
> @@ -7,6 +7,7 @@ u64 lmb_find_area_size(u64 start, u64 *sizep, u64 align);
>  void lmb_to_bootmem(u64 start, u64 end);
>  
>  void lmb_reserve_area(u64 start, u64 end, char *name);
> +void lmb_reserve_area_overlap_ok(u64 start, u64 end, char *name);
>  void lmb_free_area(u64 start, u64 end);
>  void lmb_add_memory(u64 start, u64 end);
>  struct range;
> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
> index 19a5f49..1100c18 100644
> --- a/arch/x86/mm/lmb.c
> +++ b/arch/x86/mm/lmb.c
> @@ -309,6 +309,24 @@ void __init lmb_reserve_area(u64 start, u64 end, char *name)
>  	lmb_add_region(&lmb.reserved, start, end - start);
>  }
>  
> +/*
> + * Could be used to avoid having overlap entries in lmb.reserved.region.
> + *  Don't need to use it with area that is from lmb_find_area()
> + *  Only use it for the area that fw hidden area.
> + */
> +void __init lmb_reserve_area_overlap_ok(u64 start, u64 end, char *name)
> +{
> +	if (start == end)
> +		return;
> +
> +	if (WARN_ONCE(start > end, "lmb_reserve_area_overlap_ok: wrong range [%#llx, %#llx]\n", start, end))
> +		return;
> +
> +	/* Free that region at first */
> +	lmb_free(start, end - start);
> +	lmb_add_region(&lmb.reserved, start, end - start);
> +}
> +
>  void __init lmb_free_area(u64 start, u64 end)
>  {
>  	if (start == end)



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 17/35] x86, lmb: Add x86 version of __lmb_find_area()
  2010-05-14  0:19 ` [PATCH 17/35] x86, lmb: Add x86 version of __lmb_find_area() Yinghai Lu
@ 2010-05-14  2:34   ` Benjamin Herrenschmidt
  2010-05-14  6:47     ` Yinghai
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> Generic version is going from high to low, and it seems it can not find
> right area compact enough.
> 
> the x86 version will go from goal to limit and just like the way We used
> for early_res
> 
> use ARCH_FIND_LMB_AREA to select from them.

Why the heck ?

So LMB is designed to work top->down and now you replace lmb_find_area()
with a -completely different- implementation that goes bottom->up
without any explanation as to why you are doing so ?

top->down tend to be move efficient at keeping things less fragmented
btw.

Cheers,
Ben.

> -v2: default to no
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/Kconfig  |    8 +++++
>  arch/x86/mm/lmb.c |   78 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 86 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index d80d2ab..36a5665 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -584,6 +584,14 @@ config PARAVIRT_DEBUG
>  	  Enable to debug paravirt_ops internals.  Specifically, BUG if
>  	  a paravirt_op is missing when it is called.
>  
> +config ARCH_LMB_FIND_AREA
> +	default n
> +	bool "Use x86 own lmb_find_area()"
> +	---help---
> +	  Use lmb_find_area() version instead of generic version, it get free
> +	  area up from low.
> +	  Generic one try to get free area down from limit.
> +
>  config NO_BOOTMEM
>  	default y
>  	bool "Disable Bootmem code"
> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
> index c0c4220..cf9d488 100644
> --- a/arch/x86/mm/lmb.c
> +++ b/arch/x86/mm/lmb.c
> @@ -435,3 +435,81 @@ u64 __init lmb_hole_size(u64 start, u64 end)
>  	return end - start - ((u64)ram << PAGE_SHIFT);
>  }
>  
> +#ifdef CONFIG_ARCH_LMB_FIND_AREA
> +static int __init find_overlapped_early(u64 start, u64 end)
> +{
> +	int i;
> +	struct lmb_region *r;
> +
> +	for (i = 0; i < lmb.reserved.cnt && lmb.reserved.regions[i].size; i++) {
> +		r = &lmb.reserved.regions[i];
> +		if (end > r->base && start < (r->base + r->size))
> +			break;
> +	}
> +
> +	return i;
> +}
> +
> +/* Check for already reserved areas */
> +static inline bool __init bad_addr(u64 *addrp, u64 size, u64 align)
> +{
> +	int i;
> +	u64 addr = *addrp;
> +	bool changed = false;
> +	struct lmb_region *r;
> +again:
> +	i = find_overlapped_early(addr, addr + size);
> +	r = &lmb.reserved.regions[i];
> +	if (i < lmb.reserved.cnt && r->size) {
> +		*addrp = addr = round_up(r->base + r->size, align);
> +		changed = true;
> +		goto again;
> +	}
> +	return changed;
> +}
> +
> +u64 __init __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
> +				 u64 size, u64 align)
> +{
> +	u64 addr, last;
> +
> +	addr = round_up(ei_start, align);
> +	if (addr < start)
> +		addr = round_up(start, align);
> +	if (addr >= ei_last)
> +		goto out;
> +	while (bad_addr(&addr, size, align) && addr+size <= ei_last)
> +		;
> +	last = addr + size;
> +	if (last > ei_last)
> +		goto out;
> +	if (last > end)
> +		goto out;
> +
> +	return addr;
> +
> +out:
> +	return LMB_ERROR;
> +}
> +
> +/*
> + * Find a free area with specified alignment in a specific range.
> + */
> +u64 __init lmb_find_area(u64 start, u64 end, u64 size, u64 align)
> +{
> +	int i;
> +
> +	for (i = 0; i < lmb.memory.cnt; i++) {
> +		u64 ei_start = lmb.memory.regions[i].base;
> +		u64 ei_last = ei_start + lmb.memory.regions[i].size;
> +		u64 addr;
> +
> +		addr = __lmb_find_area(ei_start, ei_last, start, end,
> +					 size, align);
> +
> +		if (addr != LMB_ERROR)
> +			return addr;
> +	}
> +	return LMB_ERROR;
> +}
> +#endif



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 24/35] lmb: Move __alloc_memory_core_early() to nobootmem.c
  2010-05-14  0:19 ` [PATCH 24/35] lmb: Move __alloc_memory_core_early() to nobootmem.c Yinghai Lu
@ 2010-05-14  2:36   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  2:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> We can remove #ifdef in mm/page_alloc.c
> 
> and change that function to static

Don't forget that nid allocation using your early_node_map[] will not
work properly on sparc due to the scattered nature of page locality.

Cheers,
Ben.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  include/linux/mm.h |    2 --
>  mm/nobootmem.c     |   21 +++++++++++++++++++++
>  mm/page_alloc.c    |   24 ------------------------
>  3 files changed, 21 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7774e1d..2a14361 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1161,8 +1161,6 @@ int add_from_early_node_map(struct range *range, int az,
>  				   int nr_range, int nid);
>  u64 __init find_memory_core_early(int nid, u64 size, u64 align,
>  					u64 goal, u64 limit);
> -void *__alloc_memory_core_early(int nodeid, u64 size, u64 align,
> -				 u64 goal, u64 limit);
>  typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
>  extern void work_with_active_regions(int nid, work_fn_t work_fn, void *data);
>  extern void sparse_memory_present_with_active_regions(int nid);
> diff --git a/mm/nobootmem.c b/mm/nobootmem.c
> index abaec96..e3cbde7 100644
> --- a/mm/nobootmem.c
> +++ b/mm/nobootmem.c
> @@ -40,6 +40,27 @@ unsigned long max_pfn;
>  unsigned long saved_max_pfn;
>  #endif
>  
> +static void * __init __alloc_memory_core_early(int nid, u64 size, u64 align,
> +					u64 goal, u64 limit)
> +{
> +	void *ptr;
> +
> +	u64 addr;
> +
> +	if (limit > lmb.current_limit)
> +		limit = lmb.current_limit;
> +
> +	addr = find_memory_core_early(nid, size, align, goal, limit);
> +
> +	if (addr == LMB_ERROR)
> +		return NULL;
> +
> +	ptr = phys_to_virt(addr);
> +	memset(ptr, 0, size);
> +	lmb_reserve_area(addr, addr + size, "BOOTMEM");
> +	return ptr;
> +}
> +
>  /*
>   * free_bootmem_late - free bootmem pages directly to page allocator
>   * @addr: starting address of the range
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 867a3a8..3449811 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3437,30 +3437,6 @@ int __init add_from_early_node_map(struct range *range, int az,
>  	return nr_range;
>  }
>  
> -#ifdef CONFIG_NO_BOOTMEM
> -void * __init __alloc_memory_core_early(int nid, u64 size, u64 align,
> -					u64 goal, u64 limit)
> -{
> -	void *ptr;
> -
> -	u64 addr;
> -
> -	if (limit > lmb.current_limit)
> -		limit = lmb.current_limit;
> -
> -	addr = find_memory_core_early(nid, size, align, goal, limit);
> -
> -	if (addr == LMB_ERROR)
> -		return NULL;
> -
> -	ptr = phys_to_virt(addr);
> -	memset(ptr, 0, size);
> -	lmb_reserve_area(addr, addr + size, "BOOTMEM");
> -	return ptr;
> -}
> -#endif
> -
> -
>  void __init work_with_active_regions(int nid, work_fn_t work_fn, void *data)
>  {
>  	int i;



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res
  2010-05-14  2:12   ` Benjamin Herrenschmidt
@ 2010-05-14  6:19     ` Yinghai
  2010-05-14  8:09       ` Benjamin Herrenschmidt
  2010-05-14  7:03     ` Yinghai
  1 sibling, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14  6:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/13/2010 07:12 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>> 1. expose lmb_debug
>> 2. expose lmb_reserved_init_regions
>> 3. expose lmb_add_region
>> 4. prection for include linux/lmb.h in mm/page_alloc.c and mm/bootmem.c
>> 5. lmb_find_base() should return LMB_ERROR in one failing path.
>>    (this one cost me 3 hours !)
>> 6. move LMB_ERROR to lmb.h
> 
> Oh well, let's start somewhere...
> 
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  include/linux/lmb.h |    4 ++++
>>  lib/lmb.c           |   21 +++++++++------------
>>  2 files changed, 13 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/lmb.h b/include/linux/lmb.h
>> index 6f8c4bd..7987766 100644
>> --- a/include/linux/lmb.h
>> +++ b/include/linux/lmb.h
>> @@ -19,6 +19,7 @@
>>  #include <asm/lmb.h>
>>  
>>  #define INIT_LMB_REGIONS 128
>> +#define LMB_ERROR	(~(phys_addr_t)0)
> 
> Ok so this was meant to remain internal. You seem to want to expose a
> whole lot of LMB internals, I suppose for your new arch/x86/lmb.c and I
> really really don't like it.
> 
> If we expose LMB_ERROR then all lmb calls that can fail should return
> that. However, the API calls all return 0 instead. Changing that means
> fixing all callers.

ok will stop use LMB_ERROR out lib/lmb.c

will go back to use -1ULL for x86 path.

> 
> We can't just have a mix bag of result code in stuff that is exposed.
> 
> If all you need LMB_ERROR is to expose lmb_find_area() and
> lmb_add_region() then make the above __ and export a public variant of
> it that returns 0.
> 
> But that's not the right approach. The right thing to do I believe is to
> instead change LMB to use proper errno.h values.
> 
> For things like lmb_add_region(), return then as a negative int. For
> things that return a phys_addr_t as well with a proper casting macro
> since I -think- we can safely consider that phys addrs in the range
> -PAGE_SIZE..-1 can be error codes. Just like we do for PTR_ERR etc...
> 
> This should be a separate patch btw.
> 
> I'm also not too happy with exposing lmb_add_region(). Why would you
> ever need to expose it ? Just call lmb_reserve() if you want to reserve
> something. lmb_add_region() is an internal function and has no business
> being used outside of the main lmb.c file.
> 
> Also:
> 
>>  	/* Calculate new doubled size */
>>  	old_size = type->max * sizeof(struct lmb_region);
>>  	new_size = old_size << 1;
>> @@ -206,7 +199,7 @@ static int lmb_double_array(struct lmb_type *type)
>>  		new_array = kmalloc(new_size, GFP_KERNEL);
>>  		addr = new_array == NULL ? LMB_ERROR : __pa(new_array);
>>  	} else
>> -		addr = lmb_find_base(new_size, sizeof(phys_addr_t), 0, LMB_ALLOC_ACCESSIBLE);
>> +		addr = lmb_find_base(new_size, sizeof(struct lmb_region), 0, LMB_ALLOC_ACCESSIBLE);
> 
> Why this change ? Does it need to be aligned to the struct size ? if you
> really want that and have a good justification, make this a separate
> patch and explain why you are doing that in the changeset comment.

will drop that

> 
>>  	if (addr == LMB_ERROR) {
>>  		pr_err("lmb: Failed to double %s array from %ld to %ld entries !\n",
>>  		       lmb_type_name(type), type->max, type->max * 2);
>> @@ -214,6 +207,10 @@ static int lmb_double_array(struct lmb_type *type)
>>  	}
>>  	new_array = __va(addr);
>>  
>> +	if (lmb_debug)
>> +		pr_info("lmb: %s array is doubled to %ld at %llx - %llx",
>> +			 lmb_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size);
>> +
>>  	/* Found space, we now need to move the array over before
>>  	 * we add the reserved region since it may be our reserved
>>  	 * array itself that is full.
>> @@ -249,7 +246,7 @@ extern int __weak lmb_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
>>  	return 1;
>>  }
> 
> Cheers,
> Ben.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/35] lmb: Add ARCH_DISCARD_LMB to put lmb code to .init
  2010-05-14  2:14   ` Benjamin Herrenschmidt
@ 2010-05-14  6:21     ` Yinghai
  2010-05-14  8:10       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14  6:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/13/2010 07:14 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>> So those lmb bits could be released after kernel is booted up.
>>
>> Arch code could define ARCH_DISCARD_LMB in asm/lmb.h,
>> __init_lmb will become __init, __initdata_lmb will becom __initdata
>>
>> x86 code will use that.
> 
> So you do not intend to use lmb after boot ? This will break the debugfs
> files unless you also remove those.

no, x86 don't lmb after boot.

yes


...

>> @@ -695,7 +695,7 @@ static int __init early_lmb(char *p)
>>  }
>>  early_param("lmb", early_lmb);
>>  
>> -#ifdef CONFIG_DEBUG_FS
>> +#if defined(CONFIG_DEBUG_FS) && !defined(ARCH_DISCARD_LMB)
>>  
>>  static int lmb_debug_show(struct seq_file *m, void *private)
>>  {
> 

it will check ARCH_DISCARD_LMB

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/35] lmb: Add lmb_find_area()
  2010-05-14  2:16   ` Benjamin Herrenschmidt
@ 2010-05-14  6:25     ` Yinghai
  2010-05-14  8:12       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14  6:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/13/2010 07:16 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>> it is a wrapper for lmb_find_base
>>
>> make it more easy for x86 to use lmb. ( rebase )
>> x86 early_res is using find/reserve pattern instead of alloc.
>>
>> -v2: Change name to lmb_find_area() according to Michael Ellerman
>> -v3: Add generic weak version __lmb_find_area()
>>      so keep the path for fallback to x86 version that handle from low
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  include/linux/lmb.h |    4 ++++
>>  lib/lmb.c           |   27 ++++++++++++++++++++++++++-
>>  2 files changed, 30 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/lmb.h b/include/linux/lmb.h
>> index 0b073a3..3c23dc8 100644
>> --- a/include/linux/lmb.h
>> +++ b/include/linux/lmb.h
>> @@ -44,6 +44,10 @@ extern struct lmb lmb;
>>  extern int lmb_debug;
>>  extern struct lmb_region lmb_reserved_init_regions[];
>>  
>> +u64 __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
>> +			u64 size, u64 align);
>> +u64 lmb_find_area(u64 start, u64 end, u64 size, u64 align);
> 
> See my comments about sorting out the return from that function.
> 
> Also, I don't understand the need for that __ version. It looks like
> something you should keep inside x86, I don't see the need for it in the
> generic LMB code, since it just does trivial cropping of the arguments.

otherwise need to export lmb_find_base() and LMB_ERROR

> 
> Also "ei_last" and "ei_start" are pretty bad names for its arguments
> anyways. To some extent I wonder if the caller should be responsible for
> doing the cropping in the first place.

yes. but need that to keep switch from early_res/nobootmem to lmb/nobootmem smooth.

will clean it after early_res is replaced.

> 
> Cheers,
> Ben.
> 
>>  extern void __init lmb_init(void);
>>  extern void __init lmb_analyze(void);
>>  extern long lmb_add(phys_addr_t base, phys_addr_t size);
>> diff --git a/lib/lmb.c b/lib/lmb.c
>> index 6d49a17..f917dbf 100644
>> --- a/lib/lmb.c
>> +++ b/lib/lmb.c
>> @@ -155,6 +155,31 @@ static phys_addr_t __init lmb_find_base(phys_addr_t size, phys_addr_t align,
>>  	return LMB_ERROR;
>>  }
>>  
>> +u64 __init __weak __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
>> +				 u64 size, u64 align)
>> +{
>> +	u64 final_start, final_end;
>> +	u64 mem;
>> +
>> +	final_start = max(ei_start, start);
>> +	final_end = min(ei_last, end);
>> +
>> +	if (final_start >= final_end)
>> +		return LMB_ERROR;
>> +
>> +	mem = lmb_find_base(size, align, final_start, final_end);
>> +
>> +	return mem;
>> +}
>> +
>> +/*
>> + * Find a free area with specified alignment in a specific range.
>> + */
>> +u64 __init __weak lmb_find_area(u64 start, u64 end, u64 size, u64 align)
>> +{
>> +	return lmb_find_base(size, align, start, end);
>> +}
>> +
>>  static void __init_lmb lmb_remove_region(struct lmb_type *type, unsigned long r)
>>  {
>>  	unsigned long i;
>> @@ -199,7 +224,7 @@ static int __init_lmb lmb_double_array(struct lmb_type *type)
>>  		new_array = kmalloc(new_size, GFP_KERNEL);
>>  		addr = new_array == NULL ? LMB_ERROR : __pa(new_array);
>>  	} else
>> -		addr = lmb_find_base(new_size, sizeof(struct lmb_region), 0, LMB_ALLOC_ACCESSIBLE);
>> +		addr = lmb_find_area(0, lmb.current_limit, new_size, sizeof(struct lmb_region));
>>  	if (addr == LMB_ERROR) {
>>  		pr_err("lmb: Failed to double %s array from %ld to %ld entries !\n",
>>  		       lmb_type_name(type), type->max, type->max * 2);
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/35] x86, lmb: Add lmb_find_area_size()
  2010-05-14  2:20   ` Benjamin Herrenschmidt
@ 2010-05-14  6:28     ` Yinghai
  2010-05-14  8:13       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14  6:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/13/2010 07:20 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>> size is returned according free range.
>> Will be used to find free ranges for early_memtest and memory corruption check
> 
> Please provide a better explanation of what these functions do. It's
> very unclear from the code (which looks like it could be a lot simpler),
> and the name of the function is totally obscure as well.

this just line by line translation from early_res version to lmb changes

please focus on lmb core at this point.

> 
> We have asked you how many times to improve on your changeset comments
> at the -very-least- ? Explain what functions do and why they do it, and
> when I say explain, I don't mean 2 lines of rot13. I mean actual
> sentences that a human being can read and have a chance to understand.
> 
> Also, I would appreciate if you picked up the habit of adding docbook
> doco for any API function you add, even if it's in the x86 "internal"
> file.
> 
> Cheers,
> Ben.
> 
>> Do not mess it up with mm/lmb.c yet.
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  arch/x86/include/asm/lmb.h |    8 ++++
>>  arch/x86/mm/Makefile       |    2 +
>>  arch/x86/mm/lmb.c          |   88 ++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 98 insertions(+), 0 deletions(-)
>>  create mode 100644 arch/x86/include/asm/lmb.h
>>  create mode 100644 arch/x86/mm/lmb.c
>>
>> diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
>> new file mode 100644
>> index 0000000..aa3a66e
>> --- /dev/null
>> +++ b/arch/x86/include/asm/lmb.h
>> @@ -0,0 +1,8 @@
>> +#ifndef _X86_LMB_H
>> +#define _X86_LMB_H
>> +
>> +#define ARCH_DISCARD_LMB
>> +
>> +u64 lmb_find_area_size(u64 start, u64 *sizep, u64 align);
>> +
>> +#endif
>> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
>> index a4c7683..8ab0505 100644
>> --- a/arch/x86/mm/Makefile
>> +++ b/arch/x86/mm/Makefile
>> @@ -26,4 +26,6 @@ obj-$(CONFIG_NUMA)		+= numa.o numa_$(BITS).o
>>  obj-$(CONFIG_K8_NUMA)		+= k8topology_64.o
>>  obj-$(CONFIG_ACPI_NUMA)		+= srat_$(BITS).o
>>  
>> +obj-$(CONFIG_HAVE_LMB)		+= lmb.o
>> +
>>  obj-$(CONFIG_MEMTEST)		+= memtest.o
>> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
>> new file mode 100644
>> index 0000000..9d26eed
>> --- /dev/null
>> +++ b/arch/x86/mm/lmb.c
>> @@ -0,0 +1,88 @@
>> +#include <linux/kernel.h>
>> +#include <linux/types.h>
>> +#include <linux/init.h>
>> +#include <linux/bitops.h>
>> +#include <linux/lmb.h>
>> +#include <linux/bootmem.h>
>> +#include <linux/mm.h>
>> +#include <linux/range.h>
>> +
>> +/* Check for already reserved areas */
>> +static inline bool __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
>> +{
>> +	int i;
>> +	u64 addr = *addrp, last;
>> +	u64 size = *sizep;
>> +	bool changed = false;
>> +again:
>> +	last = addr + size;
>> +	for (i = 0; i < lmb.reserved.cnt && lmb.reserved.regions[i].size; i++) {
>> +		struct lmb_region *r = &lmb.reserved.regions[i];
>> +		if (last > r->base && addr < r->base) {
>> +			size = r->base - addr;
>> +			changed = true;
>> +			goto again;
>> +		}
>> +		if (last > (r->base + r->size) && addr < (r->base + r->size)) {
>> +			addr = round_up(r->base + r->size, align);
>> +			size = last - addr;
>> +			changed = true;
>> +			goto again;
>> +		}
>> +		if (last <= (r->base + r->size) && addr >= r->base) {
>> +			(*sizep)++;
>> +			return false;
>> +		}
>> +	}
>> +	if (changed) {
>> +		*addrp = addr;
>> +		*sizep = size;
>> +	}
>> +	return changed;
>> +}
>> +
>> +static u64 __init __lmb_find_area_size(u64 ei_start, u64 ei_last, u64 start,
>> +			 u64 *sizep, u64 align)
>> +{
>> +	u64 addr, last;
>> +
>> +	addr = round_up(ei_start, align);
>> +	if (addr < start)
>> +		addr = round_up(start, align);
>> +	if (addr >= ei_last)
>> +		goto out;
>> +	*sizep = ei_last - addr;
>> +	while (bad_addr_size(&addr, sizep, align) && addr + *sizep <= ei_last)
>> +		;
>> +	last = addr + *sizep;
>> +	if (last > ei_last)
>> +		goto out;
>> +
>> +	return addr;
>> +
>> +out:
>> +	return LMB_ERROR;
>> +}
>> +
>> +/*
>> + * Find next free range after *start
>> + */
>> +u64 __init lmb_find_area_size(u64 start, u64 *sizep, u64 align)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < lmb.memory.cnt; i++) {
>> +		u64 ei_start = lmb.memory.regions[i].base;
>> +		u64 ei_last = ei_start + lmb.memory.regions[i].size;
>> +		u64 addr;
>> +
>> +		addr = __lmb_find_area_size(ei_start, ei_last, start,
>> +					 sizep, align);
>> +
>> +		if (addr != LMB_ERROR)
>> +			return addr;
>> +	}
>> +
>> +	return LMB_ERROR;
>> +}
>> +
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/35] x86,lmb: Add lmb_reserve_area/lmb_free_area
  2010-05-14  2:26   ` Benjamin Herrenschmidt
@ 2010-05-14  6:30     ` Yinghai
  2010-05-14  8:15       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14  6:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/13/2010 07:26 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> 
>>  #endif
>> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
>> index 37a05e2..0dbe05b 100644
>> --- a/arch/x86/mm/lmb.c
>> +++ b/arch/x86/mm/lmb.c
>> @@ -117,3 +117,30 @@ void __init lmb_to_bootmem(u64 start, u64 end)
>>  	lmb.reserved.cnt = 0;
>>  }
>>  #endif
>> +
>> +void __init lmb_add_memory(u64 start, u64 end)
>> +{
>> +	lmb_add_region(&lmb.memory, start, end - start);
>> +}
> 
> I completely fail the point of doing such a minor argument conversion
> as an exported function, with a naming that would cause certain
> confusion with the existing lmb_add().
> 
> In any case, the above should be done at the call sites. Just call
> lmb_add(start, end-start). You also aren't consistent since you do the
> similar conversion using the _area suffix below, but not above.
> 
>> +void __init lmb_reserve_area(u64 start, u64 end, char *name)
>> +{
>> +	if (start == end)
>> +		return;
>> +
>> +	if (WARN_ONCE(start > end, "lmb_reserve_area: wrong range [%#llx, %#llx]\n", start, end))
>> +		return;
>> +
>> +	lmb_add_region(&lmb.reserved, start, end - start);
>> +}
> 
> You seem to be fond of gratuituous bloat... 
> 
>> +void __init lmb_free_area(u64 start, u64 end)
>> +{
>> +	if (start == end)
>> +		return;
>> +
>> +	if (WARN_ONCE(start > end, "lmb_free_area: wrong range [%#llx, %#llx]\n", start, end))
>> +		return;
>> +
>> +	lmb_free(start, end - start);
>> +}
> 
> And here again.
> 
> If you -really- think there's value in the prototype conversions, then
> make those _area() variants static inlines, group the well together in
> the .h with a clear explanation saying something like "it's more
> practical for some arch to use start/end rather than start/size".
> 
> I personally don't see why you are doing that tho.

make the rebase more easy.

previous version is using those api.

also will add some debug print out with them

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/35] lmb: Add find_memory_core_early()
  2010-05-14  2:29   ` Benjamin Herrenschmidt
@ 2010-05-14  6:34     ` Yinghai
  2010-05-14  8:16       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14  6:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/13/2010 07:29 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>> According to node range in early_node_map[] with __lmb_find_area
>> to find free range.
>>
>> Will be used by lmb_find_area_node()
>>
>> lmb_find_area_node will be used to find right buffer for NODE_DATA
> 
> The prototype for this has nothing to do in lmb.h under that name at
> least.

it is calling __lmb_find_area()

but it need to use early_node_map[], so put this function here

> 
> Cheers,
> Ben.
> 
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  include/linux/mm.h |    2 ++
>>  mm/page_alloc.c    |   29 +++++++++++++++++++++++++++++
>>  2 files changed, 31 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index fb19bb9..7774e1d 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1159,6 +1159,8 @@ extern void free_bootmem_with_active_regions(int nid,
>>  						unsigned long max_low_pfn);
>>  int add_from_early_node_map(struct range *range, int az,
>>  				   int nr_range, int nid);
>> +u64 __init find_memory_core_early(int nid, u64 size, u64 align,
>> +					u64 goal, u64 limit);
>>  void *__alloc_memory_core_early(int nodeid, u64 size, u64 align,
>>  				 u64 goal, u64 limit);
>>  typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index d03c946..72afd94 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -21,6 +21,7 @@
>>  #include <linux/pagemap.h>
>>  #include <linux/jiffies.h>
>>  #include <linux/bootmem.h>
>> +#include <linux/lmb.h>
>>  #include <linux/compiler.h>
>>  #include <linux/kernel.h>
>>  #include <linux/kmemcheck.h>
>> @@ -3393,6 +3394,34 @@ void __init free_bootmem_with_active_regions(int nid,
>>  	}
>>  }
>>  
>> +#ifdef CONFIG_HAVE_LMB
>> +u64 __init find_memory_core_early(int nid, u64 size, u64 align,
>> +					u64 goal, u64 limit)
>> +{
>> +	int i;
>> +
>> +	/* Need to go over early_node_map to find out good range for node */
>> +	for_each_active_range_index_in_nid(i, nid) {
>> +		u64 addr;
>> +		u64 ei_start, ei_last;
>> +
>> +		ei_last = early_node_map[i].end_pfn;
>> +		ei_last <<= PAGE_SHIFT;
>> +		ei_start = early_node_map[i].start_pfn;
>> +		ei_start <<= PAGE_SHIFT;
>> +		addr = __lmb_find_area(ei_start, ei_last,
>> +					 goal, limit, size, align);
>> +
>> +		if (addr == LMB_ERROR)
>> +			continue;
>> +
>> +		return addr;
>> +	}
>> +
>> +	return -1ULL;
>> +}
>> +#endif
>> +
>>  int __init add_from_early_node_map(struct range *range, int az,
>>  				   int nr_range, int nid)
>>  {
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/35] lmb: Add find_memory_core_early()
  2010-05-14  2:30   ` Benjamin Herrenschmidt
@ 2010-05-14  6:39     ` Yinghai
  2010-05-14  8:19       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14  6:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/13/2010 07:30 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>> According to node range in early_node_map[] with __lmb_find_area
>> to find free range.
>>
>> Will be used by lmb_find_area_node()
>>
>> lmb_find_area_node will be used to find right buffer for NODE_DATA
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
> 
> Oh and this wont' work on sparc. You should probably instead add a
> lmb_find_in_nid() to lmb which shares code with lmb_alloc_nid().

should work with sparc...

+		if (addr == LMB_ERROR)
+			continue;

should do the trick.


> 
> However, why do you wnat to do this find + separate reserve again ? Why
> not lmb_alloc ?

keep switch smooth. maybe later could try to replace that one by one.
also i could use lmb_reserve_area() to take name and print out name to make debug easy.

> 
> Cheers,
> Ben.
> 
>>  include/linux/mm.h |    2 ++
>>  mm/page_alloc.c    |   29 +++++++++++++++++++++++++++++
>>  2 files changed, 31 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index fb19bb9..7774e1d 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1159,6 +1159,8 @@ extern void free_bootmem_with_active_regions(int nid,
>>  						unsigned long max_low_pfn);
>>  int add_from_early_node_map(struct range *range, int az,
>>  				   int nr_range, int nid);
>> +u64 __init find_memory_core_early(int nid, u64 size, u64 align,
>> +					u64 goal, u64 limit);
>>  void *__alloc_memory_core_early(int nodeid, u64 size, u64 align,
>>  				 u64 goal, u64 limit);
>>  typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index d03c946..72afd94 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -21,6 +21,7 @@
>>  #include <linux/pagemap.h>
>>  #include <linux/jiffies.h>
>>  #include <linux/bootmem.h>
>> +#include <linux/lmb.h>
>>  #include <linux/compiler.h>
>>  #include <linux/kernel.h>
>>  #include <linux/kmemcheck.h>
>> @@ -3393,6 +3394,34 @@ void __init free_bootmem_with_active_regions(int nid,
>>  	}
>>  }
>>  
>> +#ifdef CONFIG_HAVE_LMB
>> +u64 __init find_memory_core_early(int nid, u64 size, u64 align,
>> +					u64 goal, u64 limit)
>> +{
>> +	int i;
>> +
>> +	/* Need to go over early_node_map to find out good range for node */
>> +	for_each_active_range_index_in_nid(i, nid) {
>> +		u64 addr;
>> +		u64 ei_start, ei_last;
>> +
>> +		ei_last = early_node_map[i].end_pfn;
>> +		ei_last <<= PAGE_SHIFT;
>> +		ei_start = early_node_map[i].start_pfn;
>> +		ei_start <<= PAGE_SHIFT;
>> +		addr = __lmb_find_area(ei_start, ei_last,
>> +					 goal, limit, size, align);
>> +
>> +		if (addr == LMB_ERROR)
>> +			continue;
>> +
>> +		return addr;
>> +	}
>> +
>> +	return -1ULL;
>> +}
>> +#endif
>> +
>>  int __init add_from_early_node_map(struct range *range, int az,
>>  				   int nr_range, int nid)
>>  {
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/35] x86, lmb: Add lmb_free_memory_size()
  2010-05-14  2:31   ` Benjamin Herrenschmidt
@ 2010-05-14  6:42     ` Yinghai
  2010-05-14  8:21       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14  6:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/13/2010 07:31 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>> It will return free memory size in specified range.
>>
>> We can not use memory_size - reserved_size here, because some reserved area
>> may not be in the scope of lmb.memory.region.
>>
>> Use lmb.memory.region subtracting lmb.reserved.region to get free range array.
>> then count size of all free ranges.
> 
> I remember having already told you that the naming sucks.

any suggestion?

i think this name is clear.

> 
> Also, you fail to explain what this is actually needed for.

needed by 

[PATCH 21/35] x86, lmb: Use lmb_memory_size()/lmb_free_memory_size() to get correct dma_reserve


> 
> Cheers,
> Ben.
> 
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  arch/x86/include/asm/lmb.h |    1 +
>>  arch/x86/mm/lmb.c          |   51 ++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 52 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
>> index 358d8a6..4fb94b5 100644
>> --- a/arch/x86/include/asm/lmb.h
>> +++ b/arch/x86/include/asm/lmb.h
>> @@ -16,5 +16,6 @@ void lmb_register_active_regions(int nid, unsigned long start_pfn,
>>  					 unsigned long last_pfn);
>>  u64 lmb_hole_size(u64 start, u64 end);
>>  u64 lmb_find_area_node(int nid, u64 start, u64 end, u64 size, u64 align);
>> +u64 lmb_free_memory_size(u64 addr, u64 limit);
>>  
>>  #endif
>> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
>> index c5fa1dd..6c69e99 100644
>> --- a/arch/x86/mm/lmb.c
>> +++ b/arch/x86/mm/lmb.c
>> @@ -226,6 +226,57 @@ void __init lmb_to_bootmem(u64 start, u64 end)
>>  }
>>  #endif
>>  
>> +u64 __init lmb_free_memory_size(u64 addr, u64 limit)
>> +{
>> +	int i, count;
>> +	struct range *range;
>> +	int nr_range;
>> +	u64 final_start, final_end;
>> +	u64 free_size;
>> +
>> +	count = (lmb.reserved.cnt + lmb.memory.cnt) * 2;
>> +
>> +	range = find_range_array(count);
>> +	nr_range = 0;
>> +
>> +	addr = PFN_UP(addr);
>> +	limit = PFN_DOWN(limit);
>> +
>> +	for (i = 0; i < lmb.memory.cnt; i++) {
>> +		struct lmb_region *r = &lmb.memory.regions[i];
>> +
>> +		final_start = PFN_UP(r->base);
>> +		final_end = PFN_DOWN(r->base + r->size);
>> +		if (final_start >= final_end)
>> +			continue;
>> +		if (final_start >= limit || final_end <= addr)
>> +			continue;
>> +
>> +		nr_range = add_range(range, count, nr_range, final_start, final_end);
>> +	}
>> +	subtract_range(range, count, 0, addr);
>> +	subtract_range(range, count, limit, -1ULL);
>> +	for (i = 0; i < lmb.reserved.cnt; i++) {
>> +		struct lmb_region *r = &lmb.reserved.regions[i];
>> +
>> +		final_start = PFN_DOWN(r->base);
>> +		final_end = PFN_UP(r->base + r->size);
>> +		if (final_start >= final_end)
>> +			continue;
>> +		if (final_start >= limit || final_end <= addr)
>> +			continue;
>> +
>> +		subtract_range(range, count, final_start, final_end);
>> +	}
>> +	nr_range = clean_sort_range(range, count);
>> +
>> +	free_size = 0;
>> +	for (i = 0; i < nr_range; i++)
>> +		free_size += range[i].end - range[i].start;
>> +
>> +	return free_size << PAGE_SHIFT;
>> +}
>> +
>>  void __init lmb_add_memory(u64 start, u64 end)
>>  {
>>  	lmb_add_region(&lmb.memory, start, end - start);
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-14  2:32   ` Benjamin Herrenschmidt
@ 2010-05-14  6:44     ` Yinghai
  2010-05-14  8:30       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14  6:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/13/2010 07:32 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>> Some areas from firmware could be reserved several times from different callers.
>>
>> If these area are overlapped, We may have overlapped entries in lmb.reserved.
>>
>> Try to free the area at first, before rerserve them again.
> 
> I have already told you to make this a property of lmb_reserve() instead
> of adding that function with a terrible name.

make every lmb_reserve() call lmb_free at first?

> 
> Cheers,
> Ben.
> 
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  arch/x86/include/asm/lmb.h |    1 +
>>  arch/x86/mm/lmb.c          |   18 ++++++++++++++++++
>>  2 files changed, 19 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
>> index dd42ac1..9329e09 100644
>> --- a/arch/x86/include/asm/lmb.h
>> +++ b/arch/x86/include/asm/lmb.h
>> @@ -7,6 +7,7 @@ u64 lmb_find_area_size(u64 start, u64 *sizep, u64 align);
>>  void lmb_to_bootmem(u64 start, u64 end);
>>  
>>  void lmb_reserve_area(u64 start, u64 end, char *name);
>> +void lmb_reserve_area_overlap_ok(u64 start, u64 end, char *name);
>>  void lmb_free_area(u64 start, u64 end);
>>  void lmb_add_memory(u64 start, u64 end);
>>  struct range;
>> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
>> index 19a5f49..1100c18 100644
>> --- a/arch/x86/mm/lmb.c
>> +++ b/arch/x86/mm/lmb.c
>> @@ -309,6 +309,24 @@ void __init lmb_reserve_area(u64 start, u64 end, char *name)
>>  	lmb_add_region(&lmb.reserved, start, end - start);
>>  }
>>  
>> +/*
>> + * Could be used to avoid having overlap entries in lmb.reserved.region.
>> + *  Don't need to use it with area that is from lmb_find_area()
>> + *  Only use it for the area that fw hidden area.
>> + */
>> +void __init lmb_reserve_area_overlap_ok(u64 start, u64 end, char *name)
>> +{
>> +	if (start == end)
>> +		return;
>> +
>> +	if (WARN_ONCE(start > end, "lmb_reserve_area_overlap_ok: wrong range [%#llx, %#llx]\n", start, end))
>> +		return;
>> +
>> +	/* Free that region at first */
>> +	lmb_free(start, end - start);
>> +	lmb_add_region(&lmb.reserved, start, end - start);
>> +}
>> +
>>  void __init lmb_free_area(u64 start, u64 end)
>>  {
>>  	if (start == end)
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 17/35] x86, lmb: Add x86 version of __lmb_find_area()
  2010-05-14  2:34   ` Benjamin Herrenschmidt
@ 2010-05-14  6:47     ` Yinghai
  2010-05-14  8:31       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14  6:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/13/2010 07:34 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>> Generic version is going from high to low, and it seems it can not find
>> right area compact enough.
>>
>> the x86 version will go from goal to limit and just like the way We used
>> for early_res
>>
>> use ARCH_FIND_LMB_AREA to select from them.
> 
> Why the heck ?
> 
> So LMB is designed to work top->down and now you replace lmb_find_area()
> with a -completely different- implementation that goes bottom->up
> without any explanation as to why you are doing so ?
> 
> top->down tend to be move efficient at keeping things less fragmented
> btw.

for safe, and there is any problem with top to down allocation, we can ask user to check if low to high works...

not just make user to get broken kernel.

> 
> Cheers,
> Ben.
> 
>> -v2: default to no
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  arch/x86/Kconfig  |    8 +++++
>>  arch/x86/mm/lmb.c |   78 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 86 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index d80d2ab..36a5665 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -584,6 +584,14 @@ config PARAVIRT_DEBUG
>>  	  Enable to debug paravirt_ops internals.  Specifically, BUG if
>>  	  a paravirt_op is missing when it is called.
>>  
>> +config ARCH_LMB_FIND_AREA
>> +	default n
>> +	bool "Use x86 own lmb_find_area()"
>> +	---help---
>> +	  Use lmb_find_area() version instead of generic version, it get free
>> +	  area up from low.
>> +	  Generic one try to get free area down from limit.
>> +
>>  config NO_BOOTMEM
>>  	default y
>>  	bool "Disable Bootmem code"
>> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
>> index c0c4220..cf9d488 100644
>> --- a/arch/x86/mm/lmb.c
>> +++ b/arch/x86/mm/lmb.c
>> @@ -435,3 +435,81 @@ u64 __init lmb_hole_size(u64 start, u64 end)
>>  	return end - start - ((u64)ram << PAGE_SHIFT);
>>  }
>>  
>> +#ifdef CONFIG_ARCH_LMB_FIND_AREA
>> +static int __init find_overlapped_early(u64 start, u64 end)
>> +{
>> +	int i;
>> +	struct lmb_region *r;
>> +
>> +	for (i = 0; i < lmb.reserved.cnt && lmb.reserved.regions[i].size; i++) {
>> +		r = &lmb.reserved.regions[i];
>> +		if (end > r->base && start < (r->base + r->size))
>> +			break;
>> +	}
>> +
>> +	return i;
>> +}
>> +
>> +/* Check for already reserved areas */
>> +static inline bool __init bad_addr(u64 *addrp, u64 size, u64 align)
>> +{
>> +	int i;
>> +	u64 addr = *addrp;
>> +	bool changed = false;
>> +	struct lmb_region *r;
>> +again:
>> +	i = find_overlapped_early(addr, addr + size);
>> +	r = &lmb.reserved.regions[i];
>> +	if (i < lmb.reserved.cnt && r->size) {
>> +		*addrp = addr = round_up(r->base + r->size, align);
>> +		changed = true;
>> +		goto again;
>> +	}
>> +	return changed;
>> +}
>> +
>> +u64 __init __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
>> +				 u64 size, u64 align)
>> +{
>> +	u64 addr, last;
>> +
>> +	addr = round_up(ei_start, align);
>> +	if (addr < start)
>> +		addr = round_up(start, align);
>> +	if (addr >= ei_last)
>> +		goto out;
>> +	while (bad_addr(&addr, size, align) && addr+size <= ei_last)
>> +		;
>> +	last = addr + size;
>> +	if (last > ei_last)
>> +		goto out;
>> +	if (last > end)
>> +		goto out;
>> +
>> +	return addr;
>> +
>> +out:
>> +	return LMB_ERROR;
>> +}
>> +
>> +/*
>> + * Find a free area with specified alignment in a specific range.
>> + */
>> +u64 __init lmb_find_area(u64 start, u64 end, u64 size, u64 align)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < lmb.memory.cnt; i++) {
>> +		u64 ei_start = lmb.memory.regions[i].base;
>> +		u64 ei_last = ei_start + lmb.memory.regions[i].size;
>> +		u64 addr;
>> +
>> +		addr = __lmb_find_area(ei_start, ei_last, start, end,
>> +					 size, align);
>> +
>> +		if (addr != LMB_ERROR)
>> +			return addr;
>> +	}
>> +	return LMB_ERROR;
>> +}
>> +#endif
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res
  2010-05-14  2:12   ` Benjamin Herrenschmidt
  2010-05-14  6:19     ` Yinghai
@ 2010-05-14  7:03     ` Yinghai
  1 sibling, 0 replies; 99+ messages in thread
From: Yinghai @ 2010-05-14  7:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

updated version is in

	git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-2.6-yinghai.git lmb

it is on top of tip/master + powerpc/lmb

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res
  2010-05-14  6:19     ` Yinghai
@ 2010-05-14  8:09       ` Benjamin Herrenschmidt
  2010-05-14 16:23         ` Yinghai Lu
  2010-05-17 18:03         ` H. Peter Anvin
  0 siblings, 2 replies; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  8:09 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 23:19 -0700, Yinghai wrote:
> > If we expose LMB_ERROR then all lmb calls that can fail should
> return
> > that. However, the API calls all return 0 instead. Changing that
> means
> > fixing all callers.
> 
> ok will stop use LMB_ERROR out lib/lmb.c
> 
> will go back to use -1ULL for x86 path.

No. That is not the point. Read the rest of my email !

We need to -sanitize- those errors. _Maybe_ exposing LMB_ERROR is the
right way to do so, but in that case, we need to make -all- function use
the same error code. Right now, some fail with 0 and some with
LMB_ERROR.

You are also not responding to my other comments such as:
 
> > I'm also not too happy with exposing lmb_add_region(). Why would you
> > ever need to expose it ? Just call lmb_reserve() if you want to
> reserve
> > something. lmb_add_region() is an internal function and has no
> business
> > being used outside of the main lmb.c file.

etc...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/35] lmb: Add ARCH_DISCARD_LMB to put lmb code to .init
  2010-05-14  6:21     ` Yinghai
@ 2010-05-14  8:10       ` Benjamin Herrenschmidt
  2010-05-14 16:24         ` Yinghai Lu
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  8:10 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 23:21 -0700, Yinghai wrote:
> > So you do not intend to use lmb after boot ? This will break the
> debugfs
> > files unless you also remove those.
> 
> no, x86 don't lmb after boot.
> 
> yes 

So you need some kind of CONFIG option to keep lmb after boot, at least
the data structures, for debugfs, or when not set, disable the debugfs
files.

Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/35] lmb: Add lmb_find_area()
  2010-05-14  6:25     ` Yinghai
@ 2010-05-14  8:12       ` Benjamin Herrenschmidt
  2010-05-14 16:28         ` Yinghai Lu
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  8:12 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 23:25 -0700, Yinghai wrote:
> >> +u64 __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
> >> +                    u64 size, u64 align);
> >> +u64 lmb_find_area(u64 start, u64 end, u64 size, u64 align);
> > 
> > See my comments about sorting out the return from that function.
> > 
> > Also, I don't understand the need for that __ version. It looks like
> > something you should keep inside x86, I don't see the need for it in
> the
> > generic LMB code, since it just does trivial cropping of the
> arguments.
> 
> otherwise need to export lmb_find_base() and LMB_ERROR

Well, then export lmb_find_base(), and just sanitize the result codes
over all LMB. That's not -that- hard, it's not like there were gazillion
of users yet. I don't have time now to do that myself before monday.

> yes. but need that to keep switch from early_res/nobootmem to
> lmb/nobootmem smooth.
> 
> will clean it after early_res is replaced.

Then make it inline inside the x86 stuff. But really, you should cleanup
that result code. It's something on my TODO list for lmb that I haven't
had a chance to do yet, so please look at that or wait til next week so
I do it myself.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/35] x86, lmb: Add lmb_find_area_size()
  2010-05-14  6:28     ` Yinghai
@ 2010-05-14  8:13       ` Benjamin Herrenschmidt
  2010-05-14 16:33         ` Yinghai Lu
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  8:13 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 23:28 -0700, Yinghai wrote:
> 
> this just line by line translation from early_res version to lmb
> changes
> 
> please focus on lmb core at this point.

Well, the problem is that you dig into the LMB core with those functions
which means I -will- break your stuff if/when I change it, for example
to use linked lists.

Besides, the code lacks comments and explanation in the changeset. So
please provide that, I'm sure Thomas and Peter also want to understand
what's going on in the x86 side of things.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/35] x86,lmb: Add lmb_reserve_area/lmb_free_area
  2010-05-14  6:30     ` Yinghai
@ 2010-05-14  8:15       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  8:15 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 23:30 -0700, Yinghai wrote:
> make the rebase more easy.
> 
> previous version is using those api.
> 
> also will add some debug print out with them 

No. Get rid of them.

If any of those checks/debug is worth having in the API it's worth
having it for all archs.

At -worst-, Thomas and Peter may be ok with having inline wrappers for
the argument style conversion in x86 lmb.h but I somewhat doubt it.

In the form of non-inline exported functions with such confusing naming
and semantics (and total lack of documentation or explanation), this is
just wrong.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/35] lmb: Add find_memory_core_early()
  2010-05-14  6:34     ` Yinghai
@ 2010-05-14  8:16       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  8:16 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 23:34 -0700, Yinghai wrote:
> 
> it is calling __lmb_find_area()
> 
> but it need to use early_node_map[], so put this function here

The prototype still has nothing to do in lmb.h

You are just doing those horrible API conversions with slight name
changes, it's just insane. Just use the proper API or change it if it
doesn't suit you (and justify yourself), but stop doing all those weirdo
wrappers.

Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/35] lmb: Add find_memory_core_early()
  2010-05-14  6:39     ` Yinghai
@ 2010-05-14  8:19       ` Benjamin Herrenschmidt
  2010-05-14  8:30         ` David Miller
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  8:19 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 23:39 -0700, Yinghai wrote:
> > Oh and this wont' work on sparc. You should probably instead add a
> > lmb_find_in_nid() to lmb which shares code with lmb_alloc_nid().
> 
> should work with sparc...
> 
> +               if (addr == LMB_ERROR)
> +                       continue;
> 
> should do the trick.

Ugh ?

OK, I'll have to let Davem deal with the fine point of the sparc bits,
but I think basically sparc has CONFIG_ARCH_POPULATES_NODE_MAP set, but
the way it's NUMA affinity works, the early_node_map[] is crap, you
cannot rely on the ranges in there.

So it will "work" in the sense that you won't get errors, but the
allocations will -not- be node local. At least that's my understanding.

There's a reason we are adding node local allocations to LMB itself, so
it can deal with that during early boot, please use those and stop
trying to re-invent things slightly differently in ways that will not
work with various existing platforms. 

> keep switch smooth. maybe later could try to replace that one by one.
> also i could use lmb_reserve_area() to take name and print out name to
> make debug easy.

I don't want those _area() variants. If you want to pass a name for
debugging purposes, then add it to lmb_reserve() or something like that
and fix the callers.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/35] x86, lmb: Add lmb_free_memory_size()
  2010-05-14  6:42     ` Yinghai
@ 2010-05-14  8:21       ` Benjamin Herrenschmidt
  2010-05-14 16:37         ` Yinghai Lu
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  8:21 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 23:42 -0700, Yinghai wrote:
> On 05/13/2010 07:31 PM, Benjamin Herrenschmidt wrote:
> > On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> >> It will return free memory size in specified range.
> >>
> >> We can not use memory_size - reserved_size here, because some reserved area
> >> may not be in the scope of lmb.memory.region.
> >>
> >> Use lmb.memory.region subtracting lmb.reserved.region to get free range array.
> >> then count size of all free ranges.
> > 
> > I remember having already told you that the naming sucks.
> 
> any suggestion?
> 
> i think this name is clear.

Well, I did have suggestions yes. Among others, because you are
operating on a range, you should have the word "range" in your name. For
example, something like lmb_free_memory_in_range(). A bit long but at
least is self explanatory and doesn't lead to confusion.
 
> > Also, you fail to explain what this is actually needed for.
> 
> needed by 
> 
> [PATCH 21/35] x86, lmb: Use lmb_memory_size()/lmb_free_memory_size() to get correct dma_reserve

Ok, that's some x86ism that I would need to study more closely, but it's
fair enough to have those accessors if they are really needed. But
please, try to make an effort on the naming.

Cheers,
Ben.

> 
> > 
> > Cheers,
> > Ben.
> > 
> >> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> >> ---
> >>  arch/x86/include/asm/lmb.h |    1 +
> >>  arch/x86/mm/lmb.c          |   51 ++++++++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 52 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
> >> index 358d8a6..4fb94b5 100644
> >> --- a/arch/x86/include/asm/lmb.h
> >> +++ b/arch/x86/include/asm/lmb.h
> >> @@ -16,5 +16,6 @@ void lmb_register_active_regions(int nid, unsigned long start_pfn,
> >>  					 unsigned long last_pfn);
> >>  u64 lmb_hole_size(u64 start, u64 end);
> >>  u64 lmb_find_area_node(int nid, u64 start, u64 end, u64 size, u64 align);
> >> +u64 lmb_free_memory_size(u64 addr, u64 limit);
> >>  
> >>  #endif
> >> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
> >> index c5fa1dd..6c69e99 100644
> >> --- a/arch/x86/mm/lmb.c
> >> +++ b/arch/x86/mm/lmb.c
> >> @@ -226,6 +226,57 @@ void __init lmb_to_bootmem(u64 start, u64 end)
> >>  }
> >>  #endif
> >>  
> >> +u64 __init lmb_free_memory_size(u64 addr, u64 limit)
> >> +{
> >> +	int i, count;
> >> +	struct range *range;
> >> +	int nr_range;
> >> +	u64 final_start, final_end;
> >> +	u64 free_size;
> >> +
> >> +	count = (lmb.reserved.cnt + lmb.memory.cnt) * 2;
> >> +
> >> +	range = find_range_array(count);
> >> +	nr_range = 0;
> >> +
> >> +	addr = PFN_UP(addr);
> >> +	limit = PFN_DOWN(limit);
> >> +
> >> +	for (i = 0; i < lmb.memory.cnt; i++) {
> >> +		struct lmb_region *r = &lmb.memory.regions[i];
> >> +
> >> +		final_start = PFN_UP(r->base);
> >> +		final_end = PFN_DOWN(r->base + r->size);
> >> +		if (final_start >= final_end)
> >> +			continue;
> >> +		if (final_start >= limit || final_end <= addr)
> >> +			continue;
> >> +
> >> +		nr_range = add_range(range, count, nr_range, final_start, final_end);
> >> +	}
> >> +	subtract_range(range, count, 0, addr);
> >> +	subtract_range(range, count, limit, -1ULL);
> >> +	for (i = 0; i < lmb.reserved.cnt; i++) {
> >> +		struct lmb_region *r = &lmb.reserved.regions[i];
> >> +
> >> +		final_start = PFN_DOWN(r->base);
> >> +		final_end = PFN_UP(r->base + r->size);
> >> +		if (final_start >= final_end)
> >> +			continue;
> >> +		if (final_start >= limit || final_end <= addr)
> >> +			continue;
> >> +
> >> +		subtract_range(range, count, final_start, final_end);
> >> +	}
> >> +	nr_range = clean_sort_range(range, count);
> >> +
> >> +	free_size = 0;
> >> +	for (i = 0; i < nr_range; i++)
> >> +		free_size += range[i].end - range[i].start;
> >> +
> >> +	return free_size << PAGE_SHIFT;
> >> +}
> >> +
> >>  void __init lmb_add_memory(u64 start, u64 end)
> >>  {
> >>  	lmb_add_region(&lmb.memory, start, end - start);
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-14  6:44     ` Yinghai
@ 2010-05-14  8:30       ` Benjamin Herrenschmidt
  2010-05-14 16:40         ` Yinghai Lu
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  8:30 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 23:44 -0700, Yinghai wrote:
> On 05/13/2010 07:32 PM, Benjamin Herrenschmidt wrote:
> > On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
> >> Some areas from firmware could be reserved several times from different callers.
> >>
> >> If these area are overlapped, We may have overlapped entries in lmb.reserved.
> >>
> >> Try to free the area at first, before rerserve them again.
> > 
> > I have already told you to make this a property of lmb_reserve() instead
> > of adding that function with a terrible name.
> 
> make every lmb_reserve() call lmb_free at first?

Either that, or make it check for collisions first, and if there's
one, call free and try again. A little bit more work but I plan toq
make it smarter at some stage, ie, directly adjust surrounding ranges
instead which is not -that- hard to do.

Ben.

> > 
> > Cheers,
> > Ben.
> > 
> >> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> >> ---
> >>  arch/x86/include/asm/lmb.h |    1 +
> >>  arch/x86/mm/lmb.c          |   18 ++++++++++++++++++
> >>  2 files changed, 19 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
> >> index dd42ac1..9329e09 100644
> >> --- a/arch/x86/include/asm/lmb.h
> >> +++ b/arch/x86/include/asm/lmb.h
> >> @@ -7,6 +7,7 @@ u64 lmb_find_area_size(u64 start, u64 *sizep, u64 align);
> >>  void lmb_to_bootmem(u64 start, u64 end);
> >>  
> >>  void lmb_reserve_area(u64 start, u64 end, char *name);
> >> +void lmb_reserve_area_overlap_ok(u64 start, u64 end, char *name);
> >>  void lmb_free_area(u64 start, u64 end);
> >>  void lmb_add_memory(u64 start, u64 end);
> >>  struct range;
> >> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
> >> index 19a5f49..1100c18 100644
> >> --- a/arch/x86/mm/lmb.c
> >> +++ b/arch/x86/mm/lmb.c
> >> @@ -309,6 +309,24 @@ void __init lmb_reserve_area(u64 start, u64 end, char *name)
> >>  	lmb_add_region(&lmb.reserved, start, end - start);
> >>  }
> >>  
> >> +/*
> >> + * Could be used to avoid having overlap entries in lmb.reserved.region.
> >> + *  Don't need to use it with area that is from lmb_find_area()
> >> + *  Only use it for the area that fw hidden area.
> >> + */
> >> +void __init lmb_reserve_area_overlap_ok(u64 start, u64 end, char *name)
> >> +{
> >> +	if (start == end)
> >> +		return;
> >> +
> >> +	if (WARN_ONCE(start > end, "lmb_reserve_area_overlap_ok: wrong range [%#llx, %#llx]\n", start, end))
> >> +		return;
> >> +
> >> +	/* Free that region at first */
> >> +	lmb_free(start, end - start);
> >> +	lmb_add_region(&lmb.reserved, start, end - start);
> >> +}
> >> +
> >>  void __init lmb_free_area(u64 start, u64 end)
> >>  {
> >>  	if (start == end)
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/35] lmb: Add find_memory_core_early()
  2010-05-14  8:19       ` Benjamin Herrenschmidt
@ 2010-05-14  8:30         ` David Miller
  2010-05-14 16:44           ` Yinghai Lu
  0 siblings, 1 reply; 99+ messages in thread
From: David Miller @ 2010-05-14  8:30 UTC (permalink / raw)
  To: benh
  Cc: yinghai.lu, mingo, tglx, hpa, akpm, torvalds, hannes,
	linux-kernel, linux-arch

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Fri, 14 May 2010 18:19:47 +1000

> OK, I'll have to let Davem deal with the fine point of the sparc bits,
> but I think basically sparc has CONFIG_ARCH_POPULATES_NODE_MAP set, but
> the way it's NUMA affinity works, the early_node_map[] is crap, you
> cannot rely on the ranges in there.

Right, we can't use early_node_map[] on sparc, because the NUMA
mappings are far too granular to use that kind of representation.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 17/35] x86, lmb: Add x86 version of __lmb_find_area()
  2010-05-14  6:47     ` Yinghai
@ 2010-05-14  8:31       ` Benjamin Herrenschmidt
  2010-05-14 16:41         ` Yinghai Lu
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14  8:31 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Thu, 2010-05-13 at 23:47 -0700, Yinghai wrote:

> for safe, and there is any problem with top to down allocation, we can
> ask user to check if low to high works...
> 
> not just make user to get broken kernel.

Hrm. That's a bit gross. But ok, I don't care that much as long as your
x86 variant doesn't differ in semantics with the core lmb one and I'll
let Peter and/or Thomas to scream at you if they think it's gross
(hint: the code is)

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res
  2010-05-14  8:09       ` Benjamin Herrenschmidt
@ 2010-05-14 16:23         ` Yinghai Lu
  2010-05-17 18:03         ` H. Peter Anvin
  1 sibling, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14 16:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/14/2010 01:09 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 23:19 -0700, Yinghai wrote:
>   
>>> If we expose LMB_ERROR then all lmb calls that can fail should
>>>       
>> return
>>     
>>> that. However, the API calls all return 0 instead. Changing that
>>>       
>> means
>>     
>>> fixing all callers.
>>>       
>> ok will stop use LMB_ERROR out lib/lmb.c
>>
>> will go back to use -1ULL for x86 path.
>>     
> No. That is not the point. Read the rest of my email !
>
> We need to -sanitize- those errors. _Maybe_ exposing LMB_ERROR is the
> right way to do so, but in that case, we need to make -all- function use
> the same error code. Right now, some fail with 0 and some with
> LMB_ERROR.
>   

will check what is effects for changing all to LMB_ERROR

> You are also not responding to my other comments such as:
>  
>   
>>> I'm also not too happy with exposing lmb_add_region(). Why would you
>>> ever need to expose it ? Just call lmb_reserve() if you want to
>>>       
>> reserve
>>     
>>> something. lmb_add_region() is an internal function and has no
>>>       
>> business
>>     
>>> being used outside of the main lmb.c file.
>>>       
in other mail. and updated version in the git dropped that
lmb_add_region exposing.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/35] lmb: Add ARCH_DISCARD_LMB to put lmb code to .init
  2010-05-14  8:10       ` Benjamin Herrenschmidt
@ 2010-05-14 16:24         ` Yinghai Lu
  0 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14 16:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/14/2010 01:10 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 23:21 -0700, Yinghai wrote:
>   
>>> So you do not intend to use lmb after boot ? This will break the
>>>       
>> debugfs
>>     
>>> files unless you also remove those.
>>>       
>> no, x86 don't lmb after boot.
>>
>> yes 
>>     
> So you need some kind of CONFIG option to keep lmb after boot, at least
> the data structures, for debugfs, or when not set, disable the debugfs
> files.
>
>   
Just disable the debugfs for lmb if ARCH_DISCARD_LMB is defined in arch
lmb.h

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/35] lmb: Add lmb_find_area()
  2010-05-14  8:12       ` Benjamin Herrenschmidt
@ 2010-05-14 16:28         ` Yinghai Lu
  0 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14 16:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/14/2010 01:12 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 23:25 -0700, Yinghai wrote:
>   
>>>> +u64 __lmb_find_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
>>>> +                    u64 size, u64 align);
>>>> +u64 lmb_find_area(u64 start, u64 end, u64 size, u64 align);
>>>>         
>>> See my comments about sorting out the return from that function.
>>>
>>> Also, I don't understand the need for that __ version. It looks like
>>> something you should keep inside x86, I don't see the need for it in
>>>       
>> the
>>     
>>> generic LMB code, since it just does trivial cropping of the
>>>       
>> arguments.
>>
>> otherwise need to export lmb_find_base() and LMB_ERROR
>>     
> Well, then export lmb_find_base(), and just sanitize the result codes
> over all LMB. That's not -that- hard, it's not like there were gazillion
> of users yet. I don't have time now to do that myself before monday.
>
>   
 will check that.

>> yes. but need that to keep switch from early_res/nobootmem to
>> lmb/nobootmem smooth.
>>
>> will clean it after early_res is replaced.
>>     
> Then make it inline inside the x86 stuff. But really, you should cleanup
> that result code. It's something on my TODO list for lmb that I haven't
> had a chance to do yet, so please look at that or wait til next week so
> I do it myself.
>
>   
also need to include asm/lmb.h to the end of linux/lmb.h

YH

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/35] x86, lmb: Add lmb_find_area_size()
  2010-05-14  8:13       ` Benjamin Herrenschmidt
@ 2010-05-14 16:33         ` Yinghai Lu
  2010-05-14 22:20           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14 16:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/14/2010 01:13 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 23:28 -0700, Yinghai wrote:
>   
>> this just line by line translation from early_res version to lmb
>> changes
>>
>> please focus on lmb core at this point.
>>     
> Well, the problem is that you dig into the LMB core with those functions
> which means I -will- break your stuff if/when I change it, for example
> to use linked lists.
>
>   
then you should don't even struct lmb, and move the definition from
lmb.h to lmb.c

> Besides, the code lacks comments and explanation in the changeset. So
> please provide that, I'm sure Thomas and Peter also want to understand
> what's going on in the x86 side of things.
>
>   


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/35] x86, lmb: Add lmb_free_memory_size()
  2010-05-14  8:21       ` Benjamin Herrenschmidt
@ 2010-05-14 16:37         ` Yinghai Lu
  2010-05-14 22:20           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14 16:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/14/2010 01:21 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 23:42 -0700, Yinghai wrote:
>   
>> On 05/13/2010 07:31 PM, Benjamin Herrenschmidt wrote:
>>     
>>> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>>>       
>>>> It will return free memory size in specified range.
>>>>
>>>> We can not use memory_size - reserved_size here, because some reserved area
>>>> may not be in the scope of lmb.memory.region.
>>>>
>>>> Use lmb.memory.region subtracting lmb.reserved.region to get free range array.
>>>> then count size of all free ranges.
>>>>         
>>> I remember having already told you that the naming sucks.
>>>       
>> any suggestion?
>>
>> i think this name is clear.
>>     
> Well, I did have suggestions yes. Among others, because you are
> operating on a range, you should have the word "range" in your name. For
> example, something like lmb_free_memory_in_range(). A bit long but at
> least is self explanatory and doesn't lead to confusion.
>   

lmb_memory_size_in_range()/lmb_free_memory_size_in_range()

or

lmb_memory_in_range()/lmb_free_memory_in_range()

?



>  
>   
>>> Also, you fail to explain what this is actually needed for.
>>>       
>> needed by 
>>
>> [PATCH 21/35] x86, lmb: Use lmb_memory_size()/lmb_free_memory_size() to get correct dma_reserve
>>     
> Ok, that's some x86ism that I would need to study more closely, but it's
> fair enough to have those accessors if they are really needed. But
> please, try to make an effort on the naming.
>
> Cheers,
> Ben.
>
>   
>>     
>>> Cheers,
>>> Ben.
>>>
>>>       
>>>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>>>> ---
>>>>  arch/x86/include/asm/lmb.h |    1 +
>>>>  arch/x86/mm/lmb.c          |   51 ++++++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 52 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
>>>> index 358d8a6..4fb94b5 100644
>>>> --- a/arch/x86/include/asm/lmb.h
>>>> +++ b/arch/x86/include/asm/lmb.h
>>>> @@ -16,5 +16,6 @@ void lmb_register_active_regions(int nid, unsigned long start_pfn,
>>>>  					 unsigned long last_pfn);
>>>>  u64 lmb_hole_size(u64 start, u64 end);
>>>>  u64 lmb_find_area_node(int nid, u64 start, u64 end, u64 size, u64 align);
>>>> +u64 lmb_free_memory_size(u64 addr, u64 limit);
>>>>  
>>>>  #endif
>>>> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
>>>> index c5fa1dd..6c69e99 100644
>>>> --- a/arch/x86/mm/lmb.c
>>>> +++ b/arch/x86/mm/lmb.c
>>>> @@ -226,6 +226,57 @@ void __init lmb_to_bootmem(u64 start, u64 end)
>>>>  }
>>>>  #endif
>>>>  
>>>> +u64 __init lmb_free_memory_size(u64 addr, u64 limit)
>>>> +{
>>>> +	int i, count;
>>>> +	struct range *range;
>>>> +	int nr_range;
>>>> +	u64 final_start, final_end;
>>>> +	u64 free_size;
>>>> +
>>>> +	count = (lmb.reserved.cnt + lmb.memory.cnt) * 2;
>>>> +
>>>> +	range = find_range_array(count);
>>>> +	nr_range = 0;
>>>> +
>>>> +	addr = PFN_UP(addr);
>>>> +	limit = PFN_DOWN(limit);
>>>> +
>>>> +	for (i = 0; i < lmb.memory.cnt; i++) {
>>>> +		struct lmb_region *r = &lmb.memory.regions[i];
>>>> +
>>>> +		final_start = PFN_UP(r->base);
>>>> +		final_end = PFN_DOWN(r->base + r->size);
>>>> +		if (final_start >= final_end)
>>>> +			continue;
>>>> +		if (final_start >= limit || final_end <= addr)
>>>> +			continue;
>>>> +
>>>> +		nr_range = add_range(range, count, nr_range, final_start, final_end);
>>>> +	}
>>>> +	subtract_range(range, count, 0, addr);
>>>> +	subtract_range(range, count, limit, -1ULL);
>>>> +	for (i = 0; i < lmb.reserved.cnt; i++) {
>>>> +		struct lmb_region *r = &lmb.reserved.regions[i];
>>>> +
>>>> +		final_start = PFN_DOWN(r->base);
>>>> +		final_end = PFN_UP(r->base + r->size);
>>>> +		if (final_start >= final_end)
>>>> +			continue;
>>>> +		if (final_start >= limit || final_end <= addr)
>>>> +			continue;
>>>> +
>>>> +		subtract_range(range, count, final_start, final_end);
>>>> +	}
>>>> +	nr_range = clean_sort_range(range, count);
>>>> +
>>>> +	free_size = 0;
>>>> +	for (i = 0; i < nr_range; i++)
>>>> +		free_size += range[i].end - range[i].start;
>>>> +
>>>> +	return free_size << PAGE_SHIFT;
>>>> +}
>>>> +
>>>>  void __init lmb_add_memory(u64 start, u64 end)
>>>>  {
>>>>  	lmb_add_region(&lmb.memory, start, end - start);
>>>>         
>>>       
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>     
>
>   


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-14  8:30       ` Benjamin Herrenschmidt
@ 2010-05-14 16:40         ` Yinghai Lu
  2010-05-14 22:30           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14 16:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/14/2010 01:30 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 23:44 -0700, Yinghai wrote:
>   
>> On 05/13/2010 07:32 PM, Benjamin Herrenschmidt wrote:
>>     
>>> On Thu, 2010-05-13 at 17:19 -0700, Yinghai Lu wrote:
>>>       
>>>> Some areas from firmware could be reserved several times from different callers.
>>>>
>>>> If these area are overlapped, We may have overlapped entries in lmb.reserved.
>>>>
>>>> Try to free the area at first, before rerserve them again.
>>>>         
>>> I have already told you to make this a property of lmb_reserve() instead
>>> of adding that function with a terrible name.
>>>       
>> make every lmb_reserve() call lmb_free at first?
>>     
> Either that, or make it check for collisions first, and if there's
> one, call free and try again. A little bit more work but I plan toq
> make it smarter at some stage, ie, directly adjust surrounding ranges
> instead which is not -that- hard to do.
>
>   
later, after this patchset. this patchset is hanging around too long
already.

YH

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 17/35] x86, lmb: Add x86 version of __lmb_find_area()
  2010-05-14  8:31       ` Benjamin Herrenschmidt
@ 2010-05-14 16:41         ` Yinghai Lu
  0 siblings, 0 replies; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14 16:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/14/2010 01:31 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-05-13 at 23:47 -0700, Yinghai wrote:
>
>   
>> for safe, and there is any problem with top to down allocation, we can
>> ask user to check if low to high works...
>>
>> not just make user to get broken kernel.
>>     
> Hrm. That's a bit gross. But ok, I don't care that much as long as your
> x86 variant doesn't differ in semantics with the core lmb one and I'll
> let Peter and/or Thomas to scream at you if they think it's gross
> (hint: the code is)
>
>
>   
those bits are line by line translation from current early_res code, and
have been used for a while.

so should be good as a fallback.

YH


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/35] lmb: Add find_memory_core_early()
  2010-05-14  8:30         ` David Miller
@ 2010-05-14 16:44           ` Yinghai Lu
  2010-05-14 22:34             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai Lu @ 2010-05-14 16:44 UTC (permalink / raw)
  To: David Miller
  Cc: benh, mingo, tglx, hpa, akpm, torvalds, hannes, linux-kernel, linux-arch

On 05/14/2010 01:30 AM, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Fri, 14 May 2010 18:19:47 +1000
>
>   
>> OK, I'll have to let Davem deal with the fine point of the sparc bits,
>> but I think basically sparc has CONFIG_ARCH_POPULATES_NODE_MAP set, but
>> the way it's NUMA affinity works, the early_node_map[] is crap, you
>> cannot rely on the ranges in there.
>>     
>   

> Right, we can't use early_node_map[] on sparc, because the NUMA
> mappings are far too granular to use that kind of representation.
>   
good to know.

early_node_map[] doesn't have enough slot?

YH

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 05/35] x86, lmb: Add lmb_find_area_size()
  2010-05-14 16:33         ` Yinghai Lu
@ 2010-05-14 22:20           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14 22:20 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Fri, 2010-05-14 at 09:33 -0700, Yinghai Lu wrote:
> On 05/14/2010 01:13 AM, Benjamin Herrenschmidt wrote:
> > On Thu, 2010-05-13 at 23:28 -0700, Yinghai wrote:
> >   
> >> this just line by line translation from early_res version to lmb
> >> changes
> >>
> >> please focus on lmb core at this point.
> >>     
> > Well, the problem is that you dig into the LMB core with those functions
> > which means I -will- break your stuff if/when I change it, for example
> > to use linked lists.
> >
> >   
> then you should don't even struct lmb, and move the definition from
> lmb.h to lmb.c

Well, figure that it's been considered :-) The only reason it's in lmb.h
is so that we can write things like for_each_lmb() as inlines, but
we could just do lmb_first() and lmb_next() and move it all out of line.

Ben.

> > Besides, the code lacks comments and explanation in the changeset. So
> > please provide that, I'm sure Thomas and Peter also want to understand
> > what's going on in the x86 side of things.
> >
> >   
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/35] x86, lmb: Add lmb_free_memory_size()
  2010-05-14 16:37         ` Yinghai Lu
@ 2010-05-14 22:20           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14 22:20 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Fri, 2010-05-14 at 09:37 -0700, Yinghai Lu wrote:
> 
> lmb_memory_size_in_range()/lmb_free_memory_size_in_range()
> 
> or
> 
> lmb_memory_in_range()/lmb_free_memory_in_range() 

I think the "size" is unnecessary, go for the second.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-14 16:40         ` Yinghai Lu
@ 2010-05-14 22:30           ` Benjamin Herrenschmidt
  2010-05-15  7:32             ` Ingo Molnar
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14 22:30 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Fri, 2010-05-14 at 09:40 -0700, Yinghai Lu wrote:
> later, after this patchset. this patchset is hanging around too long
> already.

That's where I strongly disagree with Ingo :-) There's no such thing as
a patch set hanging around too long. Patches should go in when they are
ready and in good shape, not due to some kind of time bomb, which
results in tons of unfixable crap being merged, which is very bad
engineering.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 11/35] lmb: Add find_memory_core_early()
  2010-05-14 16:44           ` Yinghai Lu
@ 2010-05-14 22:34             ` Benjamin Herrenschmidt
  2010-05-14 23:51               ` lmb type features Yinghai
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-14 22:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: David Miller, mingo, tglx, hpa, akpm, torvalds, hannes,
	linux-kernel, linux-arch

On Fri, 2010-05-14 at 09:44 -0700, Yinghai Lu wrote:
> On 05/14/2010 01:30 AM, David Miller wrote:
> > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Date: Fri, 14 May 2010 18:19:47 +1000
> >
> >   
> >> OK, I'll have to let Davem deal with the fine point of the sparc bits,
> >> but I think basically sparc has CONFIG_ARCH_POPULATES_NODE_MAP set, but
> >> the way it's NUMA affinity works, the early_node_map[] is crap, you
> >> cannot rely on the ranges in there.
> >>     
> >   
> 
> > Right, we can't use early_node_map[] on sparc, because the NUMA
> > mappings are far too granular to use that kind of representation.
> >   
> good to know.
> 
> early_node_map[] doesn't have enough slot?

I think that's the case. sparc doesn't override MAX_ACTIVE_REGIONS,
which means you get the default which is 256 or 50 per node depending
on MAX_NUMNODES. My understanding is that this may not be enough.

Dave, what does it look like in practice tho ?

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* lmb type features.
  2010-05-14 22:34             ` Benjamin Herrenschmidt
@ 2010-05-14 23:51               ` Yinghai
  2010-05-17  0:46                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Yinghai @ 2010-05-14 23:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Miller, mingo, tglx, hpa, akpm, torvalds, hannes,
	linux-kernel, linux-arch

like you to make some change to lmb_type and double_array()

struct lmb_type {
        unsigned long cnt;      /* number of regions */
        unsigned long max;      /* size of the allocated array */
        struct lmb_region *regions;
};

==>

struct lmb_type {
        unsigned long cnt;      /* number of regions */
        unsigned long max;      /* size of the allocated array */
	unsigned long features;
        struct lmb_region *regions;
};


then have 

#define LMB_ADD_MERGE (1<<0) 
#define LMB_ARRAY_DOUBLE (1<<1)

so before call double_lmb_array(), should check the features to bit is set or not.
otherwise should emit PANIC with clear message.

Usage:

for range replacement,

1. early stage before lmb.reserved, lmb.memory is there.
so can not use lmb_find_base yet.

2. for bootmem replacement, when do range set subtraction for final free range list,
don't want to change lmb.reserved in the middle.  callee should make sure to have big
enough temperately lmb_regions in lmb_type. 

Thanks

Yinghai


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-14 22:30           ` Benjamin Herrenschmidt
@ 2010-05-15  7:32             ` Ingo Molnar
  2010-05-17  0:39               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 99+ messages in thread
From: Ingo Molnar @ 2010-05-15  7:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Yinghai Lu, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch


* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Fri, 2010-05-14 at 09:40 -0700, Yinghai Lu wrote:
> > later, after this patchset. this patchset is hanging around too long
> > already.
> 
> That's where I strongly disagree with Ingo :-) 
> There's no such thing as a patch set hanging around 
> too long. Patches should go in when they are ready 
> and in good shape, not due to some kind of time 
> bomb, which results in tons of unfixable crap being 
> merged, which is very bad engineering.

The thing i disagreed with was that the patches were 
posted in March and there was no progress for a long 
time. That kind of situation only leads to patches 
being piled up unreasonably and results in Yinghai 
wasting time and effort.

But now there's real progress: you posted patches and 
Yinghai is posting patches and is reacting to your 
review feedback. As long there's steady progress (and 
not just the steady decay of entropy destroying 
already created value) i'm a happy camper.

Btw., it would be nice to ready the LMB core bits for 
upstream for 2.6.35 if there's agreement about them - 
that will make the subsequent x86 patches much easier 
to merge.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-15  7:32             ` Ingo Molnar
@ 2010-05-17  0:39               ` Benjamin Herrenschmidt
  2010-05-17  6:11                 ` Yinghai
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-17  0:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Sat, 2010-05-15 at 09:32 +0200, Ingo Molnar wrote:
> 
> Btw., it would be nice to ready the LMB core bits for 
> upstream for 2.6.35 if there's agreement about them - 
> that will make the subsequent x86 patches much easier 
> to merge.

Well, 2.6.34 was released already. I think it's a bit premature. We need
to fix a few more things (the result codes for example) and do more
testing to ensure we didn't break other archs.

Also, Yinghai wants some more changes to the core that I need to discuss
with him to understand exactly what he wants and why :-)

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: lmb type features.
  2010-05-14 23:51               ` lmb type features Yinghai
@ 2010-05-17  0:46                 ` Benjamin Herrenschmidt
  2010-05-17  6:06                   ` Yinghai
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-17  0:46 UTC (permalink / raw)
  To: Yinghai
  Cc: David Miller, mingo, tglx, hpa, akpm, torvalds, hannes,
	linux-kernel, linux-arch

On Fri, 2010-05-14 at 16:51 -0700, Yinghai wrote:

 .../...

> #define LMB_ADD_MERGE (1<<0) 
> #define LMB_ARRAY_DOUBLE (1<<1)
> 
> so before call double_lmb_array(), should check the features to bit is set or not.
> otherwise should emit PANIC with clear message.
> 
> Usage:
> 
> for range replacement,
> 
> 1. early stage before lmb.reserved, lmb.memory is there.
> so can not use lmb_find_base yet.

Let me make sure I understand: You mean when doing all the memory
lmb_add() early during boot, we haven't done the various lmb_reserve()
for all potentially reserved areas and thus cannot rely on
double_lmb_array() doing the right thing ?

I think this is a good point. However, a better way to do that would
be to set the default alloc limit to 0 instead of LMB_ALLOC_ANYWHERE.

I haven't done that yet though I certainly intend to, but I'll need
to ensure all the archs using LMB set a decent limit at some stage. You
can in the meantime do it explicitely in x86.

Additionally, it should be possible in most cases to do all the critical
lmb_reserve() early, before lmb_add()'s, and thus remove the problem,
though that is indeed not the case today.

It would be nice to be able to extend the array for memory addition
since that would allow us to have much smaller static arrays in the
first place.

> 2. for bootmem replacement, when do range set subtraction for final free range list,
> don't want to change lmb.reserved in the middle.  callee should make sure to have big
> enough temperately lmb_regions in lmb_type. 

Sorry, I'm not sure I grasped your explanation above. You mean when
transitioning from LMB to the page allocator, the page freeing needs to
be done after substracting the reserved array from the memory, and that
substraction might cause the arrays to increase in size, thus affecting
the reserved array ?

That could be solved by not doing the substraction and doing things a
bit differently. You could have a single function that walks both arrays
at the same time, and calls a callback for all memory ranges it finds
that aren't reserved. Not -that- tricky to code.

Cheers,
Ben.

> Thanks
> 
> Yinghai
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: lmb type features.
  2010-05-17  0:46                 ` Benjamin Herrenschmidt
@ 2010-05-17  6:06                   ` Yinghai
  0 siblings, 0 replies; 99+ messages in thread
From: Yinghai @ 2010-05-17  6:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Miller, mingo, tglx, hpa, akpm, torvalds, hannes,
	linux-kernel, linux-arch

On 05/16/2010 05:46 PM, Benjamin Herrenschmidt wrote:
> On Fri, 2010-05-14 at 16:51 -0700, Yinghai wrote:
> 
>  .../...
> 
>> #define LMB_ADD_MERGE (1<<0) 
>> #define LMB_ARRAY_DOUBLE (1<<1)
>>
>> so before call double_lmb_array(), should check the features to bit is set or not.
>> otherwise should emit PANIC with clear message.
>>
>> Usage:
>>
>> for range replacement,
>>
>> 1. early stage before lmb.reserved, lmb.memory is there.
>> so can not use lmb_find_base yet.
> 
> Let me make sure I understand: You mean when doing all the memory
> lmb_add() early during boot, we haven't done the various lmb_reserve()
> for all potentially reserved areas and thus cannot rely on
> double_lmb_array() doing the right thing ?

yes, 

thinking to use lmb_type to replace struct for mtrr trimming

> 
> I think this is a good point. However, a better way to do that would
> be to set the default alloc limit to 0 instead of LMB_ALLOC_ANYWHERE.
> 
> I haven't done that yet though I certainly intend to, but I'll need
> to ensure all the archs using LMB set a decent limit at some stage. You
> can in the meantime do it explicitely in x86.
> 
> Additionally, it should be possible in most cases to do all the critical
> lmb_reserve() early, before lmb_add()'s, and thus remove the problem,
> though that is indeed not the case today.

need the initial array size is big enough to use lmb_reserve() early.

in my patchset, for x86, already call lmb_reserve() early, and later all lmb_add_memory()
to fill lmb.


> 
> It would be nice to be able to extend the array for memory addition
> since that would allow us to have much smaller static arrays in the
> first place.

not on x86, we could to put them in __init with them.

hotplug mem should use resource tree instead of lmb?


> 
>> 2. for bootmem replacement, when do range set subtraction for final free range list,
>> don't want to change lmb.reserved in the middle.  callee should make sure to have big
>> enough temperately lmb_regions in lmb_type. 
> 
> Sorry, I'm not sure I grasped your explanation above. You mean when
> transitioning from LMB to the page allocator, the page freeing needs to
> be done after substracting the reserved array from the memory, and that
> substraction might cause the arrays to increase in size, thus affecting
> the reserved array ?

right. 

> 
> That could be solved by not doing the substraction and doing things a
> bit differently. You could have a single function that walks both arrays
> at the same time, and calls a callback for all memory ranges it finds
> that aren't reserved. Not -that- tricky to code.

but need to make sure lmb.reserved doesn't have overlapped eentries.

will check that later.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-17  0:39               ` Benjamin Herrenschmidt
@ 2010-05-17  6:11                 ` Yinghai
  2010-05-17  6:40                   ` H. Peter Anvin
  2010-05-17  7:24                   ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 99+ messages in thread
From: Yinghai @ 2010-05-17  6:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/16/2010 05:39 PM, Benjamin Herrenschmidt wrote:
> On Sat, 2010-05-15 at 09:32 +0200, Ingo Molnar wrote:
>>
>> Btw., it would be nice to ready the LMB core bits for 
>> upstream for 2.6.35 if there's agreement about them - 
>> that will make the subsequent x86 patches much easier 
>> to merge.
> 
> Well, 2.6.34 was released already. I think it's a bit premature. We need
> to fix a few more things (the result codes for example) and do more
> testing to ensure we didn't break other archs.
> 
so looks like your change will hit 2.6.35, and my x86 changes will hit 2.6.36?

that is too long.

YH

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-17  6:11                 ` Yinghai
@ 2010-05-17  6:40                   ` H. Peter Anvin
  2010-05-17  7:24                   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 99+ messages in thread
From: H. Peter Anvin @ 2010-05-17  6:40 UTC (permalink / raw)
  To: Yinghai
  Cc: Benjamin Herrenschmidt, Ingo Molnar, Thomas Gleixner,
	Andrew Morton, David Miller, Linus Torvalds, Johannes Weiner,
	linux-kernel, linux-arch

On 05/16/2010 11:11 PM, Yinghai wrote:
>
> so looks like your change will hit 2.6.35, and my x86 changes will hit 2.6.36?
> 
> that is too long.
> 

Too long for what?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-17  6:11                 ` Yinghai
  2010-05-17  6:40                   ` H. Peter Anvin
@ 2010-05-17  7:24                   ` Benjamin Herrenschmidt
  2010-05-17 17:18                     ` Yinghai
  1 sibling, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-17  7:24 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Sun, 2010-05-16 at 23:11 -0700, Yinghai wrote:
> so looks like your change will hit 2.6.35, and my x86 changes will hit
> 2.6.36?
> 
> that is too long. 

No. Both will hit 2.6.36. It's way too late to queue up such changes for
the 2.6.35 merge window which has already opened.

Why would it be "too long" ? I keep asking what the heck is going on
with having a time bomb on those patches and yet have to get a
satisfactory answer.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-17  7:24                   ` Benjamin Herrenschmidt
@ 2010-05-17 17:18                     ` Yinghai
  2010-05-17 18:53                       ` H. Peter Anvin
  2010-05-17 22:01                       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 99+ messages in thread
From: Yinghai @ 2010-05-17 17:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/17/2010 12:24 AM, Benjamin Herrenschmidt wrote:
> On Sun, 2010-05-16 at 23:11 -0700, Yinghai wrote:
>> so looks like your change will hit 2.6.35, and my x86 changes will hit
>> 2.6.36?
>>
>> that is too long. 
> 
> No. Both will hit 2.6.36. It's way too late to queue up such changes for
> the 2.6.35 merge window which has already opened.

i have feeling that your new LMB code will hit 2.6.36. and
x86 patches that is using to lmb will hit 2.6.37.

otherwise it will make more merge conflicts between tip and lmb.
unless put your lmb change to tip?


> 
> Why would it be "too long" ? I keep asking what the heck is going on
> with having a time bomb on those patches and yet have to get a
> satisfactory answer.

why are you thinking that there is time bomb in the patches?
I even provide the option to use x86 own low to high allocation.
really don't know where is the time bomb.

my -v14 patches only touch several lines your core lmb code, and have near 0 affect
to original lmb users.

YH

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res
  2010-05-14  8:09       ` Benjamin Herrenschmidt
  2010-05-14 16:23         ` Yinghai Lu
@ 2010-05-17 18:03         ` H. Peter Anvin
  2010-05-17 22:02           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 99+ messages in thread
From: H. Peter Anvin @ 2010-05-17 18:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Yinghai, Ingo Molnar, Thomas Gleixner, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/14/2010 01:09 AM, Benjamin Herrenschmidt wrote:
> 
> No. That is not the point. Read the rest of my email !
> 
> We need to -sanitize- those errors. _Maybe_ exposing LMB_ERROR is the
> right way to do so, but in that case, we need to make -all- function use
> the same error code. Right now, some fail with 0 and some with
> LMB_ERROR.
> 

Using errnos like the rest of the kernel seems like the right thing to
do, IMO.

	-hpa

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-17 17:18                     ` Yinghai
@ 2010-05-17 18:53                       ` H. Peter Anvin
  2010-05-17 22:01                       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 99+ messages in thread
From: H. Peter Anvin @ 2010-05-17 18:53 UTC (permalink / raw)
  To: Yinghai
  Cc: Benjamin Herrenschmidt, Ingo Molnar, Thomas Gleixner,
	Andrew Morton, David Miller, Linus Torvalds, Johannes Weiner,
	linux-kernel, linux-arch

On 05/17/2010 10:18 AM, Yinghai wrote:
>>
>> No. Both will hit 2.6.36. It's way too late to queue up such changes for
>> the 2.6.35 merge window which has already opened.
> 
> i have feeling that your new LMB code will hit 2.6.36. and
> x86 patches that is using to lmb will hit 2.6.37.
> 
> otherwise it will make more merge conflicts between tip and lmb.
> unless put your lmb change to tip?
> 

We can arrange for some way of dealing with this problem... this is not
an issue.

>> Why would it be "too long" ? I keep asking what the heck is going on
>> with having a time bomb on those patches and yet have to get a
>> satisfactory answer.
> 
> why are you thinking that there is time bomb in the patches?

You're the one that keeps saying "that is too long", but without
motivating the hurry.

	-hpa

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-17 17:18                     ` Yinghai
  2010-05-17 18:53                       ` H. Peter Anvin
@ 2010-05-17 22:01                       ` Benjamin Herrenschmidt
  2010-05-17 22:19                         ` Yinghai
                                           ` (2 more replies)
  1 sibling, 3 replies; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-17 22:01 UTC (permalink / raw)
  To: Yinghai
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Mon, 2010-05-17 at 10:18 -0700, Yinghai wrote:
> > No. Both will hit 2.6.36. It's way too late to queue up such changes
> for
> > the 2.6.35 merge window which has already opened.
> 
> i have feeling that your new LMB code will hit 2.6.36. and
> x86 patches that is using to lmb will hit 2.6.37.
> 
> otherwise it will make more merge conflicts between tip and lmb.
> unless put your lmb change to tip?

There is no reason not to, I'll have them in a separate branch that Ingo
can pull. I think we can aim for 2.6.36 in one go provided that Peter
and Thomas are happy with the x86 side of things.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res
  2010-05-17 18:03         ` H. Peter Anvin
@ 2010-05-17 22:02           ` Benjamin Herrenschmidt
  2010-05-17 22:12             ` H. Peter Anvin
  0 siblings, 1 reply; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-17 22:02 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai, Ingo Molnar, Thomas Gleixner, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On Mon, 2010-05-17 at 11:03 -0700, H. Peter Anvin wrote:
> On 05/14/2010 01:09 AM, Benjamin Herrenschmidt wrote:
> > 
> > No. That is not the point. Read the rest of my email !
> > 
> > We need to -sanitize- those errors. _Maybe_ exposing LMB_ERROR is the
> > right way to do so, but in that case, we need to make -all- function use
> > the same error code. Right now, some fail with 0 and some with
> > LMB_ERROR.
> > 
> 
> Using errnos like the rest of the kernel seems like the right thing to
> do, IMO.

Maybe. The allocator/find functions return a physical address. If we all
agree that a physical address between -PAGE_SIZE and 0 is never valid,
then we can overlay the negative errno codes like we do for pointers.

I'll have a look at that, it shouldn't be very hard.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res
  2010-05-17 22:02           ` Benjamin Herrenschmidt
@ 2010-05-17 22:12             ` H. Peter Anvin
  0 siblings, 0 replies; 99+ messages in thread
From: H. Peter Anvin @ 2010-05-17 22:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Yinghai, Ingo Molnar, Thomas Gleixner, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/17/2010 03:02 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2010-05-17 at 11:03 -0700, H. Peter Anvin wrote:
>> On 05/14/2010 01:09 AM, Benjamin Herrenschmidt wrote:
>>>
>>> No. That is not the point. Read the rest of my email !
>>>
>>> We need to -sanitize- those errors. _Maybe_ exposing LMB_ERROR is the
>>> right way to do so, but in that case, we need to make -all- function use
>>> the same error code. Right now, some fail with 0 and some with
>>> LMB_ERROR.
>>>
>>
>> Using errnos like the rest of the kernel seems like the right thing to
>> do, IMO.
> 
> Maybe. The allocator/find functions return a physical address. If we all
> agree that a physical address between -PAGE_SIZE and 0 is never valid,
> then we can overlay the negative errno codes like we do for pointers.
> 
> I'll have a look at that, it shouldn't be very hard.
> 

For x86 with 64-bit resource_t, this is always true (physical addresses
are never negative.)  For x86 with 32-bit resource_t negative physical
addresses can be negative, but the top of the address space will always
be occupied by the boot ROM (it's a hardware constraint.)

	-hpa

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-17 22:01                       ` Benjamin Herrenschmidt
@ 2010-05-17 22:19                         ` Yinghai
  2010-05-17 22:26                         ` H. Peter Anvin
       [not found]                         ` <4C09A9EA.6060005@oracle.com>
  2 siblings, 0 replies; 99+ messages in thread
From: Yinghai @ 2010-05-17 22:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/17/2010 03:01 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2010-05-17 at 10:18 -0700, Yinghai wrote:
>>> No. Both will hit 2.6.36. It's way too late to queue up such changes
>> for
>>> the 2.6.35 merge window which has already opened.
>>
>> i have feeling that your new LMB code will hit 2.6.36. and
>> x86 patches that is using to lmb will hit 2.6.37.
>>
>> otherwise it will make more merge conflicts between tip and lmb.
>> unless put your lmb change to tip?
> 
> There is no reason not to, I'll have them in a separate branch that Ingo
> can pull. I think we can aim for 2.6.36 in one go provided that Peter
> and Thomas are happy with the x86 side of things.

great.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
  2010-05-17 22:01                       ` Benjamin Herrenschmidt
  2010-05-17 22:19                         ` Yinghai
@ 2010-05-17 22:26                         ` H. Peter Anvin
       [not found]                         ` <4C09A9EA.6060005@oracle.com>
  2 siblings, 0 replies; 99+ messages in thread
From: H. Peter Anvin @ 2010-05-17 22:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Yinghai, Ingo Molnar, Thomas Gleixner, Andrew Morton,
	David Miller, Linus Torvalds, Johannes Weiner, linux-kernel,
	linux-arch

On 05/17/2010 03:01 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2010-05-17 at 10:18 -0700, Yinghai wrote:
>>> No. Both will hit 2.6.36. It's way too late to queue up such changes
>> for
>>> the 2.6.35 merge window which has already opened.
>>
>> i have feeling that your new LMB code will hit 2.6.36. and
>> x86 patches that is using to lmb will hit 2.6.37.
>>
>> otherwise it will make more merge conflicts between tip and lmb.
>> unless put your lmb change to tip?
> 
> There is no reason not to, I'll have them in a separate branch that Ingo
> can pull. I think we can aim for 2.6.36 in one go provided that Peter
> and Thomas are happy with the x86 side of things.
> 

Yes, that's the sane thing to do at this point.

	-hpa

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok()
       [not found]                             ` <4C16C928.2000406@kernel.org>
@ 2010-06-15  6:55                               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 99+ messages in thread
From: Benjamin Herrenschmidt @ 2010-06-15  6:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-mm,
	Paul Mundt, Russell King

On Mon, 2010-06-14 at 17:28 -0700, Yinghai Lu wrote:
> On 06/04/2010 06:47 PM, Benjamin Herrenschmidt wrote:
> > On Fri, 2010-06-04 at 18:35 -0700, Yinghai Lu wrote:
> ..
> >> can you rebase powerpc/lmb, so we can put lmb for x86 changes to tip
> >> and next? 
> > 
> > I will. I've been kept busy with all sort of emergencies and the merge
> > window, but I will do that and a could of other things to it some time
> > next week.
> 
> Ping!

(Adding back the list)

I've updated the series (*) It's just a rebase from the previous one,
and one change: I don't allow resizing until after lmb_analyze() has run
since on various platforms, doing it too early such as when constructing
the memory array is risky as we haven't done the necessary lmb_reserve()
of whatever regions are unsuitable for allocation.

We can improve on that later, maybe by doing those reservations early,
before we add memory, or whatever, but that can wait.

Yinghai, is there any other chance you want me to do to the core ?

Another thing to add at some stage for ARM will be a default alloc base
in addition to limit, that constraints "standard" allocations.

(*) Usual place:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git lmb

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2010-06-15  6:56 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-14  0:19 [PATCH -v16 00/35] Use lmb with x86 Yinghai Lu
2010-05-14  0:19 ` [PATCH 01/35] lmb: prepare x86 to use lmb to replace early_res Yinghai Lu
2010-05-14  2:12   ` Benjamin Herrenschmidt
2010-05-14  6:19     ` Yinghai
2010-05-14  8:09       ` Benjamin Herrenschmidt
2010-05-14 16:23         ` Yinghai Lu
2010-05-17 18:03         ` H. Peter Anvin
2010-05-17 22:02           ` Benjamin Herrenschmidt
2010-05-17 22:12             ` H. Peter Anvin
2010-05-14  7:03     ` Yinghai
2010-05-14  0:19 ` [PATCH 02/35] lmb: Prepare to include linux/lmb.h in core file Yinghai Lu
2010-05-14  0:19 ` [PATCH 03/35] lmb: Add ARCH_DISCARD_LMB to put lmb code to .init Yinghai Lu
2010-05-14  2:14   ` Benjamin Herrenschmidt
2010-05-14  6:21     ` Yinghai
2010-05-14  8:10       ` Benjamin Herrenschmidt
2010-05-14 16:24         ` Yinghai Lu
2010-05-14  0:19 ` [PATCH 04/35] lmb: Add lmb_find_area() Yinghai Lu
2010-05-14  2:16   ` Benjamin Herrenschmidt
2010-05-14  6:25     ` Yinghai
2010-05-14  8:12       ` Benjamin Herrenschmidt
2010-05-14 16:28         ` Yinghai Lu
2010-05-14  0:19 ` [PATCH 05/35] x86, lmb: Add lmb_find_area_size() Yinghai Lu
2010-05-14  2:20   ` Benjamin Herrenschmidt
2010-05-14  6:28     ` Yinghai
2010-05-14  8:13       ` Benjamin Herrenschmidt
2010-05-14 16:33         ` Yinghai Lu
2010-05-14 22:20           ` Benjamin Herrenschmidt
2010-05-14  0:19 ` [PATCH 06/35] bootmem, x86: Add weak version of reserve_bootmem_generic Yinghai Lu
2010-05-14  0:19 ` [PATCH 07/35] x86, lmb: Add lmb_to_bootmem() Yinghai Lu
2010-05-14  0:19 ` [PATCH 08/35] x86,lmb: Add lmb_reserve_area/lmb_free_area Yinghai Lu
2010-05-14  2:26   ` Benjamin Herrenschmidt
2010-05-14  6:30     ` Yinghai
2010-05-14  8:15       ` Benjamin Herrenschmidt
2010-05-14  0:19 ` [PATCH 09/35] x86, lmb: Add get_free_all_memory_range() Yinghai Lu
2010-05-14  0:19 ` [PATCH 10/35] x86, lmb: Add lmb_register_active_regions() and lmb_hole_size() Yinghai Lu
2010-05-14  0:19 ` [PATCH 11/35] lmb: Add find_memory_core_early() Yinghai Lu
2010-05-14  2:29   ` Benjamin Herrenschmidt
2010-05-14  6:34     ` Yinghai
2010-05-14  8:16       ` Benjamin Herrenschmidt
2010-05-14  2:30   ` Benjamin Herrenschmidt
2010-05-14  6:39     ` Yinghai
2010-05-14  8:19       ` Benjamin Herrenschmidt
2010-05-14  8:30         ` David Miller
2010-05-14 16:44           ` Yinghai Lu
2010-05-14 22:34             ` Benjamin Herrenschmidt
2010-05-14 23:51               ` lmb type features Yinghai
2010-05-17  0:46                 ` Benjamin Herrenschmidt
2010-05-17  6:06                   ` Yinghai
2010-05-14  0:19 ` [PATCH 12/35] x86, lmb: Add lmb_find_area_node() Yinghai Lu
2010-05-14  0:19 ` [PATCH 13/35] x86, lmb: Add lmb_free_memory_size() Yinghai Lu
2010-05-14  2:31   ` Benjamin Herrenschmidt
2010-05-14  6:42     ` Yinghai
2010-05-14  8:21       ` Benjamin Herrenschmidt
2010-05-14 16:37         ` Yinghai Lu
2010-05-14 22:20           ` Benjamin Herrenschmidt
2010-05-14  0:19 ` [PATCH 14/35] x86, lmb: Add lmb_memory_size() Yinghai Lu
2010-05-14  2:31   ` Benjamin Herrenschmidt
2010-05-14  0:19 ` [PATCH 15/35] x86, lmb: Add lmb_reserve_area_overlap_ok() Yinghai Lu
2010-05-14  2:32   ` Benjamin Herrenschmidt
2010-05-14  6:44     ` Yinghai
2010-05-14  8:30       ` Benjamin Herrenschmidt
2010-05-14 16:40         ` Yinghai Lu
2010-05-14 22:30           ` Benjamin Herrenschmidt
2010-05-15  7:32             ` Ingo Molnar
2010-05-17  0:39               ` Benjamin Herrenschmidt
2010-05-17  6:11                 ` Yinghai
2010-05-17  6:40                   ` H. Peter Anvin
2010-05-17  7:24                   ` Benjamin Herrenschmidt
2010-05-17 17:18                     ` Yinghai
2010-05-17 18:53                       ` H. Peter Anvin
2010-05-17 22:01                       ` Benjamin Herrenschmidt
2010-05-17 22:19                         ` Yinghai
2010-05-17 22:26                         ` H. Peter Anvin
     [not found]                         ` <4C09A9EA.6060005@oracle.com>
     [not found]                           ` <1275702466.1931.1425.camel@pasglop>
     [not found]                             ` <4C16C928.2000406@kernel.org>
2010-06-15  6:55                               ` Benjamin Herrenschmidt
2010-05-14  0:19 ` [PATCH 16/35] x86, lmb: Use lmb_debug to control debug message print out Yinghai Lu
2010-05-14  0:19 ` [PATCH 17/35] x86, lmb: Add x86 version of __lmb_find_area() Yinghai Lu
2010-05-14  2:34   ` Benjamin Herrenschmidt
2010-05-14  6:47     ` Yinghai
2010-05-14  8:31       ` Benjamin Herrenschmidt
2010-05-14 16:41         ` Yinghai Lu
2010-05-14  0:19 ` [PATCH 18/35] x86: Use lmb to replace early_res Yinghai Lu
2010-05-14  0:19 ` [PATCH 19/35] x86: Replace e820_/_early string with lmb_ Yinghai Lu
2010-05-14  0:19 ` [PATCH 20/35] x86: Remove not used early_res code Yinghai Lu
2010-05-14  0:19 ` [PATCH 21/35] x86, lmb: Use lmb_memory_size()/lmb_free_memory_size() to get correct dma_reserve Yinghai Lu
2010-05-14  0:19 ` [PATCH 22/35] bootmem: Add nobootmem.c to reduce the #ifdef Yinghai Lu
2010-05-14  0:19 ` [PATCH 23/35] mm: move contig_page_data define to bootmem.c/nobootmem.c Yinghai Lu
2010-05-14  0:19 ` [PATCH 24/35] lmb: Move __alloc_memory_core_early() to nobootmem.c Yinghai Lu
2010-05-14  2:36   ` Benjamin Herrenschmidt
2010-05-14  0:19 ` [PATCH 25/35] x86: Have nobootmem version setup_bootmem_allocator() Yinghai Lu
2010-05-14  0:19 ` [PATCH 26/35] x86: Put 64 bit numa node memmap above 16M Yinghai Lu
2010-05-14  0:19 ` [PATCH 27/35] swiotlb: Use page alignment for early buffer allocation Yinghai Lu
2010-05-14  0:19 ` [PATCH 28/35] x86: Add sanitize_e820_map() Yinghai Lu
2010-05-14  0:19 ` [PATCH 29/35] x86: Change e820_saved to __initdata Yinghai Lu
2010-05-14  0:19 ` [PATCH 30/35] x86: Align e820 ram range to page Yinghai Lu
2010-05-14  0:19 ` [PATCH 31/35] x86: Use wake_system_ram_range() instead of e820_any_mapped() in agp path Yinghai Lu
2010-05-14  0:19 ` [PATCH 32/35] x86: Add get_centaur_ram_top() Yinghai Lu
2010-05-14  0:19 ` [PATCH 33/35] x86: Change e820_any_mapped() to __init Yinghai Lu
2010-05-14  0:19 ` [PATCH 34/35] x86: Use walk_system_ream_range() instead of referring e820.map directly for tboot Yinghai Lu
2010-05-14  0:19 ` [PATCH 35/35] x86: make e820 to be __initdata Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.