All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early
@ 2013-03-10  6:44 Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 01/20] x86: Change get_ramdisk_image() to global Yinghai Lu
                   ` (19 more replies)
  0 siblings, 20 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu

One commit that tried to parse SRAT early get reverted before v3.9-rc1.

| commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f
| Author: Tang Chen <tangchen@cn.fujitsu.com>
| Date:   Fri Feb 22 16:33:44 2013 -0800
|
|    acpi, memory-hotplug: parse SRAT before memblock is ready

It broke several things, like acpi override and fall back path etc.

This patchset is clean implementation that will parse numa info early.
1. keep the acpi table initrd override working by split finding with copying.
   finding is done at head_32.S and head64.c stage,
        in head_32.S, initrd is accessed in 32bit flat mode with phys addr.
        in head64.c, initrd is accessed via kernel low mapping address
        with help of #PF set page table.
   copying is done with early_ioremap just after memblock is setup.
2. keep fallback path working. numaq and ACPI and amd_nmua and dummy.
   seperate initmem_init to two stages.
   early_initmem_init will only extract numa info early into numa_meminfo.
   initmem_init will keep slit and emulation handling.
3. keep other old code flow untouched like relocate_initrd and initmem_init.
   early_initmem_init will take old init_mem_mapping position.
   it call early_x86_numa_init and init_mem_mapping for every nodes.
   For 64bit, we avoid having size limit on initrd, as relocate_initrd
   is still after init_mem_mapping for all memory.
4. last patch will try to put page table on local node, so that memory
   hotplug will be happy.

In short, early_initmem_init will parse numa info early and call
init_mem_mapping to set page table for every nodes's mem.

could be found at:
        git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-mm

and it is based on today's Linus tree.

-v2: Address tj's review and split patches to small ones.

Thanks

Yinghai

Yinghai Lu (20):
  x86: Change get_ramdisk_image() to global
  x86, microcode: Use common get_ramdisk_image()
  x86, ACPI, mm: Kill max_low_pfn_mapped
  x86, ACPI: Increase override tables number limit
  x86, ACPI: Split acpi_initrd_override to find/copy two functions
  x86, ACPI: Store override acpi tables phys addr in cpio files info array
  x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode
  x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c
  x86, mm, numa: Move two functions calling on successful path later
  x86, mm, numa: Call numa_meminfo_cover_memory() checking early
  x86, mm, numa: Move node_map_pfn alignment() to x86
  x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment
  x86, mm, numa: Set memblock nid later
  x86, mm, numa: Move node_possible_map setting later
  x86, mm, numa: Move emulation handling down.
  x86, ACPI, numa, ia64: split SLIT handling out
  x86, mm, numa: Add early_initmem_init() stub
  x86, mm: Parse numa info early
  x86, mm: Make init_mem_mapping be able to be called several times
  x86, mm, numa: Put pagetable on local node ram for 64bit

 arch/ia64/kernel/setup.c                |    4 +-
 arch/x86/include/asm/acpi.h             |    3 +-
 arch/x86/include/asm/page_types.h       |    2 +-
 arch/x86/include/asm/pgtable.h          |    2 +-
 arch/x86/include/asm/setup.h            |    9 ++
 arch/x86/kernel/head64.c                |    2 +
 arch/x86/kernel/head_32.S               |    4 +
 arch/x86/kernel/microcode_intel_early.c |    8 +-
 arch/x86/kernel/setup.c                 |   86 ++++++-----
 arch/x86/mm/init.c                      |   88 +++++++-----
 arch/x86/mm/numa.c                      |  240 ++++++++++++++++++++++++-------
 arch/x86/mm/numa_emulation.c            |    2 +-
 arch/x86/mm/numa_internal.h             |    2 +
 arch/x86/mm/srat.c                      |   11 +-
 drivers/acpi/numa.c                     |   13 +-
 drivers/acpi/osl.c                      |  134 +++++++++++------
 include/linux/acpi.h                    |   20 +--
 include/linux/mm.h                      |    3 -
 mm/page_alloc.c                         |   52 +------
 19 files changed, 445 insertions(+), 240 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 01/20] x86: Change get_ramdisk_image() to global
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 02/20] x86, microcode: Use common get_ramdisk_image() Yinghai Lu
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu

Need to use get_ramdisk_image() with early microcode_updating in other file.
Change it to global.

Also make it to take boot_params pointer, as head_32.S need to access it via
phys address during 32bit flat mode.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/setup.h |    3 +++
 arch/x86/kernel/setup.c      |   28 ++++++++++++++--------------
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..4f71d48 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,9 @@ void *extend_brk(size_t size, size_t align);
 	RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+u64 get_ramdisk_image(struct boot_params *bp);
+u64 get_ramdisk_size(struct boot_params *bp);
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 90d8cc9..1629577 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -300,19 +300,19 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
-static u64 __init get_ramdisk_image(void)
+u64 __init get_ramdisk_image(struct boot_params *bp)
 {
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+	u64 ramdisk_image = bp->hdr.ramdisk_image;
 
-	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+	ramdisk_image |= (u64)bp->ext_ramdisk_image << 32;
 
 	return ramdisk_image;
 }
-static u64 __init get_ramdisk_size(void)
+u64 __init get_ramdisk_size(struct boot_params *bp)
 {
-	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_size = bp->hdr.ramdisk_size;
 
-	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+	ramdisk_size |= (u64)bp->ext_ramdisk_size << 32;
 
 	return ramdisk_size;
 }
@@ -321,8 +321,8 @@ static u64 __init get_ramdisk_size(void)
 static void __init relocate_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 area_size     = PAGE_ALIGN(ramdisk_size);
 	u64 ramdisk_here;
 	unsigned long slop, clen, mapaddr;
@@ -361,8 +361,8 @@ static void __init relocate_initrd(void)
 		ramdisk_size  -= clen;
 	}
 
-	ramdisk_image = get_ramdisk_image();
-	ramdisk_size  = get_ramdisk_size();
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
 	printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
 		" [mem %#010llx-%#010llx]\n",
 		ramdisk_image, ramdisk_image + ramdisk_size - 1,
@@ -372,8 +372,8 @@ static void __init relocate_initrd(void)
 static void __init early_reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 
 	if (!boot_params.hdr.type_of_loader ||
@@ -385,8 +385,8 @@ static void __init early_reserve_initrd(void)
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 	u64 mapped_size;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 02/20] x86, microcode: Use common get_ramdisk_image()
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 01/20] x86: Change get_ramdisk_image() to global Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-04-04 17:48   ` Tejun Heo
  2013-03-10  6:44   ` Yinghai Lu
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Fenghua Yu

Use common get_ramdisk_image() to get ramdisk start phys address.

We need this to get correct ramdisk adress for 64bit bzImage that
initrd can be loaded above 4G by kexec-tools.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/microcode_intel_early.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c
index 7890bc8..a8df75f 100644
--- a/arch/x86/kernel/microcode_intel_early.c
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -742,8 +742,8 @@ load_ucode_intel_bsp(void)
 	struct boot_params *boot_params_p;
 
 	boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
-	ramdisk_image = boot_params_p->hdr.ramdisk_image;
-	ramdisk_size  = boot_params_p->hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image(boot_params_p);
+	ramdisk_size  = get_ramdisk_image(boot_params_p);
 	initrd_start_early = ramdisk_image;
 	initrd_end_early = initrd_start_early + ramdisk_size;
 
@@ -752,8 +752,8 @@ load_ucode_intel_bsp(void)
 		(unsigned long *)__pa_symbol(&mc_saved_in_initrd),
 		initrd_start_early, initrd_end_early, &uci);
 #else
-	ramdisk_image = boot_params.hdr.ramdisk_image;
-	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
 	initrd_start_early = ramdisk_image + PAGE_OFFSET;
 	initrd_end_early = initrd_start_early + ramdisk_size;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 03/20] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
@ 2013-03-10  6:44   ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 02/20] x86, microcode: Use common get_ramdisk_image() Yinghai Lu
                     ` (18 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: Daniel Vetter, Jacob Shin, linux-kernel, dri-devel,
	Rafael J. Wysocki, linux-acpi, Yinghai Lu

Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
be used anymore.

User should use arch_pfn_mapped or just 1UL<<(32-PAGE_SHIFT) instead.

Only user is ACPI_INITRD_TABLE_OVERRIDE, and it should not use that,
as later accessing is using early_ioremap(). Change to try to 4G below
and then 4G above.

-v2: Leave alone max_low_pfn_mapped in i915 code according to tj.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: linux-acpi@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
---
 arch/x86/include/asm/page_types.h |    1 -
 arch/x86/kernel/setup.c           |    4 +---
 arch/x86/mm/init.c                |    4 ----
 drivers/acpi/osl.c                |   10 +++++++---
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 54c9787..b012b82 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -43,7 +43,6 @@
 
 extern int devmem_is_allowed(unsigned long pagenr);
 
-extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
 static inline phys_addr_t get_max_mapped(void)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1629577..e75c6e6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -113,13 +113,11 @@
 #include <asm/prom.h>
 
 /*
- * max_low_pfn_mapped: highest direct mapped pfn under 4GB
- * max_pfn_mapped:     highest direct mapped pfn over 4GB
+ * max_pfn_mapped:     highest direct mapped pfn
  *
  * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
  * represented by pfn_mapped
  */
-unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
 #ifdef CONFIG_DMI
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 59b7fc4..abcc241 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -313,10 +313,6 @@ static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
 	nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_X_MAX);
 
 	max_pfn_mapped = max(max_pfn_mapped, end_pfn);
-
-	if (start_pfn < (1UL<<(32-PAGE_SHIFT)))
-		max_low_pfn_mapped = max(max_low_pfn_mapped,
-					 min(end_pfn, 1UL<<(32-PAGE_SHIFT)));
 }
 
 bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 586e7e9..c08cdb6 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -624,9 +624,13 @@ void __init acpi_initrd_override(void *data, size_t size)
 	if (table_nr == 0)
 		return;
 
-	acpi_tables_addr =
-		memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
-				       all_tables_size, PAGE_SIZE);
+	/* under 4G at first, then above 4G */
+	acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
+					all_tables_size, PAGE_SIZE);
+	if (!acpi_tables_addr)
+		acpi_tables_addr = memblock_find_in_range(0,
+					~(phys_addr_t)0,
+					all_tables_size, PAGE_SIZE);
 	if (!acpi_tables_addr) {
 		WARN_ON(1);
 		return;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 03/20] x86, ACPI, mm: Kill max_low_pfn_mapped
@ 2013-03-10  6:44   ` Yinghai Lu
  0 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Rafael J. Wysocki, Daniel Vetter,
	David Airlie, Jacob Shin, linux-acpi, dri-devel

Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
be used anymore.

User should use arch_pfn_mapped or just 1UL<<(32-PAGE_SHIFT) instead.

Only user is ACPI_INITRD_TABLE_OVERRIDE, and it should not use that,
as later accessing is using early_ioremap(). Change to try to 4G below
and then 4G above.

-v2: Leave alone max_low_pfn_mapped in i915 code according to tj.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: linux-acpi@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
---
 arch/x86/include/asm/page_types.h |    1 -
 arch/x86/kernel/setup.c           |    4 +---
 arch/x86/mm/init.c                |    4 ----
 drivers/acpi/osl.c                |   10 +++++++---
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 54c9787..b012b82 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -43,7 +43,6 @@
 
 extern int devmem_is_allowed(unsigned long pagenr);
 
-extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
 static inline phys_addr_t get_max_mapped(void)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1629577..e75c6e6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -113,13 +113,11 @@
 #include <asm/prom.h>
 
 /*
- * max_low_pfn_mapped: highest direct mapped pfn under 4GB
- * max_pfn_mapped:     highest direct mapped pfn over 4GB
+ * max_pfn_mapped:     highest direct mapped pfn
  *
  * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
  * represented by pfn_mapped
  */
-unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
 #ifdef CONFIG_DMI
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 59b7fc4..abcc241 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -313,10 +313,6 @@ static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
 	nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_X_MAX);
 
 	max_pfn_mapped = max(max_pfn_mapped, end_pfn);
-
-	if (start_pfn < (1UL<<(32-PAGE_SHIFT)))
-		max_low_pfn_mapped = max(max_low_pfn_mapped,
-					 min(end_pfn, 1UL<<(32-PAGE_SHIFT)));
 }
 
 bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 586e7e9..c08cdb6 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -624,9 +624,13 @@ void __init acpi_initrd_override(void *data, size_t size)
 	if (table_nr == 0)
 		return;
 
-	acpi_tables_addr =
-		memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
-				       all_tables_size, PAGE_SIZE);
+	/* under 4G at first, then above 4G */
+	acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
+					all_tables_size, PAGE_SIZE);
+	if (!acpi_tables_addr)
+		acpi_tables_addr = memblock_find_in_range(0,
+					~(phys_addr_t)0,
+					all_tables_size, PAGE_SIZE);
 	if (!acpi_tables_addr) {
 		WARN_ON(1);
 		return;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 04/20] x86, ACPI: Increase override tables number limit
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (2 preceding siblings ...)
  2013-03-10  6:44   ` Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-04-04 17:50   ` Tejun Heo
  2013-03-10  6:44 ` [PATCH v2 05/20] x86, ACPI: Split acpi_initrd_override to find/copy two functions Yinghai Lu
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Rafael J. Wysocki, linux-acpi

Current acpi tables in initrd is limited to 10, that is too small.
64 should be good enough as we have 35 sigs and could have several
SSDT.

Two problems in current code prevent us from increasing limit:
1. that cpio file info array is put in stack, as every element is 32
   bytes, could run out of stack if we have that array size to 64.
   We can move it out from stack, and make it as global and put it in
   __initdata section.
2. early_ioremap only can remap 256k one time. Current code is mapping
   10 tables one time. If we increase that limit, whole size could be
   more than 256k, early_ioremap will fail with that.
   We can map table one by one during copying, instead of mapping
   all them one time.

-v2: According to tj, split it out to separated patch, also
     rename array name to acpi_initrd_files.

Signed-off-by: Yinghai <yinghai@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
---
 drivers/acpi/osl.c |   21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index c08cdb6..8aaf721 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -569,8 +569,8 @@ static const char * const table_sigs[] = {
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
-/* Must not increase 10 or needs code modification below */
-#define ACPI_OVERRIDE_TABLES 10
+#define ACPI_OVERRIDE_TABLES 64
+static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
 void __init acpi_initrd_override(void *data, size_t size)
 {
@@ -579,7 +579,6 @@ void __init acpi_initrd_override(void *data, size_t size)
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
 	struct cpio_data file;
-	struct cpio_data early_initrd_files[ACPI_OVERRIDE_TABLES];
 	char *p;
 
 	if (data == NULL || size == 0)
@@ -617,8 +616,8 @@ void __init acpi_initrd_override(void *data, size_t size)
 			table->signature, cpio_path, file.name, table->length);
 
 		all_tables_size += table->length;
-		early_initrd_files[table_nr].data = file.data;
-		early_initrd_files[table_nr].size = file.size;
+		acpi_initrd_files[table_nr].data = file.data;
+		acpi_initrd_files[table_nr].size = file.size;
 		table_nr++;
 	}
 	if (table_nr == 0)
@@ -648,14 +647,14 @@ void __init acpi_initrd_override(void *data, size_t size)
 	memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
 	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
-	p = early_ioremap(acpi_tables_addr, all_tables_size);
-
 	for (no = 0; no < table_nr; no++) {
-		memcpy(p + total_offset, early_initrd_files[no].data,
-		       early_initrd_files[no].size);
-		total_offset += early_initrd_files[no].size;
+		phys_addr_t size = acpi_initrd_files[no].size;
+
+		p = early_ioremap(acpi_tables_addr + total_offset, size);
+		memcpy(p, acpi_initrd_files[no].data, size);
+		early_iounmap(p, size);
+		total_offset += size;
 	}
-	early_iounmap(p, all_tables_size);
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 05/20] x86, ACPI: Split acpi_initrd_override to find/copy two functions
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (3 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 04/20] x86, ACPI: Increase override tables number limit Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-04-04 18:07   ` Tejun Heo
  2013-03-10  6:44 ` [PATCH v2 06/20] x86, ACPI: Store override acpi tables phys addr in cpio files info array Yinghai Lu
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin,
	Rafael J. Wysocki, linux-acpi

To parse srat early, we need to move acpi table probing early.
acpi_initrd_table_override is before acpi table probing. So we need to
move it early too.

Current code acpi_initrd_table_override is after init_mem_mapping and
relocate_initrd(), so it can scan initrd and copy acpi tables with kernel
virtual address of initrd.
Copying need to be after memblock is ready, because it need to allocate
buffer for new acpi tables.

So we have to split that function to find and copy two functions.
Find should be as early as possible. Copy should be after memblock is ready.

Finding could be done in head_32.S and head64.c, just like microcode
early scanning. In head_32.S, it is 32bit flat mode, we don't
need to set page table to access it. In head64.c, #PF set page table
could help us access initrd with kernel low mapping address.

Copying could be done just after memblock is ready and before probing
acpi tables, and we need to early_ioremap to access source and target
range, as init_mem_mapping is not called yet.

Also move down two functions declaration to avoid #ifdef in setup.c

ACPI_INITRD_TABLE_OVERRIDE depends one ACPI and BLK_DEV_INITRD.
So could move declaration out from #ifdef CONFIG_ACPI protection.

-v2: Split one patch out according to tj.
     also don't pass table_nr around.

Signed-off-by: Yinghai <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
---
 arch/x86/kernel/setup.c |    6 +++---
 drivers/acpi/osl.c      |   18 +++++++++++++-----
 include/linux/acpi.h    |   16 ++++++++--------
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e75c6e6..d0cc176 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1092,9 +1092,9 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_initrd();
 
-#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start);
-#endif
+	acpi_initrd_override_find((void *)initrd_start,
+					initrd_end - initrd_start);
+	acpi_initrd_override_copy();
 
 	reserve_crashkernel();
 
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 8aaf721..d66ae0e 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -572,14 +572,13 @@ static const char * const table_sigs[] = {
 #define ACPI_OVERRIDE_TABLES 64
 static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
-void __init acpi_initrd_override(void *data, size_t size)
+void __init acpi_initrd_override_find(void *data, size_t size)
 {
-	int sig, no, table_nr = 0, total_offset = 0;
+	int sig, no, table_nr = 0;
 	long offset = 0;
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
 	struct cpio_data file;
-	char *p;
 
 	if (data == NULL || size == 0)
 		return;
@@ -620,7 +619,14 @@ void __init acpi_initrd_override(void *data, size_t size)
 		acpi_initrd_files[table_nr].size = file.size;
 		table_nr++;
 	}
-	if (table_nr == 0)
+}
+
+void __init acpi_initrd_override_copy(void)
+{
+	int no, total_offset = 0;
+	char *p;
+
+	if (!all_tables_size)
 		return;
 
 	/* under 4G at first, then above 4G */
@@ -647,9 +653,11 @@ void __init acpi_initrd_override(void *data, size_t size)
 	memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
 	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
-	for (no = 0; no < table_nr; no++) {
+	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
 		phys_addr_t size = acpi_initrd_files[no].size;
 
+		if (!size)
+			break;
 		p = early_ioremap(acpi_tables_addr + total_offset, size);
 		memcpy(p, acpi_initrd_files[no].data, size);
 		early_iounmap(p, size);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index bcbdd74..1654a241 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -79,14 +79,6 @@ typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table);
 typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 				      const unsigned long end);
 
-#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
-void acpi_initrd_override(void *data, size_t size);
-#else
-static inline void acpi_initrd_override(void *data, size_t size)
-{
-}
-#endif
-
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
@@ -485,6 +477,14 @@ static inline bool acpi_driver_match_device(struct device *dev,
 
 #endif	/* !CONFIG_ACPI */
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void acpi_initrd_override_find(void *data, size_t size);
+void acpi_initrd_override_copy(void);
+#else
+static inline void acpi_initrd_override_find(void *data, size_t size) { }
+static inline void acpi_initrd_override_copy(void) { }
+#endif
+
 #ifdef CONFIG_ACPI
 void acpi_os_set_prepare_sleep(int (*func)(u8 sleep_state,
 			       u32 pm1a_ctrl,  u32 pm1b_ctrl));
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 06/20] x86, ACPI: Store override acpi tables phys addr in cpio files info array
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (4 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 05/20] x86, ACPI: Split acpi_initrd_override to find/copy two functions Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-04-04 18:27   ` Tejun Heo
  2013-03-10  6:44 ` [PATCH v2 07/20] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode Yinghai Lu
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Rafael J. Wysocki, linux-acpi

In 32bit we will find table with phys address during 32bit flat mode
in head_32.S, because at that time we don't need set page table to
access initrd.

For copying we could use early_ioremap() with phys directly before mem mapping
is set.

To keep 32bit and 64bit consistent, use phys_addr for all.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
---
 drivers/acpi/osl.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index d66ae0e..54bcc37 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -615,7 +615,7 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 			table->signature, cpio_path, file.name, table->length);
 
 		all_tables_size += table->length;
-		acpi_initrd_files[table_nr].data = file.data;
+		acpi_initrd_files[table_nr].data = (void *)__pa(file.data);
 		acpi_initrd_files[table_nr].size = file.size;
 		table_nr++;
 	}
@@ -624,7 +624,7 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 void __init acpi_initrd_override_copy(void)
 {
 	int no, total_offset = 0;
-	char *p;
+	char *p, *q;
 
 	if (!all_tables_size)
 		return;
@@ -654,12 +654,20 @@ void __init acpi_initrd_override_copy(void)
 	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
 	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
+		/*
+		 * have to use unsigned long, otherwise 32bit spit warning
+		 * and it is ok to unsigned long, as bootloader would not
+		 * load initrd above 4G for 32bit kernel.
+		 */
+		unsigned long addr = (unsigned long)acpi_initrd_files[no].data;
 		phys_addr_t size = acpi_initrd_files[no].size;
 
 		if (!size)
 			break;
+		q = early_ioremap(addr, size);
 		p = early_ioremap(acpi_tables_addr + total_offset, size);
-		memcpy(p, acpi_initrd_files[no].data, size);
+		memcpy(p, q, size);
+		early_iounmap(q, size);
 		early_iounmap(p, size);
 		total_offset += size;
 	}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 07/20] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (5 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 06/20] x86, ACPI: Store override acpi tables phys addr in cpio files info array Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-04-04 18:35   ` Tejun Heo
  2013-03-10  6:44 ` [PATCH v2 08/20] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c Yinghai Lu
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin,
	Rafael J. Wysocki, linux-acpi

For finding with 32bit, it would be easy to access initrd in 32bit
flat mode, as we don't need to set page table.

That is from head_32.S, and microcode updating already use this trick.

Need to change acpi_initrd_override_find to use phys to access global
variables.

Pass is_phys in the function, as we can not use address to decide if it
is phys or virtual address on 32 bit. Boot loader could load initrd above
max_low_pfn.

Don't call printk as it uses global variables, so delay print later
during copying.

Change table_sigs to use stack instead, otherwise it is too messy to change
string array to phys and still keep offset calculating correct.
That size is about 36x4 bytes, and it is small to settle in stack.

Also remove "continue" in MARCO to make code more readable.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
---
 arch/x86/kernel/setup.c |    2 +-
 drivers/acpi/osl.c      |   85 ++++++++++++++++++++++++++++++++---------------
 include/linux/acpi.h    |    5 +--
 3 files changed, 63 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d0cc176..16a703f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1093,7 +1093,7 @@ void __init setup_arch(char **cmdline_p)
 	reserve_initrd();
 
 	acpi_initrd_override_find((void *)initrd_start,
-					initrd_end - initrd_start);
+					initrd_end - initrd_start, false);
 	acpi_initrd_override_copy();
 
 	reserve_crashkernel();
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 54bcc37..611ca9b 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -551,38 +551,54 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
 	return sum;
 }
 
-/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
-static const char * const table_sigs[] = {
-	ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
-	ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
-	ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
-	ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
-	ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
-	ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
-	ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
-	ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
-
 /* Non-fatal errors: Affected tables/files are ignored */
 #define INVALID_TABLE(x, path, name)					\
-	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
+	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
 #define ACPI_OVERRIDE_TABLES 64
 static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
-void __init acpi_initrd_override_find(void *data, size_t size)
+/*
+ * acpi_initrd_override_find() is called from head_32.S and head64.c.
+ * head_32.S calling path is with 32bit flat mode, so we can access
+ * initrd early without setting pagetable or relocating initrd. For
+ * global variables accessing, we need to use phys address instead of
+ * kernel virtual address, try to put table_sigs string array in stack,
+ * so avoid switching for it.
+ * Also don't call printk as it uses global variables.
+ */
+void __init acpi_initrd_override_find(void *data, size_t size, bool is_phys)
 {
 	int sig, no, table_nr = 0;
 	long offset = 0;
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
 	struct cpio_data file;
+	struct cpio_data *files = acpi_initrd_files;
+	int *all_tables_size_p = &all_tables_size;
+
+	/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
+	char *table_sigs[] = {
+		ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
+		ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
+		ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
+		ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
+		ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
+		ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
+		ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
+		ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
+		ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
 	if (data == NULL || size == 0)
 		return;
 
+	if (is_phys) {
+		files = (struct cpio_data *)__pa_symbol(acpi_initrd_files);
+		all_tables_size_p = (int *)__pa_symbol(&all_tables_size);
+	}
+
 	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
 		file = find_cpio_data(cpio_path, data, size, &offset);
 		if (!file.data)
@@ -591,9 +607,12 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header))
-			INVALID_TABLE("Table smaller than ACPI header",
+		if (file.size < sizeof(struct acpi_table_header)) {
+			if (!is_phys)
+				INVALID_TABLE("Table smaller than ACPI header",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		table = file.data;
 
@@ -601,22 +620,33 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 			if (!memcmp(table->signature, table_sigs[sig], 4))
 				break;
 
-		if (!table_sigs[sig])
-			INVALID_TABLE("Unknown signature",
+		if (!table_sigs[sig]) {
+			if (!is_phys)
+				 INVALID_TABLE("Unknown signature",
 				      cpio_path, file.name);
-		if (file.size != table->length)
-			INVALID_TABLE("File length does not match table length",
+			continue;
+		}
+		if (file.size != table->length) {
+			if (!is_phys)
+				INVALID_TABLE("File length does not match table length",
 				      cpio_path, file.name);
-		if (acpi_table_checksum(file.data, table->length))
-			INVALID_TABLE("Bad table checksum",
+			continue;
+		}
+		if (acpi_table_checksum(file.data, table->length)) {
+			if (!is_phys)
+				INVALID_TABLE("Bad table checksum",
 				      cpio_path, file.name);
+			continue;
+		}
 
-		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
+		if (!is_phys)
+			pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
 
-		all_tables_size += table->length;
-		acpi_initrd_files[table_nr].data = (void *)__pa(file.data);
-		acpi_initrd_files[table_nr].size = file.size;
+		(*all_tables_size_p) += table->length;
+		files[table_nr].data = is_phys ?
+					    file.data : (void *)__pa(file.data);
+		files[table_nr].size = file.size;
 		table_nr++;
 	}
 }
@@ -666,6 +696,9 @@ void __init acpi_initrd_override_copy(void)
 			break;
 		q = early_ioremap(addr, size);
 		p = early_ioremap(acpi_tables_addr + total_offset, size);
+		pr_info("%4.4s ACPI table found in initrd [%#010llx-%#010llx]\n",
+				((struct acpi_table_header *)q)->signature,
+				(u64)addr, (u64)(addr + size - 1));
 		memcpy(p, q, size);
 		early_iounmap(q, size);
 		early_iounmap(p, size);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 1654a241..4b943e6 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -478,10 +478,11 @@ static inline bool acpi_driver_match_device(struct device *dev,
 #endif	/* !CONFIG_ACPI */
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
-void acpi_initrd_override_find(void *data, size_t size);
+void acpi_initrd_override_find(void *data, size_t size, bool is_phys);
 void acpi_initrd_override_copy(void);
 #else
-static inline void acpi_initrd_override_find(void *data, size_t size) { }
+static inline void acpi_initrd_override_find(void *data, size_t size,
+						 bool is_phys) { }
 static inline void acpi_initrd_override_copy(void) { }
 #endif
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 08/20] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (6 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 07/20] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10 10:25   ` Pekka Enberg
  2013-03-10  6:44 ` [PATCH v2 09/20] x86, mm, numa: Move two functions calling on successful path later Yinghai Lu
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin,
	Rafael J. Wysocki, linux-acpi

head64.c could use #PF handler set page table to access initrd before
init mem mapping and initrd relocating.

head_32.S could use 32bit flat mode to access initrd before init mem
mapping initrd relocating.

That make 32bit and 64 bit more consistent.

-v2: use inline function in header file instead according to tj.
     also still need to keep #idef head_32.S to avoid compiling error.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
---
 arch/x86/include/asm/setup.h |    6 ++++++
 arch/x86/kernel/head64.c     |    2 ++
 arch/x86/kernel/head_32.S    |    4 ++++
 arch/x86/kernel/setup.c      |   30 ++++++++++++++++++++++++++++--
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 4f71d48..6f885b7 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -42,6 +42,12 @@ extern void visws_early_detect(void);
 static inline void visws_early_detect(void) { }
 #endif
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void x86_acpi_override_find(void);
+#else
+static inline void x86_acpi_override_find(void) { }
+#endif
+
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index c5e403f..a31bc63 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -174,6 +174,8 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
+	x86_acpi_override_find();
+
 	clear_page(init_level4_pgt);
 	/* set init_level4_pgt kernel high mapping*/
 	init_level4_pgt[511] = early_level4_pgt[511];
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 73afd11..ca08f0e 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -149,6 +149,10 @@ ENTRY(startup_32)
 	call load_ucode_bsp
 #endif
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+	call x86_acpi_override_find
+#endif
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond __brk_base.  The variable
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 16a703f..b067663 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -424,6 +424,34 @@ static void __init reserve_initrd(void)
 }
 #endif /* CONFIG_BLK_DEV_INITRD */
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void __init x86_acpi_override_find(void)
+{
+	unsigned long ramdisk_image, ramdisk_size;
+	unsigned char *p = NULL;
+
+#ifdef CONFIG_X86_32
+	struct boot_params *boot_params_p;
+
+	/*
+	 * 32bit is from head_32.S, and it is 32bit flat mode.
+	 * So need to use phys address to access global variables.
+	 */
+	boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
+	ramdisk_image = get_ramdisk_image(boot_params_p);
+	ramdisk_size  = get_ramdisk_size(boot_params_p);
+	p = (unsigned char *)ramdisk_image;
+	acpi_initrd_override_find(p, ramdisk_size, true);
+#else
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
+	if (ramdisk_image)
+		p = __va(ramdisk_image);
+	acpi_initrd_override_find(p, ramdisk_size, false);
+#endif
+}
+#endif
+
 static void __init parse_setup_data(void)
 {
 	struct setup_data *data;
@@ -1092,8 +1120,6 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_initrd();
 
-	acpi_initrd_override_find((void *)initrd_start,
-					initrd_end - initrd_start, false);
 	acpi_initrd_override_copy();
 
 	reserve_crashkernel();
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 09/20] x86, mm, numa: Move two functions calling on successful path later
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (7 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 08/20] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 10/20] x86, mm, numa: Call numa_meminfo_cover_memory() checking early Yinghai Lu
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu

We need to have numa info ready before init_mem_mapping, so we
can call init_mem_mapping per nodes also can trim node mem range to
big alignment.

Current numa parsing need to allocate some buffer and need to be
called after init_mem_mapping.

So try to split parsing numa info to two stages, and early one will be
before init_mem_mapping, and it should not need allocate buffers.

At last we will have early_initmem_init() and initmem_init().

This one is first one for separation.

setup_node_data() and numa_init_array() are only called for successful
path, so we can move calling to x86_numa_init(). That will also make
numa_init() small and readable.

-v2: remove online_node_map clear in numa_init(), as it is only
     set in setup_node_data() at last in successful path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/numa.c |   69 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 39 insertions(+), 30 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 72fe01e..d545638 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -480,7 +480,7 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
 	unsigned long uninitialized_var(pfn_align);
-	int i, nid;
+	int i;
 
 	/* Account for nodes with cpus and no memory */
 	node_possible_map = numa_nodes_parsed;
@@ -509,24 +509,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
-	/* Finally register nodes. */
-	for_each_node_mask(nid, node_possible_map) {
-		u64 start = PFN_PHYS(max_pfn);
-		u64 end = 0;
-
-		for (i = 0; i < mi->nr_blks; i++) {
-			if (nid != mi->blk[i].nid)
-				continue;
-			start = min(mi->blk[i].start, start);
-			end = max(mi->blk[i].end, end);
-		}
-
-		if (start < end)
-			setup_node_data(nid, start, end);
-	}
-
-	/* Dump memblock with node info and return. */
-	memblock_dump_all();
 	return 0;
 }
 
@@ -562,7 +544,6 @@ static int __init numa_init(int (*init_func)(void))
 
 	nodes_clear(numa_nodes_parsed);
 	nodes_clear(node_possible_map);
-	nodes_clear(node_online_map);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
 	WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
 	numa_reset_distance();
@@ -580,15 +561,6 @@ static int __init numa_init(int (*init_func)(void))
 	if (ret < 0)
 		return ret;
 
-	for (i = 0; i < nr_cpu_ids; i++) {
-		int nid = early_cpu_to_node(i);
-
-		if (nid == NUMA_NO_NODE)
-			continue;
-		if (!node_online(nid))
-			numa_clear_node(i);
-	}
-	numa_init_array();
 	return 0;
 }
 
@@ -621,7 +593,7 @@ static int __init dummy_numa_init(void)
  * last fallback is dummy single node config encomapssing whole memory and
  * never fails.
  */
-void __init x86_numa_init(void)
+static void __init early_x86_numa_init(void)
 {
 	if (!numa_off) {
 #ifdef CONFIG_X86_NUMAQ
@@ -641,6 +613,43 @@ void __init x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
+void __init x86_numa_init(void)
+{
+	int i, nid;
+	struct numa_meminfo *mi = &numa_meminfo;
+
+	early_x86_numa_init();
+
+	/* Finally register nodes. */
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = PFN_PHYS(max_pfn);
+		u64 end = 0;
+
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
+				continue;
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
+		}
+
+		if (start < end)
+			setup_node_data(nid, start, end); /* online is set */
+	}
+
+	/* Dump memblock with node info */
+	memblock_dump_all();
+
+	for (i = 0; i < nr_cpu_ids; i++) {
+		int nid = early_cpu_to_node(i);
+
+		if (nid == NUMA_NO_NODE)
+			continue;
+		if (!node_online(nid))
+			numa_clear_node(i);
+	}
+	numa_init_array();
+}
+
 static __init int find_near_online_node(int node)
 {
 	int n, val;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 10/20] x86, mm, numa: Call numa_meminfo_cover_memory() checking early
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (8 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 09/20] x86, mm, numa: Move two functions calling on successful path later Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 11/20] x86, mm, numa: Move node_map_pfn alignment() to x86 Yinghai Lu
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu

For the separation, we need to set memblock nid later, as it
could change memblock array, and possible doube memblock.memory
array that will need to allocate buffer.

We do not need to use nid in memblock to find out absent pages.
So we can move that numa_meminfo_cover_memory() early.

Also could change __absent_pages_in_range() to static and use
absent_pages_in_range() directly.

Later we can only set memblock nid one time on successful path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/numa.c |    7 ++++---
 include/linux/mm.h |    2 --
 mm/page_alloc.c    |    2 +-
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index d545638..b7173f6 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -460,7 +460,7 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 		u64 s = mi->blk[i].start >> PAGE_SHIFT;
 		u64 e = mi->blk[i].end >> PAGE_SHIFT;
 		numaram += e - s;
-		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+		numaram -= absent_pages_in_range(s, e);
 		if ((s64)numaram < 0)
 			numaram = 0;
 	}
@@ -488,6 +488,9 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	if (WARN_ON(nodes_empty(node_possible_map)))
 		return -EINVAL;
 
+	if (!numa_meminfo_cover_memory(mi))
+		return -EINVAL;
+
 	for (i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *mb = &mi->blk[i];
 		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
@@ -506,8 +509,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 		return -EINVAL;
 	}
 #endif
-	if (!numa_meminfo_cover_memory(mi))
-		return -EINVAL;
 
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7acc9dc..2ae2050 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1324,8 +1324,6 @@ extern void free_initmem(void);
  */
 extern void free_area_init_nodes(unsigned long *max_zone_pfn);
 unsigned long node_map_pfn_alignment(void);
-unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
-						unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
 						unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..580d919 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4356,7 +4356,7 @@ static unsigned long __meminit zone_spanned_pages_in_node(int nid,
  * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
  * then all holes in the requested range will be accounted for.
  */
-unsigned long __meminit __absent_pages_in_range(int nid,
+static unsigned long __meminit __absent_pages_in_range(int nid,
 				unsigned long range_start_pfn,
 				unsigned long range_end_pfn)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 11/20] x86, mm, numa: Move node_map_pfn alignment() to x86
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (9 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 10/20] x86, mm, numa: Call numa_meminfo_cover_memory() checking early Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 12/20] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment Yinghai Lu
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu

Move node_map_pfn_alignment() to arch/x86/mm as no other user for it.

Will update it to use numa_meminfo instead of memblock.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/numa.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h |    1 -
 mm/page_alloc.c    |   50 --------------------------------------------------
 3 files changed, 50 insertions(+), 51 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index b7173f6..24155b2 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -477,6 +477,56 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 	return true;
 }
 
+/**
+ * node_map_pfn_alignment - determine the maximum internode alignment
+ *
+ * This function should be called after node map is populated and sorted.
+ * It calculates the maximum power of two alignment which can distinguish
+ * all the nodes.
+ *
+ * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
+ * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
+ * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
+ * shifted, 1GiB is enough and this function will indicate so.
+ *
+ * This is used to test whether pfn -> nid mapping of the chosen memory
+ * model has fine enough granularity to avoid incorrect mapping for the
+ * populated node map.
+ *
+ * Returns the determined alignment in pfn's.  0 if there is no alignment
+ * requirement (single node).
+ */
+unsigned long __init node_map_pfn_alignment(void)
+{
+	unsigned long accl_mask = 0, last_end = 0;
+	unsigned long start, end, mask;
+	int last_nid = -1;
+	int i, nid;
+
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) {
+		if (!start || last_nid < 0 || last_nid == nid) {
+			last_nid = nid;
+			last_end = end;
+			continue;
+		}
+
+		/*
+		 * Start with a mask granular enough to pin-point to the
+		 * start pfn and tick off bits one-by-one until it becomes
+		 * too coarse to separate the current node from the last.
+		 */
+		mask = ~((1 << __ffs(start)) - 1);
+		while (mask && last_end <= (start & (mask << 1)))
+			mask <<= 1;
+
+		/* accumulate all internode masks */
+		accl_mask |= mask;
+	}
+
+	/* convert mask to number of pages */
+	return ~accl_mask + 1;
+}
+
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
 	unsigned long uninitialized_var(pfn_align);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2ae2050..1c79b10 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1323,7 +1323,6 @@ extern void free_initmem(void);
  * CONFIG_HAVE_MEMBLOCK_NODE_MAP.
  */
 extern void free_area_init_nodes(unsigned long *max_zone_pfn);
-unsigned long node_map_pfn_alignment(void);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
 						unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 580d919..f368db4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4725,56 +4725,6 @@ static inline void setup_nr_node_ids(void)
 }
 #endif
 
-/**
- * node_map_pfn_alignment - determine the maximum internode alignment
- *
- * This function should be called after node map is populated and sorted.
- * It calculates the maximum power of two alignment which can distinguish
- * all the nodes.
- *
- * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
- * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
- * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
- * shifted, 1GiB is enough and this function will indicate so.
- *
- * This is used to test whether pfn -> nid mapping of the chosen memory
- * model has fine enough granularity to avoid incorrect mapping for the
- * populated node map.
- *
- * Returns the determined alignment in pfn's.  0 if there is no alignment
- * requirement (single node).
- */
-unsigned long __init node_map_pfn_alignment(void)
-{
-	unsigned long accl_mask = 0, last_end = 0;
-	unsigned long start, end, mask;
-	int last_nid = -1;
-	int i, nid;
-
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) {
-		if (!start || last_nid < 0 || last_nid == nid) {
-			last_nid = nid;
-			last_end = end;
-			continue;
-		}
-
-		/*
-		 * Start with a mask granular enough to pin-point to the
-		 * start pfn and tick off bits one-by-one until it becomes
-		 * too coarse to separate the current node from the last.
-		 */
-		mask = ~((1 << __ffs(start)) - 1);
-		while (mask && last_end <= (start & (mask << 1)))
-			mask <<= 1;
-
-		/* accumulate all internode masks */
-		accl_mask |= mask;
-	}
-
-	/* convert mask to number of pages */
-	return ~accl_mask + 1;
-}
-
 /* Find the lowest pfn for a node */
 static unsigned long __init find_min_pfn_for_node(int nid)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 12/20] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (10 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 11/20] x86, mm, numa: Move node_map_pfn alignment() to x86 Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 13/20] x86, mm, numa: Set memblock nid later Yinghai Lu
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu

We could use numa_meminfo directly instead of memblock nid.

So we could move down set memblock nid and only do it one time
for successful path.

-v2: according to tj, separate moving to another patch.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/numa.c |   30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 24155b2..fcaeba9 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -496,14 +496,18 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
  * Returns the determined alignment in pfn's.  0 if there is no alignment
  * requirement (single node).
  */
-unsigned long __init node_map_pfn_alignment(void)
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 {
 	unsigned long accl_mask = 0, last_end = 0;
 	unsigned long start, end, mask;
 	int last_nid = -1;
 	int i, nid;
 
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) {
+	for (i = 0; i < mi->nr_blks; i++) {
+		start = mi->blk[i].start >> PAGE_SHIFT;
+		end = mi->blk[i].end >> PAGE_SHIFT;
+		nid = mi->blk[i].nid;
 		if (!start || last_nid < 0 || last_nid == nid) {
 			last_nid = nid;
 			last_end = end;
@@ -526,10 +530,16 @@ unsigned long __init node_map_pfn_alignment(void)
 	/* convert mask to number of pages */
 	return ~accl_mask + 1;
 }
+#else
+static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
+{
+	return 0;
+}
+#endif
 
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
-	unsigned long uninitialized_var(pfn_align);
+	unsigned long pfn_align;
 	int i;
 
 	/* Account for nodes with cpus and no memory */
@@ -541,24 +551,22 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
-	for (i = 0; i < mi->nr_blks; i++) {
-		struct numa_memblk *mb = &mi->blk[i];
-		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-	}
-
 	/*
 	 * If sections array is gonna be used for pfn -> nid mapping, check
 	 * whether its granularity is fine enough.
 	 */
-#ifdef NODE_NOT_IN_PAGE_FLAGS
-	pfn_align = node_map_pfn_alignment();
+	pfn_align = node_map_pfn_alignment(mi);
 	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
 		printk(KERN_WARNING "Node alignment %LuMB < min %LuMB, rejecting NUMA config\n",
 		       PFN_PHYS(pfn_align) >> 20,
 		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
 		return -EINVAL;
 	}
-#endif
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
+	}
 
 	return 0;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 13/20] x86, mm, numa: Set memblock nid later
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (11 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 12/20] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 14/20] x86, mm, numa: Move node_possible_map setting later Yinghai Lu
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu

For the separation, we need to set memblock nid later, as it
could change memblock array, and possible doube memblock.memory
array that will need to allocate buffer.

Only set memblock nid one time for successful path.

Also rename numa_register_memblks to numa_check_memblks()
after move out code for setting memblock nid.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/numa.c |   16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index fcaeba9..e2ddcbd 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -537,10 +537,9 @@ static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 }
 #endif
 
-static int __init numa_register_memblks(struct numa_meminfo *mi)
+static int __init numa_check_memblks(struct numa_meminfo *mi)
 {
 	unsigned long pfn_align;
-	int i;
 
 	/* Account for nodes with cpus and no memory */
 	node_possible_map = numa_nodes_parsed;
@@ -563,11 +562,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 		return -EINVAL;
 	}
 
-	for (i = 0; i < mi->nr_blks; i++) {
-		struct numa_memblk *mb = &mi->blk[i];
-		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-	}
-
 	return 0;
 }
 
@@ -604,7 +598,6 @@ static int __init numa_init(int (*init_func)(void))
 	nodes_clear(numa_nodes_parsed);
 	nodes_clear(node_possible_map);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
-	WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
 	numa_reset_distance();
 
 	ret = init_func();
@@ -616,7 +609,7 @@ static int __init numa_init(int (*init_func)(void))
 
 	numa_emulation(&numa_meminfo, numa_distance_cnt);
 
-	ret = numa_register_memblks(&numa_meminfo);
+	ret = numa_check_memblks(&numa_meminfo);
 	if (ret < 0)
 		return ret;
 
@@ -679,6 +672,11 @@ void __init x86_numa_init(void)
 
 	early_x86_numa_init();
 
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
+	}
+
 	/* Finally register nodes. */
 	for_each_node_mask(nid, node_possible_map) {
 		u64 start = PFN_PHYS(max_pfn);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 14/20] x86, mm, numa: Move node_possible_map setting later
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (12 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 13/20] x86, mm, numa: Set memblock nid later Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 15/20] x86, mm, numa: Move emulation handling down Yinghai Lu
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu

Move node_possible_map handling out of numa_check_memblks to avoid side
changing in numa_check_memblks().

Only set once for successful path instead of resetting in numa_init()
every time.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/numa.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index e2ddcbd..1d5fa08 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -539,12 +539,13 @@ static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 
 static int __init numa_check_memblks(struct numa_meminfo *mi)
 {
+	nodemask_t nodes_parsed;
 	unsigned long pfn_align;
 
 	/* Account for nodes with cpus and no memory */
-	node_possible_map = numa_nodes_parsed;
-	numa_nodemask_from_meminfo(&node_possible_map, mi);
-	if (WARN_ON(nodes_empty(node_possible_map)))
+	nodes_parsed = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&nodes_parsed, mi);
+	if (WARN_ON(nodes_empty(nodes_parsed)))
 		return -EINVAL;
 
 	if (!numa_meminfo_cover_memory(mi))
@@ -596,7 +597,6 @@ static int __init numa_init(int (*init_func)(void))
 		set_apicid_to_node(i, NUMA_NO_NODE);
 
 	nodes_clear(numa_nodes_parsed);
-	nodes_clear(node_possible_map);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
 	numa_reset_distance();
 
@@ -672,6 +672,9 @@ void __init x86_numa_init(void)
 
 	early_x86_numa_init();
 
+	node_possible_map = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&node_possible_map, mi);
+
 	for (i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *mb = &mi->blk[i];
 		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 15/20] x86, mm, numa: Move emulation handling down.
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (13 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 14/20] x86, mm, numa: Move node_possible_map setting later Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10  6:44   ` Yinghai Lu
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, David Rientjes

It needs to allocate buffer for new numa_meminfo and distance matrix,
so move it down.

Also we change the behavoir:
before this patch, if user input wrong data in command line, it
will fall back to next numa probing or disabling numa.
after this patch, if user input wrong data in command line, it will
stay with numa info from probing before, like acpi srat or amd_numa.

We need to call numa_check_memblks to reject wrong user inputs early,
so keep the original numa_meminfo not changed.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
---
 arch/x86/mm/numa.c           |    6 +++---
 arch/x86/mm/numa_emulation.c |    2 +-
 arch/x86/mm/numa_internal.h  |    2 ++
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 1d5fa08..90fd123 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -537,7 +537,7 @@ static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 }
 #endif
 
-static int __init numa_check_memblks(struct numa_meminfo *mi)
+int __init numa_check_memblks(struct numa_meminfo *mi)
 {
 	nodemask_t nodes_parsed;
 	unsigned long pfn_align;
@@ -607,8 +607,6 @@ static int __init numa_init(int (*init_func)(void))
 	if (ret < 0)
 		return ret;
 
-	numa_emulation(&numa_meminfo, numa_distance_cnt);
-
 	ret = numa_check_memblks(&numa_meminfo);
 	if (ret < 0)
 		return ret;
@@ -672,6 +670,8 @@ void __init x86_numa_init(void)
 
 	early_x86_numa_init();
 
+	numa_emulation(&numa_meminfo, numa_distance_cnt);
+
 	node_possible_map = numa_nodes_parsed;
 	numa_nodemask_from_meminfo(&node_possible_map, mi);
 
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index dbbbb47..5a0433d 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -348,7 +348,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 	if (ret < 0)
 		goto no_emu;
 
-	if (numa_cleanup_meminfo(&ei) < 0) {
+	if (numa_cleanup_meminfo(&ei) < 0 || numa_check_memblks(&ei) < 0) {
 		pr_warning("NUMA: Warning: constructed meminfo invalid, disabling emulation\n");
 		goto no_emu;
 	}
diff --git a/arch/x86/mm/numa_internal.h b/arch/x86/mm/numa_internal.h
index ad86ec9..bb2fbcc 100644
--- a/arch/x86/mm/numa_internal.h
+++ b/arch/x86/mm/numa_internal.h
@@ -21,6 +21,8 @@ void __init numa_reset_distance(void);
 
 void __init x86_numa_init(void);
 
+int __init numa_check_memblks(struct numa_meminfo *mi);
+
 #ifdef CONFIG_NUMA_EMU
 void __init numa_emulation(struct numa_meminfo *numa_meminfo,
 			   int numa_dist_cnt);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 16/20] x86, ACPI, numa, ia64: split SLIT handling out
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
@ 2013-03-10  6:44   ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 02/20] x86, microcode: Use common get_ramdisk_image() Yinghai Lu
                     ` (18 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Rafael J. Wysocki, linux-acpi,
	Tony Luck, Fenghua Yu, linux-ia64

We need to handle slit later, as it need to allocate buffer for distance
matrix. Also we do not need SLIT info before init_mem_mapping.

So move SLIT parsing later.

x86_acpi_numa_init become x86_acpi_numa_init_srat/x86_acpi_numa_init_slit.

It should not break ia64 by replacing acpi_numa_init with
acpi_numa_init_srat/acpi_numa_init_slit/acpi_num_arch_fixup.

-v2: Change name to acpi_numa_init_srat/acpi_numa_init_slit according tj.
     remove the reset_numa_distance() in numa_init(), as get we only set
     distance in slit handling.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
---
 arch/ia64/kernel/setup.c    |    4 +++-
 arch/x86/include/asm/acpi.h |    3 ++-
 arch/x86/mm/numa.c          |   14 ++++++++++++--
 arch/x86/mm/srat.c          |   11 +++++++----
 drivers/acpi/numa.c         |   13 +++++++------
 include/linux/acpi.h        |    3 ++-
 6 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 2029cc0..6a2efb5 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -558,7 +558,9 @@ setup_arch (char **cmdline_p)
 	acpi_table_init();
 	early_acpi_boot_init();
 # ifdef CONFIG_ACPI_NUMA
-	acpi_numa_init();
+	acpi_numa_init_srat();
+	acpi_numa_init_slit();
+	acpi_numa_arch_fixup();
 #  ifdef CONFIG_ACPI_HOTPLUG_CPU
 	prefill_possible_map();
 #  endif
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index b31bf97..651db0b 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -178,7 +178,8 @@ static inline void disable_acpi(void) { }
 
 #ifdef CONFIG_ACPI_NUMA
 extern int acpi_numa;
-extern int x86_acpi_numa_init(void);
+int x86_acpi_numa_init_srat(void);
+void x86_acpi_numa_init_slit(void);
 #endif /* CONFIG_ACPI_NUMA */
 
 #define acpi_unlazy_tlb(x)	leave_mm(x)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 90fd123..182e085 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -598,7 +598,6 @@ static int __init numa_init(int (*init_func)(void))
 
 	nodes_clear(numa_nodes_parsed);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
-	numa_reset_distance();
 
 	ret = init_func();
 	if (ret < 0)
@@ -636,6 +635,10 @@ static int __init dummy_numa_init(void)
 	return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+static bool srat_used __initdata;
+#endif
+
 /**
  * x86_numa_init - Initialize NUMA
  *
@@ -651,8 +654,10 @@ static void __init early_x86_numa_init(void)
 			return;
 #endif
 #ifdef CONFIG_ACPI_NUMA
-		if (!numa_init(x86_acpi_numa_init))
+		if (!numa_init(x86_acpi_numa_init_srat)) {
+			srat_used = true;
 			return;
+		}
 #endif
 #ifdef CONFIG_AMD_NUMA
 		if (!numa_init(amd_numa_init))
@@ -670,6 +675,11 @@ void __init x86_numa_init(void)
 
 	early_x86_numa_init();
 
+#ifdef CONFIG_ACPI_NUMA
+	if (srat_used)
+		x86_acpi_numa_init_slit();
+#endif
+
 	numa_emulation(&numa_meminfo, numa_distance_cnt);
 
 	node_possible_map = numa_nodes_parsed;
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index cdd0da9..443f9ef 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -185,14 +185,17 @@ out_err:
 	return -1;
 }
 
-void __init acpi_numa_arch_fixup(void) {}
-
-int __init x86_acpi_numa_init(void)
+int __init x86_acpi_numa_init_srat(void)
 {
 	int ret;
 
-	ret = acpi_numa_init();
+	ret = acpi_numa_init_srat();
 	if (ret < 0)
 		return ret;
 	return srat_disabled() ? -EINVAL : 0;
 }
+
+void __init x86_acpi_numa_init_slit(void)
+{
+	acpi_numa_init_slit();
+}
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 33e609f..6460db4 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -282,7 +282,7 @@ acpi_table_parse_srat(enum acpi_srat_type id,
 					    handler, max_entries);
 }
 
-int __init acpi_numa_init(void)
+int __init acpi_numa_init_srat(void)
 {
 	int cnt = 0;
 
@@ -303,11 +303,6 @@ int __init acpi_numa_init(void)
 					    NR_NODE_MEMBLKS);
 	}
 
-	/* SLIT: System Locality Information Table */
-	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
-
-	acpi_numa_arch_fixup();
-
 	if (cnt < 0)
 		return cnt;
 	else if (!parsed_numa_memblks)
@@ -315,6 +310,12 @@ int __init acpi_numa_init(void)
 	return 0;
 }
 
+void __init acpi_numa_init_slit(void)
+{
+	/* SLIT: System Locality Information Table */
+	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
+}
+
 int acpi_get_pxm(acpi_handle h)
 {
 	unsigned long long pxm;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4b943e6..4a78235 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -85,7 +85,8 @@ int early_acpi_boot_init(void);
 int acpi_boot_init (void);
 void acpi_boot_table_init (void);
 int acpi_mps_check (void);
-int acpi_numa_init (void);
+int acpi_numa_init_srat(void);
+void acpi_numa_init_slit(void);
 
 int acpi_table_init (void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 16/20] x86, ACPI, numa, ia64: split SLIT handling out
@ 2013-03-10  6:44   ` Yinghai Lu
  0 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Rafael J. Wysocki, linux-acpi,
	Tony Luck, Fenghua Yu, linux-ia64

We need to handle slit later, as it need to allocate buffer for distance
matrix. Also we do not need SLIT info before init_mem_mapping.

So move SLIT parsing later.

x86_acpi_numa_init become x86_acpi_numa_init_srat/x86_acpi_numa_init_slit.

It should not break ia64 by replacing acpi_numa_init with
acpi_numa_init_srat/acpi_numa_init_slit/acpi_num_arch_fixup.

-v2: Change name to acpi_numa_init_srat/acpi_numa_init_slit according tj.
     remove the reset_numa_distance() in numa_init(), as get we only set
     distance in slit handling.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
---
 arch/ia64/kernel/setup.c    |    4 +++-
 arch/x86/include/asm/acpi.h |    3 ++-
 arch/x86/mm/numa.c          |   14 ++++++++++++--
 arch/x86/mm/srat.c          |   11 +++++++----
 drivers/acpi/numa.c         |   13 +++++++------
 include/linux/acpi.h        |    3 ++-
 6 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 2029cc0..6a2efb5 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -558,7 +558,9 @@ setup_arch (char **cmdline_p)
 	acpi_table_init();
 	early_acpi_boot_init();
 # ifdef CONFIG_ACPI_NUMA
-	acpi_numa_init();
+	acpi_numa_init_srat();
+	acpi_numa_init_slit();
+	acpi_numa_arch_fixup();
 #  ifdef CONFIG_ACPI_HOTPLUG_CPU
 	prefill_possible_map();
 #  endif
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index b31bf97..651db0b 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -178,7 +178,8 @@ static inline void disable_acpi(void) { }
 
 #ifdef CONFIG_ACPI_NUMA
 extern int acpi_numa;
-extern int x86_acpi_numa_init(void);
+int x86_acpi_numa_init_srat(void);
+void x86_acpi_numa_init_slit(void);
 #endif /* CONFIG_ACPI_NUMA */
 
 #define acpi_unlazy_tlb(x)	leave_mm(x)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 90fd123..182e085 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -598,7 +598,6 @@ static int __init numa_init(int (*init_func)(void))
 
 	nodes_clear(numa_nodes_parsed);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
-	numa_reset_distance();
 
 	ret = init_func();
 	if (ret < 0)
@@ -636,6 +635,10 @@ static int __init dummy_numa_init(void)
 	return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+static bool srat_used __initdata;
+#endif
+
 /**
  * x86_numa_init - Initialize NUMA
  *
@@ -651,8 +654,10 @@ static void __init early_x86_numa_init(void)
 			return;
 #endif
 #ifdef CONFIG_ACPI_NUMA
-		if (!numa_init(x86_acpi_numa_init))
+		if (!numa_init(x86_acpi_numa_init_srat)) {
+			srat_used = true;
 			return;
+		}
 #endif
 #ifdef CONFIG_AMD_NUMA
 		if (!numa_init(amd_numa_init))
@@ -670,6 +675,11 @@ void __init x86_numa_init(void)
 
 	early_x86_numa_init();
 
+#ifdef CONFIG_ACPI_NUMA
+	if (srat_used)
+		x86_acpi_numa_init_slit();
+#endif
+
 	numa_emulation(&numa_meminfo, numa_distance_cnt);
 
 	node_possible_map = numa_nodes_parsed;
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index cdd0da9..443f9ef 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -185,14 +185,17 @@ out_err:
 	return -1;
 }
 
-void __init acpi_numa_arch_fixup(void) {}
-
-int __init x86_acpi_numa_init(void)
+int __init x86_acpi_numa_init_srat(void)
 {
 	int ret;
 
-	ret = acpi_numa_init();
+	ret = acpi_numa_init_srat();
 	if (ret < 0)
 		return ret;
 	return srat_disabled() ? -EINVAL : 0;
 }
+
+void __init x86_acpi_numa_init_slit(void)
+{
+	acpi_numa_init_slit();
+}
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 33e609f..6460db4 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -282,7 +282,7 @@ acpi_table_parse_srat(enum acpi_srat_type id,
 					    handler, max_entries);
 }
 
-int __init acpi_numa_init(void)
+int __init acpi_numa_init_srat(void)
 {
 	int cnt = 0;
 
@@ -303,11 +303,6 @@ int __init acpi_numa_init(void)
 					    NR_NODE_MEMBLKS);
 	}
 
-	/* SLIT: System Locality Information Table */
-	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
-
-	acpi_numa_arch_fixup();
-
 	if (cnt < 0)
 		return cnt;
 	else if (!parsed_numa_memblks)
@@ -315,6 +310,12 @@ int __init acpi_numa_init(void)
 	return 0;
 }
 
+void __init acpi_numa_init_slit(void)
+{
+	/* SLIT: System Locality Information Table */
+	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
+}
+
 int acpi_get_pxm(acpi_handle h)
 {
 	unsigned long long pxm;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4b943e6..4a78235 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -85,7 +85,8 @@ int early_acpi_boot_init(void);
 int acpi_boot_init (void);
 void acpi_boot_table_init (void);
 int acpi_mps_check (void);
-int acpi_numa_init (void);
+int acpi_numa_init_srat(void);
+void acpi_numa_init_slit(void);
 
 int acpi_table_init (void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 17/20] x86, mm, numa: Add early_initmem_init() stub
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (15 preceding siblings ...)
  2013-03-10  6:44   ` Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 18/20] x86, mm: Parse numa info early Yinghai Lu
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin

early_initmem_init() call early_x86_numa_init() to parse numa info early.

Later will call init_mem_mapping for nodes in it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/include/asm/page_types.h |    1 +
 arch/x86/kernel/setup.c           |    1 +
 arch/x86/mm/init.c                |    6 ++++++
 arch/x86/mm/numa.c                |    7 +++++--
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index b012b82..d04dd8c 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -55,6 +55,7 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
 extern unsigned long init_memory_mapping(unsigned long start,
 					 unsigned long end);
 
+void early_initmem_init(void);
 extern void initmem_init(void);
 
 #endif	/* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b067663..626bc9f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1135,6 +1135,7 @@ void __init setup_arch(char **cmdline_p)
 
 	early_acpi_boot_init();
 
+	early_initmem_init();
 	initmem_init();
 	memblock_find_dma_reserve();
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index abcc241..28b294f 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -450,6 +450,12 @@ void __init init_mem_mapping(void)
 	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
 }
 
+#ifndef CONFIG_NUMA
+void __init early_initmem_init(void)
+{
+}
+#endif
+
 /*
  * devmem_is_allowed() checks to see if /dev/mem access to a certain address
  * is valid. The argument is a physical page number.
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 182e085..c2d4653 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -668,13 +668,16 @@ static void __init early_x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
+void __init early_initmem_init(void)
+{
+	early_x86_numa_init();
+}
+
 void __init x86_numa_init(void)
 {
 	int i, nid;
 	struct numa_meminfo *mi = &numa_meminfo;
 
-	early_x86_numa_init();
-
 #ifdef CONFIG_ACPI_NUMA
 	if (srat_used)
 		x86_acpi_numa_init_slit();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 18/20] x86, mm: Parse numa info early
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (16 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 17/20] x86, mm, numa: Add early_initmem_init() stub Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 19/20] x86, mm: Make init_mem_mapping be able to be called several times Yinghai Lu
  2013-03-10  6:44 ` [PATCH v2 20/20] x86, mm, numa: Put pagetable on local node ram for 64bit Yinghai Lu
  19 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin

Parsing numa info has been separated to two functions now.

early_initmem_info() only parse info in numa_meminfo and
nodes_parsed. still keep numaq, acpi_numa, amd_numa, dummy
fall back sequence working.

SLIT and numa emulation handling are still left in initmem_init().

Call early_initmem_init before init_mem_mapping() to prepare
to use numa_info with it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/setup.c |   24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 626bc9f..86e1ec0 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1098,13 +1098,21 @@ void __init setup_arch(char **cmdline_p)
 	trim_platform_memory_ranges();
 	trim_low_memory_range();
 
+	/*
+	 * Parse the ACPI tables for possible boot-time SMP configuration.
+	 */
+	acpi_initrd_override_copy();
+	acpi_boot_table_init();
+	early_acpi_boot_init();
+	early_initmem_init();
 	init_mem_mapping();
-
+	memblock.current_limit = get_max_mapped();
 	early_trap_pf_init();
 
+	reserve_initrd();
+
 	setup_real_mode();
 
-	memblock.current_limit = get_max_mapped();
 	dma_contiguous_reserve(0);
 
 	/*
@@ -1118,24 +1126,12 @@ void __init setup_arch(char **cmdline_p)
 	/* Allocate bigger log buffer */
 	setup_log_buf(1);
 
-	reserve_initrd();
-
-	acpi_initrd_override_copy();
-
 	reserve_crashkernel();
 
 	vsmp_init();
 
 	io_delay_init();
 
-	/*
-	 * Parse the ACPI tables for possible boot-time SMP configuration.
-	 */
-	acpi_boot_table_init();
-
-	early_acpi_boot_init();
-
-	early_initmem_init();
 	initmem_init();
 	memblock_find_dma_reserve();
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 19/20] x86, mm: Make init_mem_mapping be able to be called several times
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (17 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 18/20] x86, mm: Parse numa info early Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-11 13:16   ` Konrad Rzeszutek Wilk
  2013-03-10  6:44 ` [PATCH v2 20/20] x86, mm, numa: Put pagetable on local node ram for 64bit Yinghai Lu
  19 siblings, 1 reply; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin,
	Konrad Rzeszutek Wilk

Prepare to put page table on local nodes.

Move calling of init_mem_mapping to early_initmem_init.

Rework alloc_low_pages to alloc page table in following order:
	BRK, local node, low range

Still only load_cr3 one time, otherwise we would break xen 64bit again.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/pgtable.h |    2 +-
 arch/x86/kernel/setup.c        |    1 -
 arch/x86/mm/init.c             |   88 ++++++++++++++++++++++++----------------
 arch/x86/mm/numa.c             |   24 +++++++++++
 4 files changed, 79 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1e67223..868687c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -621,7 +621,7 @@ static inline int pgd_none(pgd_t pgd)
 #ifndef __ASSEMBLY__
 
 extern int direct_gbpages;
-void init_mem_mapping(void);
+void init_mem_mapping(unsigned long begin, unsigned long end);
 void early_alloc_pgt_buf(void);
 
 /* local pte updates need not use xchg for locking */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 86e1ec0..1cdc1a7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1105,7 +1105,6 @@ void __init setup_arch(char **cmdline_p)
 	acpi_boot_table_init();
 	early_acpi_boot_init();
 	early_initmem_init();
-	init_mem_mapping();
 	memblock.current_limit = get_max_mapped();
 	early_trap_pf_init();
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 28b294f..8d0007a 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -24,7 +24,10 @@ static unsigned long __initdata pgt_buf_start;
 static unsigned long __initdata pgt_buf_end;
 static unsigned long __initdata pgt_buf_top;
 
-static unsigned long min_pfn_mapped;
+static unsigned long low_min_pfn_mapped;
+static unsigned long low_max_pfn_mapped;
+static unsigned long local_min_pfn_mapped;
+static unsigned long local_max_pfn_mapped;
 
 static bool __initdata can_use_brk_pgt = true;
 
@@ -52,10 +55,17 @@ __ref void *alloc_low_pages(unsigned int num)
 
 	if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
 		unsigned long ret;
-		if (min_pfn_mapped >= max_pfn_mapped)
-			panic("alloc_low_page: ran out of memory");
-		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
-					max_pfn_mapped << PAGE_SHIFT,
+		if (local_min_pfn_mapped >= local_max_pfn_mapped) {
+			if (low_min_pfn_mapped >= low_max_pfn_mapped)
+				panic("alloc_low_page: ran out of memory");
+			ret = memblock_find_in_range(
+					low_min_pfn_mapped << PAGE_SHIFT,
+					low_max_pfn_mapped << PAGE_SHIFT,
+					PAGE_SIZE * num , PAGE_SIZE);
+		} else
+			ret = memblock_find_in_range(
+					local_min_pfn_mapped << PAGE_SHIFT,
+					local_max_pfn_mapped << PAGE_SHIFT,
 					PAGE_SIZE * num , PAGE_SIZE);
 		if (!ret)
 			panic("alloc_low_page: can not alloc memory");
@@ -387,60 +397,75 @@ static unsigned long __init init_range_memory_mapping(
 
 /* (PUD_SHIFT-PMD_SHIFT)/2 */
 #define STEP_SIZE_SHIFT 5
-void __init init_mem_mapping(void)
+void __init init_mem_mapping(unsigned long begin, unsigned long end)
 {
-	unsigned long end, real_end, start, last_start;
+	unsigned long real_end, start, last_start;
 	unsigned long step_size;
 	unsigned long addr;
 	unsigned long mapped_ram_size = 0;
 	unsigned long new_mapped_ram_size;
+	bool is_low = false;
+
+	if (!begin) {
+		probe_page_size_mask();
+		/* the ISA range is always mapped regardless of memory holes */
+		init_memory_mapping(0, ISA_END_ADDRESS);
+		begin = ISA_END_ADDRESS;
+		is_low = true;
+	}
 
-	probe_page_size_mask();
-
-#ifdef CONFIG_X86_64
-	end = max_pfn << PAGE_SHIFT;
-#else
-	end = max_low_pfn << PAGE_SHIFT;
-#endif
-
-	/* the ISA range is always mapped regardless of memory holes */
-	init_memory_mapping(0, ISA_END_ADDRESS);
+	if (begin >= end)
+		return;
 
 	/* xen has big range in reserved near end of ram, skip it at first.*/
-	addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE, PMD_SIZE);
+	addr = memblock_find_in_range(begin, end, PMD_SIZE, PMD_SIZE);
 	real_end = addr + PMD_SIZE;
 
 	/* step_size need to be small so pgt_buf from BRK could cover it */
 	step_size = PMD_SIZE;
-	max_pfn_mapped = 0; /* will get exact value next */
-	min_pfn_mapped = real_end >> PAGE_SHIFT;
+	local_max_pfn_mapped = begin >> PAGE_SHIFT;
+	local_min_pfn_mapped = real_end >> PAGE_SHIFT;
 	last_start = start = real_end;
-	while (last_start > ISA_END_ADDRESS) {
+	while (last_start > begin) {
 		if (last_start > step_size) {
 			start = round_down(last_start - 1, step_size);
-			if (start < ISA_END_ADDRESS)
-				start = ISA_END_ADDRESS;
+			if (start < begin)
+				start = begin;
 		} else
-			start = ISA_END_ADDRESS;
+			start = begin;
 		new_mapped_ram_size = init_range_memory_mapping(start,
 							last_start);
+		if ((last_start >> PAGE_SHIFT) > local_max_pfn_mapped)
+			local_max_pfn_mapped = last_start >> PAGE_SHIFT;
+		local_min_pfn_mapped = start >> PAGE_SHIFT;
 		last_start = start;
-		min_pfn_mapped = last_start >> PAGE_SHIFT;
 		/* only increase step_size after big range get mapped */
 		if (new_mapped_ram_size > mapped_ram_size)
 			step_size <<= STEP_SIZE_SHIFT;
 		mapped_ram_size += new_mapped_ram_size;
 	}
 
-	if (real_end < end)
+	if (real_end < end) {
 		init_range_memory_mapping(real_end, end);
+		if ((end >> PAGE_SHIFT) > local_max_pfn_mapped)
+			local_max_pfn_mapped = end >> PAGE_SHIFT;
+	}
 
+	if (is_low) {
+		low_min_pfn_mapped = local_min_pfn_mapped;
+		low_max_pfn_mapped = local_max_pfn_mapped;
+	}
+}
+
+#ifndef CONFIG_NUMA
+void __init early_initmem_init(void)
+{
 #ifdef CONFIG_X86_64
-	if (max_pfn > max_low_pfn) {
-		/* can we preseve max_low_pfn ?*/
+	init_mem_mapping(0, max_pfn << PAGE_SHIFT);
+	if (max_pfn > max_low_pfn)
 		max_low_pfn = max_pfn;
-	}
 #else
+	init_mem_mapping(0, max_low_pfn << PAGE_SHIFT);
 	early_ioremap_page_table_range_init();
 #endif
 
@@ -449,11 +474,6 @@ void __init init_mem_mapping(void)
 
 	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
 }
-
-#ifndef CONFIG_NUMA
-void __init early_initmem_init(void)
-{
-}
 #endif
 
 /*
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index c2d4653..d3eb0c9 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -17,8 +17,10 @@
 #include <asm/dma.h>
 #include <asm/acpi.h>
 #include <asm/amd_nb.h>
+#include <asm/tlbflush.h>
 
 #include "numa_internal.h"
+#include "mm_internal.h"
 
 int __initdata numa_off;
 nodemask_t numa_nodes_parsed __initdata;
@@ -668,9 +670,31 @@ static void __init early_x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
+#ifdef CONFIG_X86_64
+static void __init early_x86_numa_init_mapping(void)
+{
+	init_mem_mapping(0, max_pfn << PAGE_SHIFT);
+	if (max_pfn > max_low_pfn)
+		max_low_pfn = max_pfn;
+}
+#else
+static void __init early_x86_numa_init_mapping(void)
+{
+	init_mem_mapping(0, max_low_pfn << PAGE_SHIFT);
+	early_ioremap_page_table_range_init();
+}
+#endif
+
 void __init early_initmem_init(void)
 {
 	early_x86_numa_init();
+
+	early_x86_numa_init_mapping();
+
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+
+	early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);
 }
 
 void __init x86_numa_init(void)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 20/20] x86, mm, numa: Put pagetable on local node ram for 64bit
  2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (18 preceding siblings ...)
  2013-03-10  6:44 ` [PATCH v2 19/20] x86, mm: Make init_mem_mapping be able to be called several times Yinghai Lu
@ 2013-03-10  6:44 ` Yinghai Lu
  2013-03-11  5:49   ` Tang Chen
  19 siblings, 1 reply; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10  6:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin,
	Konrad Rzeszutek Wilk

If node with ram is hotplugable, local node mem for page table and vmemmap
should be on that node ram.

This patch is some kind of refreshment of
| commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327
| Date:   Mon Dec 27 16:48:17 2010 -0800
|
|    x86-64, numa: Put pgtable to local node memory
That was reverted before.

We have reason to reintroduce it to make memory hotplug work.

Calling init_mem_mapping in early_initmem_init for every node.
alloc_low_pages will alloc page table in following order:
	BRK, local node, low range
So page table will be on low range or local nodes.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/mm/numa.c |   34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index d3eb0c9..11acdf6 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -673,7 +673,39 @@ static void __init early_x86_numa_init(void)
 #ifdef CONFIG_X86_64
 static void __init early_x86_numa_init_mapping(void)
 {
-	init_mem_mapping(0, max_pfn << PAGE_SHIFT);
+	unsigned long last_start = 0, last_end = 0;
+	struct numa_meminfo *mi = &numa_meminfo;
+	unsigned long start, end;
+	int last_nid = -1;
+	int i, nid;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		nid   = mi->blk[i].nid;
+		start = mi->blk[i].start;
+		end   = mi->blk[i].end;
+
+		if (last_nid == nid) {
+			last_end = end;
+			continue;
+		}
+
+		/* other nid now */
+		if (last_nid >= 0) {
+			printk(KERN_DEBUG "Node %d: [mem %#016lx-%#016lx]\n",
+					last_nid, last_start, last_end - 1);
+			init_mem_mapping(last_start, last_end);
+		}
+
+		/* for next nid */
+		last_nid   = nid;
+		last_start = start;
+		last_end   = end;
+	}
+	/* last one */
+	printk(KERN_DEBUG "Node %d: [mem %#016lx-%#016lx]\n",
+			last_nid, last_start, last_end - 1);
+	init_mem_mapping(last_start, last_end);
+
 	if (max_pfn > max_low_pfn)
 		max_low_pfn = max_pfn;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/20] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c
  2013-03-10  6:44 ` [PATCH v2 08/20] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c Yinghai Lu
@ 2013-03-10 10:25   ` Pekka Enberg
  2013-03-10 16:47     ` Yinghai Lu
  2013-04-04 20:25     ` H. Peter Anvin
  0 siblings, 2 replies; 45+ messages in thread
From: Pekka Enberg @ 2013-03-10 10:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen, linux-kernel, Jacob Shin,
	Rafael J. Wysocki, linux-acpi

On Sun, Mar 10, 2013 at 8:44 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> +void __init x86_acpi_override_find(void)
> +{
> +       unsigned long ramdisk_image, ramdisk_size;
> +       unsigned char *p = NULL;
> +
> +#ifdef CONFIG_X86_32
> +       struct boot_params *boot_params_p;
> +
> +       /*
> +        * 32bit is from head_32.S, and it is 32bit flat mode.
> +        * So need to use phys address to access global variables.
> +        */
> +       boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
> +       ramdisk_image = get_ramdisk_image(boot_params_p);
> +       ramdisk_size  = get_ramdisk_size(boot_params_p);
> +       p = (unsigned char *)ramdisk_image;
> +       acpi_initrd_override_find(p, ramdisk_size, true);
> +#else
> +       ramdisk_image = get_ramdisk_image(&boot_params);
> +       ramdisk_size  = get_ramdisk_size(&boot_params);
> +       if (ramdisk_image)
> +               p = __va(ramdisk_image);
> +       acpi_initrd_override_find(p, ramdisk_size, false);
> +#endif
> +}
> +#endif

What is preventing us from making the 64-bit variant also work in flat
mode to make the code consistent and not hiding the differences under
the rug? What am I missing here?

                        Pekka

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/20] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c
  2013-03-10 10:25   ` Pekka Enberg
@ 2013-03-10 16:47     ` Yinghai Lu
  2013-03-10 17:42       ` H. Peter Anvin
  2013-04-04 20:25     ` H. Peter Anvin
  1 sibling, 1 reply; 45+ messages in thread
From: Yinghai Lu @ 2013-03-10 16:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen, linux-kernel, Jacob Shin,
	Rafael J. Wysocki, linux-acpi

On Sun, Mar 10, 2013 at 3:25 AM, Pekka Enberg <penberg@kernel.org> wrote:
>
> What is preventing us from making the 64-bit variant also work in flat
> mode to make the code consistent and not hiding the differences under
> the rug? What am I missing here?

Boot loader could start kernel from 64bit directly from
from arch/x86/boot/compressed/head_64.s::startup_64.

initrd can be loaded by 64bit bootloader above 4G.

So we even switch back to 32bit flat mode, we still can not access those initrd
directly.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/20] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c
  2013-03-10 16:47     ` Yinghai Lu
@ 2013-03-10 17:42       ` H. Peter Anvin
  0 siblings, 0 replies; 45+ messages in thread
From: H. Peter Anvin @ 2013-03-10 17:42 UTC (permalink / raw)
  To: Yinghai Lu, Pekka Enberg
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Jacob Shin,
	Rafael J. Wysocki, linux-acpi

There is no 64-bit flat mode.  We use a #PF handler to emulate one by creating page tables on the fly.

Yinghai Lu <yinghai@kernel.org> wrote:

>On Sun, Mar 10, 2013 at 3:25 AM, Pekka Enberg <penberg@kernel.org>
>wrote:
>>
>> What is preventing us from making the 64-bit variant also work in
>flat
>> mode to make the code consistent and not hiding the differences under
>> the rug? What am I missing here?
>
>Boot loader could start kernel from 64bit directly from
>from arch/x86/boot/compressed/head_64.s::startup_64.
>
>initrd can be loaded by 64bit bootloader above 4G.
>
>So we even switch back to 32bit flat mode, we still can not access
>those initrd
>directly.
>
>Thanks
>
>Yinghai

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 20/20] x86, mm, numa: Put pagetable on local node ram for 64bit
  2013-03-10  6:44 ` [PATCH v2 20/20] x86, mm, numa: Put pagetable on local node ram for 64bit Yinghai Lu
@ 2013-03-11  5:49   ` Tang Chen
  2013-03-11  6:29     ` Yinghai Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Tang Chen @ 2013-03-11  5:49 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, linux-kernel, Pekka Enberg,
	Jacob Shin, Konrad Rzeszutek Wilk

Hi Yinghai,

Please see below. :)

On 03/10/2013 02:44 PM, Yinghai Lu wrote:
> If node with ram is hotplugable, local node mem for page table and vmemmap
> should be on that node ram.
>
> This patch is some kind of refreshment of
> | commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327
> | Date:   Mon Dec 27 16:48:17 2010 -0800
> |
> |    x86-64, numa: Put pgtable to local node memory
> That was reverted before.
>
> We have reason to reintroduce it to make memory hotplug work.
>
> Calling init_mem_mapping in early_initmem_init for every node.
> alloc_low_pages will alloc page table in following order:
> 	BRK, local node, low range
> So page table will be on low range or local nodes.
>
> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
> Cc: Pekka Enberg<penberg@kernel.org>
> Cc: Jacob Shin<jacob.shin@amd.com>
> Cc: Konrad Rzeszutek Wilk<konrad.wilk@oracle.com>
> ---
>   arch/x86/mm/numa.c |   34 +++++++++++++++++++++++++++++++++-
>   1 file changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index d3eb0c9..11acdf6 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -673,7 +673,39 @@ static void __init early_x86_numa_init(void)
>   #ifdef CONFIG_X86_64
>   static void __init early_x86_numa_init_mapping(void)
>   {
> -	init_mem_mapping(0, max_pfn<<  PAGE_SHIFT);
> +	unsigned long last_start = 0, last_end = 0;
> +	struct numa_meminfo *mi =&numa_meminfo;
> +	unsigned long start, end;
> +	int last_nid = -1;
> +	int i, nid;
> +
> +	for (i = 0; i<  mi->nr_blks; i++) {
> +		nid   = mi->blk[i].nid;
> +		start = mi->blk[i].start;
> +		end   = mi->blk[i].end;
> +
> +		if (last_nid == nid) {
> +			last_end = end;
> +			continue;
> +		}
> +
> +		/* other nid now */
> +		if (last_nid>= 0) {
> +			printk(KERN_DEBUG "Node %d: [mem %#016lx-%#016lx]\n",
> +					last_nid, last_start, last_end - 1);
> +			init_mem_mapping(last_start, last_end);

IIUC, we call init_mem_mapping() for each node ranges. In the first time,
         local_max_pfn_mapped = begin >> PAGE_SHIFT;
         local_min_pfn_mapped = real_end >> PAGE_SHIFT;
which means
	local_min_pfn_mapped >= local_max_pfn_mapped
right ?

So, the first page allocated by alloc_low_pages() is not on local node, 
right ?
Furthermore, the first page of pagetable is not on local node, right ?

BTW, I'm reading your code, and doing necessary hot-add and hot-remove 
changes now.

Thanks. :)

> +		}
> +
> +		/* for next nid */
> +		last_nid   = nid;
> +		last_start = start;
> +		last_end   = end;
> +	}
> +	/* last one */
> +	printk(KERN_DEBUG "Node %d: [mem %#016lx-%#016lx]\n",
> +			last_nid, last_start, last_end - 1);
> +	init_mem_mapping(last_start, last_end);
> +
>   	if (max_pfn>  max_low_pfn)
>   		max_low_pfn = max_pfn;
>   }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 20/20] x86, mm, numa: Put pagetable on local node ram for 64bit
  2013-03-11  5:49   ` Tang Chen
@ 2013-03-11  6:29     ` Yinghai Lu
  0 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-11  6:29 UTC (permalink / raw)
  To: Tang Chen
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, linux-kernel, Pekka Enberg,
	Jacob Shin, Konrad Rzeszutek Wilk

On Sun, Mar 10, 2013 at 10:49 PM, Tang Chen <tangchen@cn.fujitsu.com> wrote:
> On 03/10/2013 02:44 PM, Yinghai Lu wrote:
>>
>> Calling init_mem_mapping in early_initmem_init for every node.
>> alloc_low_pages will alloc page table in following order:
>>         BRK, local node, low range
>> So page table will be on low range or local nodes.
...
> IIUC, we call init_mem_mapping() for each node ranges. In the first time,
>         local_max_pfn_mapped = begin >> PAGE_SHIFT;
>         local_min_pfn_mapped = real_end >> PAGE_SHIFT;
> which means
>         local_min_pfn_mapped >= local_max_pfn_mapped
> right ?
>
> So, the first page allocated by alloc_low_pages() is not on local node,
> right ?

It is from BRK with kernel code.

> Furthermore, the first page of pagetable is not on local node, right ?

It is in BRK for node with start = 0.

Other node, it is from low_range aka node with start = 0.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 19/20] x86, mm: Make init_mem_mapping be able to be called several times
  2013-03-10  6:44 ` [PATCH v2 19/20] x86, mm: Make init_mem_mapping be able to be called several times Yinghai Lu
@ 2013-03-11 13:16   ` Konrad Rzeszutek Wilk
  2013-03-11 20:28     ` Yinghai Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 13:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen, linux-kernel,
	Pekka Enberg, Jacob Shin

On Sat, Mar 09, 2013 at 10:44:46PM -0800, Yinghai Lu wrote:
> Prepare to put page table on local nodes.
> 
> Move calling of init_mem_mapping to early_initmem_init.
> 
> Rework alloc_low_pages to alloc page table in following order:
> 	BRK, local node, low range
> 
> Still only load_cr3 one time, otherwise we would break xen 64bit again.
> 

We could also fix that. Now that the regression storm has passed
and I am able to spend some time on it we could make it a bit more
resistant.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Jacob Shin <jacob.shin@amd.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/include/asm/pgtable.h |    2 +-
>  arch/x86/kernel/setup.c        |    1 -
>  arch/x86/mm/init.c             |   88 ++++++++++++++++++++++++----------------
>  arch/x86/mm/numa.c             |   24 +++++++++++
>  4 files changed, 79 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 1e67223..868687c 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -621,7 +621,7 @@ static inline int pgd_none(pgd_t pgd)
>  #ifndef __ASSEMBLY__
>  
>  extern int direct_gbpages;
> -void init_mem_mapping(void);
> +void init_mem_mapping(unsigned long begin, unsigned long end);
>  void early_alloc_pgt_buf(void);
>  
>  /* local pte updates need not use xchg for locking */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 86e1ec0..1cdc1a7 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1105,7 +1105,6 @@ void __init setup_arch(char **cmdline_p)
>  	acpi_boot_table_init();
>  	early_acpi_boot_init();
>  	early_initmem_init();
> -	init_mem_mapping();
>  	memblock.current_limit = get_max_mapped();
>  	early_trap_pf_init();
>  
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 28b294f..8d0007a 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -24,7 +24,10 @@ static unsigned long __initdata pgt_buf_start;
>  static unsigned long __initdata pgt_buf_end;
>  static unsigned long __initdata pgt_buf_top;
>  
> -static unsigned long min_pfn_mapped;
> +static unsigned long low_min_pfn_mapped;
> +static unsigned long low_max_pfn_mapped;
> +static unsigned long local_min_pfn_mapped;
> +static unsigned long local_max_pfn_mapped;
>  
>  static bool __initdata can_use_brk_pgt = true;
>  
> @@ -52,10 +55,17 @@ __ref void *alloc_low_pages(unsigned int num)
>  
>  	if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
>  		unsigned long ret;
> -		if (min_pfn_mapped >= max_pfn_mapped)
> -			panic("alloc_low_page: ran out of memory");
> -		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
> -					max_pfn_mapped << PAGE_SHIFT,
> +		if (local_min_pfn_mapped >= local_max_pfn_mapped) {
> +			if (low_min_pfn_mapped >= low_max_pfn_mapped)
> +				panic("alloc_low_page: ran out of memory");
> +			ret = memblock_find_in_range(
> +					low_min_pfn_mapped << PAGE_SHIFT,
> +					low_max_pfn_mapped << PAGE_SHIFT,
> +					PAGE_SIZE * num , PAGE_SIZE);
> +		} else
> +			ret = memblock_find_in_range(
> +					local_min_pfn_mapped << PAGE_SHIFT,
> +					local_max_pfn_mapped << PAGE_SHIFT,
>  					PAGE_SIZE * num , PAGE_SIZE);
>  		if (!ret)
>  			panic("alloc_low_page: can not alloc memory");
> @@ -387,60 +397,75 @@ static unsigned long __init init_range_memory_mapping(
>  
>  /* (PUD_SHIFT-PMD_SHIFT)/2 */
>  #define STEP_SIZE_SHIFT 5
> -void __init init_mem_mapping(void)
> +void __init init_mem_mapping(unsigned long begin, unsigned long end)
>  {
> -	unsigned long end, real_end, start, last_start;
> +	unsigned long real_end, start, last_start;
>  	unsigned long step_size;
>  	unsigned long addr;
>  	unsigned long mapped_ram_size = 0;
>  	unsigned long new_mapped_ram_size;
> +	bool is_low = false;
> +
> +	if (!begin) {
> +		probe_page_size_mask();
> +		/* the ISA range is always mapped regardless of memory holes */
> +		init_memory_mapping(0, ISA_END_ADDRESS);
> +		begin = ISA_END_ADDRESS;
> +		is_low = true;
> +	}
>  
> -	probe_page_size_mask();
> -
> -#ifdef CONFIG_X86_64
> -	end = max_pfn << PAGE_SHIFT;
> -#else
> -	end = max_low_pfn << PAGE_SHIFT;
> -#endif
> -
> -	/* the ISA range is always mapped regardless of memory holes */
> -	init_memory_mapping(0, ISA_END_ADDRESS);
> +	if (begin >= end)
> +		return;
>  
>  	/* xen has big range in reserved near end of ram, skip it at first.*/
> -	addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE, PMD_SIZE);
> +	addr = memblock_find_in_range(begin, end, PMD_SIZE, PMD_SIZE);
>  	real_end = addr + PMD_SIZE;
>  
>  	/* step_size need to be small so pgt_buf from BRK could cover it */
>  	step_size = PMD_SIZE;
> -	max_pfn_mapped = 0; /* will get exact value next */
> -	min_pfn_mapped = real_end >> PAGE_SHIFT;
> +	local_max_pfn_mapped = begin >> PAGE_SHIFT;
> +	local_min_pfn_mapped = real_end >> PAGE_SHIFT;
>  	last_start = start = real_end;
> -	while (last_start > ISA_END_ADDRESS) {
> +	while (last_start > begin) {
>  		if (last_start > step_size) {
>  			start = round_down(last_start - 1, step_size);
> -			if (start < ISA_END_ADDRESS)
> -				start = ISA_END_ADDRESS;
> +			if (start < begin)
> +				start = begin;
>  		} else
> -			start = ISA_END_ADDRESS;
> +			start = begin;
>  		new_mapped_ram_size = init_range_memory_mapping(start,
>  							last_start);
> +		if ((last_start >> PAGE_SHIFT) > local_max_pfn_mapped)
> +			local_max_pfn_mapped = last_start >> PAGE_SHIFT;
> +		local_min_pfn_mapped = start >> PAGE_SHIFT;
>  		last_start = start;
> -		min_pfn_mapped = last_start >> PAGE_SHIFT;
>  		/* only increase step_size after big range get mapped */
>  		if (new_mapped_ram_size > mapped_ram_size)
>  			step_size <<= STEP_SIZE_SHIFT;
>  		mapped_ram_size += new_mapped_ram_size;
>  	}
>  
> -	if (real_end < end)
> +	if (real_end < end) {
>  		init_range_memory_mapping(real_end, end);
> +		if ((end >> PAGE_SHIFT) > local_max_pfn_mapped)
> +			local_max_pfn_mapped = end >> PAGE_SHIFT;
> +	}
>  
> +	if (is_low) {
> +		low_min_pfn_mapped = local_min_pfn_mapped;
> +		low_max_pfn_mapped = local_max_pfn_mapped;
> +	}
> +}
> +
> +#ifndef CONFIG_NUMA
> +void __init early_initmem_init(void)
> +{
>  #ifdef CONFIG_X86_64
> -	if (max_pfn > max_low_pfn) {
> -		/* can we preseve max_low_pfn ?*/
> +	init_mem_mapping(0, max_pfn << PAGE_SHIFT);
> +	if (max_pfn > max_low_pfn)
>  		max_low_pfn = max_pfn;
> -	}
>  #else
> +	init_mem_mapping(0, max_low_pfn << PAGE_SHIFT);
>  	early_ioremap_page_table_range_init();
>  #endif
>  
> @@ -449,11 +474,6 @@ void __init init_mem_mapping(void)
>  
>  	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
>  }
> -
> -#ifndef CONFIG_NUMA
> -void __init early_initmem_init(void)
> -{
> -}
>  #endif
>  
>  /*
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index c2d4653..d3eb0c9 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -17,8 +17,10 @@
>  #include <asm/dma.h>
>  #include <asm/acpi.h>
>  #include <asm/amd_nb.h>
> +#include <asm/tlbflush.h>
>  
>  #include "numa_internal.h"
> +#include "mm_internal.h"
>  
>  int __initdata numa_off;
>  nodemask_t numa_nodes_parsed __initdata;
> @@ -668,9 +670,31 @@ static void __init early_x86_numa_init(void)
>  	numa_init(dummy_numa_init);
>  }
>  
> +#ifdef CONFIG_X86_64
> +static void __init early_x86_numa_init_mapping(void)
> +{
> +	init_mem_mapping(0, max_pfn << PAGE_SHIFT);
> +	if (max_pfn > max_low_pfn)
> +		max_low_pfn = max_pfn;
> +}
> +#else
> +static void __init early_x86_numa_init_mapping(void)
> +{
> +	init_mem_mapping(0, max_low_pfn << PAGE_SHIFT);
> +	early_ioremap_page_table_range_init();
> +}
> +#endif
> +
>  void __init early_initmem_init(void)
>  {
>  	early_x86_numa_init();
> +
> +	early_x86_numa_init_mapping();
> +
> +	load_cr3(swapper_pg_dir);
> +	__flush_tlb_all();
> +
> +	early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);
>  }
>  
>  void __init x86_numa_init(void)
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 19/20] x86, mm: Make init_mem_mapping be able to be called several times
  2013-03-11 13:16   ` Konrad Rzeszutek Wilk
@ 2013-03-11 20:28     ` Yinghai Lu
  0 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-03-11 20:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen, linux-kernel,
	Pekka Enberg, Jacob Shin

On Mon, Mar 11, 2013 at 6:16 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Sat, Mar 09, 2013 at 10:44:46PM -0800, Yinghai Lu wrote:
>> Prepare to put page table on local nodes.
>>
>> Move calling of init_mem_mapping to early_initmem_init.
>>
>> Rework alloc_low_pages to alloc page table in following order:
>>       BRK, local node, low range
>>
>> Still only load_cr3 one time, otherwise we would break xen 64bit again.
>>
>
> We could also fix that. Now that the regression storm has passed
> and I am able to spend some time on it we could make it a bit more
> resistant.

Never mind, We should only need to call load_cr3 one time.

as init_memory_mapping itself flush tlb everytime on 64bit.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 03/20] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-10  6:44   ` Yinghai Lu
  (?)
@ 2013-04-04 17:36   ` Tejun Heo
  2013-04-04 18:20     ` Yinghai Lu
  -1 siblings, 1 reply; 45+ messages in thread
From: Tejun Heo @ 2013-04-04 17:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, David Airlie, Jacob Shin, linux-acpi, dri-devel

Hello,

On Sat, Mar 09, 2013 at 10:44:30PM -0800, Yinghai Lu wrote:
> Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
> be used anymore.
> 
> User should use arch_pfn_mapped or just 1UL<<(32-PAGE_SHIFT) instead.
> 
> Only user is ACPI_INITRD_TABLE_OVERRIDE, and it should not use that,
> as later accessing is using early_ioremap(). Change to try to 4G below
> and then 4G above.
...
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 586e7e9..c08cdb6 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -624,9 +624,13 @@ void __init acpi_initrd_override(void *data, size_t size)
>  	if (table_nr == 0)
>  		return;
>  
> -	acpi_tables_addr =
> -		memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
> -				       all_tables_size, PAGE_SIZE);
> +	/* under 4G at first, then above 4G */
> +	acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
> +					all_tables_size, PAGE_SIZE);
> +	if (!acpi_tables_addr)
> +		acpi_tables_addr = memblock_find_in_range(0,
> +					~(phys_addr_t)0,
> +					all_tables_size, PAGE_SIZE);

So, it's changing the allocation from <=4G to <=4G first and then >4G.
The only explanation given is "as later accessing is using
early_ioremap()", but I can't see why that can be a reason for that.
early_ioremap() doesn't care whether the given physaddr is under 4G or
not, it unconditionally maps it into fixmap, so whether the allocated
address is below or above 4G doesn't make any difference.

Changing the allowed range of the allocation should be a separate
patch.  It has some chance of its own breakage and the change itself
isn't really related to this one.

Please try to elaborate the reasoning behind "why", so that readers of
the description don't have to deduce (oh well, guess) your intentions
behind the changes.  As much as it would help the readers, it'd also
help you even more as you would have had to explicitly write something
like "the table is accessed with early_ioremap() so the address
doesn't need to be restricted under 4G; however, to avoid unnecessary
remappings, first try <= 4G and then > 4G."  Then, you would be
compelled to check whether the statement you explicitly wrote is true,
which isn't in this case and you would also realize that the change
isn't trivial and doesn't really belong with this patch.  By not doing
the due diligence, you're offloading what you should have done to
others, which isn't very nice.

I think the descriptions are better in this posting than the last time
but it's still lacking, so, please putfff more effort into describing
the changes and reasoning behind them.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 02/20] x86, microcode: Use common get_ramdisk_image()
  2013-03-10  6:44 ` [PATCH v2 02/20] x86, microcode: Use common get_ramdisk_image() Yinghai Lu
@ 2013-04-04 17:48   ` Tejun Heo
  2013-04-04 17:59     ` Yinghai Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Tejun Heo @ 2013-04-04 17:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Fenghua Yu

On Sat, Mar 09, 2013 at 10:44:29PM -0800, Yinghai Lu wrote:
> Use common get_ramdisk_image() to get ramdisk start phys address.
> 
> We need this to get correct ramdisk adress for 64bit bzImage that
> initrd can be loaded above 4G by kexec-tools.

Is this a bug fix?  Can it actually happen?

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Fenghua Yu <fenghua.yu@intel.com>

For 01 and 02

 Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 04/20] x86, ACPI: Increase override tables number limit
  2013-03-10  6:44 ` [PATCH v2 04/20] x86, ACPI: Increase override tables number limit Yinghai Lu
@ 2013-04-04 17:50   ` Tejun Heo
  2013-04-04 18:03     ` Yinghai Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Tejun Heo @ 2013-04-04 17:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	linux-acpi

On Sat, Mar 09, 2013 at 10:44:31PM -0800, Yinghai Lu wrote:
> Current acpi tables in initrd is limited to 10, that is too small.
> 64 should be good enough as we have 35 sigs and could have several
> SSDT.
> 
> Two problems in current code prevent us from increasing limit:
> 1. that cpio file info array is put in stack, as every element is 32
>    bytes, could run out of stack if we have that array size to 64.
>    We can move it out from stack, and make it as global and put it in
>    __initdata section.
> 2. early_ioremap only can remap 256k one time. Current code is mapping
>    10 tables one time. If we increase that limit, whole size could be
>    more than 256k, early_ioremap will fail with that.
>    We can map table one by one during copying, instead of mapping
>    all them one time.
> 
> -v2: According to tj, split it out to separated patch, also
>      rename array name to acpi_initrd_files.
> 
> Signed-off-by: Yinghai <yinghai@kernel.org>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: linux-acpi@vger.kernel.org

Acked-by: Tejun Heo <tj@kernel.org>

> @@ -648,14 +647,14 @@ void __init acpi_initrd_override(void *data, size_t size)
>  	memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
>  	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>  
> -	p = early_ioremap(acpi_tables_addr, all_tables_size);
> -

It'd be nice to have a brief comment here explaining why we're mapping
each table separately.

>  	for (no = 0; no < table_nr; no++) {
> -		memcpy(p + total_offset, early_initrd_files[no].data,
> -		       early_initrd_files[no].size);
> -		total_offset += early_initrd_files[no].size;
> +		phys_addr_t size = acpi_initrd_files[no].size;
> +
> +		p = early_ioremap(acpi_tables_addr + total_offset, size);
> +		memcpy(p, acpi_initrd_files[no].data, size);
> +		early_iounmap(p, size);
> +		total_offset += size;
>  	}
> -	early_iounmap(p, all_tables_size);

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 02/20] x86, microcode: Use common get_ramdisk_image()
  2013-04-04 17:48   ` Tejun Heo
@ 2013-04-04 17:59     ` Yinghai Lu
  0 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-04-04 17:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, Linux Kernel Mailing List,
	Fenghua Yu

On Thu, Apr 4, 2013 at 10:48 AM, Tejun Heo <tj@kernel.org> wrote:
> On Sat, Mar 09, 2013 at 10:44:29PM -0800, Yinghai Lu wrote:
>> Use common get_ramdisk_image() to get ramdisk start phys address.
>>
>> We need this to get correct ramdisk adress for 64bit bzImage that
>> initrd can be loaded above 4G by kexec-tools.
>
> Is this a bug fix?  Can it actually happen?

Yes, it could happen.
When second kernel have early microcode updating support, and it would search
wrong wrong place for ramdisk.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 04/20] x86, ACPI: Increase override tables number limit
  2013-04-04 17:50   ` Tejun Heo
@ 2013-04-04 18:03     ` Yinghai Lu
  0 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-04-04 18:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, Linux Kernel Mailing List,
	Rafael J. Wysocki, ACPI Devel Maling List

On Thu, Apr 4, 2013 at 10:50 AM, Tejun Heo <tj@kernel.org> wrote:
> On Sat, Mar 09, 2013 at 10:44:31PM -0800, Yinghai Lu wrote:

>> @@ -648,14 +647,14 @@ void __init acpi_initrd_override(void *data, size_t size)
>>       memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
>>       arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>>
>> -     p = early_ioremap(acpi_tables_addr, all_tables_size);
>> -
>
> It'd be nice to have a brief comment here explaining why we're mapping
> each table separately.

ok, will copy lines from changelog to comment.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/20] x86, ACPI: Split acpi_initrd_override to find/copy two functions
  2013-03-10  6:44 ` [PATCH v2 05/20] x86, ACPI: Split acpi_initrd_override to find/copy two functions Yinghai Lu
@ 2013-04-04 18:07   ` Tejun Heo
  2013-04-04 19:29     ` Yinghai Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Tejun Heo @ 2013-04-04 18:07 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Rafael J. Wysocki, linux-acpi

On Sat, Mar 09, 2013 at 10:44:32PM -0800, Yinghai Lu wrote:
> To parse srat early, we need to move acpi table probing early.
> acpi_initrd_table_override is before acpi table probing. So we need to
> move it early too.
> 
> Current code acpi_initrd_table_override is after init_mem_mapping and
> relocate_initrd(), so it can scan initrd and copy acpi tables with kernel
> virtual address of initrd.
> Copying need to be after memblock is ready, because it need to allocate
> buffer for new acpi tables.
> 
> So we have to split that function to find and copy two functions.
> Find should be as early as possible. Copy should be after memblock is ready.
> 
> Finding could be done in head_32.S and head64.c, just like microcode
> early scanning. In head_32.S, it is 32bit flat mode, we don't
> need to set page table to access it. In head64.c, #PF set page table
> could help us access initrd with kernel low mapping address.
> 
> Copying could be done just after memblock is ready and before probing
> acpi tables, and we need to early_ioremap to access source and target
> range, as init_mem_mapping is not called yet.
> 
> Also move down two functions declaration to avoid #ifdef in setup.c
> 
> ACPI_INITRD_TABLE_OVERRIDE depends one ACPI and BLK_DEV_INITRD.
> So could move declaration out from #ifdef CONFIG_ACPI protection.

Heh, I couldn't really follow the above.  How about something like the
following.

 While a dummy version of acpi_initrd_override() was defined when
 !CONFIG_ACPI_INITRD_TABLE_OVERRIDE, the prototype and dummy version
 were conditionalized inside CONFIG_ACPI.  This forced setup_arch() to
 have its own #ifdefs around acpi_initrd_override() as otherwise build
 would fail when !CONFIG_ACPI.  Move the prototypes and dummy
 implementations of the newly split functions below CONFIG_ACPI block
 in acpi.h so that we can do away with #ifdefs in its user.

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 03/20] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-04-04 17:36   ` Tejun Heo
@ 2013-04-04 18:20     ` Yinghai Lu
  0 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-04-04 18:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, Linux Kernel Mailing List,
	Rafael J. Wysocki, Daniel Vetter, David Airlie, Jacob Shin,
	ACPI Devel Maling List, DRI mailing list

On Thu, Apr 4, 2013 at 10:36 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Sat, Mar 09, 2013 at 10:44:30PM -0800, Yinghai Lu wrote:
>> Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
>> be used anymore.
>>
>> User should use arch_pfn_mapped or just 1UL<<(32-PAGE_SHIFT) instead.
>>
>> Only user is ACPI_INITRD_TABLE_OVERRIDE, and it should not use that,
>> as later accessing is using early_ioremap(). Change to try to 4G below
>> and then 4G above.
> ...
>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>> index 586e7e9..c08cdb6 100644
>> --- a/drivers/acpi/osl.c
>> +++ b/drivers/acpi/osl.c
>> @@ -624,9 +624,13 @@ void __init acpi_initrd_override(void *data, size_t size)
>>       if (table_nr == 0)
>>               return;
>>
>> -     acpi_tables_addr =
>> -             memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
>> -                                    all_tables_size, PAGE_SIZE);
>> +     /* under 4G at first, then above 4G */
>> +     acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
>> +                                     all_tables_size, PAGE_SIZE);
>> +     if (!acpi_tables_addr)
>> +             acpi_tables_addr = memblock_find_in_range(0,
>> +                                     ~(phys_addr_t)0,
>> +                                     all_tables_size, PAGE_SIZE);
>
> So, it's changing the allocation from <=4G to <=4G first and then >4G.
> The only explanation given is "as later accessing is using
> early_ioremap()", but I can't see why that can be a reason for that.
> early_ioremap() doesn't care whether the given physaddr is under 4G or
> not, it unconditionally maps it into fixmap, so whether the allocated
> address is below or above 4G doesn't make any difference.
>
> Changing the allowed range of the allocation should be a separate
> patch.  It has some chance of its own breakage and the change itself
> isn't really related to this one.

Ok, will separate that  "try above 4G" to another patch.

>
> Please try to elaborate the reasoning behind "why", so that readers of
> the description don't have to deduce (oh well, guess) your intentions
> behind the changes.  As much as it would help the readers, it'd also
> help you even more as you would have had to explicitly write something
> like "the table is accessed with early_ioremap() so the address
> doesn't need to be restricted under 4G; however, to avoid unnecessary
> remappings, first try <= 4G and then > 4G."  Then, you would be
> compelled to check whether the statement you explicitly wrote is true,
> which isn't in this case and you would also realize that the change
> isn't trivial and doesn't really belong with this patch.  By not doing
> the due diligence, you're offloading what you should have done to
> others, which isn't very nice.
>
> I think the descriptions are better in this posting than the last time
> but it's still lacking, so, please putfff more effort into describing
> the changes and reasoning behind them.

ok.

Thanks a lot.

Yinghai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/20] x86, ACPI: Store override acpi tables phys addr in cpio files info array
  2013-03-10  6:44 ` [PATCH v2 06/20] x86, ACPI: Store override acpi tables phys addr in cpio files info array Yinghai Lu
@ 2013-04-04 18:27   ` Tejun Heo
  2013-04-04 18:30     ` Tejun Heo
  2013-04-04 20:03     ` Yinghai Lu
  0 siblings, 2 replies; 45+ messages in thread
From: Tejun Heo @ 2013-04-04 18:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	linux-acpi

On Sat, Mar 09, 2013 at 10:44:33PM -0800, Yinghai Lu wrote:
> In 32bit we will find table with phys address during 32bit flat mode
> in head_32.S, because at that time we don't need set page table to
> access initrd.
> 
> For copying we could use early_ioremap() with phys directly before mem mapping
> is set.
> 
> To keep 32bit and 64bit consistent, use phys_addr for all.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: linux-acpi@vger.kernel.org
> ---
>  drivers/acpi/osl.c |   14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index d66ae0e..54bcc37 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -615,7 +615,7 @@ void __init acpi_initrd_override_find(void *data, size_t size)
>  			table->signature, cpio_path, file.name, table->length);
>  
>  		all_tables_size += table->length;
> -		acpi_initrd_files[table_nr].data = file.data;
> +		acpi_initrd_files[table_nr].data = (void *)__pa(file.data);
>  		acpi_initrd_files[table_nr].size = file.size;
>  		table_nr++;
>  	}
> @@ -624,7 +624,7 @@ void __init acpi_initrd_override_find(void *data, size_t size)
>  void __init acpi_initrd_override_copy(void)
>  {
>  	int no, total_offset = 0;
> -	char *p;
> +	char *p, *q;
>  
>  	if (!all_tables_size)
>  		return;
> @@ -654,12 +654,20 @@ void __init acpi_initrd_override_copy(void)
>  	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>  
>  	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
> +		/*
> +		 * have to use unsigned long, otherwise 32bit spit warning
> +		 * and it is ok to unsigned long, as bootloader would not
> +		 * load initrd above 4G for 32bit kernel.
> +		 */
> +		unsigned long addr = (unsigned long)acpi_initrd_files[no].data;

I can't say I like this.  It's stuffing phys_addr_t into void *.  It
might work okay but the code is a bit misleading / confusing.  "void
*" shouldn't contain a physical address.  Maybe the alternatives are
uglier, I don't know.  If you can think of a reasonable way to not do
this, it would be great.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/20] x86, ACPI: Store override acpi tables phys addr in cpio files info array
  2013-04-04 18:27   ` Tejun Heo
@ 2013-04-04 18:30     ` Tejun Heo
  2013-04-04 19:40       ` Yinghai Lu
  2013-04-04 20:03     ` Yinghai Lu
  1 sibling, 1 reply; 45+ messages in thread
From: Tejun Heo @ 2013-04-04 18:30 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	linux-acpi

On Thu, Apr 04, 2013 at 11:27:42AM -0700, Tejun Heo wrote:
> > +		/*
> > +		 * have to use unsigned long, otherwise 32bit spit warning
> > +		 * and it is ok to unsigned long, as bootloader would not
> > +		 * load initrd above 4G for 32bit kernel.
> > +		 */
> > +		unsigned long addr = (unsigned long)acpi_initrd_files[no].data;
> 
> I can't say I like this.  It's stuffing phys_addr_t into void *.  It
> might work okay but the code is a bit misleading / confusing.  "void
> *" shouldn't contain a physical address.  Maybe the alternatives are
> uglier, I don't know.  If you can think of a reasonable way to not do
> this, it would be great.

Also the comment contradicts with what you wrote in the next patch.

  Boot loader could load initrd above max_low_pfn.

Hmmm?

-- 
tejun

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 07/20] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode
  2013-03-10  6:44 ` [PATCH v2 07/20] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode Yinghai Lu
@ 2013-04-04 18:35   ` Tejun Heo
  2013-04-04 20:22     ` Yinghai Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Tejun Heo @ 2013-04-04 18:35 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Rafael J. Wysocki, linux-acpi

Hello,

On Sat, Mar 09, 2013 at 10:44:34PM -0800, Yinghai Lu wrote:
> For finding with 32bit, it would be easy to access initrd in 32bit
> flat mode, as we don't need to set page table.
> 
> That is from head_32.S, and microcode updating already use this trick.
> 
> Need to change acpi_initrd_override_find to use phys to access global
> variables.
> 
> Pass is_phys in the function, as we can not use address to decide if it
> is phys or virtual address on 32 bit. Boot loader could load initrd above
> max_low_pfn.
> 
> Don't call printk as it uses global variables, so delay print later
> during copying.
> 
> Change table_sigs to use stack instead, otherwise it is too messy to change
> string array to phys and still keep offset calculating correct.
> That size is about 36x4 bytes, and it is small to settle in stack.
> 
> Also remove "continue" in MARCO to make code more readable.

It'd be nice if the error message can be stored somewhere and then
printed out after the system is in proper address mode if that isn't
too complex to achieve.  If it gets too messy, no need to bother.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/20] x86, ACPI: Split acpi_initrd_override to find/copy two functions
  2013-04-04 18:07   ` Tejun Heo
@ 2013-04-04 19:29     ` Yinghai Lu
  0 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-04-04 19:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, Linux Kernel Mailing List,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki,
	ACPI Devel Maling List

On Thu, Apr 4, 2013 at 11:07 AM, Tejun Heo <tj@kernel.org> wrote:
>>
>> Also move down two functions declaration to avoid #ifdef in setup.c
>>
>> ACPI_INITRD_TABLE_OVERRIDE depends one ACPI and BLK_DEV_INITRD.
>> So could move declaration out from #ifdef CONFIG_ACPI protection.
>
> Heh, I couldn't really follow the above.  How about something like the
> following.
>
>  While a dummy version of acpi_initrd_override() was defined when
>  !CONFIG_ACPI_INITRD_TABLE_OVERRIDE, the prototype and dummy version
>  were conditionalized inside CONFIG_ACPI.  This forced setup_arch() to
>  have its own #ifdefs around acpi_initrd_override() as otherwise build
>  would fail when !CONFIG_ACPI.  Move the prototypes and dummy
>  implementations of the newly split functions below CONFIG_ACPI block
>  in acpi.h so that we can do away with #ifdefs in its user.

update changelog with your changes.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/20] x86, ACPI: Store override acpi tables phys addr in cpio files info array
  2013-04-04 18:30     ` Tejun Heo
@ 2013-04-04 19:40       ` Yinghai Lu
  0 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-04-04 19:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, Linux Kernel Mailing List,
	Rafael J. Wysocki, ACPI Devel Maling List

On Thu, Apr 4, 2013 at 11:30 AM, Tejun Heo <tj@kernel.org> wrote:
> Also the comment contradicts with what you wrote in the next patch.
>
>   Boot loader could load initrd above max_low_pfn.

It does not contradict:
this patch: bootloader would not load initrd above 4G for 32bit kernel

max_low_pfn is below 4G.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/20] x86, ACPI: Store override acpi tables phys addr in cpio files info array
  2013-04-04 18:27   ` Tejun Heo
  2013-04-04 18:30     ` Tejun Heo
@ 2013-04-04 20:03     ` Yinghai Lu
  1 sibling, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-04-04 20:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, Linux Kernel Mailing List,
	Rafael J. Wysocki, ACPI Devel Maling List

[-- Attachment #1: Type: text/plain, Size: 925 bytes --]

On Thu, Apr 4, 2013 at 11:27 AM, Tejun Heo <tj@kernel.org> wrote:
>>       for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
>> +             /*
>> +              * have to use unsigned long, otherwise 32bit spit warning
>> +              * and it is ok to unsigned long, as bootloader would not
>> +              * load initrd above 4G for 32bit kernel.
>> +              */
>> +             unsigned long addr = (unsigned long)acpi_initrd_files[no].data;
>
> I can't say I like this.  It's stuffing phys_addr_t into void *.  It
> might work okay but the code is a bit misleading / confusing.  "void
> *" shouldn't contain a physical address.  Maybe the alternatives are
> uglier, I don't know.  If you can think of a reasonable way to not do
> this, it would be great.

Please check if you are happy with attached.

-v2: introduce file_pos to save phys address instead of abusing cpio_data
        that tj is not happy with.

[-- Attachment #2: fix_acpi_override_2.patch --]
[-- Type: application/octet-stream, Size: 2390 bytes --]

Subject: [PATCH] x86, ACPI: Store override acpi tables phys addr in cpio files info array

In 32bit we will find table with phys address during 32bit flat mode
in head_32.S, because at that time we don't need set page table to
access initrd.

For copying we could use early_ioremap() with phys directly before mem mapping
is set.

To keep 32bit and 64bit consistent, use phys_addr for all.

-v2: introduce file_pos to save phys address instead of abusing cpio_data
	that tj is not happy with.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org

---
 drivers/acpi/osl.c |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/acpi/osl.c
===================================================================
--- linux-2.6.orig/drivers/acpi/osl.c
+++ linux-2.6/drivers/acpi/osl.c
@@ -570,7 +570,11 @@ static const char * const table_sigs[] =
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
 #define ACPI_OVERRIDE_TABLES 64
-static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
+struct file_pos {
+	phys_addr_t data;
+	phys_addr_t size;
+};
+static struct file_pos __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
 void __init acpi_initrd_override_find(void *data, size_t size)
 {
@@ -615,7 +619,7 @@ void __init acpi_initrd_override_find(vo
 			table->signature, cpio_path, file.name, table->length);
 
 		all_tables_size += table->length;
-		acpi_initrd_files[table_nr].data = file.data;
+		acpi_initrd_files[table_nr].data = __pa_nodebug(file.data);
 		acpi_initrd_files[table_nr].size = file.size;
 		table_nr++;
 	}
@@ -624,7 +628,7 @@ void __init acpi_initrd_override_find(vo
 void __init acpi_initrd_override_copy(void)
 {
 	int no, total_offset = 0;
-	char *p;
+	char *p, *q;
 
 	if (!all_tables_size)
 		return;
@@ -659,12 +663,15 @@ void __init acpi_initrd_override_copy(vo
 	 * one by one during copying.
 	 */
 	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
+		phys_addr_t addr = acpi_initrd_files[no].data;
 		phys_addr_t size = acpi_initrd_files[no].size;
 
 		if (!size)
 			break;
+		q = early_ioremap(addr, size);
 		p = early_ioremap(acpi_tables_addr + total_offset, size);
-		memcpy(p, acpi_initrd_files[no].data, size);
+		memcpy(p, q, size);
+		early_iounmap(q, size);
 		early_iounmap(p, size);
 		total_offset += size;
 	}

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 07/20] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode
  2013-04-04 18:35   ` Tejun Heo
@ 2013-04-04 20:22     ` Yinghai Lu
  0 siblings, 0 replies; 45+ messages in thread
From: Yinghai Lu @ 2013-04-04 20:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, Linux Kernel Mailing List,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki,
	ACPI Devel Maling List

On Thu, Apr 4, 2013 at 11:35 AM, Tejun Heo <tj@kernel.org> wrote:
>
> It'd be nice if the error message can be stored somewhere and then
> printed out after the system is in proper address mode if that isn't
> too complex to achieve.  If it gets too messy, no need to bother.

Maybe not necessary. As later during coping, another print out
is added there for successful one.

@@ -670,6 +700,9 @@ void __init acpi_initrd_override_copy(vo
                        break;
                q = early_ioremap(addr, size);
                p = early_ioremap(acpi_tables_addr + total_offset, size);
+               pr_info("%4.4s ACPI table found in initrd
[%#010llx-%#010llx]\n",
+                               ((struct acpi_table_header *)q)->signature,
+                               (u64)addr, (u64)(addr + size - 1));

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/20] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c
  2013-03-10 10:25   ` Pekka Enberg
  2013-03-10 16:47     ` Yinghai Lu
@ 2013-04-04 20:25     ` H. Peter Anvin
  1 sibling, 0 replies; 45+ messages in thread
From: H. Peter Anvin @ 2013-04-04 20:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen, linux-kernel, Jacob Shin,
	Rafael J. Wysocki, linux-acpi

On 03/10/2013 03:25 AM, Pekka Enberg wrote:
> 
> What is preventing us from making the 64-bit variant also work in flat
> mode to make the code consistent and not hiding the differences under
> the rug? What am I missing here?
> 

There is no such thing as "flat mode" in 64-bit mode.  We use a #PF
handler to emulate it, but we add the normal kernel offset when doing so.

In the 32-bit case the problem is that the kernel offset is not
available while in linear mode.  It *could* be created using segment
bases, but that would break Xen, I'm pretty sure, and possibly some
other too-clever environments.

	-hpa



^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2013-04-04 20:26 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-10  6:44 [PATCH v2 00/20] x86, ACPI, numa: Parse numa info early Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 01/20] x86: Change get_ramdisk_image() to global Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 02/20] x86, microcode: Use common get_ramdisk_image() Yinghai Lu
2013-04-04 17:48   ` Tejun Heo
2013-04-04 17:59     ` Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 03/20] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
2013-03-10  6:44   ` Yinghai Lu
2013-04-04 17:36   ` Tejun Heo
2013-04-04 18:20     ` Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 04/20] x86, ACPI: Increase override tables number limit Yinghai Lu
2013-04-04 17:50   ` Tejun Heo
2013-04-04 18:03     ` Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 05/20] x86, ACPI: Split acpi_initrd_override to find/copy two functions Yinghai Lu
2013-04-04 18:07   ` Tejun Heo
2013-04-04 19:29     ` Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 06/20] x86, ACPI: Store override acpi tables phys addr in cpio files info array Yinghai Lu
2013-04-04 18:27   ` Tejun Heo
2013-04-04 18:30     ` Tejun Heo
2013-04-04 19:40       ` Yinghai Lu
2013-04-04 20:03     ` Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 07/20] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode Yinghai Lu
2013-04-04 18:35   ` Tejun Heo
2013-04-04 20:22     ` Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 08/20] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c Yinghai Lu
2013-03-10 10:25   ` Pekka Enberg
2013-03-10 16:47     ` Yinghai Lu
2013-03-10 17:42       ` H. Peter Anvin
2013-04-04 20:25     ` H. Peter Anvin
2013-03-10  6:44 ` [PATCH v2 09/20] x86, mm, numa: Move two functions calling on successful path later Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 10/20] x86, mm, numa: Call numa_meminfo_cover_memory() checking early Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 11/20] x86, mm, numa: Move node_map_pfn alignment() to x86 Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 12/20] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 13/20] x86, mm, numa: Set memblock nid later Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 14/20] x86, mm, numa: Move node_possible_map setting later Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 15/20] x86, mm, numa: Move emulation handling down Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 16/20] x86, ACPI, numa, ia64: split SLIT handling out Yinghai Lu
2013-03-10  6:44   ` Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 17/20] x86, mm, numa: Add early_initmem_init() stub Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 18/20] x86, mm: Parse numa info early Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 19/20] x86, mm: Make init_mem_mapping be able to be called several times Yinghai Lu
2013-03-11 13:16   ` Konrad Rzeszutek Wilk
2013-03-11 20:28     ` Yinghai Lu
2013-03-10  6:44 ` [PATCH v2 20/20] x86, mm, numa: Put pagetable on local node ram for 64bit Yinghai Lu
2013-03-11  5:49   ` Tang Chen
2013-03-11  6:29     ` Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.