linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early
@ 2013-06-15  0:56 Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 01/22] x86: Change get_ramdisk_image() to global Yinghai Lu
                   ` (21 more replies)
  0 siblings, 22 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu

One commit that tried to parse SRAT early get reverted before v3.9-rc1.

| commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f
| Author: Tang Chen <tangchen@cn.fujitsu.com>
| Date:   Fri Feb 22 16:33:44 2013 -0800
|
|    acpi, memory-hotplug: parse SRAT before memblock is ready

It broke several things, like acpi override and fall back path etc.

This patchset is clean implementation that will parse numa info early.
1. keep the acpi table initrd override working by split finding with copying.
   finding is done at head_32.S and head64.c stage,
        in head_32.S, initrd is accessed in 32bit flat mode with phys addr.
        in head64.c, initrd is accessed via kernel low mapping address
        with help of #PF set page table.
   copying is done with early_ioremap just after memblock is setup.
2. keep fallback path working. numaq and ACPI and amd_nmua and dummy.
   seperate initmem_init to two stages.
   early_initmem_init will only extract numa info early into numa_meminfo.
   initmem_init will keep slit and emulation handling.
3. keep other old code flow untouched like relocate_initrd and initmem_init.
   early_initmem_init will take old init_mem_mapping position.
   it call early_x86_numa_init and init_mem_mapping for every nodes.
   For 64bit, we avoid having size limit on initrd, as relocate_initrd
   is still after init_mem_mapping for all memory.
4. last patch will try to put page table on local node, so that memory
   hotplug will be happy.

In short, early_initmem_init will parse numa info early and call
init_mem_mapping to set page table for every nodes's mem.

could be found at:
        git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-mm

and it is based on today's Linus tree.

-v2: Address tj's review and split patches to small ones.
-v3: Add some Acked-by from tj, also stop abusing cpio_data for acpi_files info
-v4: fix one typo found by Tang Chen.
     Also added tested-by from Thomas Renninger and Tony.
-v5: rebase to v3.10-rc3, and add tested-by from Tang Chang
     fix one warning for 32bit found by Fengguang.
     resend as Tang's rebase seems broken and fail the compiling test
       in Fengguang test bots.

Thanks

Yinghai

Yinghai Lu (22):
  x86: Change get_ramdisk_image() to global
  x86, microcode: Use common get_ramdisk_image()
  x86, ACPI, mm: Kill max_low_pfn_mapped
  x86, ACPI: Search buffer above 4G in second try for acpi override tables
  x86, ACPI: Increase override tables number limit
  x86, ACPI: Split acpi_initrd_override to find/copy two functions
  x86, ACPI: Store override acpi tables phys addr in cpio files info array
  x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode
  x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c
  x86, mm, numa: Move two functions calling on successful path later
  x86, mm, numa: Call numa_meminfo_cover_memory() checking early
  x86, mm, numa: Move node_map_pfn alignment() to x86
  x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment
  x86, mm, numa: Set memblock nid later
  x86, mm, numa: Move node_possible_map setting later
  x86, mm, numa: Move emulation handling down.
  x86, ACPI, numa, ia64: split SLIT handling out
  x86, mm, numa: Add early_initmem_init() stub
  x86, mm: Parse numa info early
  x86, mm: Add comments for step_size shift
  x86, mm: Make init_mem_mapping be able to be called several times
  x86, mm, numa: Put pagetable on local node ram for 64bit

 arch/ia64/kernel/setup.c                |   4 +-
 arch/x86/include/asm/acpi.h             |   3 +-
 arch/x86/include/asm/page_types.h       |   2 +-
 arch/x86/include/asm/pgtable.h          |   2 +-
 arch/x86/include/asm/setup.h            |   9 ++
 arch/x86/kernel/head64.c                |   2 +
 arch/x86/kernel/head_32.S               |   4 +
 arch/x86/kernel/microcode_intel_early.c |   8 +-
 arch/x86/kernel/setup.c                 |  86 +++++++-----
 arch/x86/mm/init.c                      | 122 ++++++++++------
 arch/x86/mm/numa.c                      | 240 +++++++++++++++++++++++++-------
 arch/x86/mm/numa_emulation.c            |   2 +-
 arch/x86/mm/numa_internal.h             |   2 +
 arch/x86/mm/srat.c                      |  11 +-
 drivers/acpi/numa.c                     |  13 +-
 drivers/acpi/osl.c                      | 139 ++++++++++++------
 include/linux/acpi.h                    |  20 +--
 include/linux/mm.h                      |   3 -
 mm/page_alloc.c                         |  52 +------
 19 files changed, 477 insertions(+), 247 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 01/22] x86: Change get_ramdisk_image() to global
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 02/22] x86, microcode: Use common get_ramdisk_image() Yinghai Lu
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu

Need to use get_ramdisk_image() with early microcode_updating in other file.
Change it to global.

Also make it to take boot_params pointer, as head_32.S need to access it via
phys address during 32bit flat mode.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |  3 +++
 arch/x86/kernel/setup.c      | 28 ++++++++++++++--------------
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..4f71d48 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,9 @@ void *extend_brk(size_t size, size_t align);
 	RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+u64 get_ramdisk_image(struct boot_params *bp);
+u64 get_ramdisk_size(struct boot_params *bp);
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 56f7fcf..66ab495 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -297,19 +297,19 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
-static u64 __init get_ramdisk_image(void)
+u64 __init get_ramdisk_image(struct boot_params *bp)
 {
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+	u64 ramdisk_image = bp->hdr.ramdisk_image;
 
-	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+	ramdisk_image |= (u64)bp->ext_ramdisk_image << 32;
 
 	return ramdisk_image;
 }
-static u64 __init get_ramdisk_size(void)
+u64 __init get_ramdisk_size(struct boot_params *bp)
 {
-	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_size = bp->hdr.ramdisk_size;
 
-	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+	ramdisk_size |= (u64)bp->ext_ramdisk_size << 32;
 
 	return ramdisk_size;
 }
@@ -318,8 +318,8 @@ static u64 __init get_ramdisk_size(void)
 static void __init relocate_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 area_size     = PAGE_ALIGN(ramdisk_size);
 	u64 ramdisk_here;
 	unsigned long slop, clen, mapaddr;
@@ -358,8 +358,8 @@ static void __init relocate_initrd(void)
 		ramdisk_size  -= clen;
 	}
 
-	ramdisk_image = get_ramdisk_image();
-	ramdisk_size  = get_ramdisk_size();
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
 	printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
 		" [mem %#010llx-%#010llx]\n",
 		ramdisk_image, ramdisk_image + ramdisk_size - 1,
@@ -369,8 +369,8 @@ static void __init relocate_initrd(void)
 static void __init early_reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 
 	if (!boot_params.hdr.type_of_loader ||
@@ -382,8 +382,8 @@ static void __init early_reserve_initrd(void)
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 	u64 mapped_size;
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 02/22] x86, microcode: Use common get_ramdisk_image()
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 01/22] x86: Change get_ramdisk_image() to global Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Fenghua Yu

Use common get_ramdisk_image() to get ramdisk start phys address.

We need this to get correct ramdisk adress for 64bit bzImage that
initrd can be loaded above 4G by kexec-tools.

-v2: fix one typo that is found by Tang Chen

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/microcode_intel_early.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c
index 2e9e128..54575a9 100644
--- a/arch/x86/kernel/microcode_intel_early.c
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -743,8 +743,8 @@ load_ucode_intel_bsp(void)
 	struct boot_params *boot_params_p;
 
 	boot_params_p = (struct boot_params *)__pa_nodebug(&boot_params);
-	ramdisk_image = boot_params_p->hdr.ramdisk_image;
-	ramdisk_size  = boot_params_p->hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image(boot_params_p);
+	ramdisk_size  = get_ramdisk_size(boot_params_p);
 	initrd_start_early = ramdisk_image;
 	initrd_end_early = initrd_start_early + ramdisk_size;
 
@@ -753,8 +753,8 @@ load_ucode_intel_bsp(void)
 		(unsigned long *)__pa_nodebug(&mc_saved_in_initrd),
 		initrd_start_early, initrd_end_early, &uci);
 #else
-	ramdisk_image = boot_params.hdr.ramdisk_image;
-	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
 	initrd_start_early = ramdisk_image + PAGE_OFFSET;
 	initrd_end_early = initrd_start_early + ramdisk_size;
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 01/22] x86: Change get_ramdisk_image() to global Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 02/22] x86, microcode: Use common get_ramdisk_image() Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-17 23:19   ` Toshi Kani
  2013-06-15  0:56 ` [PATCH v5 04/22] x86, ACPI: Search buffer above 4G in second try for acpi override tables Yinghai Lu
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Rafael J. Wysocki, Jacob Shin, Pekka Enberg, linux-acpi

Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
be used anymore.

User should use arch_pfn_mapped or just 1UL<<(32-PAGE_SHIFT) instead.

Only user is ACPI_INITRD_TABLE_OVERRIDE, and it should not use that,
as later accessing is using early_ioremap(). We could change to use
1U<<(32_PAGE_SHIFT) with it, aka under 4G.

-v2: Leave alone max_low_pfn_mapped in i915 code according to tj.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-acpi@vger.kernel.org
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/page_types.h | 1 -
 arch/x86/kernel/setup.c           | 4 +---
 arch/x86/mm/init.c                | 4 ----
 drivers/acpi/osl.c                | 6 +++---
 4 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 54c9787..b012b82 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -43,7 +43,6 @@
 
 extern int devmem_is_allowed(unsigned long pagenr);
 
-extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
 static inline phys_addr_t get_max_mapped(void)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 66ab495..6ca5f2c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -112,13 +112,11 @@
 #include <asm/prom.h>
 
 /*
- * max_low_pfn_mapped: highest direct mapped pfn under 4GB
- * max_pfn_mapped:     highest direct mapped pfn over 4GB
+ * max_pfn_mapped:     highest direct mapped pfn
  *
  * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
  * represented by pfn_mapped
  */
-unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
 #ifdef CONFIG_DMI
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index eaac174..8554656 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -313,10 +313,6 @@ static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
 	nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_X_MAX);
 
 	max_pfn_mapped = max(max_pfn_mapped, end_pfn);
-
-	if (start_pfn < (1UL<<(32-PAGE_SHIFT)))
-		max_low_pfn_mapped = max(max_low_pfn_mapped,
-					 min(end_pfn, 1UL<<(32-PAGE_SHIFT)));
 }
 
 bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e721863..93e3194 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -624,9 +624,9 @@ void __init acpi_initrd_override(void *data, size_t size)
 	if (table_nr == 0)
 		return;
 
-	acpi_tables_addr =
-		memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
-				       all_tables_size, PAGE_SIZE);
+	/* under 4G at first, then above 4G */
+	acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
+					all_tables_size, PAGE_SIZE);
 	if (!acpi_tables_addr) {
 		WARN_ON(1);
 		return;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 04/22] x86, ACPI: Search buffer above 4G in second try for acpi override tables
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (2 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-17 23:22   ` Toshi Kani
  2013-06-15  0:56 ` [PATCH v5 05/22] x86, ACPI: Increase override tables number limit Yinghai Lu
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Rafael J. Wysocki, linux-acpi

Now we only search buffer for override acpi table under 4G.
In some case, like user use memmap to exclude all low ram,
we may not find range for it under 4G.

Do second try to search above 4G.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 drivers/acpi/osl.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 93e3194..42c48fc 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -627,6 +627,10 @@ void __init acpi_initrd_override(void *data, size_t size)
 	/* under 4G at first, then above 4G */
 	acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
 					all_tables_size, PAGE_SIZE);
+	if (!acpi_tables_addr)
+		acpi_tables_addr = memblock_find_in_range(0,
+					~(phys_addr_t)0,
+					all_tables_size, PAGE_SIZE);
 	if (!acpi_tables_addr) {
 		WARN_ON(1);
 		return;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 05/22] x86, ACPI: Increase override tables number limit
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (3 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 04/22] x86, ACPI: Search buffer above 4G in second try for acpi override tables Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-17 23:35   ` Toshi Kani
  2013-06-15  0:56 ` [PATCH v5 06/22] x86, ACPI: Split acpi_initrd_override to find/copy two functions Yinghai Lu
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Rafael J. Wysocki, linux-acpi

Current acpi tables in initrd is limited to 10, that is too small.
64 should be good enough as we have 35 sigs and could have several
SSDT.

Two problems in current code prevent us from increasing limit:
1. that cpio file info array is put in stack, as every element is 32
   bytes, could run out of stack if we have that array size to 64.
   We can move it out from stack, and make it as global and put it in
   __initdata section.
2. early_ioremap only can remap 256k one time. Current code is mapping
   10 tables one time. If we increase that limit, whole size could be
   more than 256k, early_ioremap will fail with that.
   We can map table one by one during copying, instead of mapping
   all them one time.

-v2: According to tj, split it out to separated patch, also
     rename array name to acpi_initrd_files.
-v3: Add some comments about mapping table one by one during copying
     per tj.

Signed-off-by: Yinghai <yinghai@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 drivers/acpi/osl.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 42c48fc..c4ea2b7 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -569,8 +569,8 @@ static const char * const table_sigs[] = {
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
-/* Must not increase 10 or needs code modification below */
-#define ACPI_OVERRIDE_TABLES 10
+#define ACPI_OVERRIDE_TABLES 64
+static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
 void __init acpi_initrd_override(void *data, size_t size)
 {
@@ -579,7 +579,6 @@ void __init acpi_initrd_override(void *data, size_t size)
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
 	struct cpio_data file;
-	struct cpio_data early_initrd_files[ACPI_OVERRIDE_TABLES];
 	char *p;
 
 	if (data == NULL || size == 0)
@@ -617,8 +616,8 @@ void __init acpi_initrd_override(void *data, size_t size)
 			table->signature, cpio_path, file.name, table->length);
 
 		all_tables_size += table->length;
-		early_initrd_files[table_nr].data = file.data;
-		early_initrd_files[table_nr].size = file.size;
+		acpi_initrd_files[table_nr].data = file.data;
+		acpi_initrd_files[table_nr].size = file.size;
 		table_nr++;
 	}
 	if (table_nr == 0)
@@ -648,14 +647,19 @@ void __init acpi_initrd_override(void *data, size_t size)
 	memblock_reserve(acpi_tables_addr, all_tables_size);
 	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
-	p = early_ioremap(acpi_tables_addr, all_tables_size);
-
+	/*
+	 * early_ioremap only can remap 256k one time. If we map all
+	 * tables one time, we will hit the limit. Need to map table
+	 * one by one during copying.
+	 */
 	for (no = 0; no < table_nr; no++) {
-		memcpy(p + total_offset, early_initrd_files[no].data,
-		       early_initrd_files[no].size);
-		total_offset += early_initrd_files[no].size;
+		phys_addr_t size = acpi_initrd_files[no].size;
+
+		p = early_ioremap(acpi_tables_addr + total_offset, size);
+		memcpy(p, acpi_initrd_files[no].data, size);
+		early_iounmap(p, size);
+		total_offset += size;
 	}
-	early_iounmap(p, all_tables_size);
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 06/22] x86, ACPI: Split acpi_initrd_override to find/copy two functions
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (4 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 05/22] x86, ACPI: Increase override tables number limit Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-18  0:24   ` Toshi Kani
  2013-06-15  0:56 ` [PATCH v5 07/22] x86, ACPI: Store override acpi tables phys addr in cpio files info array Yinghai Lu
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki, linux-acpi

To parse srat early, we need to move acpi table probing early.
acpi_initrd_table_override is before acpi table probing. So we need to
move it early too.

Current code acpi_initrd_table_override is after init_mem_mapping and
relocate_initrd(), so it can scan initrd and copy acpi tables with kernel
virtual address of initrd.
Copying need to be after memblock is ready, because it need to allocate
buffer for new acpi tables.

So we have to split that function to find and copy two functions.
Find should be as early as possible. Copy should be after memblock is ready.

Finding could be done in head_32.S and head64.c, just like microcode
early scanning. In head_32.S, it is 32bit flat mode, we don't
need to set page table to access it. In head64.c, #PF set page table
could help us access initrd with kernel low mapping address.

Copying could be done just after memblock is ready and before probing
acpi tables, and we need to early_ioremap to access source and target
range, as init_mem_mapping is not called yet.

While a dummy version of acpi_initrd_override() was defined when
!CONFIG_ACPI_INITRD_TABLE_OVERRIDE, the prototype and dummy version
were conditionalized inside CONFIG_ACPI.  This forced setup_arch() to
have its own #ifdefs around acpi_initrd_override() as otherwise build
would fail when !CONFIG_ACPI.  Move the prototypes and dummy
implementations of the newly split functions below CONFIG_ACPI block
in acpi.h so that we can do away with #ifdefs in its user.

-v2: Split one patch out according to tj.
     also don't pass table_nr around.
-v3: Add Tj's changelog about moving down to #idef in acpi.h to
     avoid #idef in setup.c

Signed-off-by: Yinghai <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |  6 +++---
 drivers/acpi/osl.c      | 18 +++++++++++++-----
 include/linux/acpi.h    | 16 ++++++++--------
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6ca5f2c..42f584c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1119,9 +1119,9 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_initrd();
 
-#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start);
-#endif
+	acpi_initrd_override_find((void *)initrd_start,
+					initrd_end - initrd_start);
+	acpi_initrd_override_copy();
 
 	reserve_crashkernel();
 
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index c4ea2b7..fea73af 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -572,14 +572,13 @@ static const char * const table_sigs[] = {
 #define ACPI_OVERRIDE_TABLES 64
 static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
-void __init acpi_initrd_override(void *data, size_t size)
+void __init acpi_initrd_override_find(void *data, size_t size)
 {
-	int sig, no, table_nr = 0, total_offset = 0;
+	int sig, no, table_nr = 0;
 	long offset = 0;
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
 	struct cpio_data file;
-	char *p;
 
 	if (data == NULL || size == 0)
 		return;
@@ -620,7 +619,14 @@ void __init acpi_initrd_override(void *data, size_t size)
 		acpi_initrd_files[table_nr].size = file.size;
 		table_nr++;
 	}
-	if (table_nr == 0)
+}
+
+void __init acpi_initrd_override_copy(void)
+{
+	int no, total_offset = 0;
+	char *p;
+
+	if (!all_tables_size)
 		return;
 
 	/* under 4G at first, then above 4G */
@@ -652,9 +658,11 @@ void __init acpi_initrd_override(void *data, size_t size)
 	 * tables one time, we will hit the limit. Need to map table
 	 * one by one during copying.
 	 */
-	for (no = 0; no < table_nr; no++) {
+	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
 		phys_addr_t size = acpi_initrd_files[no].size;
 
+		if (!size)
+			break;
 		p = early_ioremap(acpi_tables_addr + total_offset, size);
 		memcpy(p, acpi_initrd_files[no].data, size);
 		early_iounmap(p, size);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 17b5b59..8dd917b 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -79,14 +79,6 @@ typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table);
 typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 				      const unsigned long end);
 
-#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
-void acpi_initrd_override(void *data, size_t size);
-#else
-static inline void acpi_initrd_override(void *data, size_t size)
-{
-}
-#endif
-
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
@@ -476,6 +468,14 @@ static inline bool acpi_driver_match_device(struct device *dev,
 
 #endif	/* !CONFIG_ACPI */
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void acpi_initrd_override_find(void *data, size_t size);
+void acpi_initrd_override_copy(void);
+#else
+static inline void acpi_initrd_override_find(void *data, size_t size) { }
+static inline void acpi_initrd_override_copy(void) { }
+#endif
+
 #ifdef CONFIG_ACPI
 void acpi_os_set_prepare_sleep(int (*func)(u8 sleep_state,
 			       u32 pm1a_ctrl,  u32 pm1b_ctrl));
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 07/22] x86, ACPI: Store override acpi tables phys addr in cpio files info array
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (5 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 06/22] x86, ACPI: Split acpi_initrd_override to find/copy two functions Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 08/22] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode Yinghai Lu
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Rafael J. Wysocki, linux-acpi

In 32bit we will find table with phys address during 32bit flat mode
in head_32.S, because at that time we don't need set page table to
access initrd.

For copying we could use early_ioremap() with phys directly before mem mapping
is set.

To keep 32bit and 64bit consistent, use phys_addr for all.

-v2: introduce file_pos to save phys address instead of abusing cpio_data
	that tj is not happy with.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 drivers/acpi/osl.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index fea73af..3a307ec 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -570,7 +570,11 @@ static const char * const table_sigs[] = {
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
 #define ACPI_OVERRIDE_TABLES 64
-static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
+struct file_pos {
+	phys_addr_t data;
+	phys_addr_t size;
+};
+static struct file_pos __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
 void __init acpi_initrd_override_find(void *data, size_t size)
 {
@@ -615,7 +619,7 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 			table->signature, cpio_path, file.name, table->length);
 
 		all_tables_size += table->length;
-		acpi_initrd_files[table_nr].data = file.data;
+		acpi_initrd_files[table_nr].data = __pa_nodebug(file.data);
 		acpi_initrd_files[table_nr].size = file.size;
 		table_nr++;
 	}
@@ -624,7 +628,7 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 void __init acpi_initrd_override_copy(void)
 {
 	int no, total_offset = 0;
-	char *p;
+	char *p, *q;
 
 	if (!all_tables_size)
 		return;
@@ -659,12 +663,15 @@ void __init acpi_initrd_override_copy(void)
 	 * one by one during copying.
 	 */
 	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
+		phys_addr_t addr = acpi_initrd_files[no].data;
 		phys_addr_t size = acpi_initrd_files[no].size;
 
 		if (!size)
 			break;
+		q = early_ioremap(addr, size);
 		p = early_ioremap(acpi_tables_addr + total_offset, size);
-		memcpy(p, acpi_initrd_files[no].data, size);
+		memcpy(p, q, size);
+		early_iounmap(q, size);
 		early_iounmap(p, size);
 		total_offset += size;
 	}
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 08/22] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (6 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 07/22] x86, ACPI: Store override acpi tables phys addr in cpio files info array Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 09/22] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c Yinghai Lu
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki, linux-acpi

For finding with 32bit, it would be easy to access initrd in 32bit
flat mode, as we don't need to set page table.

That is from head_32.S, and microcode updating already use this trick.

Need to change acpi_initrd_override_find to use phys to access global
variables.

Pass is_phys in the function, as we can not use address to decide if it
is phys or virtual address on 32 bit. Boot loader could load initrd above
max_low_pfn.

Don't call printk as it uses global variables, so delay print later
during copying.

Change table_sigs to use stack instead, otherwise it is too messy to change
string array to phys and still keep offset calculating correct.
That size is about 36x4 bytes, and it is small to settle in stack.

Also remove "continue" in MARCO to make code more readable.

-v2: add (size_t) castint according to hpa to fix compiling warning
	found by Fengguan Wu.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |  2 +-
 drivers/acpi/osl.c      | 86 ++++++++++++++++++++++++++++++++++---------------
 include/linux/acpi.h    |  5 +--
 3 files changed, 64 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 42f584c..142e042 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1120,7 +1120,7 @@ void __init setup_arch(char **cmdline_p)
 	reserve_initrd();
 
 	acpi_initrd_override_find((void *)initrd_start,
-					initrd_end - initrd_start);
+					initrd_end - initrd_start, false);
 	acpi_initrd_override_copy();
 
 	reserve_crashkernel();
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 3a307ec..3b2beac 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -551,21 +551,9 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
 	return sum;
 }
 
-/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
-static const char * const table_sigs[] = {
-	ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
-	ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
-	ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
-	ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
-	ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
-	ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
-	ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
-	ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
-
 /* Non-fatal errors: Affected tables/files are ignored */
 #define INVALID_TABLE(x, path, name)					\
-	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
+	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
@@ -576,17 +564,45 @@ struct file_pos {
 };
 static struct file_pos __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
-void __init acpi_initrd_override_find(void *data, size_t size)
+/*
+ * acpi_initrd_override_find() is called from head_32.S and head64.c.
+ * head_32.S calling path is with 32bit flat mode, so we can access
+ * initrd early without setting pagetable or relocating initrd. For
+ * global variables accessing, we need to use phys address instead of
+ * kernel virtual address, try to put table_sigs string array in stack,
+ * so avoid switching for it.
+ * Also don't call printk as it uses global variables.
+ */
+void __init acpi_initrd_override_find(void *data, size_t size, bool is_phys)
 {
 	int sig, no, table_nr = 0;
 	long offset = 0;
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
 	struct cpio_data file;
+	struct file_pos *files = acpi_initrd_files;
+	int *all_tables_size_p = &all_tables_size;
+
+	/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
+	char *table_sigs[] = {
+		ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
+		ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
+		ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
+		ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
+		ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
+		ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
+		ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
+		ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
+		ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
 	if (data == NULL || size == 0)
 		return;
 
+	if (is_phys) {
+		files = (struct file_pos *)__pa_symbol(acpi_initrd_files);
+		all_tables_size_p = (int *)__pa_symbol(&all_tables_size);
+	}
+
 	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
 		file = find_cpio_data(cpio_path, data, size, &offset);
 		if (!file.data)
@@ -595,9 +611,12 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header))
-			INVALID_TABLE("Table smaller than ACPI header",
+		if (file.size < sizeof(struct acpi_table_header)) {
+			if (!is_phys)
+				INVALID_TABLE("Table smaller than ACPI header",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		table = file.data;
 
@@ -605,22 +624,34 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 			if (!memcmp(table->signature, table_sigs[sig], 4))
 				break;
 
-		if (!table_sigs[sig])
-			INVALID_TABLE("Unknown signature",
+		if (!table_sigs[sig]) {
+			if (!is_phys)
+				 INVALID_TABLE("Unknown signature",
 				      cpio_path, file.name);
-		if (file.size != table->length)
-			INVALID_TABLE("File length does not match table length",
+			continue;
+		}
+		if (file.size != table->length) {
+			if (!is_phys)
+				INVALID_TABLE("File length does not match table length",
 				      cpio_path, file.name);
-		if (acpi_table_checksum(file.data, table->length))
-			INVALID_TABLE("Bad table checksum",
+			continue;
+		}
+		if (acpi_table_checksum(file.data, table->length)) {
+			if (!is_phys)
+				INVALID_TABLE("Bad table checksum",
 				      cpio_path, file.name);
+			continue;
+		}
 
-		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
+		if (!is_phys)
+			pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
 
-		all_tables_size += table->length;
-		acpi_initrd_files[table_nr].data = __pa_nodebug(file.data);
-		acpi_initrd_files[table_nr].size = file.size;
+		(*all_tables_size_p) += table->length;
+		files[table_nr].data = is_phys ?
+						(phys_addr_t)(size_t)file.data :
+						__pa_nodebug(file.data);
+		files[table_nr].size = file.size;
 		table_nr++;
 	}
 }
@@ -670,6 +701,9 @@ void __init acpi_initrd_override_copy(void)
 			break;
 		q = early_ioremap(addr, size);
 		p = early_ioremap(acpi_tables_addr + total_offset, size);
+		pr_info("%4.4s ACPI table found in initrd [%#010llx-%#010llx]\n",
+				((struct acpi_table_header *)q)->signature,
+				(u64)addr, (u64)(addr + size - 1));
 		memcpy(p, q, size);
 		early_iounmap(q, size);
 		early_iounmap(p, size);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 8dd917b..4e3731b 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -469,10 +469,11 @@ static inline bool acpi_driver_match_device(struct device *dev,
 #endif	/* !CONFIG_ACPI */
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
-void acpi_initrd_override_find(void *data, size_t size);
+void acpi_initrd_override_find(void *data, size_t size, bool is_phys);
 void acpi_initrd_override_copy(void);
 #else
-static inline void acpi_initrd_override_find(void *data, size_t size) { }
+static inline void acpi_initrd_override_find(void *data, size_t size,
+						 bool is_phys) { }
 static inline void acpi_initrd_override_copy(void) { }
 #endif
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 09/22] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (7 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 08/22] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 10/22] x86, mm, numa: Move two functions calling on successful path later Yinghai Lu
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki, linux-acpi

head64.c could use #PF handler set page table to access initrd before
init mem mapping and initrd relocating.

head_32.S could use 32bit flat mode to access initrd before init mem
mapping initrd relocating.

That make 32bit and 64 bit more consistent.

-v2: use inline function in header file instead according to tj.
     also still need to keep #idef head_32.S to avoid compiling error.
-v3: need to move down reserve_initrd() after acpi_initrd_override_copy(),
     to make sure we are using right address.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |  6 ++++++
 arch/x86/kernel/head64.c     |  2 ++
 arch/x86/kernel/head_32.S    |  4 ++++
 arch/x86/kernel/setup.c      | 34 ++++++++++++++++++++++++++++++----
 4 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 4f71d48..6f885b7 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -42,6 +42,12 @@ extern void visws_early_detect(void);
 static inline void visws_early_detect(void) { }
 #endif
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void x86_acpi_override_find(void);
+#else
+static inline void x86_acpi_override_find(void) { }
+#endif
+
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 55b6761..229b281 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -175,6 +175,8 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
+	x86_acpi_override_find();
+
 	clear_page(init_level4_pgt);
 	/* set init_level4_pgt kernel high mapping*/
 	init_level4_pgt[511] = early_level4_pgt[511];
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 73afd11..ca08f0e 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -149,6 +149,10 @@ ENTRY(startup_32)
 	call load_ucode_bsp
 #endif
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+	call x86_acpi_override_find
+#endif
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond __brk_base.  The variable
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 142e042..d11b1b7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -421,6 +421,34 @@ static void __init reserve_initrd(void)
 }
 #endif /* CONFIG_BLK_DEV_INITRD */
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void __init x86_acpi_override_find(void)
+{
+	unsigned long ramdisk_image, ramdisk_size;
+	unsigned char *p = NULL;
+
+#ifdef CONFIG_X86_32
+	struct boot_params *boot_params_p;
+
+	/*
+	 * 32bit is from head_32.S, and it is 32bit flat mode.
+	 * So need to use phys address to access global variables.
+	 */
+	boot_params_p = (struct boot_params *)__pa_nodebug(&boot_params);
+	ramdisk_image = get_ramdisk_image(boot_params_p);
+	ramdisk_size  = get_ramdisk_size(boot_params_p);
+	p = (unsigned char *)ramdisk_image;
+	acpi_initrd_override_find(p, ramdisk_size, true);
+#else
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
+	if (ramdisk_image)
+		p = __va(ramdisk_image);
+	acpi_initrd_override_find(p, ramdisk_size, false);
+#endif
+}
+#endif
+
 static void __init parse_setup_data(void)
 {
 	struct setup_data *data;
@@ -1117,12 +1145,10 @@ void __init setup_arch(char **cmdline_p)
 	/* Allocate bigger log buffer */
 	setup_log_buf(1);
 
-	reserve_initrd();
-
-	acpi_initrd_override_find((void *)initrd_start,
-					initrd_end - initrd_start, false);
 	acpi_initrd_override_copy();
 
+	reserve_initrd();
+
 	reserve_crashkernel();
 
 	vsmp_init();
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 10/22] x86, mm, numa: Move two functions calling on successful path later
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (8 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 09/22] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 11/22] x86, mm, numa: Call numa_meminfo_cover_memory() checking early Yinghai Lu
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu

We need to have numa info ready before init_mem_mapping, so we
can call init_mem_mapping per nodes also can trim node mem range to
big alignment.

Current numa parsing need to allocate some buffer and need to be
called after init_mem_mapping.

So try to split parsing numa info to two stages, and early one will be
before init_mem_mapping, and it should not need allocate buffers.

At last we will have early_initmem_init() and initmem_init().

This one is first one for separation.

setup_node_data() and numa_init_array() are only called for successful
path, so we can move calling to x86_numa_init(). That will also make
numa_init() small and readable.

-v2: remove online_node_map clear in numa_init(), as it is only
     set in setup_node_data() at last in successful path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 69 ++++++++++++++++++++++++++++++------------------------
 1 file changed, 39 insertions(+), 30 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index a71c4e2..07ae800 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -477,7 +477,7 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
 	unsigned long uninitialized_var(pfn_align);
-	int i, nid;
+	int i;
 
 	/* Account for nodes with cpus and no memory */
 	node_possible_map = numa_nodes_parsed;
@@ -506,24 +506,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
-	/* Finally register nodes. */
-	for_each_node_mask(nid, node_possible_map) {
-		u64 start = PFN_PHYS(max_pfn);
-		u64 end = 0;
-
-		for (i = 0; i < mi->nr_blks; i++) {
-			if (nid != mi->blk[i].nid)
-				continue;
-			start = min(mi->blk[i].start, start);
-			end = max(mi->blk[i].end, end);
-		}
-
-		if (start < end)
-			setup_node_data(nid, start, end);
-	}
-
-	/* Dump memblock with node info and return. */
-	memblock_dump_all();
 	return 0;
 }
 
@@ -559,7 +541,6 @@ static int __init numa_init(int (*init_func)(void))
 
 	nodes_clear(numa_nodes_parsed);
 	nodes_clear(node_possible_map);
-	nodes_clear(node_online_map);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
 	WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
 	numa_reset_distance();
@@ -577,15 +558,6 @@ static int __init numa_init(int (*init_func)(void))
 	if (ret < 0)
 		return ret;
 
-	for (i = 0; i < nr_cpu_ids; i++) {
-		int nid = early_cpu_to_node(i);
-
-		if (nid == NUMA_NO_NODE)
-			continue;
-		if (!node_online(nid))
-			numa_clear_node(i);
-	}
-	numa_init_array();
 	return 0;
 }
 
@@ -618,7 +590,7 @@ static int __init dummy_numa_init(void)
  * last fallback is dummy single node config encomapssing whole memory and
  * never fails.
  */
-void __init x86_numa_init(void)
+static void __init early_x86_numa_init(void)
 {
 	if (!numa_off) {
 #ifdef CONFIG_X86_NUMAQ
@@ -638,6 +610,43 @@ void __init x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
+void __init x86_numa_init(void)
+{
+	int i, nid;
+	struct numa_meminfo *mi = &numa_meminfo;
+
+	early_x86_numa_init();
+
+	/* Finally register nodes. */
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = PFN_PHYS(max_pfn);
+		u64 end = 0;
+
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
+				continue;
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
+		}
+
+		if (start < end)
+			setup_node_data(nid, start, end); /* online is set */
+	}
+
+	/* Dump memblock with node info */
+	memblock_dump_all();
+
+	for (i = 0; i < nr_cpu_ids; i++) {
+		int nid = early_cpu_to_node(i);
+
+		if (nid == NUMA_NO_NODE)
+			continue;
+		if (!node_online(nid))
+			numa_clear_node(i);
+	}
+	numa_init_array();
+}
+
 static __init int find_near_online_node(int node)
 {
 	int n, val;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 11/22] x86, mm, numa: Call numa_meminfo_cover_memory() checking early
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (9 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 10/22] x86, mm, numa: Move two functions calling on successful path later Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 12/22] x86, mm, numa: Move node_map_pfn alignment() to x86 Yinghai Lu
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu

For the separation, we need to set memblock nid later, as it
could change memblock array, and possible doube memblock.memory
array that will need to allocate buffer.

We do not need to use nid in memblock to find out absent pages.
So we can move that numa_meminfo_cover_memory() early.

Also could change __absent_pages_in_range() to static and use
absent_pages_in_range() directly.

Later we can only set memblock nid one time on successful path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 7 ++++---
 include/linux/mm.h | 2 --
 mm/page_alloc.c    | 2 +-
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 07ae800..1bb565d 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -457,7 +457,7 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 		u64 s = mi->blk[i].start >> PAGE_SHIFT;
 		u64 e = mi->blk[i].end >> PAGE_SHIFT;
 		numaram += e - s;
-		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+		numaram -= absent_pages_in_range(s, e);
 		if ((s64)numaram < 0)
 			numaram = 0;
 	}
@@ -485,6 +485,9 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	if (WARN_ON(nodes_empty(node_possible_map)))
 		return -EINVAL;
 
+	if (!numa_meminfo_cover_memory(mi))
+		return -EINVAL;
+
 	for (i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *mb = &mi->blk[i];
 		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
@@ -503,8 +506,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 		return -EINVAL;
 	}
 #endif
-	if (!numa_meminfo_cover_memory(mi))
-		return -EINVAL;
 
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e0c8528..28e9470 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1385,8 +1385,6 @@ static inline unsigned long free_initmem_default(int poison)
  */
 extern void free_area_init_nodes(unsigned long *max_zone_pfn);
 unsigned long node_map_pfn_alignment(void);
-unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
-						unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
 						unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 378a15b..c427f46 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4395,7 +4395,7 @@ static unsigned long __meminit zone_spanned_pages_in_node(int nid,
  * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
  * then all holes in the requested range will be accounted for.
  */
-unsigned long __meminit __absent_pages_in_range(int nid,
+static unsigned long __meminit __absent_pages_in_range(int nid,
 				unsigned long range_start_pfn,
 				unsigned long range_end_pfn)
 {
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 12/22] x86, mm, numa: Move node_map_pfn alignment() to x86
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (10 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 11/22] x86, mm, numa: Call numa_meminfo_cover_memory() checking early Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 13/22] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment Yinghai Lu
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu

Move node_map_pfn_alignment() to arch/x86/mm as no other user for it.

Will update it to use numa_meminfo instead of memblock.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h |  1 -
 mm/page_alloc.c    | 50 --------------------------------------------------
 3 files changed, 50 insertions(+), 51 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 1bb565d..10c6240 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -474,6 +474,56 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 	return true;
 }
 
+/**
+ * node_map_pfn_alignment - determine the maximum internode alignment
+ *
+ * This function should be called after node map is populated and sorted.
+ * It calculates the maximum power of two alignment which can distinguish
+ * all the nodes.
+ *
+ * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
+ * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
+ * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
+ * shifted, 1GiB is enough and this function will indicate so.
+ *
+ * This is used to test whether pfn -> nid mapping of the chosen memory
+ * model has fine enough granularity to avoid incorrect mapping for the
+ * populated node map.
+ *
+ * Returns the determined alignment in pfn's.  0 if there is no alignment
+ * requirement (single node).
+ */
+unsigned long __init node_map_pfn_alignment(void)
+{
+	unsigned long accl_mask = 0, last_end = 0;
+	unsigned long start, end, mask;
+	int last_nid = -1;
+	int i, nid;
+
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) {
+		if (!start || last_nid < 0 || last_nid == nid) {
+			last_nid = nid;
+			last_end = end;
+			continue;
+		}
+
+		/*
+		 * Start with a mask granular enough to pin-point to the
+		 * start pfn and tick off bits one-by-one until it becomes
+		 * too coarse to separate the current node from the last.
+		 */
+		mask = ~((1 << __ffs(start)) - 1);
+		while (mask && last_end <= (start & (mask << 1)))
+			mask <<= 1;
+
+		/* accumulate all internode masks */
+		accl_mask |= mask;
+	}
+
+	/* convert mask to number of pages */
+	return ~accl_mask + 1;
+}
+
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
 	unsigned long uninitialized_var(pfn_align);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 28e9470..b827743 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1384,7 +1384,6 @@ static inline unsigned long free_initmem_default(int poison)
  * CONFIG_HAVE_MEMBLOCK_NODE_MAP.
  */
 extern void free_area_init_nodes(unsigned long *max_zone_pfn);
-unsigned long node_map_pfn_alignment(void);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
 						unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c427f46..28c4a97 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4760,56 +4760,6 @@ void __init setup_nr_node_ids(void)
 }
 #endif
 
-/**
- * node_map_pfn_alignment - determine the maximum internode alignment
- *
- * This function should be called after node map is populated and sorted.
- * It calculates the maximum power of two alignment which can distinguish
- * all the nodes.
- *
- * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
- * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
- * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
- * shifted, 1GiB is enough and this function will indicate so.
- *
- * This is used to test whether pfn -> nid mapping of the chosen memory
- * model has fine enough granularity to avoid incorrect mapping for the
- * populated node map.
- *
- * Returns the determined alignment in pfn's.  0 if there is no alignment
- * requirement (single node).
- */
-unsigned long __init node_map_pfn_alignment(void)
-{
-	unsigned long accl_mask = 0, last_end = 0;
-	unsigned long start, end, mask;
-	int last_nid = -1;
-	int i, nid;
-
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) {
-		if (!start || last_nid < 0 || last_nid == nid) {
-			last_nid = nid;
-			last_end = end;
-			continue;
-		}
-
-		/*
-		 * Start with a mask granular enough to pin-point to the
-		 * start pfn and tick off bits one-by-one until it becomes
-		 * too coarse to separate the current node from the last.
-		 */
-		mask = ~((1 << __ffs(start)) - 1);
-		while (mask && last_end <= (start & (mask << 1)))
-			mask <<= 1;
-
-		/* accumulate all internode masks */
-		accl_mask |= mask;
-	}
-
-	/* convert mask to number of pages */
-	return ~accl_mask + 1;
-}
-
 /* Find the lowest pfn for a node */
 static unsigned long __init find_min_pfn_for_node(int nid)
 {
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 13/22] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (11 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 12/22] x86, mm, numa: Move node_map_pfn alignment() to x86 Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 14/22] x86, mm, numa: Set memblock nid later Yinghai Lu
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu

We could use numa_meminfo directly instead of memblock nid.

So we could move down set memblock nid and only do it one time
for successful path.

-v2: according to tj, separate moving to another patch.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 10c6240..cff565a 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -493,14 +493,18 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
  * Returns the determined alignment in pfn's.  0 if there is no alignment
  * requirement (single node).
  */
-unsigned long __init node_map_pfn_alignment(void)
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 {
 	unsigned long accl_mask = 0, last_end = 0;
 	unsigned long start, end, mask;
 	int last_nid = -1;
 	int i, nid;
 
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) {
+	for (i = 0; i < mi->nr_blks; i++) {
+		start = mi->blk[i].start >> PAGE_SHIFT;
+		end = mi->blk[i].end >> PAGE_SHIFT;
+		nid = mi->blk[i].nid;
 		if (!start || last_nid < 0 || last_nid == nid) {
 			last_nid = nid;
 			last_end = end;
@@ -523,10 +527,16 @@ unsigned long __init node_map_pfn_alignment(void)
 	/* convert mask to number of pages */
 	return ~accl_mask + 1;
 }
+#else
+static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
+{
+	return 0;
+}
+#endif
 
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
-	unsigned long uninitialized_var(pfn_align);
+	unsigned long pfn_align;
 	int i;
 
 	/* Account for nodes with cpus and no memory */
@@ -538,24 +548,22 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
-	for (i = 0; i < mi->nr_blks; i++) {
-		struct numa_memblk *mb = &mi->blk[i];
-		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-	}
-
 	/*
 	 * If sections array is gonna be used for pfn -> nid mapping, check
 	 * whether its granularity is fine enough.
 	 */
-#ifdef NODE_NOT_IN_PAGE_FLAGS
-	pfn_align = node_map_pfn_alignment();
+	pfn_align = node_map_pfn_alignment(mi);
 	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
 		printk(KERN_WARNING "Node alignment %LuMB < min %LuMB, rejecting NUMA config\n",
 		       PFN_PHYS(pfn_align) >> 20,
 		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
 		return -EINVAL;
 	}
-#endif
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
+	}
 
 	return 0;
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 14/22] x86, mm, numa: Set memblock nid later
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (12 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 13/22] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 15/22] x86, mm, numa: Move node_possible_map setting later Yinghai Lu
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu

For the separation, we need to set memblock nid later, as it
could change memblock array, and possible doube memblock.memory
array that will need to allocate buffer.

Only set memblock nid one time for successful path.

Also rename numa_register_memblks to numa_check_memblks()
after move out code for setting memblock nid.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index cff565a..e448b6f 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -534,10 +534,9 @@ static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 }
 #endif
 
-static int __init numa_register_memblks(struct numa_meminfo *mi)
+static int __init numa_check_memblks(struct numa_meminfo *mi)
 {
 	unsigned long pfn_align;
-	int i;
 
 	/* Account for nodes with cpus and no memory */
 	node_possible_map = numa_nodes_parsed;
@@ -560,11 +559,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 		return -EINVAL;
 	}
 
-	for (i = 0; i < mi->nr_blks; i++) {
-		struct numa_memblk *mb = &mi->blk[i];
-		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-	}
-
 	return 0;
 }
 
@@ -601,7 +595,6 @@ static int __init numa_init(int (*init_func)(void))
 	nodes_clear(numa_nodes_parsed);
 	nodes_clear(node_possible_map);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
-	WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
 	numa_reset_distance();
 
 	ret = init_func();
@@ -613,7 +606,7 @@ static int __init numa_init(int (*init_func)(void))
 
 	numa_emulation(&numa_meminfo, numa_distance_cnt);
 
-	ret = numa_register_memblks(&numa_meminfo);
+	ret = numa_check_memblks(&numa_meminfo);
 	if (ret < 0)
 		return ret;
 
@@ -676,6 +669,11 @@ void __init x86_numa_init(void)
 
 	early_x86_numa_init();
 
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
+	}
+
 	/* Finally register nodes. */
 	for_each_node_mask(nid, node_possible_map) {
 		u64 start = PFN_PHYS(max_pfn);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 15/22] x86, mm, numa: Move node_possible_map setting later
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (13 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 14/22] x86, mm, numa: Set memblock nid later Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 16/22] x86, mm, numa: Move emulation handling down Yinghai Lu
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu

Move node_possible_map handling out of numa_check_memblks to avoid side
changing in numa_check_memblks().

Only set once for successful path instead of resetting in numa_init()
every time.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index e448b6f..da2ebab 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -536,12 +536,13 @@ static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 
 static int __init numa_check_memblks(struct numa_meminfo *mi)
 {
+	nodemask_t nodes_parsed;
 	unsigned long pfn_align;
 
 	/* Account for nodes with cpus and no memory */
-	node_possible_map = numa_nodes_parsed;
-	numa_nodemask_from_meminfo(&node_possible_map, mi);
-	if (WARN_ON(nodes_empty(node_possible_map)))
+	nodes_parsed = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&nodes_parsed, mi);
+	if (WARN_ON(nodes_empty(nodes_parsed)))
 		return -EINVAL;
 
 	if (!numa_meminfo_cover_memory(mi))
@@ -593,7 +594,6 @@ static int __init numa_init(int (*init_func)(void))
 		set_apicid_to_node(i, NUMA_NO_NODE);
 
 	nodes_clear(numa_nodes_parsed);
-	nodes_clear(node_possible_map);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
 	numa_reset_distance();
 
@@ -669,6 +669,9 @@ void __init x86_numa_init(void)
 
 	early_x86_numa_init();
 
+	node_possible_map = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&node_possible_map, mi);
+
 	for (i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *mb = &mi->blk[i];
 		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 16/22] x86, mm, numa: Move emulation handling down.
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (14 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 15/22] x86, mm, numa: Move node_possible_map setting later Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 17/22] x86, ACPI, numa, ia64: split SLIT handling out Yinghai Lu
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	David Rientjes

It needs to allocate buffer for new numa_meminfo and distance matrix,
so move it down.

Also we change the behavoir:
before this patch, if user input wrong data in command line, it
will fall back to next numa probing or disabling numa.
after this patch, if user input wrong data in command line, it will
stay with numa info from probing before, like acpi srat or amd_numa.

We need to call numa_check_memblks to reject wrong user inputs early,
so keep the original numa_meminfo not changed.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c           | 6 +++---
 arch/x86/mm/numa_emulation.c | 2 +-
 arch/x86/mm/numa_internal.h  | 2 ++
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index da2ebab..3254f22 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -534,7 +534,7 @@ static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 }
 #endif
 
-static int __init numa_check_memblks(struct numa_meminfo *mi)
+int __init numa_check_memblks(struct numa_meminfo *mi)
 {
 	nodemask_t nodes_parsed;
 	unsigned long pfn_align;
@@ -604,8 +604,6 @@ static int __init numa_init(int (*init_func)(void))
 	if (ret < 0)
 		return ret;
 
-	numa_emulation(&numa_meminfo, numa_distance_cnt);
-
 	ret = numa_check_memblks(&numa_meminfo);
 	if (ret < 0)
 		return ret;
@@ -669,6 +667,8 @@ void __init x86_numa_init(void)
 
 	early_x86_numa_init();
 
+	numa_emulation(&numa_meminfo, numa_distance_cnt);
+
 	node_possible_map = numa_nodes_parsed;
 	numa_nodemask_from_meminfo(&node_possible_map, mi);
 
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index dbbbb47..5a0433d 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -348,7 +348,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 	if (ret < 0)
 		goto no_emu;
 
-	if (numa_cleanup_meminfo(&ei) < 0) {
+	if (numa_cleanup_meminfo(&ei) < 0 || numa_check_memblks(&ei) < 0) {
 		pr_warning("NUMA: Warning: constructed meminfo invalid, disabling emulation\n");
 		goto no_emu;
 	}
diff --git a/arch/x86/mm/numa_internal.h b/arch/x86/mm/numa_internal.h
index ad86ec9..bb2fbcc 100644
--- a/arch/x86/mm/numa_internal.h
+++ b/arch/x86/mm/numa_internal.h
@@ -21,6 +21,8 @@ void __init numa_reset_distance(void);
 
 void __init x86_numa_init(void);
 
+int __init numa_check_memblks(struct numa_meminfo *mi);
+
 #ifdef CONFIG_NUMA_EMU
 void __init numa_emulation(struct numa_meminfo *numa_meminfo,
 			   int numa_dist_cnt);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 17/22] x86, ACPI, numa, ia64: split SLIT handling out
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (15 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 16/22] x86, mm, numa: Move emulation handling down Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 18/22] x86, mm, numa: Add early_initmem_init() stub Yinghai Lu
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Rafael J. Wysocki, linux-acpi, Tony Luck, Fenghua Yu, linux-ia64

We need to handle slit later, as it need to allocate buffer for distance
matrix. Also we do not need SLIT info before init_mem_mapping.

So move SLIT parsing later.

x86_acpi_numa_init become x86_acpi_numa_init_srat/x86_acpi_numa_init_slit.

It should not break ia64 by replacing acpi_numa_init with
acpi_numa_init_srat/acpi_numa_init_slit/acpi_num_arch_fixup.

-v2: Change name to acpi_numa_init_srat/acpi_numa_init_slit according tj.
     remove the reset_numa_distance() in numa_init(), as get we only set
     distance in slit handling.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/ia64/kernel/setup.c    |  4 +++-
 arch/x86/include/asm/acpi.h |  3 ++-
 arch/x86/mm/numa.c          | 14 ++++++++++++--
 arch/x86/mm/srat.c          | 11 +++++++----
 drivers/acpi/numa.c         | 13 +++++++------
 include/linux/acpi.h        |  3 ++-
 6 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 13bfdd2..5f7db4a 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -558,7 +558,9 @@ setup_arch (char **cmdline_p)
 	acpi_table_init();
 	early_acpi_boot_init();
 # ifdef CONFIG_ACPI_NUMA
-	acpi_numa_init();
+	acpi_numa_init_srat();
+	acpi_numa_init_slit();
+	acpi_numa_arch_fixup();
 #  ifdef CONFIG_ACPI_HOTPLUG_CPU
 	prefill_possible_map();
 #  endif
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index b31bf97..651db0b 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -178,7 +178,8 @@ static inline void disable_acpi(void) { }
 
 #ifdef CONFIG_ACPI_NUMA
 extern int acpi_numa;
-extern int x86_acpi_numa_init(void);
+int x86_acpi_numa_init_srat(void);
+void x86_acpi_numa_init_slit(void);
 #endif /* CONFIG_ACPI_NUMA */
 
 #define acpi_unlazy_tlb(x)	leave_mm(x)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 3254f22..630e09f 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -595,7 +595,6 @@ static int __init numa_init(int (*init_func)(void))
 
 	nodes_clear(numa_nodes_parsed);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
-	numa_reset_distance();
 
 	ret = init_func();
 	if (ret < 0)
@@ -633,6 +632,10 @@ static int __init dummy_numa_init(void)
 	return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+static bool srat_used __initdata;
+#endif
+
 /**
  * x86_numa_init - Initialize NUMA
  *
@@ -648,8 +651,10 @@ static void __init early_x86_numa_init(void)
 			return;
 #endif
 #ifdef CONFIG_ACPI_NUMA
-		if (!numa_init(x86_acpi_numa_init))
+		if (!numa_init(x86_acpi_numa_init_srat)) {
+			srat_used = true;
 			return;
+		}
 #endif
 #ifdef CONFIG_AMD_NUMA
 		if (!numa_init(amd_numa_init))
@@ -667,6 +672,11 @@ void __init x86_numa_init(void)
 
 	early_x86_numa_init();
 
+#ifdef CONFIG_ACPI_NUMA
+	if (srat_used)
+		x86_acpi_numa_init_slit();
+#endif
+
 	numa_emulation(&numa_meminfo, numa_distance_cnt);
 
 	node_possible_map = numa_nodes_parsed;
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index cdd0da9..443f9ef 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -185,14 +185,17 @@ out_err:
 	return -1;
 }
 
-void __init acpi_numa_arch_fixup(void) {}
-
-int __init x86_acpi_numa_init(void)
+int __init x86_acpi_numa_init_srat(void)
 {
 	int ret;
 
-	ret = acpi_numa_init();
+	ret = acpi_numa_init_srat();
 	if (ret < 0)
 		return ret;
 	return srat_disabled() ? -EINVAL : 0;
 }
+
+void __init x86_acpi_numa_init_slit(void)
+{
+	acpi_numa_init_slit();
+}
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 33e609f..6460db4 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -282,7 +282,7 @@ acpi_table_parse_srat(enum acpi_srat_type id,
 					    handler, max_entries);
 }
 
-int __init acpi_numa_init(void)
+int __init acpi_numa_init_srat(void)
 {
 	int cnt = 0;
 
@@ -303,11 +303,6 @@ int __init acpi_numa_init(void)
 					    NR_NODE_MEMBLKS);
 	}
 
-	/* SLIT: System Locality Information Table */
-	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
-
-	acpi_numa_arch_fixup();
-
 	if (cnt < 0)
 		return cnt;
 	else if (!parsed_numa_memblks)
@@ -315,6 +310,12 @@ int __init acpi_numa_init(void)
 	return 0;
 }
 
+void __init acpi_numa_init_slit(void)
+{
+	/* SLIT: System Locality Information Table */
+	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
+}
+
 int acpi_get_pxm(acpi_handle h)
 {
 	unsigned long long pxm;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4e3731b..92463b5 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -85,7 +85,8 @@ int early_acpi_boot_init(void);
 int acpi_boot_init (void);
 void acpi_boot_table_init (void);
 int acpi_mps_check (void);
-int acpi_numa_init (void);
+int acpi_numa_init_srat(void);
+void acpi_numa_init_slit(void);
 
 int acpi_table_init (void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 18/22] x86, mm, numa: Add early_initmem_init() stub
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (16 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 17/22] x86, ACPI, numa, ia64: split SLIT handling out Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 19/22] x86, mm: Parse numa info early Yinghai Lu
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Pekka Enberg, Jacob Shin

early_initmem_init() call early_x86_numa_init() to parse numa info early.

Later will call init_mem_mapping for nodes in it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/page_types.h | 1 +
 arch/x86/kernel/setup.c           | 1 +
 arch/x86/mm/init.c                | 6 ++++++
 arch/x86/mm/numa.c                | 7 +++++--
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index b012b82..d04dd8c 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -55,6 +55,7 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
 extern unsigned long init_memory_mapping(unsigned long start,
 					 unsigned long end);
 
+void early_initmem_init(void);
 extern void initmem_init(void);
 
 #endif	/* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d11b1b7..301165e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1162,6 +1162,7 @@ void __init setup_arch(char **cmdline_p)
 
 	early_acpi_boot_init();
 
+	early_initmem_init();
 	initmem_init();
 	memblock_find_dma_reserve();
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8554656..3c21f16 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -467,6 +467,12 @@ void __init init_mem_mapping(void)
 	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
 }
 
+#ifndef CONFIG_NUMA
+void __init early_initmem_init(void)
+{
+}
+#endif
+
 /*
  * devmem_is_allowed() checks to see if /dev/mem access to a certain address
  * is valid. The argument is a physical page number.
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 630e09f..7d76936 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -665,13 +665,16 @@ static void __init early_x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
+void __init early_initmem_init(void)
+{
+	early_x86_numa_init();
+}
+
 void __init x86_numa_init(void)
 {
 	int i, nid;
 	struct numa_meminfo *mi = &numa_meminfo;
 
-	early_x86_numa_init();
-
 #ifdef CONFIG_ACPI_NUMA
 	if (srat_used)
 		x86_acpi_numa_init_slit();
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 19/22] x86, mm: Parse numa info early
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (17 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 18/22] x86, mm, numa: Add early_initmem_init() stub Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 20/22] x86, mm: Add comments for step_size shift Yinghai Lu
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Pekka Enberg, Jacob Shin

Parsing numa info has been separated to two functions now.

early_initmem_info() only parse info in numa_meminfo and
nodes_parsed. still keep numaq, acpi_numa, amd_numa, dummy
fall back sequence working.

SLIT and numa emulation handling are still left in initmem_init().

Call early_initmem_init before init_mem_mapping() to prepare
to use numa_info with it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 301165e..fd0d5be 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1125,13 +1125,21 @@ void __init setup_arch(char **cmdline_p)
 	trim_platform_memory_ranges();
 	trim_low_memory_range();
 
+	/*
+	 * Parse the ACPI tables for possible boot-time SMP configuration.
+	 */
+	acpi_initrd_override_copy();
+	acpi_boot_table_init();
+	early_acpi_boot_init();
+	early_initmem_init();
 	init_mem_mapping();
-
+	memblock.current_limit = get_max_mapped();
 	early_trap_pf_init();
 
+	reserve_initrd();
+
 	setup_real_mode();
 
-	memblock.current_limit = get_max_mapped();
 	dma_contiguous_reserve(0);
 
 	/*
@@ -1145,24 +1153,12 @@ void __init setup_arch(char **cmdline_p)
 	/* Allocate bigger log buffer */
 	setup_log_buf(1);
 
-	acpi_initrd_override_copy();
-
-	reserve_initrd();
-
 	reserve_crashkernel();
 
 	vsmp_init();
 
 	io_delay_init();
 
-	/*
-	 * Parse the ACPI tables for possible boot-time SMP configuration.
-	 */
-	acpi_boot_table_init();
-
-	early_acpi_boot_init();
-
-	early_initmem_init();
 	initmem_init();
 	memblock_find_dma_reserve();
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 20/22] x86, mm: Add comments for step_size shift
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (18 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 19/22] x86, mm: Parse numa info early Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 21/22] x86, mm: Make init_mem_mapping be able to be called several times Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 22/22] x86, mm, numa: Put pagetable on local node ram for 64bit Yinghai Lu
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu

As request by hpa, add comments for why we choose 5 for
step size shift.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/init.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3c21f16..5f38e72 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -395,8 +395,23 @@ static unsigned long __init init_range_memory_mapping(
 	return mapped_ram_size;
 }
 
-/* (PUD_SHIFT-PMD_SHIFT)/2 */
-#define STEP_SIZE_SHIFT 5
+static unsigned long __init get_new_step_size(unsigned long step_size)
+{
+	/*
+	 * initial mapped size is PMD_SIZE, aka 2M.
+	 * We can not set step_size to be PUD_SIZE aka 1G yet.
+	 * In worse case, when 1G is cross the 1G boundary, and
+	 * PG_LEVEL_2M is not set, we will need 1+1+512 pages (aka 2M + 8k)
+	 * to map 1G range with PTE. Use 5 as shift for now.
+	 */
+	unsigned long new_step_size = step_size << 5;
+
+	if (new_step_size > step_size)
+		step_size = new_step_size;
+
+	return  step_size;
+}
+
 void __init init_mem_mapping(void)
 {
 	unsigned long end, real_end, start, last_start;
@@ -445,7 +460,7 @@ void __init init_mem_mapping(void)
 		min_pfn_mapped = last_start >> PAGE_SHIFT;
 		/* only increase step_size after big range get mapped */
 		if (new_mapped_ram_size > mapped_ram_size)
-			step_size <<= STEP_SIZE_SHIFT;
+			step_size = get_new_step_size(step_size);
 		mapped_ram_size += new_mapped_ram_size;
 	}
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 21/22] x86, mm: Make init_mem_mapping be able to be called several times
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (19 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 20/22] x86, mm: Add comments for step_size shift Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  2013-06-15  0:56 ` [PATCH v5 22/22] x86, mm, numa: Put pagetable on local node ram for 64bit Yinghai Lu
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Pekka Enberg, Jacob Shin, Konrad Rzeszutek Wilk

Prepare to put page table on local nodes.

Move calling of init_mem_mapping to early_initmem_init.

Rework alloc_low_pages to alloc page table in following order:
	BRK, local node, low range

Still load_cr3 one time.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/pgtable.h |   2 +-
 arch/x86/kernel/setup.c        |   1 -
 arch/x86/mm/init.c             | 101 +++++++++++++++++++++++++----------------
 arch/x86/mm/numa.c             |  24 ++++++++++
 4 files changed, 88 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1e67223..868687c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -621,7 +621,7 @@ static inline int pgd_none(pgd_t pgd)
 #ifndef __ASSEMBLY__
 
 extern int direct_gbpages;
-void init_mem_mapping(void);
+void init_mem_mapping(unsigned long begin, unsigned long end);
 void early_alloc_pgt_buf(void);
 
 /* local pte updates need not use xchg for locking */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index fd0d5be..9ccbd60 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1132,7 +1132,6 @@ void __init setup_arch(char **cmdline_p)
 	acpi_boot_table_init();
 	early_acpi_boot_init();
 	early_initmem_init();
-	init_mem_mapping();
 	memblock.current_limit = get_max_mapped();
 	early_trap_pf_init();
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 5f38e72..21b1653 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -24,7 +24,10 @@ static unsigned long __initdata pgt_buf_start;
 static unsigned long __initdata pgt_buf_end;
 static unsigned long __initdata pgt_buf_top;
 
-static unsigned long min_pfn_mapped;
+static unsigned long low_min_pfn_mapped;
+static unsigned long low_max_pfn_mapped;
+static unsigned long local_min_pfn_mapped;
+static unsigned long local_max_pfn_mapped;
 
 static bool __initdata can_use_brk_pgt = true;
 
@@ -52,10 +55,17 @@ __ref void *alloc_low_pages(unsigned int num)
 
 	if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
 		unsigned long ret;
-		if (min_pfn_mapped >= max_pfn_mapped)
-			panic("alloc_low_page: ran out of memory");
-		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
-					max_pfn_mapped << PAGE_SHIFT,
+		if (local_min_pfn_mapped >= local_max_pfn_mapped) {
+			if (low_min_pfn_mapped >= low_max_pfn_mapped)
+				panic("alloc_low_page: ran out of memory");
+			ret = memblock_find_in_range(
+					low_min_pfn_mapped << PAGE_SHIFT,
+					low_max_pfn_mapped << PAGE_SHIFT,
+					PAGE_SIZE * num , PAGE_SIZE);
+		} else
+			ret = memblock_find_in_range(
+					local_min_pfn_mapped << PAGE_SHIFT,
+					local_max_pfn_mapped << PAGE_SHIFT,
 					PAGE_SIZE * num , PAGE_SIZE);
 		if (!ret)
 			panic("alloc_low_page: can not alloc memory");
@@ -412,67 +422,87 @@ static unsigned long __init get_new_step_size(unsigned long step_size)
 	return  step_size;
 }
 
-void __init init_mem_mapping(void)
+void __init init_mem_mapping(unsigned long begin, unsigned long end)
 {
-	unsigned long end, real_end, start, last_start;
+	unsigned long real_end, start, last_start;
 	unsigned long step_size;
 	unsigned long addr;
 	unsigned long mapped_ram_size = 0;
 	unsigned long new_mapped_ram_size;
+	bool is_low = false;
+
+	if (!begin) {
+		probe_page_size_mask();
+		/* the ISA range is always mapped regardless of memory holes */
+		init_memory_mapping(0, ISA_END_ADDRESS);
+		begin = ISA_END_ADDRESS;
+		is_low = true;
+	}
 
-	probe_page_size_mask();
-
-#ifdef CONFIG_X86_64
-	end = max_pfn << PAGE_SHIFT;
-#else
-	end = max_low_pfn << PAGE_SHIFT;
-#endif
-
-	/* the ISA range is always mapped regardless of memory holes */
-	init_memory_mapping(0, ISA_END_ADDRESS);
+	if (begin >= end)
+		return;
 
 	/* xen has big range in reserved near end of ram, skip it at first.*/
-	addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE, PMD_SIZE);
+	addr = memblock_find_in_range(begin, end, PMD_SIZE, PMD_SIZE);
 	real_end = addr + PMD_SIZE;
 
 	/* step_size need to be small so pgt_buf from BRK could cover it */
 	step_size = PMD_SIZE;
-	max_pfn_mapped = 0; /* will get exact value next */
-	min_pfn_mapped = real_end >> PAGE_SHIFT;
+	local_max_pfn_mapped = begin >> PAGE_SHIFT;
+	local_min_pfn_mapped = real_end >> PAGE_SHIFT;
 	last_start = start = real_end;
 
 	/*
-	 * We start from the top (end of memory) and go to the bottom.
-	 * The memblock_find_in_range() gets us a block of RAM from the
-	 * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
-	 * for page table.
+	 * alloc_low_pages() will allocate pagetable pages in the following
+	 * order:
+	 *      BRK, local node, low range
+	 *
+	 * That means it will first use up all the BRK memory, then try to get
+	 * us a block of RAM from [local_min_pfn_mapped, local_max_pfn_mapped)
+	 * used as new pagetable pages. If no memory on the local node has
+	 * been mapped, it will allocate memory from
+	 * [low_min_pfn_mapped, low_max_pfn_mapped).
 	 */
-	while (last_start > ISA_END_ADDRESS) {
+	while (last_start > begin) {
 		if (last_start > step_size) {
 			start = round_down(last_start - 1, step_size);
-			if (start < ISA_END_ADDRESS)
-				start = ISA_END_ADDRESS;
+			if (start < begin)
+				start = begin;
 		} else
-			start = ISA_END_ADDRESS;
+			start = begin;
 		new_mapped_ram_size = init_range_memory_mapping(start,
 							last_start);
+		if ((last_start >> PAGE_SHIFT) > local_max_pfn_mapped)
+			local_max_pfn_mapped = last_start >> PAGE_SHIFT;
+		local_min_pfn_mapped = start >> PAGE_SHIFT;
 		last_start = start;
-		min_pfn_mapped = last_start >> PAGE_SHIFT;
 		/* only increase step_size after big range get mapped */
 		if (new_mapped_ram_size > mapped_ram_size)
 			step_size = get_new_step_size(step_size);
 		mapped_ram_size += new_mapped_ram_size;
 	}
 
-	if (real_end < end)
+	if (real_end < end) {
 		init_range_memory_mapping(real_end, end);
+		if ((end >> PAGE_SHIFT) > local_max_pfn_mapped)
+			local_max_pfn_mapped = end >> PAGE_SHIFT;
+	}
 
+	if (is_low) {
+		low_min_pfn_mapped = local_min_pfn_mapped;
+		low_max_pfn_mapped = local_max_pfn_mapped;
+	}
+}
+
+#ifndef CONFIG_NUMA
+void __init early_initmem_init(void)
+{
 #ifdef CONFIG_X86_64
-	if (max_pfn > max_low_pfn) {
-		/* can we preseve max_low_pfn ?*/
+	init_mem_mapping(0, max_pfn << PAGE_SHIFT);
+	if (max_pfn > max_low_pfn)
 		max_low_pfn = max_pfn;
-	}
 #else
+	init_mem_mapping(0, max_low_pfn << PAGE_SHIFT);
 	early_ioremap_page_table_range_init();
 #endif
 
@@ -481,11 +511,6 @@ void __init init_mem_mapping(void)
 
 	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
 }
-
-#ifndef CONFIG_NUMA
-void __init early_initmem_init(void)
-{
-}
 #endif
 
 /*
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 7d76936..9b18ee8 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -17,8 +17,10 @@
 #include <asm/dma.h>
 #include <asm/acpi.h>
 #include <asm/amd_nb.h>
+#include <asm/tlbflush.h>
 
 #include "numa_internal.h"
+#include "mm_internal.h"
 
 int __initdata numa_off;
 nodemask_t numa_nodes_parsed __initdata;
@@ -665,9 +667,31 @@ static void __init early_x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
+#ifdef CONFIG_X86_64
+static void __init early_x86_numa_init_mapping(void)
+{
+	init_mem_mapping(0, max_pfn << PAGE_SHIFT);
+	if (max_pfn > max_low_pfn)
+		max_low_pfn = max_pfn;
+}
+#else
+static void __init early_x86_numa_init_mapping(void)
+{
+	init_mem_mapping(0, max_low_pfn << PAGE_SHIFT);
+	early_ioremap_page_table_range_init();
+}
+#endif
+
 void __init early_initmem_init(void)
 {
 	early_x86_numa_init();
+
+	early_x86_numa_init_mapping();
+
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+
+	early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);
 }
 
 void __init x86_numa_init(void)
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 22/22] x86, mm, numa: Put pagetable on local node ram for 64bit
  2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (20 preceding siblings ...)
  2013-06-15  0:56 ` [PATCH v5 21/22] x86, mm: Make init_mem_mapping be able to be called several times Yinghai Lu
@ 2013-06-15  0:56 ` Yinghai Lu
  21 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2013-06-15  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Tang Chen, linux-kernel, Yinghai Lu,
	Pekka Enberg, Jacob Shin, Konrad Rzeszutek Wilk

If node with ram is hotplugable, local node mem for page table and vmemmap
should be on that node ram.

This patch is some kind of refreshment of
| commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327
| Date:   Mon Dec 27 16:48:17 2010 -0800
|
|    x86-64, numa: Put pgtable to local node memory
That was reverted before.

We have reason to reintroduce it to make memory hotplug work.

Calling init_mem_mapping in early_initmem_init for every node.
alloc_low_pages will alloc page table in following order:
	BRK, local node, low range
So page table will be on low range or local nodes.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 9b18ee8..5adf803 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -670,7 +670,39 @@ static void __init early_x86_numa_init(void)
 #ifdef CONFIG_X86_64
 static void __init early_x86_numa_init_mapping(void)
 {
-	init_mem_mapping(0, max_pfn << PAGE_SHIFT);
+	unsigned long last_start = 0, last_end = 0;
+	struct numa_meminfo *mi = &numa_meminfo;
+	unsigned long start, end;
+	int last_nid = -1;
+	int i, nid;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		nid   = mi->blk[i].nid;
+		start = mi->blk[i].start;
+		end   = mi->blk[i].end;
+
+		if (last_nid == nid) {
+			last_end = end;
+			continue;
+		}
+
+		/* other nid now */
+		if (last_nid >= 0) {
+			printk(KERN_DEBUG "Node %d: [mem %#016lx-%#016lx]\n",
+					last_nid, last_start, last_end - 1);
+			init_mem_mapping(last_start, last_end);
+		}
+
+		/* for next nid */
+		last_nid   = nid;
+		last_start = start;
+		last_end   = end;
+	}
+	/* last one */
+	printk(KERN_DEBUG "Node %d: [mem %#016lx-%#016lx]\n",
+			last_nid, last_start, last_end - 1);
+	init_mem_mapping(last_start, last_end);
+
 	if (max_pfn > max_low_pfn)
 		max_low_pfn = max_pfn;
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-06-15  0:56 ` [PATCH v5 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
@ 2013-06-17 23:19   ` Toshi Kani
  2013-06-17 23:36     ` Yinghai Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Toshi Kani @ 2013-06-17 23:19 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen, linux-kernel,
	Rafael J. Wysocki, Jacob Shin, Pekka Enberg, linux-acpi

On Fri, 2013-06-14 at 17:56 -0700, Yinghai Lu wrote:
> Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
> be used anymore.

Can you describe why max_low_pfn_mapped should not be used?  Is this to
allow moving the code of acpi_initrd_override() up before
init_mem_mapping() in a succeeding patch, or is there also another
reason behind it?  Also, I think arch_pfn_mapped should be pfn_mapped[].

Thanks,
-Toshi


> User should use arch_pfn_mapped or just 1UL<<(32-PAGE_SHIFT) instead.
> 
> Only user is ACPI_INITRD_TABLE_OVERRIDE, and it should not use that,
> as later accessing is using early_ioremap(). We could change to use
> 1U<<(32_PAGE_SHIFT) with it, aka under 4G.
> 
> -v2: Leave alone max_low_pfn_mapped in i915 code according to tj.
> 
> Suggested-by: H. Peter Anvin <hpa@zytor.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> Cc: Jacob Shin <jacob.shin@amd.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: linux-acpi@vger.kernel.org
> Tested-by: Thomas Renninger <trenn@suse.de>
> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
> ---
>  arch/x86/include/asm/page_types.h | 1 -
>  arch/x86/kernel/setup.c           | 4 +---
>  arch/x86/mm/init.c                | 4 ----
>  drivers/acpi/osl.c                | 6 +++---
>  4 files changed, 4 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
> index 54c9787..b012b82 100644
> --- a/arch/x86/include/asm/page_types.h
> +++ b/arch/x86/include/asm/page_types.h
> @@ -43,7 +43,6 @@
>  
>  extern int devmem_is_allowed(unsigned long pagenr);
>  
> -extern unsigned long max_low_pfn_mapped;
>  extern unsigned long max_pfn_mapped;
>  
>  static inline phys_addr_t get_max_mapped(void)
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 66ab495..6ca5f2c 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -112,13 +112,11 @@
>  #include <asm/prom.h>
>  
>  /*
> - * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> - * max_pfn_mapped:     highest direct mapped pfn over 4GB
> + * max_pfn_mapped:     highest direct mapped pfn
>   *
>   * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
>   * represented by pfn_mapped
>   */
> -unsigned long max_low_pfn_mapped;
>  unsigned long max_pfn_mapped;
>  
>  #ifdef CONFIG_DMI
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index eaac174..8554656 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -313,10 +313,6 @@ static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
>  	nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_X_MAX);
>  
>  	max_pfn_mapped = max(max_pfn_mapped, end_pfn);
> -
> -	if (start_pfn < (1UL<<(32-PAGE_SHIFT)))
> -		max_low_pfn_mapped = max(max_low_pfn_mapped,
> -					 min(end_pfn, 1UL<<(32-PAGE_SHIFT)));
>  }
>  
>  bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index e721863..93e3194 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -624,9 +624,9 @@ void __init acpi_initrd_override(void *data, size_t size)
>  	if (table_nr == 0)
>  		return;
>  
> -	acpi_tables_addr =
> -		memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
> -				       all_tables_size, PAGE_SIZE);
> +	/* under 4G at first, then above 4G */
> +	acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
> +					all_tables_size, PAGE_SIZE);
>  	if (!acpi_tables_addr) {
>  		WARN_ON(1);
>  		return;



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 04/22] x86, ACPI: Search buffer above 4G in second try for acpi override tables
  2013-06-15  0:56 ` [PATCH v5 04/22] x86, ACPI: Search buffer above 4G in second try for acpi override tables Yinghai Lu
@ 2013-06-17 23:22   ` Toshi Kani
  2013-06-17 23:38     ` Yinghai Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Toshi Kani @ 2013-06-17 23:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen, linux-kernel,
	Rafael J. Wysocki, linux-acpi

On Fri, 2013-06-14 at 17:56 -0700, Yinghai Lu wrote:
> Now we only search buffer for override acpi table under 4G.
> In some case, like user use memmap to exclude all low ram,
> we may not find range for it under 4G.
> 
> Do second try to search above 4G.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> Cc: linux-acpi@vger.kernel.org
> Tested-by: Thomas Renninger <trenn@suse.de>
> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
> ---
>  drivers/acpi/osl.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 93e3194..42c48fc 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -627,6 +627,10 @@ void __init acpi_initrd_override(void *data, size_t size)
>  	/* under 4G at first, then above 4G */
>  	acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
>  					all_tables_size, PAGE_SIZE);
> +	if (!acpi_tables_addr)
> +		acpi_tables_addr = memblock_find_in_range(0,
> +					~(phys_addr_t)0,
> +					all_tables_size, PAGE_SIZE);

Should this search start from 4G, instead of 0?

Thanks,
-Toshi


>  	if (!acpi_tables_addr) {
>  		WARN_ON(1);
>  		return;



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/22] x86, ACPI: Increase override tables number limit
  2013-06-15  0:56 ` [PATCH v5 05/22] x86, ACPI: Increase override tables number limit Yinghai Lu
@ 2013-06-17 23:35   ` Toshi Kani
  0 siblings, 0 replies; 31+ messages in thread
From: Toshi Kani @ 2013-06-17 23:35 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen, linux-kernel,
	Rafael J. Wysocki, linux-acpi

On Fri, 2013-06-14 at 17:56 -0700, Yinghai Lu wrote:
> Current acpi tables in initrd is limited to 10, that is too small.
> 64 should be good enough as we have 35 sigs and could have several
> SSDT.
> 
> Two problems in current code prevent us from increasing limit:
> 1. that cpio file info array is put in stack, as every element is 32
>    bytes, could run out of stack if we have that array size to 64.
>    We can move it out from stack, and make it as global and put it in
>    __initdata section.
> 2. early_ioremap only can remap 256k one time. Current code is mapping
>    10 tables one time. If we increase that limit, whole size could be
>    more than 256k, early_ioremap will fail with that.
>    We can map table one by one during copying, instead of mapping
>    all them one time.
> 
> -v2: According to tj, split it out to separated patch, also
>      rename array name to acpi_initrd_files.
> -v3: Add some comments about mapping table one by one during copying
>      per tj.
> 
> Signed-off-by: Yinghai <yinghai@kernel.org>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: linux-acpi@vger.kernel.org
> Acked-by: Tejun Heo <tj@kernel.org>
> Tested-by: Thomas Renninger <trenn@suse.de>
> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> Tested-by: Tang Chen <tangchen@cn.fujitsu.com>

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> ---
>  drivers/acpi/osl.c | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 42c48fc..c4ea2b7 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -569,8 +569,8 @@ static const char * const table_sigs[] = {
>  
>  #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>  
> -/* Must not increase 10 or needs code modification below */
> -#define ACPI_OVERRIDE_TABLES 10
> +#define ACPI_OVERRIDE_TABLES 64
> +static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
>  
>  void __init acpi_initrd_override(void *data, size_t size)
>  {
> @@ -579,7 +579,6 @@ void __init acpi_initrd_override(void *data, size_t size)
>  	struct acpi_table_header *table;
>  	char cpio_path[32] = "kernel/firmware/acpi/";
>  	struct cpio_data file;
> -	struct cpio_data early_initrd_files[ACPI_OVERRIDE_TABLES];
>  	char *p;
>  
>  	if (data == NULL || size == 0)
> @@ -617,8 +616,8 @@ void __init acpi_initrd_override(void *data, size_t size)
>  			table->signature, cpio_path, file.name, table->length);
>  
>  		all_tables_size += table->length;
> -		early_initrd_files[table_nr].data = file.data;
> -		early_initrd_files[table_nr].size = file.size;
> +		acpi_initrd_files[table_nr].data = file.data;
> +		acpi_initrd_files[table_nr].size = file.size;
>  		table_nr++;
>  	}
>  	if (table_nr == 0)
> @@ -648,14 +647,19 @@ void __init acpi_initrd_override(void *data, size_t size)
>  	memblock_reserve(acpi_tables_addr, all_tables_size);
>  	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>  
> -	p = early_ioremap(acpi_tables_addr, all_tables_size);
> -
> +	/*
> +	 * early_ioremap only can remap 256k one time. If we map all
> +	 * tables one time, we will hit the limit. Need to map table
> +	 * one by one during copying.
> +	 */
>  	for (no = 0; no < table_nr; no++) {
> -		memcpy(p + total_offset, early_initrd_files[no].data,
> -		       early_initrd_files[no].size);
> -		total_offset += early_initrd_files[no].size;
> +		phys_addr_t size = acpi_initrd_files[no].size;
> +
> +		p = early_ioremap(acpi_tables_addr + total_offset, size);
> +		memcpy(p, acpi_initrd_files[no].data, size);
> +		early_iounmap(p, size);
> +		total_offset += size;
>  	}
> -	early_iounmap(p, all_tables_size);
>  }
>  #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
>  



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-06-17 23:19   ` Toshi Kani
@ 2013-06-17 23:36     ` Yinghai Lu
  2013-06-17 23:55       ` Toshi Kani
  0 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2013-06-17 23:36 UTC (permalink / raw)
  To: Toshi Kani
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen,
	Linux Kernel Mailing List, Rafael J. Wysocki, Jacob Shin,
	Pekka Enberg, ACPI Devel Maling List

On Mon, Jun 17, 2013 at 4:19 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Fri, 2013-06-14 at 17:56 -0700, Yinghai Lu wrote:
>> Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
>> be used anymore.
>
> Can you describe why max_low_pfn_mapped should not be used?  Is this to
> allow moving the code of acpi_initrd_override() up before
> init_mem_mapping() in a succeeding patch, or is there also another
> reason behind it?  Also, I think arch_pfn_mapped should be pfn_mapped[].

ok, assumption that from [0, max_low_pfn_mapped) all mapped.

Now we only map the  RAM or KERNL_RESERVED, to prevent user from
taking it granted that only check it is below max_low_pfn_mapped to assume
it mapped.

Yes, is should be pfn_mapped[]

>
> Thanks,
> -Toshi
>
>
>> User should use arch_pfn_mapped or just 1UL<<(32-PAGE_SHIFT) instead.
>>
>> Only user is ACPI_INITRD_TABLE_OVERRIDE, and it should not use that,
>> as later accessing is using early_ioremap(). We could change to use
>> 1U<<(32_PAGE_SHIFT) with it, aka under 4G.
>>
>> -v2: Leave alone max_low_pfn_mapped in i915 code according to tj.
>>
>> Suggested-by: H. Peter Anvin <hpa@zytor.com>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
>> Cc: Jacob Shin <jacob.shin@amd.com>
>> Cc: Pekka Enberg <penberg@kernel.org>
>> Cc: linux-acpi@vger.kernel.org
>> Tested-by: Thomas Renninger <trenn@suse.de>
>> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
>> Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
>> ---
>>  arch/x86/include/asm/page_types.h | 1 -
>>  arch/x86/kernel/setup.c           | 4 +---
>>  arch/x86/mm/init.c                | 4 ----
>>  drivers/acpi/osl.c                | 6 +++---
>>  4 files changed, 4 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
>> index 54c9787..b012b82 100644
>> --- a/arch/x86/include/asm/page_types.h
>> +++ b/arch/x86/include/asm/page_types.h
>> @@ -43,7 +43,6 @@
>>
>>  extern int devmem_is_allowed(unsigned long pagenr);
>>
>> -extern unsigned long max_low_pfn_mapped;
>>  extern unsigned long max_pfn_mapped;
>>
>>  static inline phys_addr_t get_max_mapped(void)
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 66ab495..6ca5f2c 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -112,13 +112,11 @@
>>  #include <asm/prom.h>
>>
>>  /*
>> - * max_low_pfn_mapped: highest direct mapped pfn under 4GB
>> - * max_pfn_mapped:     highest direct mapped pfn over 4GB
>> + * max_pfn_mapped:     highest direct mapped pfn
>>   *
>>   * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
>>   * represented by pfn_mapped
>>   */
>> -unsigned long max_low_pfn_mapped;
>>  unsigned long max_pfn_mapped;
>>
>>  #ifdef CONFIG_DMI
>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>> index eaac174..8554656 100644
>> --- a/arch/x86/mm/init.c
>> +++ b/arch/x86/mm/init.c
>> @@ -313,10 +313,6 @@ static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
>>       nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_X_MAX);
>>
>>       max_pfn_mapped = max(max_pfn_mapped, end_pfn);
>> -
>> -     if (start_pfn < (1UL<<(32-PAGE_SHIFT)))
>> -             max_low_pfn_mapped = max(max_low_pfn_mapped,
>> -                                      min(end_pfn, 1UL<<(32-PAGE_SHIFT)));
>>  }
>>
>>  bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>> index e721863..93e3194 100644
>> --- a/drivers/acpi/osl.c
>> +++ b/drivers/acpi/osl.c
>> @@ -624,9 +624,9 @@ void __init acpi_initrd_override(void *data, size_t size)
>>       if (table_nr == 0)
>>               return;
>>
>> -     acpi_tables_addr =
>> -             memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
>> -                                    all_tables_size, PAGE_SIZE);
>> +     /* under 4G at first, then above 4G */
>> +     acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
>> +                                     all_tables_size, PAGE_SIZE);
>>       if (!acpi_tables_addr) {
>>               WARN_ON(1);
>>               return;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 04/22] x86, ACPI: Search buffer above 4G in second try for acpi override tables
  2013-06-17 23:22   ` Toshi Kani
@ 2013-06-17 23:38     ` Yinghai Lu
  2013-06-17 23:56       ` Toshi Kani
  0 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2013-06-17 23:38 UTC (permalink / raw)
  To: Toshi Kani
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen,
	Linux Kernel Mailing List, Rafael J. Wysocki,
	ACPI Devel Maling List

On Mon, Jun 17, 2013 at 4:22 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Fri, 2013-06-14 at 17:56 -0700, Yinghai Lu wrote:
>> Now we only search buffer for override acpi table under 4G.
>> In some case, like user use memmap to exclude all low ram,
>> we may not find range for it under 4G.
>>
>> Do second try to search above 4G.
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
>> Cc: linux-acpi@vger.kernel.org
>> Tested-by: Thomas Renninger <trenn@suse.de>
>> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
>> Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
>> ---
>>  drivers/acpi/osl.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>> index 93e3194..42c48fc 100644
>> --- a/drivers/acpi/osl.c
>> +++ b/drivers/acpi/osl.c
>> @@ -627,6 +627,10 @@ void __init acpi_initrd_override(void *data, size_t size)
>>       /* under 4G at first, then above 4G */
>>       acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
>>                                       all_tables_size, PAGE_SIZE);
>> +     if (!acpi_tables_addr)
>> +             acpi_tables_addr = memblock_find_in_range(0,
>> +                                     ~(phys_addr_t)0,
>> +                                     all_tables_size, PAGE_SIZE);
>
> Should this search start from 4G, instead of 0?

should be ok, as memblock searching is top-down.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-06-17 23:36     ` Yinghai Lu
@ 2013-06-17 23:55       ` Toshi Kani
  0 siblings, 0 replies; 31+ messages in thread
From: Toshi Kani @ 2013-06-17 23:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen,
	Linux Kernel Mailing List, Rafael J. Wysocki, Jacob Shin,
	Pekka Enberg, ACPI Devel Maling List

On Mon, 2013-06-17 at 16:36 -0700, Yinghai Lu wrote:
> On Mon, Jun 17, 2013 at 4:19 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> > On Fri, 2013-06-14 at 17:56 -0700, Yinghai Lu wrote:
> >> Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
> >> be used anymore.
> >
> > Can you describe why max_low_pfn_mapped should not be used?  Is this to
> > allow moving the code of acpi_initrd_override() up before
> > init_mem_mapping() in a succeeding patch, or is there also another
> > reason behind it?  Also, I think arch_pfn_mapped should be pfn_mapped[].
> 
> ok, assumption that from [0, max_low_pfn_mapped) all mapped.
> 
> Now we only map the  RAM or KERNL_RESERVED, to prevent user from
> taking it granted that only check it is below max_low_pfn_mapped to assume
> it mapped.

Oh, I see.  So, it is not necessary any more.

> Yes, is should be pfn_mapped[]

OK.

Please update the change log per the info you described here.  With
that:

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> >
> > Thanks,
> > -Toshi
> >
> >
> >> User should use arch_pfn_mapped or just 1UL<<(32-PAGE_SHIFT) instead.
> >>
> >> Only user is ACPI_INITRD_TABLE_OVERRIDE, and it should not use that,
> >> as later accessing is using early_ioremap(). We could change to use
> >> 1U<<(32_PAGE_SHIFT) with it, aka under 4G.
> >>
> >> -v2: Leave alone max_low_pfn_mapped in i915 code according to tj.
> >>
> >> Suggested-by: H. Peter Anvin <hpa@zytor.com>
> >> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> >> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> >> Cc: Jacob Shin <jacob.shin@amd.com>
> >> Cc: Pekka Enberg <penberg@kernel.org>
> >> Cc: linux-acpi@vger.kernel.org
> >> Tested-by: Thomas Renninger <trenn@suse.de>
> >> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> >> Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
> >> ---
> >>  arch/x86/include/asm/page_types.h | 1 -
> >>  arch/x86/kernel/setup.c           | 4 +---
> >>  arch/x86/mm/init.c                | 4 ----
> >>  drivers/acpi/osl.c                | 6 +++---
> >>  4 files changed, 4 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
> >> index 54c9787..b012b82 100644
> >> --- a/arch/x86/include/asm/page_types.h
> >> +++ b/arch/x86/include/asm/page_types.h
> >> @@ -43,7 +43,6 @@
> >>
> >>  extern int devmem_is_allowed(unsigned long pagenr);
> >>
> >> -extern unsigned long max_low_pfn_mapped;
> >>  extern unsigned long max_pfn_mapped;
> >>
> >>  static inline phys_addr_t get_max_mapped(void)
> >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >> index 66ab495..6ca5f2c 100644
> >> --- a/arch/x86/kernel/setup.c
> >> +++ b/arch/x86/kernel/setup.c
> >> @@ -112,13 +112,11 @@
> >>  #include <asm/prom.h>
> >>
> >>  /*
> >> - * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> >> - * max_pfn_mapped:     highest direct mapped pfn over 4GB
> >> + * max_pfn_mapped:     highest direct mapped pfn
> >>   *
> >>   * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
> >>   * represented by pfn_mapped
> >>   */
> >> -unsigned long max_low_pfn_mapped;
> >>  unsigned long max_pfn_mapped;
> >>
> >>  #ifdef CONFIG_DMI
> >> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> >> index eaac174..8554656 100644
> >> --- a/arch/x86/mm/init.c
> >> +++ b/arch/x86/mm/init.c
> >> @@ -313,10 +313,6 @@ static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
> >>       nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_X_MAX);
> >>
> >>       max_pfn_mapped = max(max_pfn_mapped, end_pfn);
> >> -
> >> -     if (start_pfn < (1UL<<(32-PAGE_SHIFT)))
> >> -             max_low_pfn_mapped = max(max_low_pfn_mapped,
> >> -                                      min(end_pfn, 1UL<<(32-PAGE_SHIFT)));
> >>  }
> >>
> >>  bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
> >> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> >> index e721863..93e3194 100644
> >> --- a/drivers/acpi/osl.c
> >> +++ b/drivers/acpi/osl.c
> >> @@ -624,9 +624,9 @@ void __init acpi_initrd_override(void *data, size_t size)
> >>       if (table_nr == 0)
> >>               return;
> >>
> >> -     acpi_tables_addr =
> >> -             memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
> >> -                                    all_tables_size, PAGE_SIZE);
> >> +     /* under 4G at first, then above 4G */
> >> +     acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
> >> +                                     all_tables_size, PAGE_SIZE);
> >>       if (!acpi_tables_addr) {
> >>               WARN_ON(1);
> >>               return;
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 04/22] x86, ACPI: Search buffer above 4G in second try for acpi override tables
  2013-06-17 23:38     ` Yinghai Lu
@ 2013-06-17 23:56       ` Toshi Kani
  0 siblings, 0 replies; 31+ messages in thread
From: Toshi Kani @ 2013-06-17 23:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen,
	Linux Kernel Mailing List, Rafael J. Wysocki,
	ACPI Devel Maling List

On Mon, 2013-06-17 at 16:38 -0700, Yinghai Lu wrote:
> On Mon, Jun 17, 2013 at 4:22 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> > On Fri, 2013-06-14 at 17:56 -0700, Yinghai Lu wrote:
> >> Now we only search buffer for override acpi table under 4G.
> >> In some case, like user use memmap to exclude all low ram,
> >> we may not find range for it under 4G.
> >>
> >> Do second try to search above 4G.
> >>
> >> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> >> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> >> Cc: linux-acpi@vger.kernel.org
> >> Tested-by: Thomas Renninger <trenn@suse.de>
> >> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> >> Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
> >> ---
> >>  drivers/acpi/osl.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> >> index 93e3194..42c48fc 100644
> >> --- a/drivers/acpi/osl.c
> >> +++ b/drivers/acpi/osl.c
> >> @@ -627,6 +627,10 @@ void __init acpi_initrd_override(void *data, size_t size)
> >>       /* under 4G at first, then above 4G */
> >>       acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
> >>                                       all_tables_size, PAGE_SIZE);
> >> +     if (!acpi_tables_addr)
> >> +             acpi_tables_addr = memblock_find_in_range(0,
> >> +                                     ~(phys_addr_t)0,
> >> +                                     all_tables_size, PAGE_SIZE);
> >
> > Should this search start from 4G, instead of 0?
> 
> should be ok, as memblock searching is top-down.

I see.  Thanks for the clarification.

Acked-by: Toshi Kani <toshi.kani@hp.com>

-Toshi


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 06/22] x86, ACPI: Split acpi_initrd_override to find/copy two functions
  2013-06-15  0:56 ` [PATCH v5 06/22] x86, ACPI: Split acpi_initrd_override to find/copy two functions Yinghai Lu
@ 2013-06-18  0:24   ` Toshi Kani
  0 siblings, 0 replies; 31+ messages in thread
From: Toshi Kani @ 2013-06-18  0:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Tejun Heo, Thomas Renninger, Tang Chen, linux-kernel,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki, linux-acpi

On Fri, 2013-06-14 at 17:56 -0700, Yinghai Lu wrote:
> To parse srat early, we need to move acpi table probing early.
> acpi_initrd_table_override is before acpi table probing. So we need to
> move it early too.
> 
> Current code acpi_initrd_table_override is after init_mem_mapping and
> relocate_initrd(), so it can scan initrd and copy acpi tables with kernel
> virtual address of initrd.
> Copying need to be after memblock is ready, because it need to allocate
> buffer for new acpi tables.
> 
> So we have to split that function to find and copy two functions.
> Find should be as early as possible. Copy should be after memblock is ready.
> 
> Finding could be done in head_32.S and head64.c, just like microcode
> early scanning. In head_32.S, it is 32bit flat mode, we don't
> need to set page table to access it. In head64.c, #PF set page table
> could help us access initrd with kernel low mapping address.
> 
> Copying could be done just after memblock is ready and before probing
> acpi tables, and we need to early_ioremap to access source and target
> range, as init_mem_mapping is not called yet.
> 
> While a dummy version of acpi_initrd_override() was defined when
> !CONFIG_ACPI_INITRD_TABLE_OVERRIDE, the prototype and dummy version
> were conditionalized inside CONFIG_ACPI.  This forced setup_arch() to
> have its own #ifdefs around acpi_initrd_override() as otherwise build
> would fail when !CONFIG_ACPI.  Move the prototypes and dummy
> implementations of the newly split functions below CONFIG_ACPI block
> in acpi.h so that we can do away with #ifdefs in its user.
> 
> -v2: Split one patch out according to tj.
>      also don't pass table_nr around.
> -v3: Add Tj's changelog about moving down to #idef in acpi.h to
>      avoid #idef in setup.c
> 
> Signed-off-by: Yinghai <yinghai@kernel.org>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Jacob Shin <jacob.shin@amd.com>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: linux-acpi@vger.kernel.org
> Acked-by: Tejun Heo <tj@kernel.org>
> Tested-by: Thomas Renninger <trenn@suse.de>
> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> Tested-by: Tang Chen <tangchen@cn.fujitsu.com>

Looks good to me.

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2013-06-18  0:24 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-15  0:56 [PATCH v5 00/22] x86, ACPI, numa: Parse numa info early Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 01/22] x86: Change get_ramdisk_image() to global Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 02/22] x86, microcode: Use common get_ramdisk_image() Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
2013-06-17 23:19   ` Toshi Kani
2013-06-17 23:36     ` Yinghai Lu
2013-06-17 23:55       ` Toshi Kani
2013-06-15  0:56 ` [PATCH v5 04/22] x86, ACPI: Search buffer above 4G in second try for acpi override tables Yinghai Lu
2013-06-17 23:22   ` Toshi Kani
2013-06-17 23:38     ` Yinghai Lu
2013-06-17 23:56       ` Toshi Kani
2013-06-15  0:56 ` [PATCH v5 05/22] x86, ACPI: Increase override tables number limit Yinghai Lu
2013-06-17 23:35   ` Toshi Kani
2013-06-15  0:56 ` [PATCH v5 06/22] x86, ACPI: Split acpi_initrd_override to find/copy two functions Yinghai Lu
2013-06-18  0:24   ` Toshi Kani
2013-06-15  0:56 ` [PATCH v5 07/22] x86, ACPI: Store override acpi tables phys addr in cpio files info array Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 08/22] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 09/22] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 10/22] x86, mm, numa: Move two functions calling on successful path later Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 11/22] x86, mm, numa: Call numa_meminfo_cover_memory() checking early Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 12/22] x86, mm, numa: Move node_map_pfn alignment() to x86 Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 13/22] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 14/22] x86, mm, numa: Set memblock nid later Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 15/22] x86, mm, numa: Move node_possible_map setting later Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 16/22] x86, mm, numa: Move emulation handling down Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 17/22] x86, ACPI, numa, ia64: split SLIT handling out Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 18/22] x86, mm, numa: Add early_initmem_init() stub Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 19/22] x86, mm: Parse numa info early Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 20/22] x86, mm: Add comments for step_size shift Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 21/22] x86, mm: Make init_mem_mapping be able to be called several times Yinghai Lu
2013-06-15  0:56 ` [PATCH v5 22/22] x86, mm, numa: Put pagetable on local node ram for 64bit Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).