linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/14] x86, ACPI, numa: Parse numa info early
@ 2013-03-08  4:58 Yinghai Lu
  2013-03-08  4:58 ` [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
                   ` (13 more replies)
  0 siblings, 14 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu

One commit that tried to parse SRAT early get reverted before v3.9-rc1.

| commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f
| Author: Tang Chen <tangchen@cn.fujitsu.com>
| Date:   Fri Feb 22 16:33:44 2013 -0800
|
|    acpi, memory-hotplug: parse SRAT before memblock is ready

It broke several several things, like acpi override and fall back path etc.

This patchset is clean implementation that will parse numa info early.
1. keep the acpi table initrd override working by split finding with copying.
   finding is done at head_32.S and head64.c stage, that mimics microcode updating.
   copying is just after memblock is setup.
2. keep fallback path working. numaq and ACPI and amd_nmua and dummy.
   seperate initmem_init to two stages. early_initmem_init will only extract
   numa info early into numa_meminfo.
3. keep other old code flow untouched like relocate_initrd and initmem_init.
4. last patch will try to put page table on local node, so that memory
   hotplug will be happy.

In short, early_initmem_init will parse numa info early and call
init_mem_mapping to set page table for every nodes's mem.

could be found at:
        git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-mm

and it is based on today's Linus tree that merges x86_urgent.

Tested on x86 32bit non-numa/numa, 64bit non-numa/numa configurations.

Thanks

Yinghai

Yinghai Lu (14):
  x86, ACPI, mm: Kill max_low_pfn_mapped
  x86, ACPI: Split find/copy from acpi_initrd_override
  x86, ACPI: store override acpi tables phys addr
  x86, ACPI: make acpi override finding work with 32bit flat mode
  x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c
  x86, mm, numa: Move successful path handling code later
  x86, mm, numa: call numa_meminfo_cover_memory() early
  x86, mm, numa: use numa_meminfo to check node_map_pfn alignment
  x86, mm, numa: set memblock nid later
  x86, mm, numa: Move emulation handling down.
  x86, acpi, numa: split SLIT handling out
  x86, mm, numa: Add early_initmem_init() stub
  x86, mm: Parse numa info early
  x86, mm: Put pagetable on local node ram

 arch/x86/include/asm/acpi.h            |    3 +-
 arch/x86/include/asm/page_types.h      |    2 +-
 arch/x86/include/asm/pgtable.h         |    2 +-
 arch/x86/include/asm/setup.h           |    2 +
 arch/x86/kernel/head64.c               |    2 +
 arch/x86/kernel/head_32.S              |    4 +
 arch/x86/kernel/setup.c                |   56 +++++---
 arch/x86/mm/init.c                     |   87 ++++++------
 arch/x86/mm/init_32.c                  |   11 ++
 arch/x86/mm/init_64.c                  |   12 ++
 arch/x86/mm/numa.c                     |  236 +++++++++++++++++++++++++-------
 arch/x86/mm/numa_emulation.c           |    2 +-
 arch/x86/mm/numa_internal.h            |    2 +
 arch/x86/mm/srat.c                     |    8 +-
 drivers/acpi/numa.c                    |   22 ++-
 drivers/acpi/osl.c                     |  122 +++++++++++------
 drivers/gpu/drm/i915/i915_gem_stolen.c |    2 +-
 include/linux/acpi.h                   |   18 +--
 include/linux/mm.h                     |    3 -
 mm/page_alloc.c                        |   52 +------
 20 files changed, 425 insertions(+), 223 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  5:10   ` Tejun Heo
  2013-03-08  4:58 ` [PATCH 02/14] x86, ACPI: Split find/copy from acpi_initrd_override Yinghai Lu
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Rafael J. Wysocki, Daniel Vetter,
	David Airlie, Jacob Shin, linux-acpi, dri-devel

Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
be used anymore.

Only user is ACPI_OVERRIDE, and it should not use that, as later
accessing is using early_remap. Change to try to 4G below and
then 4G above.

Other user is in drm/i915, but it is commented out.

Should use arch_pfn_mapped or just 1<<(32-PAGE_SHIFT) instead.

Suggested-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Thomas Renninger <trenn@suse.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: linux-acpi@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
---
 arch/x86/include/asm/page_types.h      |    1 -
 arch/x86/kernel/setup.c                |    4 +---
 arch/x86/mm/init.c                     |    4 ----
 drivers/acpi/osl.c                     |    9 ++++++---
 drivers/gpu/drm/i915/i915_gem_stolen.c |    2 +-
 5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 54c9787..b012b82 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -43,7 +43,6 @@
 
 extern int devmem_is_allowed(unsigned long pagenr);
 
-extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
 static inline phys_addr_t get_max_mapped(void)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 90d8cc9..4dcaae7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -113,13 +113,11 @@
 #include <asm/prom.h>
 
 /*
- * max_low_pfn_mapped: highest direct mapped pfn under 4GB
- * max_pfn_mapped:     highest direct mapped pfn over 4GB
+ * max_pfn_mapped:     highest direct mapped pfn
  *
  * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
  * represented by pfn_mapped
  */
-unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
 #ifdef CONFIG_DMI
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 59b7fc4..abcc241 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -313,10 +313,6 @@ static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
 	nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_X_MAX);
 
 	max_pfn_mapped = max(max_pfn_mapped, end_pfn);
-
-	if (start_pfn < (1UL<<(32-PAGE_SHIFT)))
-		max_low_pfn_mapped = max(max_low_pfn_mapped,
-					 min(end_pfn, 1UL<<(32-PAGE_SHIFT)));
 }
 
 bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 586e7e9..c9e36d7 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -624,9 +624,12 @@ void __init acpi_initrd_override(void *data, size_t size)
 	if (table_nr == 0)
 		return;
 
-	acpi_tables_addr =
-		memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
-				       all_tables_size, PAGE_SIZE);
+	/* under 4G at first, then above 4G */
+	acpi_tables_addr = memblock_find_in_range(0, 1ULL<<32,
+					all_tables_size, PAGE_SIZE);
+	if (!acpi_tables_addr)
+		acpi_tables_addr = memblock_find_in_range(1ULL<<32, -1ULL,
+					all_tables_size, PAGE_SIZE);
 	if (!acpi_tables_addr) {
 		WARN_ON(1);
 		return;
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 69d97cb..7f9380b 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -81,7 +81,7 @@ static unsigned long i915_stolen_to_physical(struct drm_device *dev)
 		base -= dev_priv->mm.gtt->stolen_size;
 	} else {
 		/* Stolen is immediately above Top of Memory */
-		base = max_low_pfn_mapped << PAGE_SHIFT;
+		base = __REMOVED_CRAZY__ << PAGE_SHIFT;
 #endif
 	}
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/14] x86, ACPI: Split find/copy from acpi_initrd_override
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
  2013-03-08  4:58 ` [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  5:33   ` Tejun Heo
  2013-03-08  4:58 ` [PATCH 03/14] x86, ACPI: store override acpi tables phys addr Yinghai Lu
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin,
	Rafael J. Wysocki, linux-acpi

To parse srat early, we will need to move acpi table probing early.
and to keep acpi_initrd_table_override working, we need to move it
ahead.

But current that is called after init_mem_mapping and relocate_initrd().

Copying need to be after memblock is ready, because it need to allocate
some buffer for acpi tables.

Finding will be moved into head_32.S and head64.c, just like microcode
early scanning.

So split them at first.

Also move down functions declaration to avoid #ifdef in setup.c

Signed-off-by: Yinghai <yinghai@kernel.org>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
---
 arch/x86/kernel/setup.c |    6 +++---
 drivers/acpi/osl.c      |   32 +++++++++++++++++++-------------
 include/linux/acpi.h    |   16 ++++++++--------
 3 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 4dcaae7..e2913e9 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1092,9 +1092,9 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_initrd();
 
-#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start);
-#endif
+	acpi_initrd_override_find((void *)initrd_start,
+					initrd_end - initrd_start);
+	acpi_initrd_override_copy();
 
 	reserve_crashkernel();
 
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index c9e36d7..b9d2ff0 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -539,6 +539,7 @@ acpi_os_predefined_override(const struct acpi_predefined_names *init_val,
 
 static u64 acpi_tables_addr;
 static int all_tables_size;
+static int table_nr;
 
 /* Copied from acpica/tbutils.c:acpi_tb_checksum() */
 u8 __init acpi_table_checksum(u8 *buffer, u32 length)
@@ -569,18 +570,16 @@ static const char * const table_sigs[] = {
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
-/* Must not increase 10 or needs code modification below */
-#define ACPI_OVERRIDE_TABLES 10
+#define ACPI_OVERRIDE_TABLES 64
+static struct cpio_data __initdata early_initrd_files[ACPI_OVERRIDE_TABLES];
 
-void __init acpi_initrd_override(void *data, size_t size)
+void __init acpi_initrd_override_find(void *data, size_t size)
 {
-	int sig, no, table_nr = 0, total_offset = 0;
+	int sig, no;
 	long offset = 0;
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
 	struct cpio_data file;
-	struct cpio_data early_initrd_files[ACPI_OVERRIDE_TABLES];
-	char *p;
 
 	if (data == NULL || size == 0)
 		return;
@@ -621,7 +620,14 @@ void __init acpi_initrd_override(void *data, size_t size)
 		early_initrd_files[table_nr].size = file.size;
 		table_nr++;
 	}
-	if (table_nr == 0)
+}
+
+void __init acpi_initrd_override_copy(void)
+{
+	int no, total_offset = 0;
+	char *p;
+
+	if (!table_nr)
 		return;
 
 	/* under 4G at first, then above 4G */
@@ -647,14 +653,14 @@ void __init acpi_initrd_override(void *data, size_t size)
 	memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
 	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
-	p = early_ioremap(acpi_tables_addr, all_tables_size);
-
 	for (no = 0; no < table_nr; no++) {
-		memcpy(p + total_offset, early_initrd_files[no].data,
-		       early_initrd_files[no].size);
-		total_offset += early_initrd_files[no].size;
+		size_t size = early_initrd_files[no].size;
+
+		p = early_ioremap(acpi_tables_addr + total_offset, size);
+		memcpy(p, early_initrd_files[no].data, size);
+		early_iounmap(p, size);
+		total_offset += size;
 	}
-	early_iounmap(p, all_tables_size);
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index bcbdd74..1654a241 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -79,14 +79,6 @@ typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table);
 typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 				      const unsigned long end);
 
-#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
-void acpi_initrd_override(void *data, size_t size);
-#else
-static inline void acpi_initrd_override(void *data, size_t size)
-{
-}
-#endif
-
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
@@ -485,6 +477,14 @@ static inline bool acpi_driver_match_device(struct device *dev,
 
 #endif	/* !CONFIG_ACPI */
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void acpi_initrd_override_find(void *data, size_t size);
+void acpi_initrd_override_copy(void);
+#else
+static inline void acpi_initrd_override_find(void *data, size_t size) { }
+static inline void acpi_initrd_override_copy(void) { }
+#endif
+
 #ifdef CONFIG_ACPI
 void acpi_os_set_prepare_sleep(int (*func)(u8 sleep_state,
 			       u32 pm1a_ctrl,  u32 pm1b_ctrl));
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/14] x86, ACPI: store override acpi tables phys addr
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
  2013-03-08  4:58 ` [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
  2013-03-08  4:58 ` [PATCH 02/14] x86, ACPI: Split find/copy from acpi_initrd_override Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  5:36   ` Tejun Heo
  2013-03-08  4:58 ` [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode Yinghai Lu
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Rafael J. Wysocki, linux-acpi

As later 32bit only find table with phys address during 32bit flat mode
in head_32.S.

To keep 32bit and 64 bit consistent, use phys_addr for all.

Use early_ioremap to access during copying.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
---
 drivers/acpi/osl.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index b9d2ff0..60317ea 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -616,7 +616,7 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 			table->signature, cpio_path, file.name, table->length);
 
 		all_tables_size += table->length;
-		early_initrd_files[table_nr].data = file.data;
+		early_initrd_files[table_nr].data = (void *)__pa(file.data);
 		early_initrd_files[table_nr].size = file.size;
 		table_nr++;
 	}
@@ -625,7 +625,7 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 void __init acpi_initrd_override_copy(void)
 {
 	int no, total_offset = 0;
-	char *p;
+	char *p, *q;
 
 	if (!table_nr)
 		return;
@@ -654,10 +654,13 @@ void __init acpi_initrd_override_copy(void)
 	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
 	for (no = 0; no < table_nr; no++) {
-		size_t size = early_initrd_files[no].size;
+		unsigned long size = early_initrd_files[no].size;
 
 		p = early_ioremap(acpi_tables_addr + total_offset, size);
-		memcpy(p, early_initrd_files[no].data, size);
+		q = early_ioremap((unsigned long)early_initrd_files[no].data,
+					 size);
+		memcpy(p, q, size);
+		early_iounmap(q, size);
 		early_iounmap(p, size);
 		total_offset += size;
 	}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (2 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 03/14] x86, ACPI: store override acpi tables phys addr Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  5:50   ` Tejun Heo
  2013-03-08  4:58 ` [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c Yinghai Lu
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin,
	Rafael J. Wysocki, linux-acpi

We will find acpi tables in initrd during head_32.S in 32bit flat mode.

So need acpi_initrd_override_find could take phys directly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
---
 arch/x86/kernel/setup.c |    2 +-
 drivers/acpi/osl.c      |   84 +++++++++++++++++++++++++++++++----------------
 include/linux/acpi.h    |    4 +--
 3 files changed, 58 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e2913e9..668e658 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1093,7 +1093,7 @@ void __init setup_arch(char **cmdline_p)
 	reserve_initrd();
 
 	acpi_initrd_override_find((void *)initrd_start,
-					initrd_end - initrd_start);
+					initrd_end - initrd_start, false);
 	acpi_initrd_override_copy();
 
 	reserve_crashkernel();
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 60317ea..b375159 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -552,38 +552,47 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
 	return sum;
 }
 
-/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
-static const char * const table_sigs[] = {
-	ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
-	ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
-	ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
-	ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
-	ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
-	ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
-	ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
-	ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
-
 /* Non-fatal errors: Affected tables/files are ignored */
 #define INVALID_TABLE(x, path, name)					\
-	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
+	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
 #define ACPI_OVERRIDE_TABLES 64
 static struct cpio_data __initdata early_initrd_files[ACPI_OVERRIDE_TABLES];
 
-void __init acpi_initrd_override_find(void *data, size_t size)
+void __init acpi_initrd_override_find(void *data, size_t size, bool is_phys)
 {
 	int sig, no;
 	long offset = 0;
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
 	struct cpio_data file;
+	struct cpio_data *files = early_initrd_files;
+	int *all_tables_size_p = &all_tables_size;
+	int *table_nr_p = &table_nr;
+
+	/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
+	char *table_sigs[] = {
+		ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
+		ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
+		ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
+		ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
+		ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
+		ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
+		ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
+		ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
+		ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
 	if (data == NULL || size == 0)
 		return;
 
+	if (is_phys) {
+		files = (struct cpio_data *)__pa_symbol(early_initrd_files);
+		all_tables_size_p = (int *)__pa_symbol(&all_tables_size);
+		table_nr_p = (int *)__pa_symbol(&table_nr);
+	}
+
 	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
 		file = find_cpio_data(cpio_path, data, size, &offset);
 		if (!file.data)
@@ -592,9 +601,12 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header))
-			INVALID_TABLE("Table smaller than ACPI header",
+		if (file.size < sizeof(struct acpi_table_header)) {
+			if (!is_phys)
+				INVALID_TABLE("Table smaller than ACPI header",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		table = file.data;
 
@@ -602,23 +614,34 @@ void __init acpi_initrd_override_find(void *data, size_t size)
 			if (!memcmp(table->signature, table_sigs[sig], 4))
 				break;
 
-		if (!table_sigs[sig])
-			INVALID_TABLE("Unknown signature",
+		if (!table_sigs[sig]) {
+			if (!is_phys)
+				 INVALID_TABLE("Unknown signature",
 				      cpio_path, file.name);
-		if (file.size != table->length)
-			INVALID_TABLE("File length does not match table length",
+			continue;
+		}
+		if (file.size != table->length) {
+			if (!is_phys)
+				INVALID_TABLE("File length does not match table length",
 				      cpio_path, file.name);
-		if (acpi_table_checksum(file.data, table->length))
-			INVALID_TABLE("Bad table checksum",
+			continue;
+		}
+		if (acpi_table_checksum(file.data, table->length)) {
+			if (!is_phys)
+				INVALID_TABLE("Bad table checksum",
 				      cpio_path, file.name);
+			continue;
+		}
 
-		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
+		if (!is_phys)
+			pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
 
-		all_tables_size += table->length;
-		early_initrd_files[table_nr].data = (void *)__pa(file.data);
-		early_initrd_files[table_nr].size = file.size;
-		table_nr++;
+		(*all_tables_size_p) += table->length;
+		files[*table_nr_p].data = is_phys ?
+					    file.data : (void *)__pa(file.data);
+		files[*table_nr_p].size = file.size;
+		(*table_nr_p)++;
 	}
 }
 
@@ -654,11 +677,14 @@ void __init acpi_initrd_override_copy(void)
 	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
 	for (no = 0; no < table_nr; no++) {
+		unsigned long phys_addr = (unsigned long)early_initrd_files[no].data;
 		unsigned long size = early_initrd_files[no].size;
 
+		q = early_ioremap(phys_addr, size);
+		pr_info("%4.4s ACPI table found in initrd [%#010lx-%#010lx]\n",
+				((struct acpi_table_header *)q)->signature,
+				phys_addr, phys_addr + size - 1);
 		p = early_ioremap(acpi_tables_addr + total_offset, size);
-		q = early_ioremap((unsigned long)early_initrd_files[no].data,
-					 size);
 		memcpy(p, q, size);
 		early_iounmap(q, size);
 		early_iounmap(p, size);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 1654a241..46a8a89 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -478,10 +478,10 @@ static inline bool acpi_driver_match_device(struct device *dev,
 #endif	/* !CONFIG_ACPI */
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
-void acpi_initrd_override_find(void *data, size_t size);
+void acpi_initrd_override_find(void *data, size_t size, bool is_phys);
 void acpi_initrd_override_copy(void);
 #else
-static inline void acpi_initrd_override_find(void *data, size_t size) { }
+static inline void acpi_initrd_override_find(void *data, size_t size, bool is_phys) { }
 static inline void acpi_initrd_override_copy(void) { }
 #endif
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (3 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  5:57   ` Tejun Heo
  2013-03-08  4:58 ` [PATCH 06/14] x86, mm, numa: Move successful path handling code later Yinghai Lu
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin,
	Rafael J. Wysocki, linux-acpi

head64.c could use #PF handler set page table to access initrd before
mapping and relocating.

head_32.S could use 32bit flat mode to access initrd before mapping
and relocating.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
---
 arch/x86/include/asm/setup.h |    2 ++
 arch/x86/kernel/head64.c     |    2 ++
 arch/x86/kernel/head_32.S    |    4 ++++
 arch/x86/kernel/setup.c      |   28 ++++++++++++++++++++++++++--
 4 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..b09db26 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -42,6 +42,8 @@ extern void visws_early_detect(void);
 static inline void visws_early_detect(void) { }
 #endif
 
+void x86_acpi_override_find(void);
+
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index c5e403f..a31bc63 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -174,6 +174,8 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
+	x86_acpi_override_find();
+
 	clear_page(init_level4_pgt);
 	/* set init_level4_pgt kernel high mapping*/
 	init_level4_pgt[511] = early_level4_pgt[511];
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 73afd11..ca08f0e 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -149,6 +149,10 @@ ENTRY(startup_32)
 	call load_ucode_bsp
 #endif
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+	call x86_acpi_override_find
+#endif
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond __brk_base.  The variable
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 668e658..d43545a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -424,6 +424,32 @@ static void __init reserve_initrd(void)
 }
 #endif /* CONFIG_BLK_DEV_INITRD */
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void __init x86_acpi_override_find(void)
+{
+	unsigned long ramdisk_image, ramdisk_size;
+	unsigned char *p = NULL;
+
+#ifdef CONFIG_X86_32
+	struct boot_params *boot_params_p;
+
+	boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
+	ramdisk_image = boot_params_p->hdr.ramdisk_image;
+	ramdisk_size  = boot_params_p->hdr.ramdisk_size;
+	p = (unsigned char *)ramdisk_image;
+	acpi_initrd_override_find(p, ramdisk_size, true);
+#else
+	ramdisk_image = get_ramdisk_image();
+	ramdisk_size  = get_ramdisk_size();
+	if (ramdisk_image)
+		p = __va(ramdisk_image);
+	acpi_initrd_override_find(p, ramdisk_size, false);
+#endif
+}
+#else
+void __init x86_acpi_override_find(void) { }
+#endif
+
 static void __init parse_setup_data(void)
 {
 	struct setup_data *data;
@@ -1092,8 +1118,6 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_initrd();
 
-	acpi_initrd_override_find((void *)initrd_start,
-					initrd_end - initrd_start, false);
 	acpi_initrd_override_copy();
 
 	reserve_crashkernel();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/14] x86, mm, numa: Move successful path handling code later
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (4 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  6:04   ` Tejun Heo
  2013-03-08  4:58 ` [PATCH 07/14] x86, mm, numa: call numa_meminfo_cover_memory() early Yinghai Lu
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Tejun Heo

We could move setup_node_data() and numa_init_array() calling out
numa_init() to make numa_init() small.

Those functions only need to be called for success path, and only
call them one time in x86_numa_init().

So later we could split parse numa info to two stages.
early one will be before init_mem_mapping.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/numa.c |   68 ++++++++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 72fe01e..24c20f0 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -480,7 +480,7 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
 	unsigned long uninitialized_var(pfn_align);
-	int i, nid;
+	int i;
 
 	/* Account for nodes with cpus and no memory */
 	node_possible_map = numa_nodes_parsed;
@@ -509,24 +509,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
-	/* Finally register nodes. */
-	for_each_node_mask(nid, node_possible_map) {
-		u64 start = PFN_PHYS(max_pfn);
-		u64 end = 0;
-
-		for (i = 0; i < mi->nr_blks; i++) {
-			if (nid != mi->blk[i].nid)
-				continue;
-			start = min(mi->blk[i].start, start);
-			end = max(mi->blk[i].end, end);
-		}
-
-		if (start < end)
-			setup_node_data(nid, start, end);
-	}
-
-	/* Dump memblock with node info and return. */
-	memblock_dump_all();
 	return 0;
 }
 
@@ -580,15 +562,6 @@ static int __init numa_init(int (*init_func)(void))
 	if (ret < 0)
 		return ret;
 
-	for (i = 0; i < nr_cpu_ids; i++) {
-		int nid = early_cpu_to_node(i);
-
-		if (nid == NUMA_NO_NODE)
-			continue;
-		if (!node_online(nid))
-			numa_clear_node(i);
-	}
-	numa_init_array();
 	return 0;
 }
 
@@ -623,22 +596,55 @@ static int __init dummy_numa_init(void)
  */
 void __init x86_numa_init(void)
 {
+	int i, nid;
+	struct numa_meminfo *mi = &numa_meminfo;
+
 	if (!numa_off) {
 #ifdef CONFIG_X86_NUMAQ
 		if (!numa_init(numaq_numa_init))
-			return;
+			goto out;
 #endif
 #ifdef CONFIG_ACPI_NUMA
 		if (!numa_init(x86_acpi_numa_init))
-			return;
+			goto out;
 #endif
 #ifdef CONFIG_AMD_NUMA
 		if (!numa_init(amd_numa_init))
-			return;
+			goto out;
 #endif
 	}
 
 	numa_init(dummy_numa_init);
+
+out:
+	/* Finally register nodes. */
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = PFN_PHYS(max_pfn);
+		u64 end = 0;
+
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
+				continue;
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
+		}
+
+		if (start < end)
+			setup_node_data(nid, start, end);
+	}
+
+	/* Dump memblock with node info */
+	memblock_dump_all();
+
+	for (i = 0; i < nr_cpu_ids; i++) {
+		int nid = early_cpu_to_node(i);
+
+		if (nid == NUMA_NO_NODE)
+			continue;
+		if (!node_online(nid))
+			numa_clear_node(i);
+	}
+	numa_init_array();
 }
 
 static __init int find_near_online_node(int node)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/14] x86, mm, numa: call numa_meminfo_cover_memory() early
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (5 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 06/14] x86, mm, numa: Move successful path handling code later Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  4:58 ` [PATCH 08/14] x86, mm, numa: use numa_meminfo to check node_map_pfn alignment Yinghai Lu
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Tejun Heo

We do not need to use nid in memblock to find out absent pages.

So could move that numa_meminfo_cover_memory() early before set
memblock nid.

Also could make __absent_pages_in_range() to static and use
absent_pages_in_range() directly.

Later will only set memblock nid one time on successful path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/numa.c |    7 ++++---
 include/linux/mm.h |    2 --
 mm/page_alloc.c    |    2 +-
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 24c20f0..6df5028 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -460,7 +460,7 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 		u64 s = mi->blk[i].start >> PAGE_SHIFT;
 		u64 e = mi->blk[i].end >> PAGE_SHIFT;
 		numaram += e - s;
-		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+		numaram -= absent_pages_in_range(s, e);
 		if ((s64)numaram < 0)
 			numaram = 0;
 	}
@@ -488,6 +488,9 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	if (WARN_ON(nodes_empty(node_possible_map)))
 		return -EINVAL;
 
+	if (!numa_meminfo_cover_memory(mi))
+		return -EINVAL;
+
 	for (i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *mb = &mi->blk[i];
 		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
@@ -506,8 +509,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 		return -EINVAL;
 	}
 #endif
-	if (!numa_meminfo_cover_memory(mi))
-		return -EINVAL;
 
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7acc9dc..2ae2050 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1324,8 +1324,6 @@ extern void free_initmem(void);
  */
 extern void free_area_init_nodes(unsigned long *max_zone_pfn);
 unsigned long node_map_pfn_alignment(void);
-unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
-						unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
 						unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..580d919 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4356,7 +4356,7 @@ static unsigned long __meminit zone_spanned_pages_in_node(int nid,
  * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
  * then all holes in the requested range will be accounted for.
  */
-unsigned long __meminit __absent_pages_in_range(int nid,
+static unsigned long __meminit __absent_pages_in_range(int nid,
 				unsigned long range_start_pfn,
 				unsigned long range_end_pfn)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/14] x86, mm, numa: use numa_meminfo to check node_map_pfn alignment
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (6 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 07/14] x86, mm, numa: call numa_meminfo_cover_memory() early Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  6:26   ` Tejun Heo
  2013-03-08  4:58 ` [PATCH 09/14] x86, mm, numa: set memblock nid later Yinghai Lu
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Tejun Heo

We could use numa_meminfo directly instead of memblock nid.

So we could move down set memblock nid down and only do it one time
for successful path

Move node_map_pfn_alignment() to arch/x86/mm as no other user for it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/numa.c |   76 +++++++++++++++++++++++++++++++++++++++++++++-------
 include/linux/mm.h |    1 -
 mm/page_alloc.c    |   50 ----------------------------------
 3 files changed, 67 insertions(+), 60 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 6df5028..b8cc248 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -477,9 +477,69 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 	return true;
 }
 
+/**
+ * node_map_pfn_alignment - determine the maximum internode alignment
+ *
+ * This function should be called after node map is populated and sorted.
+ * It calculates the maximum power of two alignment which can distinguish
+ * all the nodes.
+ *
+ * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
+ * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
+ * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
+ * shifted, 1GiB is enough and this function will indicate so.
+ *
+ * This is used to test whether pfn -> nid mapping of the chosen memory
+ * model has fine enough granularity to avoid incorrect mapping for the
+ * populated node map.
+ *
+ * Returns the determined alignment in pfn's.  0 if there is no alignment
+ * requirement (single node).
+ */
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
+{
+	unsigned long accl_mask = 0, last_end = 0;
+	unsigned long start, end, mask;
+	int last_nid = -1;
+	int i, nid;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		start = mi->blk[i].start >> PAGE_SHIFT;
+		end = mi->blk[i].end >> PAGE_SHIFT;
+		nid = mi->blk[i].nid;
+		if (!start || last_nid < 0 || last_nid == nid) {
+			last_nid = nid;
+			last_end = end;
+			continue;
+		}
+
+		/*
+		 * Start with a mask granular enough to pin-point to the
+		 * start pfn and tick off bits one-by-one until it becomes
+		 * too coarse to separate the current node from the last.
+		 */
+		mask = ~((1 << __ffs(start)) - 1);
+		while (mask && last_end <= (start & (mask << 1)))
+			mask <<= 1;
+
+		/* accumulate all internode masks */
+		accl_mask |= mask;
+	}
+
+	/* convert mask to number of pages */
+	return ~accl_mask + 1;
+}
+#else
+static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
+{
+	return 0;
+}
+#endif
+
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
-	unsigned long uninitialized_var(pfn_align);
+	unsigned long pfn_align;
 	int i;
 
 	/* Account for nodes with cpus and no memory */
@@ -491,24 +551,22 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
-	for (i = 0; i < mi->nr_blks; i++) {
-		struct numa_memblk *mb = &mi->blk[i];
-		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-	}
-
 	/*
 	 * If sections array is gonna be used for pfn -> nid mapping, check
 	 * whether its granularity is fine enough.
 	 */
-#ifdef NODE_NOT_IN_PAGE_FLAGS
-	pfn_align = node_map_pfn_alignment();
+	pfn_align = node_map_pfn_alignment(mi);
 	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
 		printk(KERN_WARNING "Node alignment %LuMB < min %LuMB, rejecting NUMA config\n",
 		       PFN_PHYS(pfn_align) >> 20,
 		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
 		return -EINVAL;
 	}
-#endif
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
+	}
 
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2ae2050..1c79b10 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1323,7 +1323,6 @@ extern void free_initmem(void);
  * CONFIG_HAVE_MEMBLOCK_NODE_MAP.
  */
 extern void free_area_init_nodes(unsigned long *max_zone_pfn);
-unsigned long node_map_pfn_alignment(void);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
 						unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 580d919..f368db4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4725,56 +4725,6 @@ static inline void setup_nr_node_ids(void)
 }
 #endif
 
-/**
- * node_map_pfn_alignment - determine the maximum internode alignment
- *
- * This function should be called after node map is populated and sorted.
- * It calculates the maximum power of two alignment which can distinguish
- * all the nodes.
- *
- * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
- * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
- * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
- * shifted, 1GiB is enough and this function will indicate so.
- *
- * This is used to test whether pfn -> nid mapping of the chosen memory
- * model has fine enough granularity to avoid incorrect mapping for the
- * populated node map.
- *
- * Returns the determined alignment in pfn's.  0 if there is no alignment
- * requirement (single node).
- */
-unsigned long __init node_map_pfn_alignment(void)
-{
-	unsigned long accl_mask = 0, last_end = 0;
-	unsigned long start, end, mask;
-	int last_nid = -1;
-	int i, nid;
-
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) {
-		if (!start || last_nid < 0 || last_nid == nid) {
-			last_nid = nid;
-			last_end = end;
-			continue;
-		}
-
-		/*
-		 * Start with a mask granular enough to pin-point to the
-		 * start pfn and tick off bits one-by-one until it becomes
-		 * too coarse to separate the current node from the last.
-		 */
-		mask = ~((1 << __ffs(start)) - 1);
-		while (mask && last_end <= (start & (mask << 1)))
-			mask <<= 1;
-
-		/* accumulate all internode masks */
-		accl_mask |= mask;
-	}
-
-	/* convert mask to number of pages */
-	return ~accl_mask + 1;
-}
-
 /* Find the lowest pfn for a node */
 static unsigned long __init find_min_pfn_for_node(int nid)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/14] x86, mm, numa: set memblock nid later
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (7 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 08/14] x86, mm, numa: use numa_meminfo to check node_map_pfn alignment Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  6:28   ` Tejun Heo
  2013-03-08  4:58 ` [PATCH 10/14] x86, mm, numa: Move emulation handling down Yinghai Lu
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Tejun Heo

Only set memblock nid one time.

Also rename numa_register_memblks to numa_check_memblks()
after move set memblock nid out.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/numa.c |   16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index b8cc248..e875c2b 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -537,10 +537,9 @@ static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 }
 #endif
 
-static int __init numa_register_memblks(struct numa_meminfo *mi)
+static int __init numa_check_memblks(struct numa_meminfo *mi)
 {
 	unsigned long pfn_align;
-	int i;
 
 	/* Account for nodes with cpus and no memory */
 	node_possible_map = numa_nodes_parsed;
@@ -563,11 +562,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 		return -EINVAL;
 	}
 
-	for (i = 0; i < mi->nr_blks; i++) {
-		struct numa_memblk *mb = &mi->blk[i];
-		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-	}
-
 	return 0;
 }
 
@@ -605,7 +599,6 @@ static int __init numa_init(int (*init_func)(void))
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
-	WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
 	numa_reset_distance();
 
 	ret = init_func();
@@ -617,7 +610,7 @@ static int __init numa_init(int (*init_func)(void))
 
 	numa_emulation(&numa_meminfo, numa_distance_cnt);
 
-	ret = numa_register_memblks(&numa_meminfo);
+	ret = numa_check_memblks(&numa_meminfo);
 	if (ret < 0)
 		return ret;
 
@@ -676,6 +669,11 @@ void __init x86_numa_init(void)
 	numa_init(dummy_numa_init);
 
 out:
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
+	}
+
 	/* Finally register nodes. */
 	for_each_node_mask(nid, node_possible_map) {
 		u64 start = PFN_PHYS(max_pfn);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/14] x86, mm, numa: Move emulation handling down.
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (8 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 09/14] x86, mm, numa: set memblock nid later Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  6:42   ` Tejun Heo
  2013-03-08  4:58 ` [PATCH 11/14] x86, acpi, numa: split SLIT handling out Yinghai Lu
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Tejun Heo, David Rientjes

It will need to allocate buffer for new numa_meminfo and
distance matrix, so move it down.

Also we change the behavoir:
before this patch, if user input wrong data in command line, it
will fall back to next numa or disabling numa.
after this patch, if user input wrong data in command line, it will
stay with numa info from probing before, like acpi srat or amd_numa.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
---
 arch/x86/mm/numa.c           |   15 +++++++++------
 arch/x86/mm/numa_emulation.c |    2 +-
 arch/x86/mm/numa_internal.h  |    2 ++
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index e875c2b..ace0370 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -537,14 +537,16 @@ static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 }
 #endif
 
-static int __init numa_check_memblks(struct numa_meminfo *mi)
+
+int __init numa_check_memblks(struct numa_meminfo *mi)
 {
+	nodemask_t tmp_node_map;
 	unsigned long pfn_align;
 
 	/* Account for nodes with cpus and no memory */
-	node_possible_map = numa_nodes_parsed;
-	numa_nodemask_from_meminfo(&node_possible_map, mi);
-	if (WARN_ON(nodes_empty(node_possible_map)))
+	tmp_node_map = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&tmp_node_map, mi);
+	if (WARN_ON(nodes_empty(tmp_node_map)))
 		return -EINVAL;
 
 	if (!numa_meminfo_cover_memory(mi))
@@ -562,6 +564,7 @@ static int __init numa_check_memblks(struct numa_meminfo *mi)
 		return -EINVAL;
 	}
 
+	node_possible_map = tmp_node_map;
 	return 0;
 }
 
@@ -608,8 +611,6 @@ static int __init numa_init(int (*init_func)(void))
 	if (ret < 0)
 		return ret;
 
-	numa_emulation(&numa_meminfo, numa_distance_cnt);
-
 	ret = numa_check_memblks(&numa_meminfo);
 	if (ret < 0)
 		return ret;
@@ -669,6 +670,8 @@ void __init x86_numa_init(void)
 	numa_init(dummy_numa_init);
 
 out:
+	numa_emulation(&numa_meminfo, numa_distance_cnt);
+
 	for (i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *mb = &mi->blk[i];
 		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index dbbbb47..5a0433d 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -348,7 +348,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 	if (ret < 0)
 		goto no_emu;
 
-	if (numa_cleanup_meminfo(&ei) < 0) {
+	if (numa_cleanup_meminfo(&ei) < 0 || numa_check_memblks(&ei) < 0) {
 		pr_warning("NUMA: Warning: constructed meminfo invalid, disabling emulation\n");
 		goto no_emu;
 	}
diff --git a/arch/x86/mm/numa_internal.h b/arch/x86/mm/numa_internal.h
index ad86ec9..bb2fbcc 100644
--- a/arch/x86/mm/numa_internal.h
+++ b/arch/x86/mm/numa_internal.h
@@ -21,6 +21,8 @@ void __init numa_reset_distance(void);
 
 void __init x86_numa_init(void);
 
+int __init numa_check_memblks(struct numa_meminfo *mi);
+
 #ifdef CONFIG_NUMA_EMU
 void __init numa_emulation(struct numa_meminfo *numa_meminfo,
 			   int numa_dist_cnt);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/14] x86, acpi, numa: split SLIT handling out
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (9 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 10/14] x86, mm, numa: Move emulation handling down Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  6:46   ` Tejun Heo
  2013-03-08  4:58 ` [PATCH 12/14] x86, mm, numa: Add early_initmem_init() stub Yinghai Lu
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Tejun Heo, Rafael J. Wysocki, linux-acpi

We need to handle slit later, as it need to allocate buffer.

Also we only need srat info before init_mem_mapping.

x86_acpi_numa_init become x86_acpi_numa_init_only_slit
x86_acpi_numa_init_no_slit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-acpi@vger.kernel.org
---
 arch/x86/include/asm/acpi.h |    3 ++-
 arch/x86/mm/numa.c          |   13 ++++++++++++-
 arch/x86/mm/srat.c          |    8 ++++++--
 drivers/acpi/numa.c         |   22 +++++++++++++++++++---
 include/linux/acpi.h        |    2 ++
 5 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index b31bf97..9f171a7 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -178,7 +178,8 @@ static inline void disable_acpi(void) { }
 
 #ifdef CONFIG_ACPI_NUMA
 extern int acpi_numa;
-extern int x86_acpi_numa_init(void);
+int x86_acpi_numa_init_no_slit(void);
+void x86_acpi_numa_init_only_slit(void);
 #endif /* CONFIG_ACPI_NUMA */
 
 #define acpi_unlazy_tlb(x)	leave_mm(x)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index ace0370..23ec6ba 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -640,6 +640,10 @@ static int __init dummy_numa_init(void)
 	return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+static bool srat_used __initdata;
+#endif
+
 /**
  * x86_numa_init - Initialize NUMA
  *
@@ -658,8 +662,10 @@ void __init x86_numa_init(void)
 			goto out;
 #endif
 #ifdef CONFIG_ACPI_NUMA
-		if (!numa_init(x86_acpi_numa_init))
+		if (!numa_init(x86_acpi_numa_init_no_slit)) {
+			srat_used = true;
 			goto out;
+		}
 #endif
 #ifdef CONFIG_AMD_NUMA
 		if (!numa_init(amd_numa_init))
@@ -670,6 +676,11 @@ void __init x86_numa_init(void)
 	numa_init(dummy_numa_init);
 
 out:
+#ifdef CONFIG_ACPI_NUMA
+	if (srat_used)
+		x86_acpi_numa_init_only_slit();
+#endif
+
 	numa_emulation(&numa_meminfo, numa_distance_cnt);
 
 	for (i = 0; i < mi->nr_blks; i++) {
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index cdd0da9..47a62b2 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -187,11 +187,15 @@ out_err:
 
 void __init acpi_numa_arch_fixup(void) {}
 
-int __init x86_acpi_numa_init(void)
+void __init x86_acpi_numa_init_only_slit(void)
+{
+	acpi_numa_init_only_slit();
+}
+int __init x86_acpi_numa_init_no_slit(void)
 {
 	int ret;
 
-	ret = acpi_numa_init();
+	ret = acpi_numa_init_no_slit();
 	if (ret < 0)
 		return ret;
 	return srat_disabled() ? -EINVAL : 0;
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 33e609f..2215718 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -282,7 +282,13 @@ acpi_table_parse_srat(enum acpi_srat_type id,
 					    handler, max_entries);
 }
 
-int __init acpi_numa_init(void)
+void __init acpi_numa_init_only_slit(void)
+{
+	/* SLIT: System Locality Information Table */
+	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
+}
+
+static int __init __acpi_numa_init(bool with_slit)
 {
 	int cnt = 0;
 
@@ -303,8 +309,8 @@ int __init acpi_numa_init(void)
 					    NR_NODE_MEMBLKS);
 	}
 
-	/* SLIT: System Locality Information Table */
-	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
+	if (with_slit)
+		acpi_numa_init_only_slit();
 
 	acpi_numa_arch_fixup();
 
@@ -315,6 +321,16 @@ int __init acpi_numa_init(void)
 	return 0;
 }
 
+int __init acpi_numa_init(void)
+{
+	return __acpi_numa_init(true);
+}
+
+int __init acpi_numa_init_no_slit(void)
+{
+	return __acpi_numa_init(false);
+}
+
 int acpi_get_pxm(acpi_handle h)
 {
 	unsigned long long pxm;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 46a8a89..bfd2852 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -86,6 +86,8 @@ int acpi_boot_init (void);
 void acpi_boot_table_init (void);
 int acpi_mps_check (void);
 int acpi_numa_init (void);
+int acpi_numa_init_no_slit(void);
+void acpi_numa_init_only_slit(void);
 
 int acpi_table_init (void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 12/14] x86, mm, numa: Add early_initmem_init() stub
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (10 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 11/14] x86, acpi, numa: split SLIT handling out Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  4:58 ` [PATCH 13/14] x86, mm: Parse numa info early Yinghai Lu
  2013-03-08  4:58 ` [PATCH 14/14] x86, mm: Put pagetable on local node ram Yinghai Lu
  13 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Tejun Heo, Pekka Enberg, Jacob Shin

early_initmem_init() will call early_x86_numa_init().

later will call init_mem_mapping for nodes in it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/include/asm/page_types.h |    1 +
 arch/x86/kernel/setup.c           |    1 +
 arch/x86/mm/init_32.c             |    3 +++
 arch/x86/mm/init_64.c             |    3 +++
 arch/x86/mm/numa.c                |   23 +++++++++++++++--------
 5 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index b012b82..d04dd8c 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -55,6 +55,7 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
 extern unsigned long init_memory_mapping(unsigned long start,
 					 unsigned long end);
 
+void early_initmem_init(void);
 extern void initmem_init(void);
 
 #endif	/* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d43545a..c4f1c63 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1133,6 +1133,7 @@ void __init setup_arch(char **cmdline_p)
 
 	early_acpi_boot_init();
 
+	early_initmem_init();
 	initmem_init();
 	memblock_find_dma_reserve();
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 2d19001..3801962 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -660,6 +660,9 @@ void __init find_low_pfn_range(void)
 }
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
+void __init early_initmem_init(void)
+{
+}
 void __init initmem_init(void)
 {
 #ifdef CONFIG_HIGHMEM
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 474e28f..218a4e5 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -640,6 +640,9 @@ kernel_physical_mapping_init(unsigned long start,
 }
 
 #ifndef CONFIG_NUMA
+void __init early_initmem_init(void)
+{
+}
 void __init initmem_init(void)
 {
 	memblock_set_node(0, (phys_addr_t)ULLONG_MAX, 0);
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 23ec6ba..643b39a 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -651,31 +651,38 @@ static bool srat_used __initdata;
  * last fallback is dummy single node config encomapssing whole memory and
  * never fails.
  */
-void __init x86_numa_init(void)
+static void __init early_x86_numa_init(void)
 {
-	int i, nid;
-	struct numa_meminfo *mi = &numa_meminfo;
-
 	if (!numa_off) {
 #ifdef CONFIG_X86_NUMAQ
 		if (!numa_init(numaq_numa_init))
-			goto out;
+			return;
 #endif
 #ifdef CONFIG_ACPI_NUMA
 		if (!numa_init(x86_acpi_numa_init_no_slit)) {
 			srat_used = true;
-			goto out;
+			return;
 		}
 #endif
 #ifdef CONFIG_AMD_NUMA
 		if (!numa_init(amd_numa_init))
-			goto out;
+			return;
 #endif
 	}
 
 	numa_init(dummy_numa_init);
+}
+
+void __init early_initmem_init(void)
+{
+	early_x86_numa_init();
+}
+
+void __init x86_numa_init(void)
+{
+	int i, nid;
+	struct numa_meminfo *mi = &numa_meminfo;
 
-out:
 #ifdef CONFIG_ACPI_NUMA
 	if (srat_used)
 		x86_acpi_numa_init_only_slit();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 13/14] x86, mm: Parse numa info early
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (11 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 12/14] x86, mm, numa: Add early_initmem_init() stub Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  4:58 ` [PATCH 14/14] x86, mm: Put pagetable on local node ram Yinghai Lu
  13 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Pekka Enberg, Jacob Shin

Parse numa info at first and store info into numa_meminfo.

call early_initmem_init before init_memory_mapping(), will
have numa info ready at first, and will still keep numaq, acpi_numa,
amd_numa, dummy fall back sequence.

SLIT and numa emulation handling are still left in initmem_init().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/setup.c |   24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4f1c63..29a6b94 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1096,13 +1096,21 @@ void __init setup_arch(char **cmdline_p)
 	trim_platform_memory_ranges();
 	trim_low_memory_range();
 
+	/*
+	 * Parse the ACPI tables for possible boot-time SMP configuration.
+	 */
+	acpi_initrd_override_copy();
+	acpi_boot_table_init();
+	early_acpi_boot_init();
+	early_initmem_init();
 	init_mem_mapping();
-
+	memblock.current_limit = get_max_mapped();
 	early_trap_pf_init();
 
+	reserve_initrd();
+
 	setup_real_mode();
 
-	memblock.current_limit = get_max_mapped();
 	dma_contiguous_reserve(0);
 
 	/*
@@ -1116,24 +1124,12 @@ void __init setup_arch(char **cmdline_p)
 	/* Allocate bigger log buffer */
 	setup_log_buf(1);
 
-	reserve_initrd();
-
-	acpi_initrd_override_copy();
-
 	reserve_crashkernel();
 
 	vsmp_init();
 
 	io_delay_init();
 
-	/*
-	 * Parse the ACPI tables for possible boot-time SMP configuration.
-	 */
-	acpi_boot_table_init();
-
-	early_acpi_boot_init();
-
-	early_initmem_init();
 	initmem_init();
 	memblock_find_dma_reserve();
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 14/14] x86, mm: Put pagetable on local node ram
  2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
                   ` (12 preceding siblings ...)
  2013-03-08  4:58 ` [PATCH 13/14] x86, mm: Parse numa info early Yinghai Lu
@ 2013-03-08  4:58 ` Yinghai Lu
  2013-03-08  7:01   ` Tejun Heo
  2013-03-08  8:20   ` Tang Chen
  13 siblings, 2 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen
  Cc: linux-kernel, Yinghai Lu, Tejun Heo, Pekka Enberg, Jacob Shin,
	Konrad Rzeszutek Wilk

If node with ram is hotplugable, local node mem for page table and vmemmap
should be on that node ram.

This patch is some kind of refreshment of
| commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327
| Date:   Mon Dec 27 16:48:17 2010 -0800
|
|    x86-64, numa: Put pgtable to local node memory
That was reverted before.

We have reason to reintroduce it to make memory hotplug work.

Split calling of init_mem_mapping into early_initmem_info
for nodes after we get numa info there.

First node will be low range.
Need to rework alloc_low_pages to alloc page table in following order:
	BRK, local node, low range

Still only load_cr3 one time, otherwise we would break xen 64bit again.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/pgtable.h |    2 +-
 arch/x86/kernel/setup.c        |    1 -
 arch/x86/mm/init.c             |   83 ++++++++++++++++++++++------------------
 arch/x86/mm/init_32.c          |    8 ++++
 arch/x86/mm/init_64.c          |    9 +++++
 arch/x86/mm/numa.c             |   56 +++++++++++++++++++++++++++
 6 files changed, 119 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1e67223..868687c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -621,7 +621,7 @@ static inline int pgd_none(pgd_t pgd)
 #ifndef __ASSEMBLY__
 
 extern int direct_gbpages;
-void init_mem_mapping(void);
+void init_mem_mapping(unsigned long begin, unsigned long end);
 void early_alloc_pgt_buf(void);
 
 /* local pte updates need not use xchg for locking */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 29a6b94..37d993f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1103,7 +1103,6 @@ void __init setup_arch(char **cmdline_p)
 	acpi_boot_table_init();
 	early_acpi_boot_init();
 	early_initmem_init();
-	init_mem_mapping();
 	memblock.current_limit = get_max_mapped();
 	early_trap_pf_init();
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index abcc241..2838bb5 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -24,7 +24,10 @@ static unsigned long __initdata pgt_buf_start;
 static unsigned long __initdata pgt_buf_end;
 static unsigned long __initdata pgt_buf_top;
 
-static unsigned long min_pfn_mapped;
+static unsigned long low_min_pfn_mapped;
+static unsigned long low_max_pfn_mapped;
+static unsigned long local_min_pfn_mapped;
+static unsigned long local_max_pfn_mapped;
 
 static bool __initdata can_use_brk_pgt = true;
 
@@ -52,10 +55,17 @@ __ref void *alloc_low_pages(unsigned int num)
 
 	if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
 		unsigned long ret;
-		if (min_pfn_mapped >= max_pfn_mapped)
-			panic("alloc_low_page: ran out of memory");
-		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
-					max_pfn_mapped << PAGE_SHIFT,
+		if (local_min_pfn_mapped >= local_max_pfn_mapped) {
+			if (low_min_pfn_mapped >= low_max_pfn_mapped)
+				panic("alloc_low_page: ran out of memory");
+			ret = memblock_find_in_range(
+					low_min_pfn_mapped << PAGE_SHIFT,
+					low_max_pfn_mapped << PAGE_SHIFT,
+					PAGE_SIZE * num , PAGE_SIZE);
+		} else
+			ret = memblock_find_in_range(
+					local_min_pfn_mapped << PAGE_SHIFT,
+					local_max_pfn_mapped << PAGE_SHIFT,
 					PAGE_SIZE * num , PAGE_SIZE);
 		if (!ret)
 			panic("alloc_low_page: can not alloc memory");
@@ -387,67 +397,64 @@ static unsigned long __init init_range_memory_mapping(
 
 /* (PUD_SHIFT-PMD_SHIFT)/2 */
 #define STEP_SIZE_SHIFT 5
-void __init init_mem_mapping(void)
+void __init init_mem_mapping(unsigned long begin, unsigned long end)
 {
-	unsigned long end, real_end, start, last_start;
+	unsigned long real_end, start, last_start;
 	unsigned long step_size;
 	unsigned long addr;
 	unsigned long mapped_ram_size = 0;
 	unsigned long new_mapped_ram_size;
+	bool is_low = false;
+
+	if (!begin) {
+		probe_page_size_mask();
+		/* the ISA range is always mapped regardless of memory holes */
+		init_memory_mapping(0, ISA_END_ADDRESS);
+		begin = ISA_END_ADDRESS;
+		is_low = true;
+	}
 
-	probe_page_size_mask();
-
-#ifdef CONFIG_X86_64
-	end = max_pfn << PAGE_SHIFT;
-#else
-	end = max_low_pfn << PAGE_SHIFT;
-#endif
-
-	/* the ISA range is always mapped regardless of memory holes */
-	init_memory_mapping(0, ISA_END_ADDRESS);
+	if (begin >= end)
+		return;
 
 	/* xen has big range in reserved near end of ram, skip it at first.*/
-	addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE, PMD_SIZE);
+	addr = memblock_find_in_range(begin, end, PMD_SIZE, PMD_SIZE);
 	real_end = addr + PMD_SIZE;
 
 	/* step_size need to be small so pgt_buf from BRK could cover it */
 	step_size = PMD_SIZE;
-	max_pfn_mapped = 0; /* will get exact value next */
-	min_pfn_mapped = real_end >> PAGE_SHIFT;
+	local_max_pfn_mapped = begin >> PAGE_SHIFT;
+	local_min_pfn_mapped = real_end >> PAGE_SHIFT;
 	last_start = start = real_end;
-	while (last_start > ISA_END_ADDRESS) {
+	while (last_start > begin) {
 		if (last_start > step_size) {
 			start = round_down(last_start - 1, step_size);
-			if (start < ISA_END_ADDRESS)
-				start = ISA_END_ADDRESS;
+			if (start < begin)
+				start = begin;
 		} else
-			start = ISA_END_ADDRESS;
+			start = begin;
 		new_mapped_ram_size = init_range_memory_mapping(start,
 							last_start);
+		if ((last_start >> PAGE_SHIFT) > local_max_pfn_mapped)
+			local_max_pfn_mapped = last_start >> PAGE_SHIFT;
+		local_min_pfn_mapped = start >> PAGE_SHIFT;
 		last_start = start;
-		min_pfn_mapped = last_start >> PAGE_SHIFT;
 		/* only increase step_size after big range get mapped */
 		if (new_mapped_ram_size > mapped_ram_size)
 			step_size <<= STEP_SIZE_SHIFT;
 		mapped_ram_size += new_mapped_ram_size;
 	}
 
-	if (real_end < end)
+	if (real_end < end) {
 		init_range_memory_mapping(real_end, end);
-
-#ifdef CONFIG_X86_64
-	if (max_pfn > max_low_pfn) {
-		/* can we preseve max_low_pfn ?*/
-		max_low_pfn = max_pfn;
+		if ((end >> PAGE_SHIFT) > local_max_pfn_mapped)
+			local_max_pfn_mapped = end >> PAGE_SHIFT;
 	}
-#else
-	early_ioremap_page_table_range_init();
-#endif
 
-	load_cr3(swapper_pg_dir);
-	__flush_tlb_all();
-
-	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
+	if (is_low) {
+		low_min_pfn_mapped = local_min_pfn_mapped;
+		low_max_pfn_mapped = local_max_pfn_mapped;
+	}
 }
 
 /*
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 3801962..37e5768 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -662,6 +662,14 @@ void __init find_low_pfn_range(void)
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 void __init early_initmem_init(void)
 {
+	init_mem_mapping(0, max_low_pfn<<PAGE_SHIFT);
+
+	early_ioremap_page_table_range_init();
+
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+
+	early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);
 }
 void __init initmem_init(void)
 {
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 218a4e5..a15db8a 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -642,6 +642,15 @@ kernel_physical_mapping_init(unsigned long start,
 #ifndef CONFIG_NUMA
 void __init early_initmem_init(void)
 {
+	init_mem_mapping(0, max_pfn<<PAGE_SHIFT);
+
+	if (max_pfn > max_low_pfn)
+		max_low_pfn = max_pfn;
+
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+
+	early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);
 }
 void __init initmem_init(void)
 {
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 643b39a..0aeb980 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -17,8 +17,10 @@
 #include <asm/dma.h>
 #include <asm/acpi.h>
 #include <asm/amd_nb.h>
+#include <asm/tlbflush.h>
 
 #include "numa_internal.h"
+#include "mm_internal.h"
 
 int __initdata numa_off;
 nodemask_t numa_nodes_parsed __initdata;
@@ -673,9 +675,63 @@ static void __init early_x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
+#ifdef CONFIG_X86_64
+static void __init early_x86_numa_init_mapping(void)
+{
+	unsigned long last_start = 0, last_end = 0;
+	struct numa_meminfo *mi = &numa_meminfo;
+	unsigned long start, end;
+	int last_nid = -1;
+	int i, nid;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		nid   = mi->blk[i].nid;
+		start = mi->blk[i].start;
+		end   = mi->blk[i].end;
+
+		if (last_nid == nid) {
+			last_end = end;
+			continue;
+		}
+
+		/* other nid now */
+		if (last_nid >= 0) {
+			printk(KERN_DEBUG "Node %d: [mem %#016lx-%#016lx]\n",
+					last_nid, last_start, last_end - 1);
+			init_mem_mapping(last_start, last_end);
+		}
+
+		/* for next nid */
+		last_nid   = nid;
+		last_start = start;
+		last_end   = end;
+	}
+	/* last one */
+	printk(KERN_DEBUG "Node %d: [mem %#016lx-%#016lx]\n",
+			last_nid, last_start, last_end - 1);
+	init_mem_mapping(last_start, last_end);
+
+	if (max_pfn > max_low_pfn)
+		max_low_pfn = max_pfn;
+}
+#else
+static void __init early_x86_numa_init_mapping(void)
+{
+	init_mem_mapping(0, max_low_pfn<<PAGE_SHIFT);
+	early_ioremap_page_table_range_init();
+}
+#endif
+
 void __init early_initmem_init(void)
 {
 	early_x86_numa_init();
+
+	early_x86_numa_init_mapping();
+
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
+
+	early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);
 }
 
 void __init x86_numa_init(void)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-08  4:58 ` [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
@ 2013-03-08  5:10   ` Tejun Heo
  2013-03-08  5:22     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  5:10 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, David Airlie, Jacob Shin, linux-acpi, dri-devel

On Thu, Mar 07, 2013 at 08:58:27PM -0800, Yinghai Lu wrote:
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index 69d97cb..7f9380b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -81,7 +81,7 @@ static unsigned long i915_stolen_to_physical(struct drm_device *dev)
>  		base -= dev_priv->mm.gtt->stolen_size;
>  	} else {
>  		/* Stolen is immediately above Top of Memory */
> -		base = max_low_pfn_mapped << PAGE_SHIFT;
> +		base = __REMOVED_CRAZY__ << PAGE_SHIFT;

Huh?

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-08  5:10   ` Tejun Heo
@ 2013-03-08  5:22     ` Yinghai Lu
  2013-03-08  5:25       ` Tejun Heo
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  5:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, David Airlie, Jacob Shin, linux-acpi, dri-devel

On Thu, Mar 7, 2013 at 9:10 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 07, 2013 at 08:58:27PM -0800, Yinghai Lu wrote:
>> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
>> index 69d97cb..7f9380b 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>> @@ -81,7 +81,7 @@ static unsigned long i915_stolen_to_physical(struct drm_device *dev)
>>               base -= dev_priv->mm.gtt->stolen_size;
>>       } else {
>>               /* Stolen is immediately above Top of Memory */
>> -             base = max_low_pfn_mapped << PAGE_SHIFT;
>> +             base = __REMOVED_CRAZY__ << PAGE_SHIFT;
>
> Huh?

Whole function:

static unsigned long i915_stolen_to_physical(struct drm_device *dev)
{
	struct drm_i915_private *dev_priv = dev->dev_private;
	struct pci_dev *pdev = dev_priv->bridge_dev;
	u32 base;

	/* On the machines I have tested the Graphics Base of Stolen Memory
	 * is unreliable, so on those compute the base by subtracting the
	 * stolen memory from the Top of Low Usable DRAM which is where the
	 * BIOS places the graphics stolen memory.
	 *
	 * On gen2, the layout is slightly different with the Graphics Segment
	 * immediately following Top of Memory (or Top of Usable DRAM). Note
	 * it appears that TOUD is only reported by 865g, so we just use the
	 * top of memory as determined by the e820 probe.
	 *
	 * XXX gen2 requires an unavailable symbol and 945gm fails with
	 * its value of TOLUD.
	 */
	base = 0;
	if (INTEL_INFO(dev)->gen >= 6) {
		/* Read Base Data of Stolen Memory Register (BDSM) directly.
		 * Note that there is also a MCHBAR miror at 0x1080c0 or
		 * we could use device 2:0x5c instead.
		*/
		pci_read_config_dword(pdev, 0xB0, &base);
		base &= ~4095; /* lower bits used for locking register */
	} else if (INTEL_INFO(dev)->gen > 3 || IS_G33(dev)) {
		/* Read Graphics Base of Stolen Memory directly */
		pci_read_config_dword(pdev, 0xA4, &base);
#if 0
	} else if (IS_GEN3(dev)) {
		u8 val;
		/* Stolen is immediately below Top of Low Usable DRAM */
		pci_read_config_byte(pdev, 0x9c, &val);
		base = val >> 3 << 27;
		base -= dev_priv->mm.gtt->stolen_size;
	} else {
		/* Stolen is immediately above Top of Memory */
		base = __REMOVED_CRAZY__ << PAGE_SHIFT;
#endif
	}

	return base;
}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-08  5:22     ` Yinghai Lu
@ 2013-03-08  5:25       ` Tejun Heo
  2013-03-08  5:27         ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  5:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, David Airlie, Jacob Shin, linux-acpi, dri-devel

On Thu, Mar 7, 2013 at 9:22 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Mar 7, 2013 at 9:10 PM, Tejun Heo <tj@kernel.org> wrote:
>> On Thu, Mar 07, 2013 at 08:58:27PM -0800, Yinghai Lu wrote:
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> index 69d97cb..7f9380b 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> @@ -81,7 +81,7 @@ static unsigned long i915_stolen_to_physical(struct drm_device *dev)
>>>               base -= dev_priv->mm.gtt->stolen_size;
>>>       } else {
>>>               /* Stolen is immediately above Top of Memory */
>>> -             base = max_low_pfn_mapped << PAGE_SHIFT;
>>> +             base = __REMOVED_CRAZY__ << PAGE_SHIFT;
>>
>> Huh?
>
> Whole function:

Yeah, but can't we still just do 1LLU << 32 like other places? Or at
least explain what was there before? It's gonna confuse the hell out
of future readers of the code.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-08  5:25       ` Tejun Heo
@ 2013-03-08  5:27         ` Yinghai Lu
  2013-03-08  5:28           ` Tejun Heo
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  5:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, David Airlie, Jacob Shin, linux-acpi, dri-devel

On Thu, Mar 7, 2013 at 9:25 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 7, 2013 at 9:22 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Thu, Mar 7, 2013 at 9:10 PM, Tejun Heo <tj@kernel.org> wrote:
>>> On Thu, Mar 07, 2013 at 08:58:27PM -0800, Yinghai Lu wrote:
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>>> index 69d97cb..7f9380b 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>>> @@ -81,7 +81,7 @@ static unsigned long i915_stolen_to_physical(struct drm_device *dev)
>>>>               base -= dev_priv->mm.gtt->stolen_size;
>>>>       } else {
>>>>               /* Stolen is immediately above Top of Memory */
>>>> -             base = max_low_pfn_mapped << PAGE_SHIFT;
>>>> +             base = __REMOVED_CRAZY__ << PAGE_SHIFT;
>>>
>>> Huh?
>>
>> Whole function:
>
> Yeah, but can't we still just do 1LLU << 32 like other places? Or at
> least explain what was there before? It's gonna confuse the hell out
> of future readers of the code.

They are not using memblock_find_in_range(), so 1ULL<< will not help.

Really hope i915 drm guys could clean that hacks.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-08  5:27         ` Yinghai Lu
@ 2013-03-08  5:28           ` Tejun Heo
  2013-03-08  6:09             ` H. Peter Anvin
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  5:28 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, David Airlie, Jacob Shin, linux-acpi, dri-devel

On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> They are not using memblock_find_in_range(), so 1ULL<< will not help.
>
> Really hope i915 drm guys could clean that hacks.

The code isn't being used.  Just leave it alone.  Maybe add a comment.
 The change is just making things more confusing.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/14] x86, ACPI: Split find/copy from acpi_initrd_override
  2013-03-08  4:58 ` [PATCH 02/14] x86, ACPI: Split find/copy from acpi_initrd_override Yinghai Lu
@ 2013-03-08  5:33   ` Tejun Heo
  2013-03-08  6:47     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  5:33 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Rafael J. Wysocki, linux-acpi

On Thu, Mar 07, 2013 at 08:58:28PM -0800, Yinghai Lu wrote:
> To parse srat early, we will need to move acpi table probing early.
> and to keep acpi_initrd_table_override working, we need to move it
> ahead.
> 
> But current that is called after init_mem_mapping and relocate_initrd().
> 
> Copying need to be after memblock is ready, because it need to allocate
> some buffer for acpi tables.
> 
> Finding will be moved into head_32.S and head64.c, just like microcode
> early scanning.
> 
> So split them at first.
> 
> Also move down functions declaration to avoid #ifdef in setup.c
> 
> Signed-off-by: Yinghai <yinghai@kernel.org>
> Cc: Thomas Renninger <trenn@suse.de>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Jacob Shin <jacob.shin@amd.com>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: linux-acpi@vger.kernel.org
...
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index c9e36d7..b9d2ff0 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -539,6 +539,7 @@ acpi_os_predefined_override(const struct acpi_predefined_names *init_val,
>  
>  static u64 acpi_tables_addr;
>  static int all_tables_size;
> +static int table_nr;

Not particularly good choice of name for static variable visible to
multiple functions.  all_tables_size isn't a stellar choice either but
no need to continue the tradition.  Maybe acpi_nr_initrd_files?  Also,
why is this one defined here away from the actual table?

> -/* Must not increase 10 or needs code modification below */
> -#define ACPI_OVERRIDE_TABLES 10
> +#define ACPI_OVERRIDE_TABLES 64

What's up with the silent bumping of table size?

> +static struct cpio_data __initdata early_initrd_files[ACPI_OVERRIDE_TABLES];

acpi_initrd_files[]?  Do we really need the "early" designation
together with initrd?

> @@ -647,14 +653,14 @@ void __init acpi_initrd_override(void *data, size_t size)
>  	memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
>  	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>  
> -	p = early_ioremap(acpi_tables_addr, all_tables_size);
> -
>  	for (no = 0; no < table_nr; no++) {
> -		memcpy(p + total_offset, early_initrd_files[no].data,
> -		       early_initrd_files[no].size);
> -		total_offset += early_initrd_files[no].size;
> +		size_t size = early_initrd_files[no].size;
> +
> +		p = early_ioremap(acpi_tables_addr + total_offset, size);
> +		memcpy(p, early_initrd_files[no].data, size);
> +		early_iounmap(p, size);
> +		total_offset += size;
>  	}
> -	early_iounmap(p, all_tables_size);

Why is this necessary?  Why no explanation in the description?

> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -79,14 +79,6 @@ typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table);
>  typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
>  				      const unsigned long end);
>  
> -#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
> -void acpi_initrd_override(void *data, size_t size);
> -#else
> -static inline void acpi_initrd_override(void *data, size_t size)
> -{
> -}
> -#endif
> -
>  char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
>  void __acpi_unmap_table(char *map, unsigned long size);
>  int early_acpi_boot_init(void);
> @@ -485,6 +477,14 @@ static inline bool acpi_driver_match_device(struct device *dev,
>  
>  #endif	/* !CONFIG_ACPI */
>  
> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
> +void acpi_initrd_override_find(void *data, size_t size);
> +void acpi_initrd_override_copy(void);
> +#else
> +static inline void acpi_initrd_override_find(void *data, size_t size) { }
> +static inline void acpi_initrd_override_copy(void) { }
> +#endif

I don't get this part either.  Why is it necessary to move the
prototypes to avoid #ifdefs in setup.c?  Ah, okay, you're brining it
outside CONFIG_ACPI so that they're defined regardless of that config
option.  Can you please add why you're moving the prototype in the
descriptoin?  Having "what" is nice but "why" is much nicer. :)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/14] x86, ACPI: store override acpi tables phys addr
  2013-03-08  4:58 ` [PATCH 03/14] x86, ACPI: store override acpi tables phys addr Yinghai Lu
@ 2013-03-08  5:36   ` Tejun Heo
  2013-03-08  6:49     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  5:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	linux-acpi

On Thu, Mar 07, 2013 at 08:58:29PM -0800, Yinghai Lu wrote:
> As later 32bit only find table with phys address during 32bit flat mode
> in head_32.S.
> 
> To keep 32bit and 64 bit consistent, use phys_addr for all.
> 
> Use early_ioremap to access during copying.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Thomas Renninger <trenn@suse.de>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: linux-acpi@vger.kernel.org
> ---
> @@ -654,10 +654,13 @@ void __init acpi_initrd_override_copy(void)
>  	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>  
>  	for (no = 0; no < table_nr; no++) {
> -		size_t size = early_initrd_files[no].size;
> +		unsigned long size = early_initrd_files[no].size;
>  
>  		p = early_ioremap(acpi_tables_addr + total_offset, size);
> -		memcpy(p, early_initrd_files[no].data, size);
> +		q = early_ioremap((unsigned long)early_initrd_files[no].data,
> +					 size);
> +		memcpy(p, q, size);
> +		early_iounmap(q, size);

Ah, okay, so the loop change in the previous patch was for this, I
suppose?  That chunk probably should either be a separate patch or
rolled into this one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode
  2013-03-08  4:58 ` [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode Yinghai Lu
@ 2013-03-08  5:50   ` Tejun Heo
  2013-03-08  6:57     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  5:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Rafael J. Wysocki, linux-acpi

On Thu, Mar 07, 2013 at 08:58:30PM -0800, Yinghai Lu wrote:
> We will find acpi tables in initrd during head_32.S in 32bit flat mode.
> 
> So need acpi_initrd_override_find could take phys directly.

The patch description doesn't explain even half of what's going on.

> @@ -552,38 +552,47 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
>  	return sum;
>  }
>  
> -/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
> -static const char * const table_sigs[] = {
> -	ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
> -	ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
> -	ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
> -	ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
> -	ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
> -	ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
> -	ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
> -	ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
> -	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };

Why is this table made a stack variable?  What's the benefit of doing
that?

>  /* Non-fatal errors: Affected tables/files are ignored */
>  #define INVALID_TABLE(x, path, name)					\
> -	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
> +	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)

Might as well rename the macro to something which indicates it's just
printing error message.  Urgh... who thought embedding control flow
directive like continue inside a macro was a good idea? :(

> -void __init acpi_initrd_override_find(void *data, size_t size)
> +void __init acpi_initrd_override_find(void *data, size_t size, bool is_phys)

Is it really necessary to make the function take both virtual and
physical addresses?  Can't we just make the function take phys_addr_t
and update everyone to call with physaddr?  Also @is_phys isn't simple
address switch.  It also changes error reporting.  If you're gonna
keep @is_phys, let's at least write up a function comment explaining
what's going on and why we need it.  But, really, if at all possible,
let's change the function to take single type of argument and
predicate error message printing on something else (e.g. early printk
initialized or whatever).

> @@ -654,11 +677,14 @@ void __init acpi_initrd_override_copy(void)
>  	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>  
>  	for (no = 0; no < table_nr; no++) {
> +		unsigned long phys_addr = (unsigned long)early_initrd_files[no].data;

Can we please use phys_addr_t for physical addresses?

>  		unsigned long size = early_initrd_files[no].size;
>  
> +		q = early_ioremap(phys_addr, size);
> +		pr_info("%4.4s ACPI table found in initrd [%#010lx-%#010lx]\n",
> +				((struct acpi_table_header *)q)->signature,
> +				phys_addr, phys_addr + size - 1);

Maybe putting pr_info after ioremapping both p and q would be easier
on the eyes?

>  		p = early_ioremap(acpi_tables_addr + total_offset, size);
> -		q = early_ioremap((unsigned long)early_initrd_files[no].data,
> -					 size);
>  		memcpy(p, q, size);
>  		early_iounmap(q, size);
>  		early_iounmap(p, size);

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c
  2013-03-08  4:58 ` [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c Yinghai Lu
@ 2013-03-08  5:57   ` Tejun Heo
  2013-03-08  7:02     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  5:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Rafael J. Wysocki, linux-acpi

On Thu, Mar 07, 2013 at 08:58:31PM -0800, Yinghai Lu wrote:
> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
> index 73afd11..ca08f0e 100644
> --- a/arch/x86/kernel/head_32.S
> +++ b/arch/x86/kernel/head_32.S
> @@ -149,6 +149,10 @@ ENTRY(startup_32)
>  	call load_ucode_bsp
>  #endif
>  
> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
> +	call x86_acpi_override_find
> +#endif

The function is always defined.  We can probalby lose ifdef here?

Also, does it really have to be called from head_32.S?  No way this
can be done after entering C code?  It would be great if you can
explain overall design choices in the head message (and important
patches).

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 668e658..d43545a 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -424,6 +424,32 @@ static void __init reserve_initrd(void)
>  }
>  #endif /* CONFIG_BLK_DEV_INITRD */
>  
> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
> +void __init x86_acpi_override_find(void)
> +{
> +	unsigned long ramdisk_image, ramdisk_size;
> +	unsigned char *p = NULL;
> +
> +#ifdef CONFIG_X86_32
> +	struct boot_params *boot_params_p;
> +
> +	boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
> +	ramdisk_image = boot_params_p->hdr.ramdisk_image;
> +	ramdisk_size  = boot_params_p->hdr.ramdisk_size;
> +	p = (unsigned char *)ramdisk_image;
> +	acpi_initrd_override_find(p, ramdisk_size, true);
> +#else
> +	ramdisk_image = get_ramdisk_image();
> +	ramdisk_size  = get_ramdisk_size();
> +	if (ramdisk_image)
> +		p = __va(ramdisk_image);
> +	acpi_initrd_override_find(p, ramdisk_size, false);
> +#endif
> +}
> +#else
> +void __init x86_acpi_override_find(void) { }

And add a comment here why we're not doing static inline for the dummy
function?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/14] x86, mm, numa: Move successful path handling code later
  2013-03-08  4:58 ` [PATCH 06/14] x86, mm, numa: Move successful path handling code later Yinghai Lu
@ 2013-03-08  6:04   ` Tejun Heo
  2013-03-08  7:03     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  6:04 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel

>  static int __init numa_register_memblks(struct numa_meminfo *mi)

After this patch, the above name is a bit misleading, I think.

> +out:

Maybe register: would fit better?

> +	/* Finally register nodes. */
> +	for_each_node_mask(nid, node_possible_map) {
> +		u64 start = PFN_PHYS(max_pfn);

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-08  5:28           ` Tejun Heo
@ 2013-03-08  6:09             ` H. Peter Anvin
  2013-03-11 22:50               ` Daniel Vetter
  0 siblings, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2013-03-08  6:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, David Airlie, Jacob Shin, linux-acpi, dri-devel

On 03/07/2013 09:28 PM, Tejun Heo wrote:
> On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> They are not using memblock_find_in_range(), so 1ULL<< will not help.
>>
>> Really hope i915 drm guys could clean that hacks.
> 
> The code isn't being used.  Just leave it alone.  Maybe add a comment.
>  The change is just making things more confusing.
> 

Indeed, but...

Daniel: can you guys clean this up or can we just remove the #if 0 clause?

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/14] x86, mm, numa: use numa_meminfo to check node_map_pfn alignment
  2013-03-08  4:58 ` [PATCH 08/14] x86, mm, numa: use numa_meminfo to check node_map_pfn alignment Yinghai Lu
@ 2013-03-08  6:26   ` Tejun Heo
  2013-03-08  7:05     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  6:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel

On Thu, Mar 07, 2013 at 08:58:34PM -0800, Yinghai Lu wrote:
> We could use numa_meminfo directly instead of memblock nid.
> 
> So we could move down set memblock nid down and only do it one time
> for successful path
> 
> Move node_map_pfn_alignment() to arch/x86/mm as no other user for it.

Please don't move and update in the same patch.  It makes it difficult
to review what's really changing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 09/14] x86, mm, numa: set memblock nid later
  2013-03-08  4:58 ` [PATCH 09/14] x86, mm, numa: set memblock nid later Yinghai Lu
@ 2013-03-08  6:28   ` Tejun Heo
  2013-03-08  7:11     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  6:28 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel

On Thu, Mar 07, 2013 at 08:58:35PM -0800, Yinghai Lu wrote:
> Only set memblock nid one time.

Would be awesome if the description explains why we're doing this and
why we're allowed to do this now.

> Also rename numa_register_memblks to numa_check_memblks()
> after move set memblock nid out.

Ah... so, it's getting renamed here.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/14] x86, mm, numa: Move emulation handling down.
  2013-03-08  4:58 ` [PATCH 10/14] x86, mm, numa: Move emulation handling down Yinghai Lu
@ 2013-03-08  6:42   ` Tejun Heo
  2013-03-08  7:13     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  6:42 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, David Rientjes

On Thu, Mar 07, 2013 at 08:58:36PM -0800, Yinghai Lu wrote:
> -static int __init numa_check_memblks(struct numa_meminfo *mi)
> +
> +int __init numa_check_memblks(struct numa_meminfo *mi)
>  {
> +	nodemask_t tmp_node_map;
>  	unsigned long pfn_align;
>  
>  	/* Account for nodes with cpus and no memory */
> -	node_possible_map = numa_nodes_parsed;
> -	numa_nodemask_from_meminfo(&node_possible_map, mi);
> -	if (WARN_ON(nodes_empty(node_possible_map)))
> +	tmp_node_map = numa_nodes_parsed;
> +	numa_nodemask_from_meminfo(&tmp_node_map, mi);
> +	if (WARN_ON(nodes_empty(tmp_node_map)))
>  		return -EINVAL;
>  
>  	if (!numa_meminfo_cover_memory(mi))
> @@ -562,6 +564,7 @@ static int __init numa_check_memblks(struct numa_meminfo *mi)
>  		return -EINVAL;
>  	}
>  
> +	node_possible_map = tmp_node_map;

Hmmm.... it's kinda nasty to have a side effect like the above for a
function named numa_check_memblks().  Maybe we can move this to the
caller or name the function to make it clear that some global state is
being updated?

> @@ -608,8 +611,6 @@ static int __init numa_init(int (*init_func)(void))
>  	if (ret < 0)
>  		return ret;
>  
> -	numa_emulation(&numa_meminfo, numa_distance_cnt);
> -
>  	ret = numa_check_memblks(&numa_meminfo);
>  	if (ret < 0)
>  		return ret;
> @@ -669,6 +670,8 @@ void __init x86_numa_init(void)
>  	numa_init(dummy_numa_init);
>  
>  out:
> +	numa_emulation(&numa_meminfo, numa_distance_cnt);
> +
>  	for (i = 0; i < mi->nr_blks; i++) {
>  		struct numa_memblk *mb = &mi->blk[i];
>  		memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
> diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
> index dbbbb47..5a0433d 100644
> --- a/arch/x86/mm/numa_emulation.c
> +++ b/arch/x86/mm/numa_emulation.c
> @@ -348,7 +348,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
>  	if (ret < 0)
>  		goto no_emu;
>  
> -	if (numa_cleanup_meminfo(&ei) < 0) {
> +	if (numa_cleanup_meminfo(&ei) < 0 || numa_check_memblks(&ei) < 0) {
>  		pr_warning("NUMA: Warning: constructed meminfo invalid, disabling emulation\n");
>  		goto no_emu;
>  	}

Given that acpi is the only mechanism which matters in any modern NUMA
machines, I think the re-ordering should be fine.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 11/14] x86, acpi, numa: split SLIT handling out
  2013-03-08  4:58 ` [PATCH 11/14] x86, acpi, numa: split SLIT handling out Yinghai Lu
@ 2013-03-08  6:46   ` Tejun Heo
  2013-03-08  7:18     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  6:46 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	linux-acpi

On Thu, Mar 07, 2013 at 08:58:37PM -0800, Yinghai Lu wrote:
> +void __init acpi_numa_init_only_slit(void)
> +{
> +	/* SLIT: System Locality Information Table */
> +	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
> +}
> +
> +static int __init __acpi_numa_init(bool with_slit)
>  {
>  	int cnt = 0;

Hmmm.... how about just having the following two functions

	acpi_numa_init_srat();
	acpi_numa_init_slit();

and update both x86 and ia64 to use the two functions?

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/14] x86, ACPI: Split find/copy from acpi_initrd_override
  2013-03-08  5:33   ` Tejun Heo
@ 2013-03-08  6:47     ` Yinghai Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  6:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Rafael J. Wysocki, linux-acpi

On Thu, Mar 7, 2013 at 9:33 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 07, 2013 at 08:58:28PM -0800, Yinghai Lu wrote:
>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>> index c9e36d7..b9d2ff0 100644
>> --- a/drivers/acpi/osl.c
>> +++ b/drivers/acpi/osl.c
>> @@ -539,6 +539,7 @@ acpi_os_predefined_override(const struct acpi_predefined_names *init_val,
>>
>>  static u64 acpi_tables_addr;
>>  static int all_tables_size;
>> +static int table_nr;
>
> Not particularly good choice of name for static variable visible to
> multiple functions.  all_tables_size isn't a stellar choice either but
> no need to continue the tradition.  Maybe acpi_nr_initrd_files?  Also,
> why is this one defined here away from the actual table?

ok, acpi_nr_initrd_files.

will check if it could be killed.

>> -/* Must not increase 10 or needs code modification below */
>> -#define ACPI_OVERRIDE_TABLES 10
>> +#define ACPI_OVERRIDE_TABLES 64
>
> What's up with the silent bumping of table size?

will mention that in change log.

>
>> +static struct cpio_data __initdata early_initrd_files[ACPI_OVERRIDE_TABLES];
>
> acpi_initrd_files[]?  Do we really need the "early" designation
> together with initrd?

just move it out from acpi_initrd_override.

>
>> @@ -647,14 +653,14 @@ void __init acpi_initrd_override(void *data, size_t size)
>>       memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
>>       arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>>
>> -     p = early_ioremap(acpi_tables_addr, all_tables_size);
>> -
>>       for (no = 0; no < table_nr; no++) {
>> -             memcpy(p + total_offset, early_initrd_files[no].data,
>> -                    early_initrd_files[no].size);
>> -             total_offset += early_initrd_files[no].size;
>> +             size_t size = early_initrd_files[no].size;
>> +
>> +             p = early_ioremap(acpi_tables_addr + total_offset, size);
>> +             memcpy(p, early_initrd_files[no].data, size);
>> +             early_iounmap(p, size);
>> +             total_offset += size;
>>       }
>> -     early_iounmap(p, all_tables_size);
>
> Why is this necessary?  Why no explanation in the description?

actually it is the reason for bump table_nr to 64.

early_ioremap only can map 256k one time, so there will have limit for
overall size.

If map one by one, then we could increase the number of limit.

>
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -79,14 +79,6 @@ typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table);
>>  typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
>>                                     const unsigned long end);
>>
>> -#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
>> -void acpi_initrd_override(void *data, size_t size);
>> -#else
>> -static inline void acpi_initrd_override(void *data, size_t size)
>> -{
>> -}
>> -#endif
>> -
>>  char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
>>  void __acpi_unmap_table(char *map, unsigned long size);
>>  int early_acpi_boot_init(void);
>> @@ -485,6 +477,14 @@ static inline bool acpi_driver_match_device(struct device *dev,
>>
>>  #endif       /* !CONFIG_ACPI */
>>
>> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
>> +void acpi_initrd_override_find(void *data, size_t size);
>> +void acpi_initrd_override_copy(void);
>> +#else
>> +static inline void acpi_initrd_override_find(void *data, size_t size) { }
>> +static inline void acpi_initrd_override_copy(void) { }
>> +#endif
>
> I don't get this part either.  Why is it necessary to move the
> prototypes to avoid #ifdefs in setup.c?  Ah, okay, you're brining it
> outside CONFIG_ACPI so that they're defined regardless of that config
> option.  Can you please add why you're moving the prototype in the
> descriptoin?  Having "what" is nice but "why" is much nicer. :)

I think i have that in the log.

more detail is : ACPI_INITRD_TABLE_OVERRIDE depends
one ACPI and BLK_DEV_INITRD.

So could move it out safely.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/14] x86, ACPI: store override acpi tables phys addr
  2013-03-08  5:36   ` Tejun Heo
@ 2013-03-08  6:49     ` Yinghai Lu
  2013-03-08  7:08       ` Tejun Heo
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  6:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	linux-acpi

On Thu, Mar 7, 2013 at 9:36 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 07, 2013 at 08:58:29PM -0800, Yinghai Lu wrote:
>> As later 32bit only find table with phys address during 32bit flat mode
>> in head_32.S.
>>
>> To keep 32bit and 64 bit consistent, use phys_addr for all.
>>
>> Use early_ioremap to access during copying.
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: Thomas Renninger <trenn@suse.de>
>> Cc: Rafael J. Wysocki <rjw@sisk.pl>
>> Cc: linux-acpi@vger.kernel.org
>> ---
>> @@ -654,10 +654,13 @@ void __init acpi_initrd_override_copy(void)
>>       arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>>
>>       for (no = 0; no < table_nr; no++) {
>> -             size_t size = early_initrd_files[no].size;
>> +             unsigned long size = early_initrd_files[no].size;
>>
>>               p = early_ioremap(acpi_tables_addr + total_offset, size);
>> -             memcpy(p, early_initrd_files[no].data, size);
>> +             q = early_ioremap((unsigned long)early_initrd_files[no].data,
>> +                                      size);
>> +             memcpy(p, q, size);
>> +             early_iounmap(q, size);
>
> Ah, okay, so the loop change in the previous patch was for this, I
> suppose?  That chunk probably should either be a separate patch or
> rolled into this one.

merge two patches?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode
  2013-03-08  5:50   ` Tejun Heo
@ 2013-03-08  6:57     ` Yinghai Lu
  2013-03-08  7:06       ` Tejun Heo
                         ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  6:57 UTC (permalink / raw)
  To: Tejun Heo, Yu, Fenghua
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Rafael J. Wysocki, linux-acpi

On Thu, Mar 7, 2013 at 9:50 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 07, 2013 at 08:58:30PM -0800, Yinghai Lu wrote:
>> We will find acpi tables in initrd during head_32.S in 32bit flat mode.
>>
>> So need acpi_initrd_override_find could take phys directly.
>
> The patch description doesn't explain even half of what's going on.

hope HPA could understand.

Access initrd before relocate_initrd and init_memory mapping.

>
>> @@ -552,38 +552,47 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
>>       return sum;
>>  }
>>
>> -/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
>> -static const char * const table_sigs[] = {
>> -     ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
>> -     ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
>> -     ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
>> -     ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
>> -     ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
>> -     ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
>> -     ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
>> -     ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
>> -     ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
>
> Why is this table made a stack variable?  What's the benefit of doing
> that?

so I do need to switch global variables to phys and access it.

>
>>  /* Non-fatal errors: Affected tables/files are ignored */
>>  #define INVALID_TABLE(x, path, name)                                 \
>> -     { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
>> +     do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
>
> Might as well rename the macro to something which indicates it's just
> printing error message.  Urgh... who thought embedding control flow
> directive like continue inside a macro was a good idea? :(

so I removed it.

>
>> -void __init acpi_initrd_override_find(void *data, size_t size)
>> +void __init acpi_initrd_override_find(void *data, size_t size, bool is_phys)
>
> Is it really necessary to make the function take both virtual and
> physical addresses?  Can't we just make the function take phys_addr_t
> and update everyone to call with physaddr?  Also @is_phys isn't simple
> address switch.  It also changes error reporting.  If you're gonna
> keep @is_phys, let's at least write up a function comment explaining
> what's going on and why we need it.  But, really, if at all possible,
> let's change the function to take single type of argument and
> predicate error message printing on something else (e.g. early printk
> initialized or whatever).

yes, one for 32bit from head_32.S, phys.
one for 64bit from head64.c. with _va.

Not sure if I could use early_printk from head_32.S, as Fenghua does
not print out
from microcode updating early in the same parts.

Will check that.

>
>> @@ -654,11 +677,14 @@ void __init acpi_initrd_override_copy(void)
>>       arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>>
>>       for (no = 0; no < table_nr; no++) {
>> +             unsigned long phys_addr = (unsigned long)early_initrd_files[no].data;
>
> Can we please use phys_addr_t for physical addresses?

ok.

>
>>               unsigned long size = early_initrd_files[no].size;
>>
>> +             q = early_ioremap(phys_addr, size);
>> +             pr_info("%4.4s ACPI table found in initrd [%#010lx-%#010lx]\n",
>> +                             ((struct acpi_table_header *)q)->signature,
>> +                             phys_addr, phys_addr + size - 1);
>
> Maybe putting pr_info after ioremapping both p and q would be easier
> on the eyes?

ok.

>
>>               p = early_ioremap(acpi_tables_addr + total_offset, size);
>> -             q = early_ioremap((unsigned long)early_initrd_files[no].data,
>> -                                      size);
>>               memcpy(p, q, size);
>>               early_iounmap(q, size);
>>               early_iounmap(p, size);

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] x86, mm: Put pagetable on local node ram
  2013-03-08  4:58 ` [PATCH 14/14] x86, mm: Put pagetable on local node ram Yinghai Lu
@ 2013-03-08  7:01   ` Tejun Heo
  2013-03-08  7:44     ` Yinghai Lu
  2013-03-08  8:20   ` Tang Chen
  1 sibling, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  7:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Konrad Rzeszutek Wilk

On Thu, Mar 07, 2013 at 08:58:40PM -0800, Yinghai Lu wrote:
> If node with ram is hotplugable, local node mem for page table and vmemmap
> should be on that node ram.
> 
> This patch is some kind of refreshment of
> | commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327
> | Date:   Mon Dec 27 16:48:17 2010 -0800
> |
> |    x86-64, numa: Put pgtable to local node memory
> That was reverted before.
> 
> We have reason to reintroduce it to make memory hotplug work.
> 
> Split calling of init_mem_mapping into early_initmem_info
> for nodes after we get numa info there.
> 
> First node will be low range.
> Need to rework alloc_low_pages to alloc page table in following order:
> 	BRK, local node, low range
> 
> Still only load_cr3 one time, otherwise we would break xen 64bit again.

Hmmm... can you please split this patch further?  init_mem_mapping()
change can be separated, no?  Also, comments are disturbingly missing.
How are other people reading the code supposed to know what it's
trying to achieve why and how?  Hmmm... we're also likely to end up
with smaller mapping for misaligned NUMA configurations (I think my
test machine is like that).  Is it guaranteed that the top level ends
up in the first node?  It really needs documentation.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c
  2013-03-08  5:57   ` Tejun Heo
@ 2013-03-08  7:02     ` Yinghai Lu
  2013-03-08  7:07       ` Tejun Heo
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  7:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Rafael J. Wysocki, linux-acpi

On Thu, Mar 7, 2013 at 9:57 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 07, 2013 at 08:58:31PM -0800, Yinghai Lu wrote:
>> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
>> index 73afd11..ca08f0e 100644
>> --- a/arch/x86/kernel/head_32.S
>> +++ b/arch/x86/kernel/head_32.S
>> @@ -149,6 +149,10 @@ ENTRY(startup_32)
>>       call load_ucode_bsp
>>  #endif
>>
>> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
>> +     call x86_acpi_override_find
>> +#endif
>
> The function is always defined.  We can probalby lose ifdef here?

just mimic microcode updating again.

>
> Also, does it really have to be called from head_32.S?  No way this
> can be done after entering C code?  It would be great if you can
> explain overall design choices in the head message (and important
> patches).

have to be with head_32.S and it is with 32bit flat mode, so could access
4G blow without setting page table.

Will try add to more in the change log.

>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 668e658..d43545a 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -424,6 +424,32 @@ static void __init reserve_initrd(void)
>>  }
>>  #endif /* CONFIG_BLK_DEV_INITRD */
>>
>> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
>> +void __init x86_acpi_override_find(void)
>> +{
>> +     unsigned long ramdisk_image, ramdisk_size;
>> +     unsigned char *p = NULL;
>> +
>> +#ifdef CONFIG_X86_32
>> +     struct boot_params *boot_params_p;
>> +
>> +     boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
>> +     ramdisk_image = boot_params_p->hdr.ramdisk_image;
>> +     ramdisk_size  = boot_params_p->hdr.ramdisk_size;
>> +     p = (unsigned char *)ramdisk_image;
>> +     acpi_initrd_override_find(p, ramdisk_size, true);
>> +#else
>> +     ramdisk_image = get_ramdisk_image();
>> +     ramdisk_size  = get_ramdisk_size();
>> +     if (ramdisk_image)
>> +             p = __va(ramdisk_image);
>> +     acpi_initrd_override_find(p, ramdisk_size, false);
>> +#endif
>> +}
>> +#else
>> +void __init x86_acpi_override_find(void) { }
>
> And add a comment here why we're not doing static inline for the dummy
> function?

...

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/14] x86, mm, numa: Move successful path handling code later
  2013-03-08  6:04   ` Tejun Heo
@ 2013-03-08  7:03     ` Yinghai Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  7:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel

On Thu, Mar 7, 2013 at 10:04 PM, Tejun Heo <tj@kernel.org> wrote:
>>  static int __init numa_register_memblks(struct numa_meminfo *mi)
>
> After this patch, the above name is a bit misleading, I think.

later i changed it to numa_check_memblks()

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/14] x86, mm, numa: use numa_meminfo to check node_map_pfn alignment
  2013-03-08  6:26   ` Tejun Heo
@ 2013-03-08  7:05     ` Yinghai Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  7:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel

On Thu, Mar 7, 2013 at 10:26 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 07, 2013 at 08:58:34PM -0800, Yinghai Lu wrote:
>> We could use numa_meminfo directly instead of memblock nid.
>>
>> So we could move down set memblock nid down and only do it one time
>> for successful path
>>
>> Move node_map_pfn_alignment() to arch/x86/mm as no other user for it.
>
> Please don't move and update in the same patch.  It makes it difficult
> to review what's really changing.

ok, will make it two patches, one for moving and one for changing to
numa_meminfo.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode
  2013-03-08  6:57     ` Yinghai Lu
@ 2013-03-08  7:06       ` Tejun Heo
  2013-03-08  7:25         ` Yinghai Lu
  2013-03-08  7:16       ` Andrew Morton
  2013-03-08 21:25       ` Thomas Gleixner
  2 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  7:06 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Yu, Fenghua, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Thomas Renninger, Tang Chen, linux-kernel,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki, linux-acpi

Hello, Yinghai.

On Thu, Mar 07, 2013 at 10:57:21PM -0800, Yinghai Lu wrote:
> On Thu, Mar 7, 2013 at 9:50 PM, Tejun Heo <tj@kernel.org> wrote:
> > On Thu, Mar 07, 2013 at 08:58:30PM -0800, Yinghai Lu wrote:
> >> We will find acpi tables in initrd during head_32.S in 32bit flat mode.
> >>
> >> So need acpi_initrd_override_find could take phys directly.
> >
> > The patch description doesn't explain even half of what's going on.
> 
> hope HPA could understand.
> 
> Access initrd before relocate_initrd and init_memory mapping.

I really hope the changelogs were better.  Eh well...

> >> -/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
> >> -static const char * const table_sigs[] = {
> >> -     ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
> >> -     ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
> >> -     ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
> >> -     ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
> >> -     ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
> >> -     ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
> >> -     ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
> >> -     ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
> >> -     ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
> >
> > Why is this table made a stack variable?  What's the benefit of doing
> > that?
> 
> so I do need to switch global variables to phys and access it.

I can't really understand what your response means.  Can you please
elaborate?

> > Is it really necessary to make the function take both virtual and
> > physical addresses?  Can't we just make the function take phys_addr_t
> > and update everyone to call with physaddr?  Also @is_phys isn't simple
> > address switch.  It also changes error reporting.  If you're gonna
> > keep @is_phys, let's at least write up a function comment explaining
> > what's going on and why we need it.  But, really, if at all possible,
> > let's change the function to take single type of argument and
> > predicate error message printing on something else (e.g. early printk
> > initialized or whatever).
> 
> yes, one for 32bit from head_32.S, phys.
> one for 64bit from head64.c. with _va.

head64.c can't call with phys?  Why not?

> Not sure if I could use early_printk from head_32.S, as Fenghua does
> not print out
> from microcode updating early in the same parts.

ISTR it works but it doens't have to (although it would be much nicer
if it did).  You can test whether printk is online and skip if it
isn't online yet.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c
  2013-03-08  7:02     ` Yinghai Lu
@ 2013-03-08  7:07       ` Tejun Heo
  0 siblings, 0 replies; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  7:07 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Rafael J. Wysocki, linux-acpi

On Thu, Mar 07, 2013 at 11:02:15PM -0800, Yinghai Lu wrote:
> > Also, does it really have to be called from head_32.S?  No way this
> > can be done after entering C code?  It would be great if you can
> > explain overall design choices in the head message (and important
> > patches).
> 
> have to be with head_32.S and it is with 32bit flat mode, so could access
> 4G blow without setting page table.
> 
> Will try add to more in the change log.

Yes, please.  In the comment too.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/14] x86, ACPI: store override acpi tables phys addr
  2013-03-08  6:49     ` Yinghai Lu
@ 2013-03-08  7:08       ` Tejun Heo
  0 siblings, 0 replies; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  7:08 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	linux-acpi

On Thu, Mar 07, 2013 at 10:49:04PM -0800, Yinghai Lu wrote:
> >> @@ -654,10 +654,13 @@ void __init acpi_initrd_override_copy(void)
> >>       arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
> >>
> >>       for (no = 0; no < table_nr; no++) {
> >> -             size_t size = early_initrd_files[no].size;
> >> +             unsigned long size = early_initrd_files[no].size;
> >>
> >>               p = early_ioremap(acpi_tables_addr + total_offset, size);
> >> -             memcpy(p, early_initrd_files[no].data, size);
> >> +             q = early_ioremap((unsigned long)early_initrd_files[no].data,
> >> +                                      size);
> >> +             memcpy(p, q, size);
> >> +             early_iounmap(q, size);
> >
> > Ah, okay, so the loop change in the previous patch was for this, I
> > suppose?  That chunk probably should either be a separate patch or
> > rolled into this one.
> 
> merge two patches?

Hmm... probably better to just move the related chunks from the
previous patch to this one with better explanation on what's going on.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 09/14] x86, mm, numa: set memblock nid later
  2013-03-08  6:28   ` Tejun Heo
@ 2013-03-08  7:11     ` Yinghai Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  7:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel

On Thu, Mar 7, 2013 at 10:28 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 07, 2013 at 08:58:35PM -0800, Yinghai Lu wrote:
>> Only set memblock nid one time.
>
> Would be awesome if the description explains why we're doing this and
> why we're allowed to do this now.

will add more:


set memblock nid will cause membock layout change like array could be doubled.
and we do not have current memblock limit set, so will put down under
1M and could
use too much under 1M.
And for fallback path, we can avoid restore memblock and remerge action.


After We use numa_meminfo nid for checking, we don't need to send it again.


Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/14] x86, mm, numa: Move emulation handling down.
  2013-03-08  6:42   ` Tejun Heo
@ 2013-03-08  7:13     ` Yinghai Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  7:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, David Rientjes

On Thu, Mar 7, 2013 at 10:42 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 07, 2013 at 08:58:36PM -0800, Yinghai Lu wrote:
>> -static int __init numa_check_memblks(struct numa_meminfo *mi)
>> +
>> +int __init numa_check_memblks(struct numa_meminfo *mi)
>>  {
>> +     nodemask_t tmp_node_map;
>>       unsigned long pfn_align;
>>
>>       /* Account for nodes with cpus and no memory */
>> -     node_possible_map = numa_nodes_parsed;
>> -     numa_nodemask_from_meminfo(&node_possible_map, mi);
>> -     if (WARN_ON(nodes_empty(node_possible_map)))
>> +     tmp_node_map = numa_nodes_parsed;
>> +     numa_nodemask_from_meminfo(&tmp_node_map, mi);
>> +     if (WARN_ON(nodes_empty(tmp_node_map)))
>>               return -EINVAL;
>>
>>       if (!numa_meminfo_cover_memory(mi))
>> @@ -562,6 +564,7 @@ static int __init numa_check_memblks(struct numa_meminfo *mi)
>>               return -EINVAL;
>>       }
>>
>> +     node_possible_map = tmp_node_map;
>
> Hmmm.... it's kinda nasty to have a side effect like the above for a
> function named numa_check_memblks().  Maybe we can move this to the
> caller or name the function to make it clear that some global state is
> being updated?

ok, will split it out for node_possibe_map updating.

>
>> @@ -608,8 +611,6 @@ static int __init numa_init(int (*init_func)(void))
>>       if (ret < 0)
>>               return ret;
>>
>> -     numa_emulation(&numa_meminfo, numa_distance_cnt);
>> -
>>       ret = numa_check_memblks(&numa_meminfo);
>>       if (ret < 0)
>>               return ret;
>> @@ -669,6 +670,8 @@ void __init x86_numa_init(void)
>>       numa_init(dummy_numa_init);
>>
>>  out:
>> +     numa_emulation(&numa_meminfo, numa_distance_cnt);
>> +
>>       for (i = 0; i < mi->nr_blks; i++) {
>>               struct numa_memblk *mb = &mi->blk[i];
>>               memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
>> diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
>> index dbbbb47..5a0433d 100644
>> --- a/arch/x86/mm/numa_emulation.c
>> +++ b/arch/x86/mm/numa_emulation.c
>> @@ -348,7 +348,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
>>       if (ret < 0)
>>               goto no_emu;
>>
>> -     if (numa_cleanup_meminfo(&ei) < 0) {
>> +     if (numa_cleanup_meminfo(&ei) < 0 || numa_check_memblks(&ei) < 0) {
>>               pr_warning("NUMA: Warning: constructed meminfo invalid, disabling emulation\n");
>>               goto no_emu;
>>       }
>
> Given that acpi is the only mechanism which matters in any modern NUMA
> machines, I think the re-ordering should be fine.

Good.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode
  2013-03-08  6:57     ` Yinghai Lu
  2013-03-08  7:06       ` Tejun Heo
@ 2013-03-08  7:16       ` Andrew Morton
  2013-03-08 21:25       ` Thomas Gleixner
  2 siblings, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2013-03-08  7:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tejun Heo, Yu, Fenghua, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Thomas Renninger, Tang Chen, linux-kernel,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki, linux-acpi

On Thu, 7 Mar 2013 22:57:21 -0800 Yinghai Lu <yinghai@kernel.org> wrote:

> >
> >> @@ -552,38 +552,47 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
> >>       return sum;
> >>  }
> >>
> >> -/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
> >> -static const char * const table_sigs[] = {
> >> -     ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
> >> -     ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
> >> -     ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
> >> -     ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
> >> -     ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
> >> -     ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
> >> -     ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
> >> -     ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
> >> -     ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
> >
> > Why is this table made a stack variable?  What's the benefit of doing
> > that?
> 
> so I do need to switch global variables to phys and access it.

What Tejun means is that it should be marked "static" within
acpi_initrd_override(), so we don't have to build a copy on the stack
at runtime each time acpi_initrd_override() is called.

While we're there, it should be __initdata also.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 11/14] x86, acpi, numa: split SLIT handling out
  2013-03-08  6:46   ` Tejun Heo
@ 2013-03-08  7:18     ` Yinghai Lu
  2013-03-08  7:19       ` Tejun Heo
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  7:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	linux-acpi

On Thu, Mar 7, 2013 at 10:46 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 07, 2013 at 08:58:37PM -0800, Yinghai Lu wrote:
>> +void __init acpi_numa_init_only_slit(void)
>> +{
>> +     /* SLIT: System Locality Information Table */
>> +     acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
>> +}
>> +
>> +static int __init __acpi_numa_init(bool with_slit)
>>  {
>>       int cnt = 0;
>
> Hmmm.... how about just having the following two functions
>
>         acpi_numa_init_srat();
>         acpi_numa_init_slit();

ok.


> and update both x86 and ia64 to use the two functions?

ia64 like to call in this seqence
acpi_numa_init()
parse srat
parse slit
then
acpi_numa_arch_fixup()

in this arch_fixup, it will try to fill dummy distance_matrix.

so would to keep acpi_numa_init ...

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 11/14] x86, acpi, numa: split SLIT handling out
  2013-03-08  7:18     ` Yinghai Lu
@ 2013-03-08  7:19       ` Tejun Heo
  2013-03-08  7:33         ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  7:19 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	linux-acpi

On Thu, Mar 7, 2013 at 11:18 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> ia64 like to call in this seqence
> acpi_numa_init()
> parse srat
> parse slit
> then
> acpi_numa_arch_fixup()
>
> in this arch_fixup, it will try to fill dummy distance_matrix.
>
> so would to keep acpi_numa_init ...

Can't it just call acpi_numa_init_srat() and then init_slit()?  What
am I missing?

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode
  2013-03-08  7:06       ` Tejun Heo
@ 2013-03-08  7:25         ` Yinghai Lu
  2013-03-08  7:28           ` Tejun Heo
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  7:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yu, Fenghua, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Thomas Renninger, Tang Chen, linux-kernel,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki, linux-acpi

On Thu, Mar 7, 2013 at 11:06 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Yinghai.
>
> On Thu, Mar 07, 2013 at 10:57:21PM -0800, Yinghai Lu wrote:
>> On Thu, Mar 7, 2013 at 9:50 PM, Tejun Heo <tj@kernel.org> wrote:
>> > On Thu, Mar 07, 2013 at 08:58:30PM -0800, Yinghai Lu wrote:
>> >> We will find acpi tables in initrd during head_32.S in 32bit flat mode.
>> >>
>> >> So need acpi_initrd_override_find could take phys directly.
>> >
>> > The patch description doesn't explain even half of what's going on.
>>
>> hope HPA could understand.
>>
>> Access initrd before relocate_initrd and init_memory mapping.
>
> I really hope the changelogs were better.  Eh well...
>
>> >> -/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
>> >> -static const char * const table_sigs[] = {
>> >> -     ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
>> >> -     ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
>> >> -     ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
>> >> -     ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
>> >> -     ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
>> >> -     ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
>> >> -     ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
>> >> -     ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
>> >> -     ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
>> >
>> > Why is this table made a stack variable?  What's the benefit of doing
>> > that?
>>
>> so I do need to switch global variables to phys and access it.
>
> I can't really understand what your response means.  Can you please
> elaborate?

sorry, I missed NOT.

so I do NOT need to switch global variables from kernel virtual addr
to phys address and access it
in 32bit flat mode.

>
>> > Is it really necessary to make the function take both virtual and
>> > physical addresses?  Can't we just make the function take phys_addr_t
>> > and update everyone to call with physaddr?  Also @is_phys isn't simple
>> > address switch.  It also changes error reporting.  If you're gonna
>> > keep @is_phys, let's at least write up a function comment explaining
>> > what's going on and why we need it.  But, really, if at all possible,
>> > let's change the function to take single type of argument and
>> > predicate error message printing on something else (e.g. early printk
>> > initialized or whatever).
>>
>> yes, one for 32bit from head_32.S, phys.
>> one for 64bit from head64.c. with _va.
>
> head64.c can't call with phys?  Why not?

HPA's #PF set up page table only handle kernel low mapping address.

and after reset_early_page_tables, only kernel high mapping address is
there. and other low mapping will be supported via #PF handler.

>
>> Not sure if I could use early_printk from head_32.S, as Fenghua does
>> not print out
>> from microcode updating early in the same parts.
>
> ISTR it works but it doens't have to (although it would be much nicer
> if it did).  You can test whether printk is online and skip if it
> isn't online yet.

ok, will give it try.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode
  2013-03-08  7:25         ` Yinghai Lu
@ 2013-03-08  7:28           ` Tejun Heo
  0 siblings, 0 replies; 55+ messages in thread
From: Tejun Heo @ 2013-03-08  7:28 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Yu, Fenghua, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Thomas Renninger, Tang Chen, linux-kernel,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki, linux-acpi

On Thu, Mar 07, 2013 at 11:25:14PM -0800, Yinghai Lu wrote:
> >> > Why is this table made a stack variable?  What's the benefit of doing
> >> > that?
> >>
> >> so I do need to switch global variables to phys and access it.
> >
> > I can't really understand what your response means.  Can you please
> > elaborate?
> 
> sorry, I missed NOT.
> 
> so I do NOT need to switch global variables from kernel virtual addr
> to phys address and access it
> in 32bit flat mode.

Ah, okay, so the function is called with a completely different
address mode and so you actually want to build the table on stack so
that you don't have to flip the address mode for the global address.

> >> yes, one for 32bit from head_32.S, phys.
> >> one for 64bit from head64.c. with _va.
> >
> > head64.c can't call with phys?  Why not?
> 
> HPA's #PF set up page table only handle kernel low mapping address.
> 
> and after reset_early_page_tables, only kernel high mapping address is
> there. and other low mapping will be supported via #PF handler.

Okay, it now makes sense.  Ah.... You'll definitely need a lot of
documentation explanining what's going on.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 11/14] x86, acpi, numa: split SLIT handling out
  2013-03-08  7:19       ` Tejun Heo
@ 2013-03-08  7:33         ` Yinghai Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  7:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Rafael J. Wysocki,
	linux-acpi

On Thu, Mar 7, 2013 at 11:19 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 7, 2013 at 11:18 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> ia64 like to call in this seqence
>> acpi_numa_init()
>> parse srat
>> parse slit
>> then
>> acpi_numa_arch_fixup()
>>
>> in this arch_fixup, it will try to fill dummy distance_matrix.
>>
>> so would to keep acpi_numa_init ...
>
> Can't it just call acpi_numa_init_srat() and then init_slit()?  What
> am I missing?

Yes, but need to break acpi_numa_init calling
in arch/ia64/kernel/setup.c

with
acpi_numa_init_srat()
acpi_numa_init_slit()
acpi_numa_arch_fixup()

current code is
acpi_numa_init calling arch_numa_arch_fixup()

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] x86, mm: Put pagetable on local node ram
  2013-03-08  7:01   ` Tejun Heo
@ 2013-03-08  7:44     ` Yinghai Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08  7:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Tang Chen, linux-kernel, Pekka Enberg,
	Jacob Shin, Konrad Rzeszutek Wilk

On Thu, Mar 7, 2013 at 11:01 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Mar 07, 2013 at 08:58:40PM -0800, Yinghai Lu wrote:
>> If node with ram is hotplugable, local node mem for page table and vmemmap
>> should be on that node ram.
>>
>> This patch is some kind of refreshment of
>> | commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327
>> | Date:   Mon Dec 27 16:48:17 2010 -0800
>> |
>> |    x86-64, numa: Put pgtable to local node memory
>> That was reverted before.
>>
>> We have reason to reintroduce it to make memory hotplug work.
>>
>> Split calling of init_mem_mapping into early_initmem_info
>> for nodes after we get numa info there.
>>
>> First node will be low range.
>> Need to rework alloc_low_pages to alloc page table in following order:
>>       BRK, local node, low range
>>
>> Still only load_cr3 one time, otherwise we would break xen 64bit again.
>
> Hmmm... can you please split this patch further?  init_mem_mapping()
> change can be separated, no?

will try to split it out.

> Also, comments are disturbingly missing.
> How are other people reading the code supposed to know what it's
> trying to achieve why and how?  Hmmm... we're also likely to end up
> with smaller mapping for misaligned NUMA configurations (I think my
> test machine is like that).  Is it guaranteed that the top level ends
> up in the first node?  It really needs documentation.

Yes. To really memory hotplug working, will need to trim the node
alignment to be
1G in memblock and numa_meminfo.

also need to put pgd page in low range (first node) if 512G block is
crossing node.
for example: if node2 is [256g, 1024g), pgd for 256g-512g, must be stay on node0
and 512g-1024g could stay on node2.
or just put all PGD pages on low range (first node).

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] x86, mm: Put pagetable on local node ram
  2013-03-08  4:58 ` [PATCH 14/14] x86, mm: Put pagetable on local node ram Yinghai Lu
  2013-03-08  7:01   ` Tejun Heo
@ 2013-03-08  8:20   ` Tang Chen
  2013-03-08 17:25     ` Yinghai Lu
  1 sibling, 1 reply; 55+ messages in thread
From: Tang Chen @ 2013-03-08  8:20 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, linux-kernel, Tejun Heo, Pekka Enberg,
	Jacob Shin, Konrad Rzeszutek Wilk

Hi Yinghai,

On 03/08/2013 12:58 PM, Yinghai Lu wrote:
......
>   	/* xen has big range in reserved near end of ram, skip it at first.*/
> -	addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE, PMD_SIZE);
> +	addr = memblock_find_in_range(begin, end, PMD_SIZE, PMD_SIZE);

Found that the latest code here is:

  414         addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE,
  415                          PAGE_SIZE);
                              ^^^^^^^^^^^^

The "align" is PAGE_SIZE, not PMD_SIZE. Not sure if it is a problem. :)

Thanks. :)



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] x86, mm: Put pagetable on local node ram
  2013-03-08  8:20   ` Tang Chen
@ 2013-03-08 17:25     ` Yinghai Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2013-03-08 17:25 UTC (permalink / raw)
  To: Tang Chen
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, linux-kernel, Tejun Heo, Pekka Enberg,
	Jacob Shin, Konrad Rzeszutek Wilk

On Fri, Mar 8, 2013 at 12:20 AM, Tang Chen <tangchen@cn.fujitsu.com> wrote:
> Hi Yinghai,
>
> On 03/08/2013 12:58 PM, Yinghai Lu wrote:
> ......
>
>>         /* xen has big range in reserved near end of ram, skip it at
>> first.*/
>> -       addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE,
>> PMD_SIZE);
>> +       addr = memblock_find_in_range(begin, end, PMD_SIZE, PMD_SIZE);
>
>
> Found that the latest code here is:
>
>  414         addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE,
>  415                          PAGE_SIZE);
>                              ^^^^^^^^^^^^
>
> The "align" is PAGE_SIZE, not PMD_SIZE. Not sure if it is a problem. :)
>

Yes, it is PMD_SIZE.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=98e7a989979b185f49e86ddaed2ad6890299d9f0

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode
  2013-03-08  6:57     ` Yinghai Lu
  2013-03-08  7:06       ` Tejun Heo
  2013-03-08  7:16       ` Andrew Morton
@ 2013-03-08 21:25       ` Thomas Gleixner
  2 siblings, 0 replies; 55+ messages in thread
From: Thomas Gleixner @ 2013-03-08 21:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tejun Heo, Yu, Fenghua, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Thomas Renninger, Tang Chen, linux-kernel,
	Pekka Enberg, Jacob Shin, Rafael J. Wysocki, linux-acpi

Yinghai,

On Thu, 7 Mar 2013, Yinghai Lu wrote:

> On Thu, Mar 7, 2013 at 9:50 PM, Tejun Heo <tj@kernel.org> wrote:
> > On Thu, Mar 07, 2013 at 08:58:30PM -0800, Yinghai Lu wrote:
> >> We will find acpi tables in initrd during head_32.S in 32bit flat mode.
> >>
> >> So need acpi_initrd_override_find could take phys directly.
> >
> > The patch description doesn't explain even half of what's going on.
> 
> hope HPA could understand.

What the heck? Is HPA your personal decryptor?

I'm really tired of this nonsense. There is a track record of people
complaining about your completely useless and sloppy changelogs and
your unwillingness to properly explain and discuss your patches.

Just to make it clear. We all are able and willing to cope with
developers who are not native english speakers and have limited
language skills. I'm rewriting changelogs on a regular base without
complaining about that.

But that's not the problem at hand. I met you personally at KS and
found out that your english skills are not those of the random Chinese
person. You just prefer to hide your excellent language skills behind
your Chinese name.

That's an utter waste of time and resources!

Please grow up and use your language and technical skills in a way
which does not offend the people you need to interact with.

Seriously, if you can't convince yourself that a proper communication
with maintainers and other developers is a primary task, please don't
be surprised if you end up on the /dev/null filters and auto-NAK bots
of quite a bunch of affected people.

That would be a sad outcome, really. But that's solely your decision.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-08  6:09             ` H. Peter Anvin
@ 2013-03-11 22:50               ` Daniel Vetter
  2013-03-11 23:09                 ` Chris Wilson
  2013-03-12  1:51                 ` H. Peter Anvin
  0 siblings, 2 replies; 55+ messages in thread
From: Daniel Vetter @ 2013-03-11 22:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tejun Heo, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	Andrew Morton, Thomas Renninger, Tang Chen, linux-kernel,
	Rafael J. Wysocki, Daniel Vetter, David Airlie, Jacob Shin,
	linux-acpi, dri-devel

On Thu, Mar 07, 2013 at 10:09:26PM -0800, H. Peter Anvin wrote:
> On 03/07/2013 09:28 PM, Tejun Heo wrote:
> > On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> They are not using memblock_find_in_range(), so 1ULL<< will not help.
> >>
> >> Really hope i915 drm guys could clean that hacks.
> > 
> > The code isn't being used.  Just leave it alone.  Maybe add a comment.
> >  The change is just making things more confusing.
> > 
> 
> Indeed, but...
> 
> Daniel: can you guys clean this up or can we just remove the #if 0 clause?

I guess we could just put this into a comment explaining where stolen
memory for the gfx devices is at on gen2. But tbh I don't mind if we just
keep the #if 0 code around. For all newer platforms we can get at that
offset through mch bar registers, so I don't really care.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-11 22:50               ` Daniel Vetter
@ 2013-03-11 23:09                 ` Chris Wilson
  2013-03-12  1:51                 ` H. Peter Anvin
  1 sibling, 0 replies; 55+ messages in thread
From: Chris Wilson @ 2013-03-11 23:09 UTC (permalink / raw)
  To: H. Peter Anvin, Tejun Heo, Yinghai Lu, Thomas Gleixner,
	Ingo Molnar, Andrew Morton, Thomas Renninger, Tang Chen,
	linux-kernel, Rafael J. Wysocki, David Airlie, Jacob Shin,
	linux-acpi, dri-devel

On Mon, Mar 11, 2013 at 11:50:48PM +0100, Daniel Vetter wrote:
> On Thu, Mar 07, 2013 at 10:09:26PM -0800, H. Peter Anvin wrote:
> > On 03/07/2013 09:28 PM, Tejun Heo wrote:
> > > On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > >> They are not using memblock_find_in_range(), so 1ULL<< will not help.
> > >>
> > >> Really hope i915 drm guys could clean that hacks.
> > > 
> > > The code isn't being used.  Just leave it alone.  Maybe add a comment.
> > >  The change is just making things more confusing.
> > > 
> > 
> > Indeed, but...
> > 
> > Daniel: can you guys clean this up or can we just remove the #if 0 clause?
> 
> I guess we could just put this into a comment explaining where stolen
> memory for the gfx devices is at on gen2. But tbh I don't mind if we just
> keep the #if 0 code around. For all newer platforms we can get at that
> offset through mch bar registers, so I don't really care.

If you want to keep the comment accurate
s/max_low_pfn_mapped/max_pfn_mapped/ as the machines in question don't
support more than 4GiB anyway. Or you can help address the underlying
issue of figuring out how we can derive the location of the stolen
memory which is reserved by the BIOS but not communicated to the OS.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped
  2013-03-11 22:50               ` Daniel Vetter
  2013-03-11 23:09                 ` Chris Wilson
@ 2013-03-12  1:51                 ` H. Peter Anvin
  1 sibling, 0 replies; 55+ messages in thread
From: H. Peter Anvin @ 2013-03-12  1:51 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Tejun Heo, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	Andrew Morton, Thomas Renninger, Tang Chen, linux-kernel,
	Rafael J. Wysocki, Daniel Vetter, David Airlie, Jacob Shin,
	linux-acpi, dri-devel

The problem is that the code will be broken, and so it makes no sense.  The #if 0 is really confusing.

Daniel Vetter <daniel@ffwll.ch> wrote:

>On Thu, Mar 07, 2013 at 10:09:26PM -0800, H. Peter Anvin wrote:
>> On 03/07/2013 09:28 PM, Tejun Heo wrote:
>> > On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu <yinghai@kernel.org>
>wrote:
>> >> They are not using memblock_find_in_range(), so 1ULL<< will not
>help.
>> >>
>> >> Really hope i915 drm guys could clean that hacks.
>> > 
>> > The code isn't being used.  Just leave it alone.  Maybe add a
>comment.
>> >  The change is just making things more confusing.
>> > 
>> 
>> Indeed, but...
>> 
>> Daniel: can you guys clean this up or can we just remove the #if 0
>clause?
>
>I guess we could just put this into a comment explaining where stolen
>memory for the gfx devices is at on gen2. But tbh I don't mind if we
>just
>keep the #if 0 code around. For all newer platforms we can get at that
>offset through mch bar registers, so I don't really care.
>-Daniel

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2013-03-12  1:56 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-08  4:58 [PATCH 00/14] x86, ACPI, numa: Parse numa info early Yinghai Lu
2013-03-08  4:58 ` [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped Yinghai Lu
2013-03-08  5:10   ` Tejun Heo
2013-03-08  5:22     ` Yinghai Lu
2013-03-08  5:25       ` Tejun Heo
2013-03-08  5:27         ` Yinghai Lu
2013-03-08  5:28           ` Tejun Heo
2013-03-08  6:09             ` H. Peter Anvin
2013-03-11 22:50               ` Daniel Vetter
2013-03-11 23:09                 ` Chris Wilson
2013-03-12  1:51                 ` H. Peter Anvin
2013-03-08  4:58 ` [PATCH 02/14] x86, ACPI: Split find/copy from acpi_initrd_override Yinghai Lu
2013-03-08  5:33   ` Tejun Heo
2013-03-08  6:47     ` Yinghai Lu
2013-03-08  4:58 ` [PATCH 03/14] x86, ACPI: store override acpi tables phys addr Yinghai Lu
2013-03-08  5:36   ` Tejun Heo
2013-03-08  6:49     ` Yinghai Lu
2013-03-08  7:08       ` Tejun Heo
2013-03-08  4:58 ` [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode Yinghai Lu
2013-03-08  5:50   ` Tejun Heo
2013-03-08  6:57     ` Yinghai Lu
2013-03-08  7:06       ` Tejun Heo
2013-03-08  7:25         ` Yinghai Lu
2013-03-08  7:28           ` Tejun Heo
2013-03-08  7:16       ` Andrew Morton
2013-03-08 21:25       ` Thomas Gleixner
2013-03-08  4:58 ` [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c Yinghai Lu
2013-03-08  5:57   ` Tejun Heo
2013-03-08  7:02     ` Yinghai Lu
2013-03-08  7:07       ` Tejun Heo
2013-03-08  4:58 ` [PATCH 06/14] x86, mm, numa: Move successful path handling code later Yinghai Lu
2013-03-08  6:04   ` Tejun Heo
2013-03-08  7:03     ` Yinghai Lu
2013-03-08  4:58 ` [PATCH 07/14] x86, mm, numa: call numa_meminfo_cover_memory() early Yinghai Lu
2013-03-08  4:58 ` [PATCH 08/14] x86, mm, numa: use numa_meminfo to check node_map_pfn alignment Yinghai Lu
2013-03-08  6:26   ` Tejun Heo
2013-03-08  7:05     ` Yinghai Lu
2013-03-08  4:58 ` [PATCH 09/14] x86, mm, numa: set memblock nid later Yinghai Lu
2013-03-08  6:28   ` Tejun Heo
2013-03-08  7:11     ` Yinghai Lu
2013-03-08  4:58 ` [PATCH 10/14] x86, mm, numa: Move emulation handling down Yinghai Lu
2013-03-08  6:42   ` Tejun Heo
2013-03-08  7:13     ` Yinghai Lu
2013-03-08  4:58 ` [PATCH 11/14] x86, acpi, numa: split SLIT handling out Yinghai Lu
2013-03-08  6:46   ` Tejun Heo
2013-03-08  7:18     ` Yinghai Lu
2013-03-08  7:19       ` Tejun Heo
2013-03-08  7:33         ` Yinghai Lu
2013-03-08  4:58 ` [PATCH 12/14] x86, mm, numa: Add early_initmem_init() stub Yinghai Lu
2013-03-08  4:58 ` [PATCH 13/14] x86, mm: Parse numa info early Yinghai Lu
2013-03-08  4:58 ` [PATCH 14/14] x86, mm: Put pagetable on local node ram Yinghai Lu
2013-03-08  7:01   ` Tejun Heo
2013-03-08  7:44     ` Yinghai Lu
2013-03-08  8:20   ` Tang Chen
2013-03-08 17:25     ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).