All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH part4 0/4] Parse SRAT memory affinities earlier.
@ 2013-08-08  9:41 ` Tang Chen
  0 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-08  9:41 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The current Linux cannot migrate pages used by the kernel. When doing
memory hotplug, if the memory is used by the kernel, it cannot be 
hot-removed.

In order to prevent bootmem allocator (memblock) from allocating memory 
for the kernel, we have to parse SRAT at earlier time. 

When parsing ACPI tables at the system boot time, the current kernel 
works like this:

setup_arch()
 |->memblock_x86_fill()            /* memblock is ready. */
 |......
 |->acpi_initrd_override()         /* Find all tables specified by users in initrd,
 |                                    and store them in acpi_tables_addr array. */
 |......
 |->acpi_boot_table_init()         /* Find all tables in firmware and install them
                                      into acpi_gbl_root_table_list. Check acpi_tables_addr,
                                      if any table needs to be overrided, override it. */

In previous part3 patches modified it like this:

setup_arch()
 |->memblock_x86_fill()            /* memblock is ready. */
 |......
 |->early_acpi_boot_table_init()   /* Find all tables in firmware and install them 
 |                                    into acpi_gbl_root_table_list. No override. */
 |......
 |->acpi_initrd_override()         /* Find all tables specified by users in initrd,
 |                                    and store them in acpi_tables_addr array. */
 |......
 |->acpi_boot_table_init()         /* Check acpi_tables_addr, if any table needs to 
                                      be overrided, override it. */

We can obtain SRAT earlier now. So this patch-set will do the following things:

1. Try to find user specified SRAT in initrd file, if any, get it.
2. If there is no user specified SRAT in initrd file, to find SRAT 
   in firmware.
3. Parse all memory affinities in SRAT, and find all hotpluggable memory.

In later patches, we will improve memblock to mark and skip hotpluggable
memory when allocating memory.


Tang Chen (4):
  x86: Make get_ramdisk_{image|size}() global.
  x86, acpica, acpi: Try to find if SRAT is overrided earlier.
  x86, acpica, acpi: Try to find SRAT in firmware earlier.
  x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory
    affinities.

 arch/x86/include/asm/setup.h   |   21 +++++
 arch/x86/kernel/setup.c        |   28 +++----
 drivers/acpi/acpica/tbxface.c  |   32 ++++++++
 drivers/acpi/osl.c             |  168 ++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpixf.h          |    4 +
 include/linux/acpi.h           |   20 ++++-
 include/linux/memory_hotplug.h |    2 +
 mm/memory_hotplug.c            |   47 +++++++++++-
 8 files changed, 301 insertions(+), 21 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH part4 0/4] Parse SRAT memory affinities earlier.
@ 2013-08-08  9:41 ` Tang Chen
  0 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-08  9:41 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The current Linux cannot migrate pages used by the kernel. When doing
memory hotplug, if the memory is used by the kernel, it cannot be 
hot-removed.

In order to prevent bootmem allocator (memblock) from allocating memory 
for the kernel, we have to parse SRAT at earlier time. 

When parsing ACPI tables at the system boot time, the current kernel 
works like this:

setup_arch()
 |->memblock_x86_fill()            /* memblock is ready. */
 |......
 |->acpi_initrd_override()         /* Find all tables specified by users in initrd,
 |                                    and store them in acpi_tables_addr array. */
 |......
 |->acpi_boot_table_init()         /* Find all tables in firmware and install them
                                      into acpi_gbl_root_table_list. Check acpi_tables_addr,
                                      if any table needs to be overrided, override it. */

In previous part3 patches modified it like this:

setup_arch()
 |->memblock_x86_fill()            /* memblock is ready. */
 |......
 |->early_acpi_boot_table_init()   /* Find all tables in firmware and install them 
 |                                    into acpi_gbl_root_table_list. No override. */
 |......
 |->acpi_initrd_override()         /* Find all tables specified by users in initrd,
 |                                    and store them in acpi_tables_addr array. */
 |......
 |->acpi_boot_table_init()         /* Check acpi_tables_addr, if any table needs to 
                                      be overrided, override it. */

We can obtain SRAT earlier now. So this patch-set will do the following things:

1. Try to find user specified SRAT in initrd file, if any, get it.
2. If there is no user specified SRAT in initrd file, to find SRAT 
   in firmware.
3. Parse all memory affinities in SRAT, and find all hotpluggable memory.

In later patches, we will improve memblock to mark and skip hotpluggable
memory when allocating memory.


Tang Chen (4):
  x86: Make get_ramdisk_{image|size}() global.
  x86, acpica, acpi: Try to find if SRAT is overrided earlier.
  x86, acpica, acpi: Try to find SRAT in firmware earlier.
  x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory
    affinities.

 arch/x86/include/asm/setup.h   |   21 +++++
 arch/x86/kernel/setup.c        |   28 +++----
 drivers/acpi/acpica/tbxface.c  |   32 ++++++++
 drivers/acpi/osl.c             |  168 ++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpixf.h          |    4 +
 include/linux/acpi.h           |   20 ++++-
 include/linux/memory_hotplug.h |    2 +
 mm/memory_hotplug.c            |   47 +++++++++++-
 8 files changed, 301 insertions(+), 21 deletions(-)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH part4 1/4] x86: Make get_ramdisk_{image|size}() global.
  2013-08-08  9:41 ` Tang Chen
@ 2013-08-08  9:41   ` Tang Chen
  -1 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-08  9:41 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In the following patches, we need to call get_ramdisk_{image|size}()
to get initrd file's address and size. So make these two functions
global.

v1 -> v2:
As tj suggested, make these two function static inline in
arch/x86/include/asm/setup.h.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |   21 +++++++++++++++++++++
 arch/x86/kernel/setup.c      |   18 ------------------
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..cfdb55d 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,27 @@ void *extend_brk(size_t size, size_t align);
 	RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+
+#ifdef CONFIG_BLK_DEV_INITRD
+static inline u64 __init get_ramdisk_image(void)
+{
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
+	return ramdisk_image;
+}
+
+static inline u64 __init get_ramdisk_size(void)
+{
+	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
+	return ramdisk_size;
+}
+#endif /* CONFIG_BLK_DEV_INITRD */
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index fdb5a26..da44353 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -296,24 +296,6 @@ static void __init reserve_brk(void)
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
-
-static u64 __init get_ramdisk_image(void)
-{
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-
-	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
-
-	return ramdisk_image;
-}
-static u64 __init get_ramdisk_size(void)
-{
-	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
-
-	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
-
-	return ramdisk_size;
-}
-
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH part4 1/4] x86: Make get_ramdisk_{image|size}() global.
@ 2013-08-08  9:41   ` Tang Chen
  0 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-08  9:41 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In the following patches, we need to call get_ramdisk_{image|size}()
to get initrd file's address and size. So make these two functions
global.

v1 -> v2:
As tj suggested, make these two function static inline in
arch/x86/include/asm/setup.h.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |   21 +++++++++++++++++++++
 arch/x86/kernel/setup.c      |   18 ------------------
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..cfdb55d 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,27 @@ void *extend_brk(size_t size, size_t align);
 	RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+
+#ifdef CONFIG_BLK_DEV_INITRD
+static inline u64 __init get_ramdisk_image(void)
+{
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
+	return ramdisk_image;
+}
+
+static inline u64 __init get_ramdisk_size(void)
+{
+	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
+	return ramdisk_size;
+}
+#endif /* CONFIG_BLK_DEV_INITRD */
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index fdb5a26..da44353 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -296,24 +296,6 @@ static void __init reserve_brk(void)
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
-
-static u64 __init get_ramdisk_image(void)
-{
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-
-	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
-
-	return ramdisk_image;
-}
-static u64 __init get_ramdisk_size(void)
-{
-	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
-
-	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
-
-	return ramdisk_size;
-}
-
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH part4 2/4] x86, acpica, acpi: Try to find if SRAT is overrided earlier.
  2013-08-08  9:41 ` Tang Chen
@ 2013-08-08  9:41   ` Tang Chen
  -1 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-08  9:41 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Linux cannot migrate pages used by the kernel due to the direct mapping
(va = pa + PAGE_OFFSET), any memory used by the kernel cannot be hot-removed.
So when using memory hotplug, we have to prevent the kernel from using
hotpluggable memory.

The ACPI table SRAT (System Resource Affinity Table) contains info to specify
which memory is hotpluggble. After SRAT is parsed, we are aware of which
memory is hotpluggable.

At the early time when system is booting, SRAT has not been parsed. The boot
memory allocator memblock will allocate any memory to the kernel. So we need
SRAT parsed before memblock starts to work.

In this patch, we are going to parse SRAT earlier, right after memblock is ready.

Generally speaking, tables such as SRAT are provided by firmware. But
ACPI_INITRD_TABLE_OVERRIDE functionality allows users to customize their own
tables in initrd, and override the ones from firmware. So if we want to parse
SRAT earlier, we also need to do SRAT override earlier.

First, we introduce early_acpi_override_srat() to check if SRAT will be overridden
from initrd.

Second, we introduce find_hotpluggable_memory() to reserve hotpluggable memory,
which will firstly call early_acpi_override_srat() to find out which memory is
hotpluggable in the override SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c        |   10 ++++++
 drivers/acpi/osl.c             |   61 ++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h           |   14 ++++++++-
 include/linux/memory_hotplug.h |    2 +
 mm/memory_hotplug.c            |   25 +++++++++++++++-
 5 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index da44353..36d7fe8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1061,6 +1061,16 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	early_acpi_boot_table_init();
 
+#ifdef CONFIG_ACPI_NUMA
+	/*
+	 * Linux kernel cannot migrate kernel pages, as a result, memory used
+	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
+	 * memory in memblock to prevent memblock from allocating hotpluggable
+	 * memory for the kernel.
+	 */
+	find_hotpluggable_memory();
+#endif
+
 	/*
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 0043e9f..dcbca3e 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -48,6 +48,7 @@
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
+#include <asm/setup.h>
 
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
@@ -631,6 +632,66 @@ int __init acpi_verify_table(struct cpio_data *file,
 	return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_override_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Try to get the phys addr of SRAT in initrd.
+ *              The ACPI_INITRD_TABLE_OVERRIDE procedure is able to use tables
+ *              in initrd file to override the ones provided by firmware. This
+ *              function checks if there is a SRAT in initrd at early time. If
+ *              so, return the phys addr of the SRAT.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_override_srat(void)
+{
+	int i;
+	u32 length;
+	long offset;
+	void *ramdisk_vaddr;
+	struct acpi_table_header *table;
+	struct cpio_data file;
+	unsigned long map_step = NR_FIX_BTMAPS << PAGE_SHIFT;
+	phys_addr_t ramdisk_image = get_ramdisk_image();
+	char cpio_path[32] = "kernel/firmware/acpi/";
+
+	if (!ramdisk_image || !get_ramdisk_size())
+		return 0;
+
+	/* Try to find if SRAT is overridden */
+	for (i = 0; i < ACPI_OVERRIDE_TABLES; i++) {
+		ramdisk_vaddr = early_ioremap(ramdisk_image, map_step);
+
+		file = find_cpio_data(cpio_path, ramdisk_vaddr,
+				      map_step, &offset);
+		if (!file.data) {
+			early_iounmap(ramdisk_vaddr, map_step);
+			return 0;
+		}
+
+		table = file.data;
+		length = table->length;
+
+		if (acpi_verify_table(&file, cpio_path, ACPI_SIG_SRAT)) {
+			ramdisk_image += offset;
+			early_iounmap(ramdisk_vaddr, map_step);
+			continue;
+		}
+
+		/* Found SRAT */
+		early_iounmap(ramdisk_vaddr, map_step);
+		ramdisk_image = ramdisk_image + offset - length;
+
+		break;
+	}
+
+	return ramdisk_image;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
 	int no, table_nr = 0, total_offset = 0;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c5e7b2a..bdcb9dd 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -81,11 +81,21 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
 void acpi_initrd_override(void *data, size_t size);
-#else
+
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_override_srat(void);
+#endif	/* CONFIG_ACPI_NUMA */
+
+#else	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 static inline void acpi_initrd_override(void *data, size_t size)
 {
 }
-#endif
+
+static inline phys_addr_t early_acpi_override_srat(void)
+{
+	return 0;
+}
+#endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index dd38e62..463efa9 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -104,6 +104,7 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
+extern void find_hotpluggable_memory(void);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
@@ -181,6 +182,7 @@ static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
 }
 #endif
+
 extern void put_page_bootmem(struct page *page);
 extern void get_page_bootmem(unsigned long ingo, struct page *page,
 			     unsigned long type);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ca1dd3a..2a57888 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -30,6 +30,7 @@
 #include <linux/mm_inline.h>
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
+#include <linux/acpi.h>
 
 #include <asm/tlbflush.h>
 
@@ -62,7 +63,6 @@ void unlock_memory_hotplug(void)
 	mutex_unlock(&mem_hotplug_mutex);
 }
 
-
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
@@ -91,6 +91,29 @@ static void release_memory_resource(struct resource *res)
 	return;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/**
+ * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
+ *
+ * This function did the following:
+ * 1. Try to find if there is a SRAT in initrd file used to override the one
+ *    provided by firmware. If so, get its phys addr.
+ * 2. If there is no override SRAT, get the phys addr of the SRAT in firmware.
+ * 3. Parse SRAT, find out which memory is hotpluggable.
+ */
+void __init find_hotpluggable_memory(void)
+{
+	phys_addr_t srat_paddr;
+
+	/* Try to find if SRAT is overridden */
+	srat_paddr = early_acpi_override_srat();
+	if (!srat_paddr)
+		return;
+
+	/* Will parse SRAT and find out hotpluggable memory here */
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 void get_page_bootmem(unsigned long info,  struct page *page,
 		      unsigned long type)
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH part4 2/4] x86, acpica, acpi: Try to find if SRAT is overrided earlier.
@ 2013-08-08  9:41   ` Tang Chen
  0 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-08  9:41 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Linux cannot migrate pages used by the kernel due to the direct mapping
(va = pa + PAGE_OFFSET), any memory used by the kernel cannot be hot-removed.
So when using memory hotplug, we have to prevent the kernel from using
hotpluggable memory.

The ACPI table SRAT (System Resource Affinity Table) contains info to specify
which memory is hotpluggble. After SRAT is parsed, we are aware of which
memory is hotpluggable.

At the early time when system is booting, SRAT has not been parsed. The boot
memory allocator memblock will allocate any memory to the kernel. So we need
SRAT parsed before memblock starts to work.

In this patch, we are going to parse SRAT earlier, right after memblock is ready.

Generally speaking, tables such as SRAT are provided by firmware. But
ACPI_INITRD_TABLE_OVERRIDE functionality allows users to customize their own
tables in initrd, and override the ones from firmware. So if we want to parse
SRAT earlier, we also need to do SRAT override earlier.

First, we introduce early_acpi_override_srat() to check if SRAT will be overridden
from initrd.

Second, we introduce find_hotpluggable_memory() to reserve hotpluggable memory,
which will firstly call early_acpi_override_srat() to find out which memory is
hotpluggable in the override SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c        |   10 ++++++
 drivers/acpi/osl.c             |   61 ++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h           |   14 ++++++++-
 include/linux/memory_hotplug.h |    2 +
 mm/memory_hotplug.c            |   25 +++++++++++++++-
 5 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index da44353..36d7fe8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1061,6 +1061,16 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	early_acpi_boot_table_init();
 
+#ifdef CONFIG_ACPI_NUMA
+	/*
+	 * Linux kernel cannot migrate kernel pages, as a result, memory used
+	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
+	 * memory in memblock to prevent memblock from allocating hotpluggable
+	 * memory for the kernel.
+	 */
+	find_hotpluggable_memory();
+#endif
+
 	/*
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 0043e9f..dcbca3e 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -48,6 +48,7 @@
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
+#include <asm/setup.h>
 
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
@@ -631,6 +632,66 @@ int __init acpi_verify_table(struct cpio_data *file,
 	return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_override_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Try to get the phys addr of SRAT in initrd.
+ *              The ACPI_INITRD_TABLE_OVERRIDE procedure is able to use tables
+ *              in initrd file to override the ones provided by firmware. This
+ *              function checks if there is a SRAT in initrd at early time. If
+ *              so, return the phys addr of the SRAT.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_override_srat(void)
+{
+	int i;
+	u32 length;
+	long offset;
+	void *ramdisk_vaddr;
+	struct acpi_table_header *table;
+	struct cpio_data file;
+	unsigned long map_step = NR_FIX_BTMAPS << PAGE_SHIFT;
+	phys_addr_t ramdisk_image = get_ramdisk_image();
+	char cpio_path[32] = "kernel/firmware/acpi/";
+
+	if (!ramdisk_image || !get_ramdisk_size())
+		return 0;
+
+	/* Try to find if SRAT is overridden */
+	for (i = 0; i < ACPI_OVERRIDE_TABLES; i++) {
+		ramdisk_vaddr = early_ioremap(ramdisk_image, map_step);
+
+		file = find_cpio_data(cpio_path, ramdisk_vaddr,
+				      map_step, &offset);
+		if (!file.data) {
+			early_iounmap(ramdisk_vaddr, map_step);
+			return 0;
+		}
+
+		table = file.data;
+		length = table->length;
+
+		if (acpi_verify_table(&file, cpio_path, ACPI_SIG_SRAT)) {
+			ramdisk_image += offset;
+			early_iounmap(ramdisk_vaddr, map_step);
+			continue;
+		}
+
+		/* Found SRAT */
+		early_iounmap(ramdisk_vaddr, map_step);
+		ramdisk_image = ramdisk_image + offset - length;
+
+		break;
+	}
+
+	return ramdisk_image;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
 	int no, table_nr = 0, total_offset = 0;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c5e7b2a..bdcb9dd 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -81,11 +81,21 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
 void acpi_initrd_override(void *data, size_t size);
-#else
+
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_override_srat(void);
+#endif	/* CONFIG_ACPI_NUMA */
+
+#else	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 static inline void acpi_initrd_override(void *data, size_t size)
 {
 }
-#endif
+
+static inline phys_addr_t early_acpi_override_srat(void)
+{
+	return 0;
+}
+#endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index dd38e62..463efa9 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -104,6 +104,7 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
+extern void find_hotpluggable_memory(void);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
@@ -181,6 +182,7 @@ static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
 }
 #endif
+
 extern void put_page_bootmem(struct page *page);
 extern void get_page_bootmem(unsigned long ingo, struct page *page,
 			     unsigned long type);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ca1dd3a..2a57888 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -30,6 +30,7 @@
 #include <linux/mm_inline.h>
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
+#include <linux/acpi.h>
 
 #include <asm/tlbflush.h>
 
@@ -62,7 +63,6 @@ void unlock_memory_hotplug(void)
 	mutex_unlock(&mem_hotplug_mutex);
 }
 
-
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
@@ -91,6 +91,29 @@ static void release_memory_resource(struct resource *res)
 	return;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/**
+ * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
+ *
+ * This function did the following:
+ * 1. Try to find if there is a SRAT in initrd file used to override the one
+ *    provided by firmware. If so, get its phys addr.
+ * 2. If there is no override SRAT, get the phys addr of the SRAT in firmware.
+ * 3. Parse SRAT, find out which memory is hotpluggable.
+ */
+void __init find_hotpluggable_memory(void)
+{
+	phys_addr_t srat_paddr;
+
+	/* Try to find if SRAT is overridden */
+	srat_paddr = early_acpi_override_srat();
+	if (!srat_paddr)
+		return;
+
+	/* Will parse SRAT and find out hotpluggable memory here */
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 void get_page_bootmem(unsigned long info,  struct page *page,
 		      unsigned long type)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH part4 3/4] x86, acpica, acpi: Try to find SRAT in firmware earlier.
  2013-08-08  9:41 ` Tang Chen
@ 2013-08-08  9:41   ` Tang Chen
  -1 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-08  9:41 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch introduce early_acpi_firmware_srat() to find the
phys addr of SRAT provided by firmware. And call it in
find_hotpluggable_memory().

Since we have initialized acpi_gbl_root_table_list earlier,
and store all the tables' phys addrs and signatures in it,
it is easy to find the SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbxface.c |   32 ++++++++++++++++++++++++++++++++
 drivers/acpi/osl.c            |   22 ++++++++++++++++++++++
 include/acpi/acpixf.h         |    4 ++++
 include/linux/acpi.h          |    4 ++++
 mm/memory_hotplug.c           |    8 ++++++--
 5 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c
index ecaa5e1..a025dcc 100644
--- a/drivers/acpi/acpica/tbxface.c
+++ b/drivers/acpi/acpica/tbxface.c
@@ -236,6 +236,38 @@ acpi_status acpi_reallocate_root_table(void)
 	return_ACPI_STATUS(status);
 }
 
+/**
+ * acpi_get_table_desc - Get the acpi table descriptor of a specific table.
+ * @signature: The signature of the table to be found.
+ * @out_desc: The out returned descriptor.
+ *
+ * Iterate over acpi_gbl_root_table_list to find a specific table and then
+ * return its phys addr.
+ *
+ * NOTE: The caller has the responsibility to allocate memory for @out_desc.
+ *
+ * Return AE_OK on success, AE_NOT_FOUND if the table is not found.
+ */
+acpi_status acpi_get_table_desc(char *signature,
+				struct acpi_table_desc *out_desc)
+{
+	struct acpi_table_desc *desc;
+	int pos, count = acpi_gbl_root_table_list.current_table_count;
+
+	for (pos = 0; pos < count; pos++) {
+		desc = &acpi_gbl_root_table_list.tables[pos];
+
+		if (!ACPI_COMPARE_NAME(&desc->signature, signature))
+			continue;
+
+		memcpy(out_desc, desc, sizeof(struct acpi_table_desc));
+
+		return_ACPI_STATUS(AE_OK);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
+
 /*******************************************************************************
  *
  * FUNCTION:    acpi_get_table_header
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index dcbca3e..ec490fe 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -53,6 +53,7 @@
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
 #include <acpi/processor.h>
+#include <acpi/acpixf.h>
 
 #define _COMPONENT		ACPI_OS_SERVICES
 ACPI_MODULE_NAME("osl");
@@ -760,6 +761,27 @@ void __init acpi_initrd_override(void *data, size_t size)
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_firmware_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Get the phys addr of SRAT provided by firmware.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_firmware_srat(void)
+{
+	struct acpi_table_desc table_desc;
+
+	if (acpi_get_table_desc(ACPI_SIG_SRAT, &table_desc))
+		return 0;
+
+	return table_desc.address;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 static void acpi_table_taint(struct acpi_table_header *table)
 {
 	pr_warn(PREFIX
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index 99c9d7b..daa7c10 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -188,6 +188,10 @@ acpi_status acpi_find_root_pointer(acpi_size *rsdp_address);
 acpi_status acpi_unload_table_id(acpi_owner_id id);
 
 acpi_status
+acpi_get_table_desc(char *signature,
+		    struct acpi_table_desc *out_desc);
+
+acpi_status
 acpi_get_table_header(acpi_string signature,
 		      u32 instance, struct acpi_table_header *out_table_header);
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index bdcb9dd..280078c 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -97,6 +97,10 @@ static inline phys_addr_t early_acpi_override_srat(void)
 }
 #endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_firmware_srat(void);
+#endif  /* CONFIG_ACPI_NUMA */
+
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2a57888..2dfb06f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -107,8 +107,12 @@ void __init find_hotpluggable_memory(void)
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
-	if (!srat_paddr)
-		return;
+	if (!srat_paddr) {
+		/* Try to find SRAT from firmware if it wasn't overridden */
+		srat_paddr = early_acpi_firmware_srat();
+		if (!srat_paddr)
+			return;
+	}
 
 	/* Will parse SRAT and find out hotpluggable memory here */
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH part4 3/4] x86, acpica, acpi: Try to find SRAT in firmware earlier.
@ 2013-08-08  9:41   ` Tang Chen
  0 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-08  9:41 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch introduce early_acpi_firmware_srat() to find the
phys addr of SRAT provided by firmware. And call it in
find_hotpluggable_memory().

Since we have initialized acpi_gbl_root_table_list earlier,
and store all the tables' phys addrs and signatures in it,
it is easy to find the SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbxface.c |   32 ++++++++++++++++++++++++++++++++
 drivers/acpi/osl.c            |   22 ++++++++++++++++++++++
 include/acpi/acpixf.h         |    4 ++++
 include/linux/acpi.h          |    4 ++++
 mm/memory_hotplug.c           |    8 ++++++--
 5 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c
index ecaa5e1..a025dcc 100644
--- a/drivers/acpi/acpica/tbxface.c
+++ b/drivers/acpi/acpica/tbxface.c
@@ -236,6 +236,38 @@ acpi_status acpi_reallocate_root_table(void)
 	return_ACPI_STATUS(status);
 }
 
+/**
+ * acpi_get_table_desc - Get the acpi table descriptor of a specific table.
+ * @signature: The signature of the table to be found.
+ * @out_desc: The out returned descriptor.
+ *
+ * Iterate over acpi_gbl_root_table_list to find a specific table and then
+ * return its phys addr.
+ *
+ * NOTE: The caller has the responsibility to allocate memory for @out_desc.
+ *
+ * Return AE_OK on success, AE_NOT_FOUND if the table is not found.
+ */
+acpi_status acpi_get_table_desc(char *signature,
+				struct acpi_table_desc *out_desc)
+{
+	struct acpi_table_desc *desc;
+	int pos, count = acpi_gbl_root_table_list.current_table_count;
+
+	for (pos = 0; pos < count; pos++) {
+		desc = &acpi_gbl_root_table_list.tables[pos];
+
+		if (!ACPI_COMPARE_NAME(&desc->signature, signature))
+			continue;
+
+		memcpy(out_desc, desc, sizeof(struct acpi_table_desc));
+
+		return_ACPI_STATUS(AE_OK);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
+
 /*******************************************************************************
  *
  * FUNCTION:    acpi_get_table_header
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index dcbca3e..ec490fe 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -53,6 +53,7 @@
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
 #include <acpi/processor.h>
+#include <acpi/acpixf.h>
 
 #define _COMPONENT		ACPI_OS_SERVICES
 ACPI_MODULE_NAME("osl");
@@ -760,6 +761,27 @@ void __init acpi_initrd_override(void *data, size_t size)
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_firmware_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Get the phys addr of SRAT provided by firmware.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_firmware_srat(void)
+{
+	struct acpi_table_desc table_desc;
+
+	if (acpi_get_table_desc(ACPI_SIG_SRAT, &table_desc))
+		return 0;
+
+	return table_desc.address;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 static void acpi_table_taint(struct acpi_table_header *table)
 {
 	pr_warn(PREFIX
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index 99c9d7b..daa7c10 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -188,6 +188,10 @@ acpi_status acpi_find_root_pointer(acpi_size *rsdp_address);
 acpi_status acpi_unload_table_id(acpi_owner_id id);
 
 acpi_status
+acpi_get_table_desc(char *signature,
+		    struct acpi_table_desc *out_desc);
+
+acpi_status
 acpi_get_table_header(acpi_string signature,
 		      u32 instance, struct acpi_table_header *out_table_header);
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index bdcb9dd..280078c 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -97,6 +97,10 @@ static inline phys_addr_t early_acpi_override_srat(void)
 }
 #endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_firmware_srat(void);
+#endif  /* CONFIG_ACPI_NUMA */
+
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2a57888..2dfb06f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -107,8 +107,12 @@ void __init find_hotpluggable_memory(void)
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
-	if (!srat_paddr)
-		return;
+	if (!srat_paddr) {
+		/* Try to find SRAT from firmware if it wasn't overridden */
+		srat_paddr = early_acpi_firmware_srat();
+		if (!srat_paddr)
+			return;
+	}
 
 	/* Will parse SRAT and find out hotpluggable memory here */
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
  2013-08-08  9:41 ` Tang Chen
@ 2013-08-08  9:41   ` Tang Chen
  -1 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-08  9:41 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In ACPI SRAT(System Resource Affinity Table), there is a memory affinity for each
memory range in the system. In each memory affinity, there is a field indicating
that if the memory range is hotpluggable.

This patch parses all the memory affinities in SRAT only, and find out all the
hotpluggable memory ranges in the system.

This patch doesn't mark hotpluggable memory in memblock. Memory marked as hotplug
won't be allocated to the kernel. If all the memory in the system is hotpluggable,
then the system won't have enough memory to boot. The basic idea to solve this
problem is making the nodes the kerenl resides in unhotpluggable. So, before we do
this, we don't mark any hotpluggable memory in memory so that to keep memblock
working as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c   |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |    2 +
 mm/memory_hotplug.c  |   22 ++++++++++++-
 3 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index ec490fe..d01202d 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -780,6 +780,91 @@ phys_addr_t __init early_acpi_firmware_srat(void)
 
 	return table_desc.address;
 }
+
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_hotplug_mem_affinity
+ *
+ * PARAMETERS:  Srat_vaddr         - Virt addr of SRAT
+ *              Base               - The base address of the found hotpluggable
+ *                                   memory region
+ *              Size               - The size of the found hotpluggable memory
+ *                                   region
+ *              Offset             - Offset of the found memory affinity
+ *
+ * RETURN:      Status
+ *
+ * DESCRIPTION: This function iterates SRAT affinities list to find memory
+ *              affinities with hotpluggable memory one by one. Return the
+ *              offset of the found memory affinity through @offset. @offset
+ *              can be used to iterate the SRAT affinities list to find all the
+ *              hotpluggable memory affinities. If @offset is 0, it is the first
+ *              time of the iteration.
+ *
+ ******************************************************************************/
+acpi_status __init
+acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base, u64 *size,
+			  unsigned long *offset)
+{
+	struct acpi_table_header *table_header;
+	struct acpi_subtable_header *entry;
+	struct acpi_srat_mem_affinity *ma;
+	unsigned long table_end, curr;
+
+	if (!offset)
+		return_ACPI_STATUS(AE_BAD_PARAMETER);
+
+	table_header = (struct acpi_table_header *)srat_vaddr;
+	table_end = (unsigned long)table_header + table_header->length;
+
+	entry = (struct acpi_subtable_header *)
+		((unsigned long)table_header + *offset);
+
+	if (*offset) {
+		/*
+		 * @offset is the offset of the last affinity found in the
+		 * last call. So need to move to the next affinity.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	} else {
+		/*
+		 * Offset of the first affinity is the size of SRAT
+		 * table header.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + sizeof(struct acpi_table_srat));
+	}
+
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) <
+	       table_end) {
+		if (entry->length == 0)
+			break;
+
+		if (entry->type != ACPI_SRAT_TYPE_MEMORY_AFFINITY)
+			goto next;
+
+		ma = (struct acpi_srat_mem_affinity *)entry;
+
+		if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE))
+			goto next;
+
+		if (base)
+			*base = ma->base_address;
+
+		if (size)
+			*size = ma->length;
+
+		*offset = (unsigned long)entry - (unsigned long)srat_vaddr;
+		return_ACPI_STATUS(AE_OK);
+
+next:
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
 #endif	/* CONFIG_ACPI_NUMA */
 
 static void acpi_table_taint(struct acpi_table_header *table)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 280078c..f103e91 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -99,6 +99,8 @@ static inline phys_addr_t early_acpi_override_srat(void)
 
 #ifdef CONFIG_ACPI_NUMA
 phys_addr_t early_acpi_firmware_srat(void);
+acpi_status acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base,
+				      u64 *size, unsigned long *offset);
 #endif  /* CONFIG_ACPI_NUMA */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2dfb06f..ef9ccf8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -103,7 +103,11 @@ static void release_memory_resource(struct resource *res)
  */
 void __init find_hotpluggable_memory(void)
 {
-	phys_addr_t srat_paddr;
+	void *srat_vaddr;
+	phys_addr_t srat_paddr, base, size;
+	u32 length;
+	struct acpi_table_header *srat_header;
+	unsigned long offset = 0;
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
@@ -114,7 +118,21 @@ void __init find_hotpluggable_memory(void)
 			return;
 	}
 
-	/* Will parse SRAT and find out hotpluggable memory here */
+	/* Get the length of SRAT */
+	srat_header = early_ioremap(srat_paddr,
+				    sizeof(struct acpi_table_header));
+	length = srat_header->length;
+	early_iounmap(srat_header, sizeof(struct acpi_table_header));
+
+	/* Find all the hotpluggable memory regions */
+	srat_vaddr = early_ioremap(srat_paddr, length);
+
+	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
+						      &size, &offset))) {
+		/* Will mark hotpluggable memory regions here */
+	}
+
+	early_iounmap(srat_vaddr, length);
 }
 #endif	/* CONFIG_ACPI_NUMA */
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
@ 2013-08-08  9:41   ` Tang Chen
  0 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-08  9:41 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In ACPI SRAT(System Resource Affinity Table), there is a memory affinity for each
memory range in the system. In each memory affinity, there is a field indicating
that if the memory range is hotpluggable.

This patch parses all the memory affinities in SRAT only, and find out all the
hotpluggable memory ranges in the system.

This patch doesn't mark hotpluggable memory in memblock. Memory marked as hotplug
won't be allocated to the kernel. If all the memory in the system is hotpluggable,
then the system won't have enough memory to boot. The basic idea to solve this
problem is making the nodes the kerenl resides in unhotpluggable. So, before we do
this, we don't mark any hotpluggable memory in memory so that to keep memblock
working as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c   |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |    2 +
 mm/memory_hotplug.c  |   22 ++++++++++++-
 3 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index ec490fe..d01202d 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -780,6 +780,91 @@ phys_addr_t __init early_acpi_firmware_srat(void)
 
 	return table_desc.address;
 }
+
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_hotplug_mem_affinity
+ *
+ * PARAMETERS:  Srat_vaddr         - Virt addr of SRAT
+ *              Base               - The base address of the found hotpluggable
+ *                                   memory region
+ *              Size               - The size of the found hotpluggable memory
+ *                                   region
+ *              Offset             - Offset of the found memory affinity
+ *
+ * RETURN:      Status
+ *
+ * DESCRIPTION: This function iterates SRAT affinities list to find memory
+ *              affinities with hotpluggable memory one by one. Return the
+ *              offset of the found memory affinity through @offset. @offset
+ *              can be used to iterate the SRAT affinities list to find all the
+ *              hotpluggable memory affinities. If @offset is 0, it is the first
+ *              time of the iteration.
+ *
+ ******************************************************************************/
+acpi_status __init
+acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base, u64 *size,
+			  unsigned long *offset)
+{
+	struct acpi_table_header *table_header;
+	struct acpi_subtable_header *entry;
+	struct acpi_srat_mem_affinity *ma;
+	unsigned long table_end, curr;
+
+	if (!offset)
+		return_ACPI_STATUS(AE_BAD_PARAMETER);
+
+	table_header = (struct acpi_table_header *)srat_vaddr;
+	table_end = (unsigned long)table_header + table_header->length;
+
+	entry = (struct acpi_subtable_header *)
+		((unsigned long)table_header + *offset);
+
+	if (*offset) {
+		/*
+		 * @offset is the offset of the last affinity found in the
+		 * last call. So need to move to the next affinity.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	} else {
+		/*
+		 * Offset of the first affinity is the size of SRAT
+		 * table header.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + sizeof(struct acpi_table_srat));
+	}
+
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) <
+	       table_end) {
+		if (entry->length == 0)
+			break;
+
+		if (entry->type != ACPI_SRAT_TYPE_MEMORY_AFFINITY)
+			goto next;
+
+		ma = (struct acpi_srat_mem_affinity *)entry;
+
+		if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE))
+			goto next;
+
+		if (base)
+			*base = ma->base_address;
+
+		if (size)
+			*size = ma->length;
+
+		*offset = (unsigned long)entry - (unsigned long)srat_vaddr;
+		return_ACPI_STATUS(AE_OK);
+
+next:
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
 #endif	/* CONFIG_ACPI_NUMA */
 
 static void acpi_table_taint(struct acpi_table_header *table)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 280078c..f103e91 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -99,6 +99,8 @@ static inline phys_addr_t early_acpi_override_srat(void)
 
 #ifdef CONFIG_ACPI_NUMA
 phys_addr_t early_acpi_firmware_srat(void);
+acpi_status acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base,
+				      u64 *size, unsigned long *offset);
 #endif  /* CONFIG_ACPI_NUMA */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2dfb06f..ef9ccf8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -103,7 +103,11 @@ static void release_memory_resource(struct resource *res)
  */
 void __init find_hotpluggable_memory(void)
 {
-	phys_addr_t srat_paddr;
+	void *srat_vaddr;
+	phys_addr_t srat_paddr, base, size;
+	u32 length;
+	struct acpi_table_header *srat_header;
+	unsigned long offset = 0;
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
@@ -114,7 +118,21 @@ void __init find_hotpluggable_memory(void)
 			return;
 	}
 
-	/* Will parse SRAT and find out hotpluggable memory here */
+	/* Get the length of SRAT */
+	srat_header = early_ioremap(srat_paddr,
+				    sizeof(struct acpi_table_header));
+	length = srat_header->length;
+	early_iounmap(srat_header, sizeof(struct acpi_table_header));
+
+	/* Find all the hotpluggable memory regions */
+	srat_vaddr = early_ioremap(srat_paddr, length);
+
+	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
+						      &size, &offset))) {
+		/* Will mark hotpluggable memory regions here */
+	}
+
+	early_iounmap(srat_vaddr, length);
 }
 #endif	/* CONFIG_ACPI_NUMA */
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 2/4] x86, acpica, acpi: Try to find if SRAT is overrided earlier.
  2013-08-08  9:41   ` Tang Chen
  (?)
@ 2013-08-08 16:29   ` Yinghai Lu
  2013-08-09  9:41     ` Tang Chen
  -1 siblings, 1 reply; 30+ messages in thread
From: Yinghai Lu @ 2013-08-08 16:29 UTC (permalink / raw)
  To: Tang Chen, Konrad Rzeszutek Wilk
  Cc: Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Jiang Liu, Zhang Yanfei,
	Linux Kernel Mailing List, ACPI Devel Maling List

Trimmed CC list a bit.

On Thu, Aug 8, 2013 at 2:41 AM, Tang Chen <tangchen@cn.fujitsu.com> wrote:

> Linux cannot migrate pages used by the kernel due to the direct mapping
> (va = pa + PAGE_OFFSET), any memory used by the kernel cannot be hot-removed.
> So when using memory hotplug, we have to prevent the kernel from using
> hotpluggable memory.
>
> The ACPI table SRAT (System Resource Affinity Table) contains info to specify
> which memory is hotpluggble. After SRAT is parsed, we are aware of which
> memory is hotpluggable.
>
> At the early time when system is booting, SRAT has not been parsed. The boot
> memory allocator memblock will allocate any memory to the kernel. So we need
> SRAT parsed before memblock starts to work.
>
> In this patch, we are going to parse SRAT earlier, right after memblock is ready.
>
> Generally speaking, tables such as SRAT are provided by firmware. But
> ACPI_INITRD_TABLE_OVERRIDE functionality allows users to customize their own
> tables in initrd, and override the ones from firmware. So if we want to parse
> SRAT earlier, we also need to do SRAT override earlier.
>
> First, we introduce early_acpi_override_srat() to check if SRAT will be overridden
> from initrd.
>
> Second, we introduce find_hotpluggable_memory() to reserve hotpluggable memory,
> which will firstly call early_acpi_override_srat() to find out which memory is
> hotpluggable in the override SRAT.

acpi_override_srat handling is pretty ugly.

Please check if you can reuse first half of my patchset, so find and copy
override table earlier. the copied acpi tables could be near kernel code range.

Move finding in head64.c stage could help xen/dom0 a bit.
as Konrad is working on patchset with acpi override in xen hypervisor.
We can avoid override acpi table two times. Esp xen like to change
DMAR to XMAR.

Yinghai

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
  2013-08-08  9:41   ` Tang Chen
@ 2013-08-08 16:41     ` Yinghai Lu
  -1 siblings, 0 replies; 30+ messages in thread
From: Yinghai Lu @ 2013-08-08 16:41 UTC (permalink / raw)
  To: Tang Chen
  Cc: Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy

On Thu, Aug 8, 2013 at 2:41 AM, Tang Chen <tangchen@cn.fujitsu.com> wrote:
> In ACPI SRAT(System Resource Affinity Table), there is a memory affinity for each
> memory range in the system. In each memory affinity, there is a field indicating
> that if the memory range is hotpluggable.
>
> This patch parses all the memory affinities in SRAT only, and find out all the
> hotpluggable memory ranges in the system.

oh, no.

How do you make sure the SRAT's entries are right ?
later numa_init could reject srat table if srat ranges does not cover
e820 memmap.

Also parse srat table two times looks silly.

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
@ 2013-08-08 16:41     ` Yinghai Lu
  0 siblings, 0 replies; 30+ messages in thread
From: Yinghai Lu @ 2013-08-08 16:41 UTC (permalink / raw)
  To: Tang Chen
  Cc: Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy,
	the arch/x86 maintainers, linux-doc, Linux Kernel Mailing List,
	Linux MM, ACPI Devel Maling List

On Thu, Aug 8, 2013 at 2:41 AM, Tang Chen <tangchen@cn.fujitsu.com> wrote:
> In ACPI SRAT(System Resource Affinity Table), there is a memory affinity for each
> memory range in the system. In each memory affinity, there is a field indicating
> that if the memory range is hotpluggable.
>
> This patch parses all the memory affinities in SRAT only, and find out all the
> hotpluggable memory ranges in the system.

oh, no.

How do you make sure the SRAT's entries are right ?
later numa_init could reject srat table if srat ranges does not cover
e820 memmap.

Also parse srat table two times looks silly.

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
  2013-08-08 16:41     ` Yinghai Lu
@ 2013-08-09  9:32       ` Tang Chen
  -1 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-09  9:32 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prar

On 08/09/2013 12:41 AM, Yinghai Lu wrote:
> On Thu, Aug 8, 2013 at 2:41 AM, Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>> In ACPI SRAT(System Resource Affinity Table), there is a memory affinity for each
>> memory range in the system. In each memory affinity, there is a field indicating
>> that if the memory range is hotpluggable.
>>
>> This patch parses all the memory affinities in SRAT only, and find out all the
>> hotpluggable memory ranges in the system.
>
> oh, no.
>
> How do you make sure the SRAT's entries are right ?
> later numa_init could reject srat table if srat ranges does not cover
> e820 memmap.

In numa_meminfo_cover_memory(), it checks if SRAT covers the e820 ranges.
And it uses
     e820ram = max_pfn - absent_pages_in_range(0, max_pfn)
to calculate the e820 ram size.

Since max_pfn is initialized before memblock.memory is fulfilled, I think
we can also do this check at earlier time.

>
> Also parse srat table two times looks silly.

By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
in acpi_initrd_override_copy() procedure at such an early time. This memory
could also be in hotpluggable area.

I think, parsing SRAT memory affinities one more time is clean, no memory
allocation, no global variable initialization. All the current numa init
pathes will work as before.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
@ 2013-08-09  9:32       ` Tang Chen
  0 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-09  9:32 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy,
	the arch/x86 maintainers, linux-doc, Linux Kernel Mailing List,
	Linux MM, ACPI Devel Maling List

On 08/09/2013 12:41 AM, Yinghai Lu wrote:
> On Thu, Aug 8, 2013 at 2:41 AM, Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>> In ACPI SRAT(System Resource Affinity Table), there is a memory affinity for each
>> memory range in the system. In each memory affinity, there is a field indicating
>> that if the memory range is hotpluggable.
>>
>> This patch parses all the memory affinities in SRAT only, and find out all the
>> hotpluggable memory ranges in the system.
>
> oh, no.
>
> How do you make sure the SRAT's entries are right ?
> later numa_init could reject srat table if srat ranges does not cover
> e820 memmap.

In numa_meminfo_cover_memory(), it checks if SRAT covers the e820 ranges.
And it uses
     e820ram = max_pfn - absent_pages_in_range(0, max_pfn)
to calculate the e820 ram size.

Since max_pfn is initialized before memblock.memory is fulfilled, I think
we can also do this check at earlier time.

>
> Also parse srat table two times looks silly.

By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
in acpi_initrd_override_copy() procedure at such an early time. This memory
could also be in hotpluggable area.

I think, parsing SRAT memory affinities one more time is clean, no memory
allocation, no global variable initialization. All the current numa init
pathes will work as before.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 2/4] x86, acpica, acpi: Try to find if SRAT is overrided earlier.
  2013-08-08 16:29   ` Yinghai Lu
@ 2013-08-09  9:41     ` Tang Chen
  2013-08-09 23:34       ` Yinghai Lu
  0 siblings, 1 reply; 30+ messages in thread
From: Tang Chen @ 2013-08-09  9:41 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Len Brown, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Tejun Heo, Thomas Renninger, Jiang Liu,
	Zhang Yanfei, Linux Kernel Mailing List, ACPI Devel Maling List

On 08/09/2013 12:29 AM, Yinghai Lu wrote:
......
>
> Please check if you can reuse first half of my patchset, so find and copy
> override table earlier. the copied acpi tables could be near kernel code range.
>

I don't think we need to do the finding step at that early time, in
head64.c stage.

Before pagetables are setup, we can use early_ioremap() to map the
memory we want to access. We don't need to use phys addr. We can do
it in setup_arch(), which has nothing to do with 32bit or 64bit.

> Move finding in head64.c stage could help xen/dom0 a bit.
> as Konrad is working on patchset with acpi override in xen hypervisor.
> We can avoid override acpi table two times. Esp xen like to change
> DMAR to XMAR.

Would you please give some more info about this, and explain why finding
override tables in head64.c stage is helpful for xen ?

Thanks.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 2/4] x86, acpica, acpi: Try to find if SRAT is overrided earlier.
  2013-08-09  9:41     ` Tang Chen
@ 2013-08-09 23:34       ` Yinghai Lu
  2013-08-12 12:28         ` Tang Chen
  0 siblings, 1 reply; 30+ messages in thread
From: Yinghai Lu @ 2013-08-09 23:34 UTC (permalink / raw)
  To: Tang Chen
  Cc: Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Len Brown, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Tejun Heo, Thomas Renninger, Jiang Liu,
	Zhang Yanfei, Linux Kernel Mailing List, ACPI Devel Maling List

On Fri, Aug 9, 2013 at 2:41 AM, Tang Chen <tangchen@cn.fujitsu.com> wrote:
> On 08/09/2013 12:29 AM, Yinghai Lu wrote:
> ......
>
>>
>> Please check if you can reuse first half of my patchset, so find and copy
>> override table earlier. the copied acpi tables could be near kernel code
>> range.
>>
>
> I don't think we need to do the finding step at that early time, in
> head64.c stage.
>
> Before pagetables are setup, we can use early_ioremap() to map the
> memory we want to access. We don't need to use phys addr. We can do
> it in setup_arch(), which has nothing to do with 32bit or 64bit.

if override the acpi tables early, you don't need to check firmware srat and
then override srat.
just check last one will be used by kernel.

So you don't need to dig initrd to find srat anymore.

>
>
>> Move finding in head64.c stage could help xen/dom0 a bit.
>> as Konrad is working on patchset with acpi override in xen hypervisor.
>> We can avoid override acpi table two times. Esp xen like to change
>> DMAR to XMAR.
>
>
> Would you please give some more info about this, and explain why finding
> override tables in head64.c stage is helpful for xen ?

xen usually can change acpi tables and pass to dom0 kernel. like change DMAR
to hide it to dom0 kernel.

also distribution could have same kernel to support bare metal and dom0.

so if we find the override kernel early in head64.c, dom0 path will not copy
actually as no one try to find that for them.

Yinghai

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
  2013-08-09  9:32       ` Tang Chen
@ 2013-08-09 23:39         ` Yinghai Lu
  -1 siblings, 0 replies; 30+ messages in thread
From: Yinghai Lu @ 2013-08-09 23:39 UTC (permalink / raw)
  To: Tang Chen
  Cc: Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy

On Fri, Aug 9, 2013 at 2:32 AM, Tang Chen <tangchen@cn.fujitsu.com> wrote:
> On 08/09/2013 12:41 AM, Yinghai Lu wrote:
>>
>> On Thu, Aug 8, 2013 at 2:41 AM, Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>>>
>>> In ACPI SRAT(System Resource Affinity Table), there is a memory affinity
>>> for each
>>> memory range in the system. In each memory affinity, there is a field
>>> indicating
>>> that if the memory range is hotpluggable.
>>>
>>> This patch parses all the memory affinities in SRAT only, and find out
>>> all the
>>> hotpluggable memory ranges in the system.
>>
>>
>> oh, no.
>>
>> How do you make sure the SRAT's entries are right ?
>> later numa_init could reject srat table if srat ranges does not cover
>> e820 memmap.
>
>
> In numa_meminfo_cover_memory(), it checks if SRAT covers the e820 ranges.
> And it uses
>     e820ram = max_pfn - absent_pages_in_range(0, max_pfn)
> to calculate the e820 ram size.
>
> Since max_pfn is initialized before memblock.memory is fulfilled, I think
> we can also do this check at earlier time.
>
>
>>
>> Also parse srat table two times looks silly.
>
>
> By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
> in acpi_initrd_override_copy() procedure at such an early time. This memory
> could also be in hotpluggable area.

You already mark kernel position to be not hot-plugged,  so near the
kernel range should be safe to be put override acpi tables.

also what I mean parse srat two times:
parse to get hotplug range, and late parse other numa info again.

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
@ 2013-08-09 23:39         ` Yinghai Lu
  0 siblings, 0 replies; 30+ messages in thread
From: Yinghai Lu @ 2013-08-09 23:39 UTC (permalink / raw)
  To: Tang Chen
  Cc: Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Tejun Heo, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy,
	the arch/x86 maintainers, linux-doc, Linux Kernel Mailing List,
	Linux MM, ACPI Devel Maling List

On Fri, Aug 9, 2013 at 2:32 AM, Tang Chen <tangchen@cn.fujitsu.com> wrote:
> On 08/09/2013 12:41 AM, Yinghai Lu wrote:
>>
>> On Thu, Aug 8, 2013 at 2:41 AM, Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>>>
>>> In ACPI SRAT(System Resource Affinity Table), there is a memory affinity
>>> for each
>>> memory range in the system. In each memory affinity, there is a field
>>> indicating
>>> that if the memory range is hotpluggable.
>>>
>>> This patch parses all the memory affinities in SRAT only, and find out
>>> all the
>>> hotpluggable memory ranges in the system.
>>
>>
>> oh, no.
>>
>> How do you make sure the SRAT's entries are right ?
>> later numa_init could reject srat table if srat ranges does not cover
>> e820 memmap.
>
>
> In numa_meminfo_cover_memory(), it checks if SRAT covers the e820 ranges.
> And it uses
>     e820ram = max_pfn - absent_pages_in_range(0, max_pfn)
> to calculate the e820 ram size.
>
> Since max_pfn is initialized before memblock.memory is fulfilled, I think
> we can also do this check at earlier time.
>
>
>>
>> Also parse srat table two times looks silly.
>
>
> By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
> in acpi_initrd_override_copy() procedure at such an early time. This memory
> could also be in hotpluggable area.

You already mark kernel position to be not hot-plugged,  so near the
kernel range should be safe to be put override acpi tables.

also what I mean parse srat two times:
parse to get hotplug range, and late parse other numa info again.

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
  2013-08-09 23:39         ` Yinghai Lu
@ 2013-08-09 23:43           ` H. Peter Anvin
  -1 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2013-08-09 23:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tang Chen, Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Jiang Liu, Wen Congyang, Lai Jiangshan,
	Yasuaki Ishimatsu, Taku Izumi, Mel Gorman, Minchan Kim, mina86,
	gong.chen, Vasilis Liaskovitis, lwoodman, Rik van Riel, jweiner,
	Prarit

On 08/09/2013 04:39 PM, Yinghai Lu wrote:
>>>
>>> Also parse srat table two times looks silly.
>>
>> By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
>> in acpi_initrd_override_copy() procedure at such an early time. This memory
>> could also be in hotpluggable area.
> 
> You already mark kernel position to be not hot-plugged,  so near the
> kernel range should be safe to be put override acpi tables.
> 
> also what I mean parse srat two times:
> parse to get hotplug range, and late parse other numa info again.
> 

Doing two passes over a small data structure (SRAT) would seem more
sensible than allocating memory just to avoid that...

	-hpa


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
@ 2013-08-09 23:43           ` H. Peter Anvin
  0 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2013-08-09 23:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tang Chen, Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Jiang Liu, Wen Congyang, Lai Jiangshan,
	Yasuaki Ishimatsu, Taku Izumi, Mel Gorman, Minchan Kim, mina86,
	gong.chen, Vasilis Liaskovitis, lwoodman, Rik van Riel, jweiner,
	Prarit Bhargava, Zhang Yanfei, yanghy, the arch/x86 maintainers,
	linux-doc, Linux Kernel Mailing List, Linux MM,
	ACPI Devel Maling List

On 08/09/2013 04:39 PM, Yinghai Lu wrote:
>>>
>>> Also parse srat table two times looks silly.
>>
>> By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
>> in acpi_initrd_override_copy() procedure at such an early time. This memory
>> could also be in hotpluggable area.
> 
> You already mark kernel position to be not hot-plugged,  so near the
> kernel range should be safe to be put override acpi tables.
> 
> also what I mean parse srat two times:
> parse to get hotplug range, and late parse other numa info again.
> 

Doing two passes over a small data structure (SRAT) would seem more
sensible than allocating memory just to avoid that...

	-hpa


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
  2013-08-09 23:43           ` H. Peter Anvin
@ 2013-08-09 23:53             ` Yinghai Lu
  -1 siblings, 0 replies; 30+ messages in thread
From: Yinghai Lu @ 2013-08-09 23:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tang Chen, Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Jiang Liu, Wen Congyang, Lai Jiangshan,
	Yasuaki Ishimatsu, Taku Izumi, Mel Gorman, Minchan Kim, mina86,
	gong.chen, Vasilis Liaskovitis, lwoodman, Rik van Riel, jweiner,
	Prarit Bhargava, Zhang Yanfei, yanghy

On Fri, Aug 9, 2013 at 4:43 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 08/09/2013 04:39 PM, Yinghai Lu wrote:
>>>>
>>>> Also parse srat table two times looks silly.
>>>
>>> By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
>>> in acpi_initrd_override_copy() procedure at such an early time. This memory
>>> could also be in hotpluggable area.
>>
>> You already mark kernel position to be not hot-plugged,  so near the
>> kernel range should be safe to be put override acpi tables.
>>
>> also what I mean parse srat two times:
>> parse to get hotplug range, and late parse other numa info again.
>>
>
> Doing two passes over a small data structure (SRAT) would seem more
> sensible than allocating memory just to avoid that...

for x86 there is some numa info discovery path, and there are chance
srat is wrong but still have hotplug range there, or numa finally is using other
way or not used. Inconsistency looks weird.

numa_meminfo is static struct, we have way to get final numa info early enough
before we need use memblock to alloc buffer with it.

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
@ 2013-08-09 23:53             ` Yinghai Lu
  0 siblings, 0 replies; 30+ messages in thread
From: Yinghai Lu @ 2013-08-09 23:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tang Chen, Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Jiang Liu, Wen Congyang, Lai Jiangshan,
	Yasuaki Ishimatsu, Taku Izumi, Mel Gorman, Minchan Kim, mina86,
	gong.chen, Vasilis Liaskovitis, lwoodman, Rik van Riel, jweiner,
	Prarit Bhargava, Zhang Yanfei, yanghy, the arch/x86 maintainers,
	linux-doc, Linux Kernel Mailing List, Linux MM,
	ACPI Devel Maling List

On Fri, Aug 9, 2013 at 4:43 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 08/09/2013 04:39 PM, Yinghai Lu wrote:
>>>>
>>>> Also parse srat table two times looks silly.
>>>
>>> By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
>>> in acpi_initrd_override_copy() procedure at such an early time. This memory
>>> could also be in hotpluggable area.
>>
>> You already mark kernel position to be not hot-plugged,  so near the
>> kernel range should be safe to be put override acpi tables.
>>
>> also what I mean parse srat two times:
>> parse to get hotplug range, and late parse other numa info again.
>>
>
> Doing two passes over a small data structure (SRAT) would seem more
> sensible than allocating memory just to avoid that...

for x86 there is some numa info discovery path, and there are chance
srat is wrong but still have hotplug range there, or numa finally is using other
way or not used. Inconsistency looks weird.

numa_meminfo is static struct, we have way to get final numa info early enough
before we need use memblock to alloc buffer with it.

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
  2013-08-09 23:53             ` Yinghai Lu
@ 2013-08-09 23:56               ` H. Peter Anvin
  -1 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2013-08-09 23:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tang Chen, Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Jiang Liu, Wen Congyang, Lai Jiangshan,
	Yasuaki Ishimatsu, Taku Izumi, Mel Gorman, Minchan Kim, mina86,
	gong.chen, Vasilis Liaskovitis, lwoodman, Rik van Riel, jweiner,
	Prarit

On 08/09/2013 04:53 PM, Yinghai Lu wrote:
> On Fri, Aug 9, 2013 at 4:43 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 08/09/2013 04:39 PM, Yinghai Lu wrote:
>>>>>
>>>>> Also parse srat table two times looks silly.
>>>>
>>>> By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
>>>> in acpi_initrd_override_copy() procedure at such an early time. This memory
>>>> could also be in hotpluggable area.
>>>
>>> You already mark kernel position to be not hot-plugged,  so near the
>>> kernel range should be safe to be put override acpi tables.
>>>
>>> also what I mean parse srat two times:
>>> parse to get hotplug range, and late parse other numa info again.
>>>
>>
>> Doing two passes over a small data structure (SRAT) would seem more
>> sensible than allocating memory just to avoid that...
> 
> for x86 there is some numa info discovery path, and there are chance
> srat is wrong but still have hotplug range there, or numa finally is using other
> way or not used. Inconsistency looks weird.
> 
> numa_meminfo is static struct, we have way to get final numa info early enough
> before we need use memblock to alloc buffer with it.
> 

Now, for kernel-generated data if you can define a sensible maximum you
can put it in brk, if not, you have a serious problem.

	-hpa


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
@ 2013-08-09 23:56               ` H. Peter Anvin
  0 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2013-08-09 23:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tang Chen, Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Jiang Liu, Wen Congyang, Lai Jiangshan,
	Yasuaki Ishimatsu, Taku Izumi, Mel Gorman, Minchan Kim, mina86,
	gong.chen, Vasilis Liaskovitis, lwoodman, Rik van Riel, jweiner,
	Prarit Bhargava, Zhang Yanfei, yanghy, the arch/x86 maintainers,
	linux-doc, Linux Kernel Mailing List, Linux MM,
	ACPI Devel Maling List

On 08/09/2013 04:53 PM, Yinghai Lu wrote:
> On Fri, Aug 9, 2013 at 4:43 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 08/09/2013 04:39 PM, Yinghai Lu wrote:
>>>>>
>>>>> Also parse srat table two times looks silly.
>>>>
>>>> By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
>>>> in acpi_initrd_override_copy() procedure at such an early time. This memory
>>>> could also be in hotpluggable area.
>>>
>>> You already mark kernel position to be not hot-plugged,  so near the
>>> kernel range should be safe to be put override acpi tables.
>>>
>>> also what I mean parse srat two times:
>>> parse to get hotplug range, and late parse other numa info again.
>>>
>>
>> Doing two passes over a small data structure (SRAT) would seem more
>> sensible than allocating memory just to avoid that...
> 
> for x86 there is some numa info discovery path, and there are chance
> srat is wrong but still have hotplug range there, or numa finally is using other
> way or not used. Inconsistency looks weird.
> 
> numa_meminfo is static struct, we have way to get final numa info early enough
> before we need use memblock to alloc buffer with it.
> 

Now, for kernel-generated data if you can define a sensible maximum you
can put it in brk, if not, you have a serious problem.

	-hpa


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
  2013-08-09 23:56               ` H. Peter Anvin
@ 2013-08-10  0:12                 ` Yinghai Lu
  -1 siblings, 0 replies; 30+ messages in thread
From: Yinghai Lu @ 2013-08-10  0:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tang Chen, Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Jiang Liu, Wen Congyang, Lai Jiangshan,
	Yasuaki Ishimatsu, Taku Izumi, Mel Gorman, Minchan Kim, mina86,
	gong.chen, Vasilis Liaskovitis, lwoodman, Rik van Riel, jweiner,
	Prarit Bhargava, Zhang Yanfei, yanghy

On Fri, Aug 9, 2013 at 4:56 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 08/09/2013 04:53 PM, Yinghai Lu wrote:
>> On Fri, Aug 9, 2013 at 4:43 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 08/09/2013 04:39 PM, Yinghai Lu wrote:
>>>>>>
>>>>>> Also parse srat table two times looks silly.
>>>>>
>>>>> By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
>>>>> in acpi_initrd_override_copy() procedure at such an early time. This memory
>>>>> could also be in hotpluggable area.
>>>>
>>>> You already mark kernel position to be not hot-plugged,  so near the
>>>> kernel range should be safe to be put override acpi tables.
>>>>
>>>> also what I mean parse srat two times:
>>>> parse to get hotplug range, and late parse other numa info again.
>>>>
>>>
>>> Doing two passes over a small data structure (SRAT) would seem more
>>> sensible than allocating memory just to avoid that...
>>
>> for x86 there is some numa info discovery path, and there are chance
>> srat is wrong but still have hotplug range there, or numa finally is using other
>> way or not used. Inconsistency looks weird.
>>
>> numa_meminfo is static struct, we have way to get final numa info early enough
>> before we need use memblock to alloc buffer with it.
>>
>
> Now, for kernel-generated data if you can define a sensible maximum you
> can put it in brk, if not, you have a serious problem.

Yes, that could be even better, we just can numa_info parsing early but just
make them to use brk if it need extra buffer.
In the way we may not need to split things to two steps or two pass etc.

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
@ 2013-08-10  0:12                 ` Yinghai Lu
  0 siblings, 0 replies; 30+ messages in thread
From: Yinghai Lu @ 2013-08-10  0:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tang Chen, Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, Tejun Heo,
	Thomas Renninger, Jiang Liu, Wen Congyang, Lai Jiangshan,
	Yasuaki Ishimatsu, Taku Izumi, Mel Gorman, Minchan Kim, mina86,
	gong.chen, Vasilis Liaskovitis, lwoodman, Rik van Riel, jweiner,
	Prarit Bhargava, Zhang Yanfei, yanghy, the arch/x86 maintainers,
	linux-doc, Linux Kernel Mailing List, Linux MM,
	ACPI Devel Maling List

On Fri, Aug 9, 2013 at 4:56 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 08/09/2013 04:53 PM, Yinghai Lu wrote:
>> On Fri, Aug 9, 2013 at 4:43 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 08/09/2013 04:39 PM, Yinghai Lu wrote:
>>>>>>
>>>>>> Also parse srat table two times looks silly.
>>>>>
>>>>> By parsing SRAT twice, I can avoid memory allocation for acpi_tables_addr
>>>>> in acpi_initrd_override_copy() procedure at such an early time. This memory
>>>>> could also be in hotpluggable area.
>>>>
>>>> You already mark kernel position to be not hot-plugged,  so near the
>>>> kernel range should be safe to be put override acpi tables.
>>>>
>>>> also what I mean parse srat two times:
>>>> parse to get hotplug range, and late parse other numa info again.
>>>>
>>>
>>> Doing two passes over a small data structure (SRAT) would seem more
>>> sensible than allocating memory just to avoid that...
>>
>> for x86 there is some numa info discovery path, and there are chance
>> srat is wrong but still have hotplug range there, or numa finally is using other
>> way or not used. Inconsistency looks weird.
>>
>> numa_meminfo is static struct, we have way to get final numa info early enough
>> before we need use memblock to alloc buffer with it.
>>
>
> Now, for kernel-generated data if you can define a sensible maximum you
> can put it in brk, if not, you have a serious problem.

Yes, that could be even better, we just can numa_info parsing early but just
make them to use brk if it need extra buffer.
In the way we may not need to split things to two steps or two pass etc.

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 2/4] x86, acpica, acpi: Try to find if SRAT is overrided earlier.
  2013-08-09 23:34       ` Yinghai Lu
@ 2013-08-12 12:28         ` Tang Chen
  0 siblings, 0 replies; 30+ messages in thread
From: Tang Chen @ 2013-08-12 12:28 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tang Chen, Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng,
	Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Tejun Heo, Thomas Renninger,
	Jiang Liu, Zhang Yanfei, Linux Kernel Mailing List,
	ACPI Devel Maling List

On 08/10/2013 07:34 AM, Yinghai Lu wrote:
> On Fri, Aug 9, 2013 at 2:41 AM, Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>> On 08/09/2013 12:29 AM, Yinghai Lu wrote:
>> ......
>>
>>>
>>> Please check if you can reuse first half of my patchset, so find and copy
>>> override table earlier. the copied acpi tables could be near kernel code
>>> range.
>>>
>>
>> I don't think we need to do the finding step at that early time, in
>> head64.c stage.
>>
>> Before pagetables are setup, we can use early_ioremap() to map the
>> memory we want to access. We don't need to use phys addr. We can do
>> it in setup_arch(), which has nothing to do with 32bit or 64bit.
>
> if override the acpi tables early, you don't need to check firmware srat and
> then override srat.
> just check last one will be used by kernel.
>
> So you don't need to dig initrd to find srat anymore.

The current logic is find tables in firmware, and override them. I
don't think it is a big deal.

Even if we want to override first, and then skip the table in
firmware (install one time), we still don't need to do it in head64.c,
right ?

>
>>
>>
>>> Move finding in head64.c stage could help xen/dom0 a bit.
>>> as Konrad is working on patchset with acpi override in xen hypervisor.
>>> We can avoid override acpi table two times. Esp xen like to change
>>> DMAR to XMAR.
>>
>>
>> Would you please give some more info about this, and explain why finding
>> override tables in head64.c stage is helpful for xen ?
>
> xen usually can change acpi tables and pass to dom0 kernel. like change DMAR
> to hide it to dom0 kernel.
>
> also distribution could have same kernel to support bare metal and dom0.
>
> so if we find the override kernel early in head64.c, dom0 path will not copy
> actually as no one try to find that for them.
>

I don't know the detail, but seeing from your description, doing
override in head64.c may avoid a copy. If doing such a copy in xen
is not that difficult, I think we can do it for now. Modifying
acpi_initrd_override() logic will need to modify a lot of things.

Just like the local node pagetable, I think it is better to do
xen things after memory hotplug is done.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 1/4] x86: Make get_ramdisk_{image|size}() global.
  2013-08-08  9:41   ` Tang Chen
@ 2013-08-12 14:25     ` Tejun Heo
  -1 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2013-08-12 14:25 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Thu, Aug 08, 2013 at 05:41:20PM +0800, Tang Chen wrote:
> In the following patches, we need to call get_ramdisk_{image|size}()
> to get initrd file's address and size. So make these two functions
> global.
> 
> v1 -> v2:
> As tj suggested, make these two function static inline in
> arch/x86/include/asm/setup.h.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

Reviewed-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH part4 1/4] x86: Make get_ramdisk_{image|size}() global.
@ 2013-08-12 14:25     ` Tejun Heo
  0 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2013-08-12 14:25 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Thu, Aug 08, 2013 at 05:41:20PM +0800, Tang Chen wrote:
> In the following patches, we need to call get_ramdisk_{image|size}()
> to get initrd file's address and size. So make these two functions
> global.
> 
> v1 -> v2:
> As tj suggested, make these two function static inline in
> arch/x86/include/asm/setup.h.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

Reviewed-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-08-12 14:26 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-08  9:41 [PATCH part4 0/4] Parse SRAT memory affinities earlier Tang Chen
2013-08-08  9:41 ` Tang Chen
2013-08-08  9:41 ` [PATCH part4 1/4] x86: Make get_ramdisk_{image|size}() global Tang Chen
2013-08-08  9:41   ` Tang Chen
2013-08-12 14:25   ` Tejun Heo
2013-08-12 14:25     ` Tejun Heo
2013-08-08  9:41 ` [PATCH part4 2/4] x86, acpica, acpi: Try to find if SRAT is overrided earlier Tang Chen
2013-08-08  9:41   ` Tang Chen
2013-08-08 16:29   ` Yinghai Lu
2013-08-09  9:41     ` Tang Chen
2013-08-09 23:34       ` Yinghai Lu
2013-08-12 12:28         ` Tang Chen
2013-08-08  9:41 ` [PATCH part4 3/4] x86, acpica, acpi: Try to find SRAT in firmware earlier Tang Chen
2013-08-08  9:41   ` Tang Chen
2013-08-08  9:41 ` [PATCH part4 4/4] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities Tang Chen
2013-08-08  9:41   ` Tang Chen
2013-08-08 16:41   ` Yinghai Lu
2013-08-08 16:41     ` Yinghai Lu
2013-08-09  9:32     ` Tang Chen
2013-08-09  9:32       ` Tang Chen
2013-08-09 23:39       ` Yinghai Lu
2013-08-09 23:39         ` Yinghai Lu
2013-08-09 23:43         ` H. Peter Anvin
2013-08-09 23:43           ` H. Peter Anvin
2013-08-09 23:53           ` Yinghai Lu
2013-08-09 23:53             ` Yinghai Lu
2013-08-09 23:56             ` H. Peter Anvin
2013-08-09 23:56               ` H. Peter Anvin
2013-08-10  0:12               ` Yinghai Lu
2013-08-10  0:12                 ` Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.