All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/18] Arrange hotpluggable memory as ZONE_MOVABLE.
@ 2013-08-01  7:06 ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch-set aims to solve some problems at system boot time
to enhance memory hotplug functionality.

[Background]

The Linux kernel cannot migrate pages used by the kernel because
of the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
physical address is changed, we cannot simply update the kernel
pagetable. On the contrary, we have to update all the pointers
pointing to the virtual address, which is very difficult to do.

In order to do memory hotplug, we should prevent the kernel to use
hotpluggable memory.

In ACPI, there is a table named SRAT(System Resource Affinity Table).
It contains system NUMA info (CPUs, memory ranges, PXM), and also a
flag field indicating which memory ranges are hotpluggable.


[Problem to be solved]

At the very early time when the system is booting, we use a bootmem
allocator, named memblock, to allocate memory for the kernel.
memblock will start to work before the kernel parse SRAT, which
means memblock won't know which memory is hotpluggable before SRAT
is parsed.

So at this time, memblock could allocate hotpluggable memory for
the kernel to use permanently. For example, the kernel may allocate
pagetables in hotpluggable memory, which cannot be freed when the
system is up.

So we have to prevent memblock allocating hotpluggable memory for
the kernel at the early boot time.


[Earlier solutions]

We have tried to parse SRAT earlier, before memblock is ready. To
do this, we also have to do ACPI_INITRD_TABLE_OVERRIDE earlier.
Otherwise the override tables won't be able to effect.

This is not that easy to do because memblock is ready before direct
mapping is setup. So Yinghai split the ACPI_INITRD_TABLE_OVERRIDE
procedure into two steps: find and copy. Please refer to the
following patch-set:
        https://lkml.org/lkml/2013/6/13/587

To this solution, tj gave a lot of comments and the following
suggestions.


[Suggestion from tj]

tj mainly gave the following suggestions:

1. Necessary reordering is OK, but we should not rely on
   reordering to achieve the goal because it makes the kernel
   too fragile.

2. Memory allocated to kernel for temporary usage is OK because
   it will be freed when the system is up. Doing relocation
   for permanent allocated hotpluggable memory will make the
   the kernel more robust.

3. Need to enhance memblock to discover and complain if any
   hotpluggable memory is allocated to kernel.

After a long thinking, we choose not to do the relocation for
the following reasons:

1. It's easy to find out the allocated hotpluggable memory. But
   memblock will merge the adjoined ranges owned by different users
   and used for different purposes. It's hard to find the owners.

2. Different memory has different way to be relocated. I think one
   function for each kind of memory will make the code too messy.

3. Pagetable could be in hotpluggable memory. Relocating pagetable
   is too difficult and risky. We have to update all PUD, PMD pages.
   And also, ACPI_INITRD_TABLE_OVERRIDE and parsing SRAT procedures
   are not long after pagetable is initialized. If we relocate the
   pagetable not long after it was initialized, the code will be
   very ugly.


[Solution in this patch-set]

In this patch-set, we still do the reordering, but in a new way.

1. Improve memblock with flags, so that it is able to differentiate
   memory regions for different usage. And also a MEMBLOCK_HOTPLUG
   flag to mark hotpluggable memory.

2. When memblock is ready (memblock_x86_fill() is called), initialize
   acpi_gbl_root_table_list, fulfill all the ACPI tables' phys addrs.
   Now, we have all the ACPI tables' phys addrs provided by firmware.

3. Check if there is a SRAT in initrd file used to override the one
   provided by firmware. If so, get its phys addr.

4. If no override SRAT in initrd, get the phys addr of the SRAT
   provided by firmware.

   Now, we have the phys addr of the to be used SRAT, the one in
   initrd or the one in firmware.

5. Parse only the memory affinities in SRAT, find out all the
   hotpluggable memory regions and mark them in memblock.memory with
   MEMBLOCK_HOTPLUG flag.

6. The kernel goes through the current path. Any other related parts,
   such as ACPI_INITRD_TABLE_OVERRIDE path, the current parsing ACPI
   tables pathes, global variable numa_meminfo, and so on, are not
   modified. They work as before.

7. Make memblock default allocator skip hotpluggable memory.

8. Introduce movablenode boot option to allow users to enable
   and disable this functionality.


In summary, in order to get hotpluggable memory info as early as possible,
this patch-set only parse memory affinities in SRAT one more time right
after memblock is ready, and leave all the other pathes untouched. With
the hotpluggable memory info, we can arrange hotpluggable memory in
ZONE_MOVABLE to prevent the kernel to use it.


change log v1 -> v2:
1. According to Tejun's advice, make ACPI side report which memory regions
   are hotpluggable, and memblock side handle the memory allocation.
2. Change "movablecore=acpi" boot option to "movablenode" boot option.

Thanks. 


Tang Chen (17):
  acpi: Print Hot-Pluggable Field in SRAT.
  earlycpio.c: Fix the confusing comment of find_cpio_data().
  acpi: Remove "continue" in macro INVALID_TABLE().
  acpi: Introduce acpi_invalid_table() to check if a table is invalid.
  x86, acpi: Split acpi_boot_table_init() into two parts.
  x86, acpi: Initialize ACPI root table list earlier.
  x86, acpi: Also initialize signature and length when parsing root
    table.
  x86: Make get_ramdisk_{image|size}() global.
  x86, acpi: Try to find if SRAT is overrided earlier.
  x86, acpi: Try to find SRAT in firmware earlier.
  x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory
    affinities.
  x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  memblock, numa: Introduce flag into memblock.
  memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark
    hotpluggable regions.
  memblock, mem_hotplug: Make memblock skip hotpluggable regions by
    default.
  mem-hotplug: Introduce movablenode boot option to {en|dis}able using
    SRAT.
  x86, numa, acpi, memory-hotplug: Make movablenode have higher
    priority.

Yasuaki Ishimatsu (1):
  x86: get pg_data_t's memory from other node

 Documentation/kernel-parameters.txt |   15 ++
 arch/x86/include/asm/setup.h        |   21 +++
 arch/x86/kernel/acpi/boot.c         |   38 ++++--
 arch/x86/kernel/setup.c             |   37 +++---
 arch/x86/mm/numa.c                  |    5 +-
 arch/x86/mm/srat.c                  |   11 +-
 drivers/acpi/acpica/tbutils.c       |   47 ++++++-
 drivers/acpi/acpica/tbxface.c       |   32 +++++
 drivers/acpi/osl.c                  |  247 ++++++++++++++++++++++++++++++++---
 drivers/acpi/tables.c               |    7 +-
 include/acpi/acpixf.h               |    6 +
 include/linux/acpi.h                |   22 +++-
 include/linux/memblock.h            |   14 ++
 include/linux/memory_hotplug.h      |    5 +
 lib/earlycpio.c                     |   27 ++--
 mm/memblock.c                       |   92 +++++++++++--
 mm/memory_hotplug.c                 |  104 +++++++++++++++-
 mm/page_alloc.c                     |   31 ++++-
 18 files changed, 664 insertions(+), 97 deletions(-)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v2 00/18] Arrange hotpluggable memory as ZONE_MOVABLE.
@ 2013-08-01  7:06 ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch-set aims to solve some problems at system boot time
to enhance memory hotplug functionality.

[Background]

The Linux kernel cannot migrate pages used by the kernel because
of the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
physical address is changed, we cannot simply update the kernel
pagetable. On the contrary, we have to update all the pointers
pointing to the virtual address, which is very difficult to do.

In order to do memory hotplug, we should prevent the kernel to use
hotpluggable memory.

In ACPI, there is a table named SRAT(System Resource Affinity Table).
It contains system NUMA info (CPUs, memory ranges, PXM), and also a
flag field indicating which memory ranges are hotpluggable.


[Problem to be solved]

At the very early time when the system is booting, we use a bootmem
allocator, named memblock, to allocate memory for the kernel.
memblock will start to work before the kernel parse SRAT, which
means memblock won't know which memory is hotpluggable before SRAT
is parsed.

So at this time, memblock could allocate hotpluggable memory for
the kernel to use permanently. For example, the kernel may allocate
pagetables in hotpluggable memory, which cannot be freed when the
system is up.

So we have to prevent memblock allocating hotpluggable memory for
the kernel at the early boot time.


[Earlier solutions]

We have tried to parse SRAT earlier, before memblock is ready. To
do this, we also have to do ACPI_INITRD_TABLE_OVERRIDE earlier.
Otherwise the override tables won't be able to effect.

This is not that easy to do because memblock is ready before direct
mapping is setup. So Yinghai split the ACPI_INITRD_TABLE_OVERRIDE
procedure into two steps: find and copy. Please refer to the
following patch-set:
        https://lkml.org/lkml/2013/6/13/587

To this solution, tj gave a lot of comments and the following
suggestions.


[Suggestion from tj]

tj mainly gave the following suggestions:

1. Necessary reordering is OK, but we should not rely on
   reordering to achieve the goal because it makes the kernel
   too fragile.

2. Memory allocated to kernel for temporary usage is OK because
   it will be freed when the system is up. Doing relocation
   for permanent allocated hotpluggable memory will make the
   the kernel more robust.

3. Need to enhance memblock to discover and complain if any
   hotpluggable memory is allocated to kernel.

After a long thinking, we choose not to do the relocation for
the following reasons:

1. It's easy to find out the allocated hotpluggable memory. But
   memblock will merge the adjoined ranges owned by different users
   and used for different purposes. It's hard to find the owners.

2. Different memory has different way to be relocated. I think one
   function for each kind of memory will make the code too messy.

3. Pagetable could be in hotpluggable memory. Relocating pagetable
   is too difficult and risky. We have to update all PUD, PMD pages.
   And also, ACPI_INITRD_TABLE_OVERRIDE and parsing SRAT procedures
   are not long after pagetable is initialized. If we relocate the
   pagetable not long after it was initialized, the code will be
   very ugly.


[Solution in this patch-set]

In this patch-set, we still do the reordering, but in a new way.

1. Improve memblock with flags, so that it is able to differentiate
   memory regions for different usage. And also a MEMBLOCK_HOTPLUG
   flag to mark hotpluggable memory.

2. When memblock is ready (memblock_x86_fill() is called), initialize
   acpi_gbl_root_table_list, fulfill all the ACPI tables' phys addrs.
   Now, we have all the ACPI tables' phys addrs provided by firmware.

3. Check if there is a SRAT in initrd file used to override the one
   provided by firmware. If so, get its phys addr.

4. If no override SRAT in initrd, get the phys addr of the SRAT
   provided by firmware.

   Now, we have the phys addr of the to be used SRAT, the one in
   initrd or the one in firmware.

5. Parse only the memory affinities in SRAT, find out all the
   hotpluggable memory regions and mark them in memblock.memory with
   MEMBLOCK_HOTPLUG flag.

6. The kernel goes through the current path. Any other related parts,
   such as ACPI_INITRD_TABLE_OVERRIDE path, the current parsing ACPI
   tables pathes, global variable numa_meminfo, and so on, are not
   modified. They work as before.

7. Make memblock default allocator skip hotpluggable memory.

8. Introduce movablenode boot option to allow users to enable
   and disable this functionality.


In summary, in order to get hotpluggable memory info as early as possible,
this patch-set only parse memory affinities in SRAT one more time right
after memblock is ready, and leave all the other pathes untouched. With
the hotpluggable memory info, we can arrange hotpluggable memory in
ZONE_MOVABLE to prevent the kernel to use it.


change log v1 -> v2:
1. According to Tejun's advice, make ACPI side report which memory regions
   are hotpluggable, and memblock side handle the memory allocation.
2. Change "movablecore=acpi" boot option to "movablenode" boot option.

Thanks. 


Tang Chen (17):
  acpi: Print Hot-Pluggable Field in SRAT.
  earlycpio.c: Fix the confusing comment of find_cpio_data().
  acpi: Remove "continue" in macro INVALID_TABLE().
  acpi: Introduce acpi_invalid_table() to check if a table is invalid.
  x86, acpi: Split acpi_boot_table_init() into two parts.
  x86, acpi: Initialize ACPI root table list earlier.
  x86, acpi: Also initialize signature and length when parsing root
    table.
  x86: Make get_ramdisk_{image|size}() global.
  x86, acpi: Try to find if SRAT is overrided earlier.
  x86, acpi: Try to find SRAT in firmware earlier.
  x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory
    affinities.
  x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  memblock, numa: Introduce flag into memblock.
  memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark
    hotpluggable regions.
  memblock, mem_hotplug: Make memblock skip hotpluggable regions by
    default.
  mem-hotplug: Introduce movablenode boot option to {en|dis}able using
    SRAT.
  x86, numa, acpi, memory-hotplug: Make movablenode have higher
    priority.

Yasuaki Ishimatsu (1):
  x86: get pg_data_t's memory from other node

 Documentation/kernel-parameters.txt |   15 ++
 arch/x86/include/asm/setup.h        |   21 +++
 arch/x86/kernel/acpi/boot.c         |   38 ++++--
 arch/x86/kernel/setup.c             |   37 +++---
 arch/x86/mm/numa.c                  |    5 +-
 arch/x86/mm/srat.c                  |   11 +-
 drivers/acpi/acpica/tbutils.c       |   47 ++++++-
 drivers/acpi/acpica/tbxface.c       |   32 +++++
 drivers/acpi/osl.c                  |  247 ++++++++++++++++++++++++++++++++---
 drivers/acpi/tables.c               |    7 +-
 include/acpi/acpixf.h               |    6 +
 include/linux/acpi.h                |   22 +++-
 include/linux/memblock.h            |   14 ++
 include/linux/memory_hotplug.h      |    5 +
 lib/earlycpio.c                     |   27 ++--
 mm/memblock.c                       |   92 +++++++++++--
 mm/memory_hotplug.c                 |  104 +++++++++++++++-
 mm/page_alloc.c                     |   31 ++++-
 18 files changed, 664 insertions(+), 97 deletions(-)


^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v2 01/18] acpi: Print Hot-Pluggable Field in SRAT.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Hot-Pluggable field in SRAT suggests if the memory could be
hotplugged while the system is running. Print it as well when
parsing SRAT will help users to know which memory is hotpluggable.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/srat.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index cdd0da9..d44c8a4 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -146,6 +146,7 @@ int __init
 acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 {
 	u64 start, end;
+	u32 hotpluggable;
 	int node, pxm;
 
 	if (srat_disabled())
@@ -154,7 +155,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		goto out_err_bad_srat;
 	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
 		goto out_err;
-	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
+	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+	if (hotpluggable && !save_add_info())
 		goto out_err;
 
 	start = ma->base_address;
@@ -174,9 +176,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 
 	node_set(node, numa_nodes_parsed);
 
-	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
-	       node, pxm,
-	       (unsigned long long) start, (unsigned long long) end - 1);
+	pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s\n",
+		node, pxm,
+		(unsigned long long) start, (unsigned long long) end - 1,
+		hotpluggable ? " Hot Pluggable" : "");
 
 	return 0;
 out_err_bad_srat:
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 01/18] acpi: Print Hot-Pluggable Field in SRAT.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Hot-Pluggable field in SRAT suggests if the memory could be
hotplugged while the system is running. Print it as well when
parsing SRAT will help users to know which memory is hotpluggable.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/srat.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index cdd0da9..d44c8a4 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -146,6 +146,7 @@ int __init
 acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 {
 	u64 start, end;
+	u32 hotpluggable;
 	int node, pxm;
 
 	if (srat_disabled())
@@ -154,7 +155,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		goto out_err_bad_srat;
 	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
 		goto out_err;
-	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
+	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+	if (hotpluggable && !save_add_info())
 		goto out_err;
 
 	start = ma->base_address;
@@ -174,9 +176,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 
 	node_set(node, numa_nodes_parsed);
 
-	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
-	       node, pxm,
-	       (unsigned long long) start, (unsigned long long) end - 1);
+	pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s\n",
+		node, pxm,
+		(unsigned long long) start, (unsigned long long) end - 1,
+		hotpluggable ? " Hot Pluggable" : "");
 
 	return 0;
 out_err_bad_srat:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 02/18] earlycpio.c: Fix the confusing comment of find_cpio_data().
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The comments of find_cpio_data() says:

  * @offset: When a matching file is found, this is the offset to the
  *          beginning of the cpio. ......

But according to the code,

  dptr = PTR_ALIGN(p + ch[C_NAMESIZE], 4);
  nptr = PTR_ALIGN(dptr + ch[C_FILESIZE], 4);
  ....
  *offset = (long)nptr - (long)data;	/* data is the cpio file */

@offset is the offset of the next file, not the matching file itself.
This is confused and may cause unnecessary waste of time to debug.
So fix it.

v1 -> v2:
As tj suggested, rename @offset to @nextoff which is more clear to
users. And also adjust the new comments.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 lib/earlycpio.c |   27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/earlycpio.c b/lib/earlycpio.c
index 8078ef4..7affac0 100644
--- a/lib/earlycpio.c
+++ b/lib/earlycpio.c
@@ -49,22 +49,23 @@ enum cpio_fields {
 
 /**
  * cpio_data find_cpio_data - Search for files in an uncompressed cpio
- * @path:   The directory to search for, including a slash at the end
- * @data:   Pointer to the the cpio archive or a header inside
- * @len:    Remaining length of the cpio based on data pointer
- * @offset: When a matching file is found, this is the offset to the
- *          beginning of the cpio. It can be used to iterate through
- *          the cpio to find all files inside of a directory path
+ * @path:       The directory to search for, including a slash at the end
+ * @data:       Pointer to the the cpio archive or a header inside
+ * @len:        Remaining length of the cpio based on data pointer
+ * @nextoff:    When a matching file is found, this is the offset from the
+ *              beginning of the cpio to the beginning of the next file, not the
+ *              matching file itself. It can be used to iterate through the cpio
+ *              to find all files inside of a directory path
  *
- * @return: struct cpio_data containing the address, length and
- *          filename (with the directory path cut off) of the found file.
- *          If you search for a filename and not for files in a directory,
- *          pass the absolute path of the filename in the cpio and make sure
- *          the match returned an empty filename string.
+ * @return:     struct cpio_data containing the address, length and
+ *              filename (with the directory path cut off) of the found file.
+ *              If you search for a filename and not for files in a directory,
+ *              pass the absolute path of the filename in the cpio and make sure
+ *              the match returned an empty filename string.
  */
 
 struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
-					  size_t len,  long *offset)
+					  size_t len,  long *nextoff)
 {
 	const size_t cpio_header_len = 8*C_NFIELDS - 2;
 	struct cpio_data cd = { NULL, 0, "" };
@@ -124,7 +125,7 @@ struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
 		if ((ch[C_MODE] & 0170000) == 0100000 &&
 		    ch[C_NAMESIZE] >= mypathsize &&
 		    !memcmp(p, path, mypathsize)) {
-			*offset = (long)nptr - (long)data;
+			*nextoff = (long)nptr - (long)data;
 			if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
 				pr_warn(
 				"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 02/18] earlycpio.c: Fix the confusing comment of find_cpio_data().
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The comments of find_cpio_data() says:

  * @offset: When a matching file is found, this is the offset to the
  *          beginning of the cpio. ......

But according to the code,

  dptr = PTR_ALIGN(p + ch[C_NAMESIZE], 4);
  nptr = PTR_ALIGN(dptr + ch[C_FILESIZE], 4);
  ....
  *offset = (long)nptr - (long)data;	/* data is the cpio file */

@offset is the offset of the next file, not the matching file itself.
This is confused and may cause unnecessary waste of time to debug.
So fix it.

v1 -> v2:
As tj suggested, rename @offset to @nextoff which is more clear to
users. And also adjust the new comments.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 lib/earlycpio.c |   27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/earlycpio.c b/lib/earlycpio.c
index 8078ef4..7affac0 100644
--- a/lib/earlycpio.c
+++ b/lib/earlycpio.c
@@ -49,22 +49,23 @@ enum cpio_fields {
 
 /**
  * cpio_data find_cpio_data - Search for files in an uncompressed cpio
- * @path:   The directory to search for, including a slash at the end
- * @data:   Pointer to the the cpio archive or a header inside
- * @len:    Remaining length of the cpio based on data pointer
- * @offset: When a matching file is found, this is the offset to the
- *          beginning of the cpio. It can be used to iterate through
- *          the cpio to find all files inside of a directory path
+ * @path:       The directory to search for, including a slash at the end
+ * @data:       Pointer to the the cpio archive or a header inside
+ * @len:        Remaining length of the cpio based on data pointer
+ * @nextoff:    When a matching file is found, this is the offset from the
+ *              beginning of the cpio to the beginning of the next file, not the
+ *              matching file itself. It can be used to iterate through the cpio
+ *              to find all files inside of a directory path
  *
- * @return: struct cpio_data containing the address, length and
- *          filename (with the directory path cut off) of the found file.
- *          If you search for a filename and not for files in a directory,
- *          pass the absolute path of the filename in the cpio and make sure
- *          the match returned an empty filename string.
+ * @return:     struct cpio_data containing the address, length and
+ *              filename (with the directory path cut off) of the found file.
+ *              If you search for a filename and not for files in a directory,
+ *              pass the absolute path of the filename in the cpio and make sure
+ *              the match returned an empty filename string.
  */
 
 struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
-					  size_t len,  long *offset)
+					  size_t len,  long *nextoff)
 {
 	const size_t cpio_header_len = 8*C_NFIELDS - 2;
 	struct cpio_data cd = { NULL, 0, "" };
@@ -124,7 +125,7 @@ struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
 		if ((ch[C_MODE] & 0170000) == 0100000 &&
 		    ch[C_NAMESIZE] >= mypathsize &&
 		    !memcmp(p, path, mypathsize)) {
-			*offset = (long)nptr - (long)data;
+			*nextoff = (long)nptr - (long)data;
 			if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
 				pr_warn(
 				"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 03/18] acpi: Remove "continue" in macro INVALID_TABLE().
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The macro INVALID_TABLE() is defined like this:

 #define INVALID_TABLE(x, path, name)                                    \
         { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }

And it is used like this:

	for (...) {
		...
		if (...)
			INVALID_TABLE()
		...
	}

The "continue" in the macro makes the code hard to understand.
Change it to the style like other macros:

 #define INVALID_TABLE(x, path, name)                                    \
         do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)

So after this patch, this macro should be used like this:

	for (...) {
		...
		if (...) {
			INVALID_TABLE()
			continue;
		}
		...
	}

Add the "continue" wherever the macro is called.
(For now, it is only called in acpi_initrd_override().)

The idea is from Yinghai Lu <yinghai@kernel.org>.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c |   18 +++++++++++++-----
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e721863..91d9f54 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -565,7 +565,7 @@ static const char * const table_sigs[] = {
 
 /* Non-fatal errors: Affected tables/files are ignored */
 #define INVALID_TABLE(x, path, name)					\
-	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
+	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
@@ -593,9 +593,11 @@ void __init acpi_initrd_override(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header))
+		if (file.size < sizeof(struct acpi_table_header)) {
 			INVALID_TABLE("Table smaller than ACPI header",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		table = file.data;
 
@@ -603,15 +605,21 @@ void __init acpi_initrd_override(void *data, size_t size)
 			if (!memcmp(table->signature, table_sigs[sig], 4))
 				break;
 
-		if (!table_sigs[sig])
+		if (!table_sigs[sig]) {
 			INVALID_TABLE("Unknown signature",
 				      cpio_path, file.name);
-		if (file.size != table->length)
+			continue;
+		}
+		if (file.size != table->length) {
 			INVALID_TABLE("File length does not match table length",
 				      cpio_path, file.name);
-		if (acpi_table_checksum(file.data, table->length))
+			continue;
+		}
+		if (acpi_table_checksum(file.data, table->length)) {
 			INVALID_TABLE("Bad table checksum",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 03/18] acpi: Remove "continue" in macro INVALID_TABLE().
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The macro INVALID_TABLE() is defined like this:

 #define INVALID_TABLE(x, path, name)                                    \
         { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }

And it is used like this:

	for (...) {
		...
		if (...)
			INVALID_TABLE()
		...
	}

The "continue" in the macro makes the code hard to understand.
Change it to the style like other macros:

 #define INVALID_TABLE(x, path, name)                                    \
         do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)

So after this patch, this macro should be used like this:

	for (...) {
		...
		if (...) {
			INVALID_TABLE()
			continue;
		}
		...
	}

Add the "continue" wherever the macro is called.
(For now, it is only called in acpi_initrd_override().)

The idea is from Yinghai Lu <yinghai@kernel.org>.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c |   18 +++++++++++++-----
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e721863..91d9f54 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -565,7 +565,7 @@ static const char * const table_sigs[] = {
 
 /* Non-fatal errors: Affected tables/files are ignored */
 #define INVALID_TABLE(x, path, name)					\
-	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
+	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
@@ -593,9 +593,11 @@ void __init acpi_initrd_override(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header))
+		if (file.size < sizeof(struct acpi_table_header)) {
 			INVALID_TABLE("Table smaller than ACPI header",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		table = file.data;
 
@@ -603,15 +605,21 @@ void __init acpi_initrd_override(void *data, size_t size)
 			if (!memcmp(table->signature, table_sigs[sig], 4))
 				break;
 
-		if (!table_sigs[sig])
+		if (!table_sigs[sig]) {
 			INVALID_TABLE("Unknown signature",
 				      cpio_path, file.name);
-		if (file.size != table->length)
+			continue;
+		}
+		if (file.size != table->length) {
 			INVALID_TABLE("File length does not match table length",
 				      cpio_path, file.name);
-		if (acpi_table_checksum(file.data, table->length))
+			continue;
+		}
+		if (acpi_table_checksum(file.data, table->length)) {
 			INVALID_TABLE("Bad table checksum",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 04/18] acpi: Introduce acpi_invalid_table() to check if a table is invalid.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In acpi_initrd_override(), it checks several things to ensure the
table it found is valid. In later patches, we need to do these check
somewhere else. So this patch introduces a common function
acpi_invalid_table() to do all these checks, and reuse it in different
places. The function will be used in the subsequent patches.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c |   86 +++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 91d9f54..8df8a93 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -572,9 +572,68 @@ static const char * const table_sigs[] = {
 /* Must not increase 10 or needs code modification below */
 #define ACPI_OVERRIDE_TABLES 10
 
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_invalid_table
+ *
+ * PARAMETERS:  File               - The initrd file
+ *              Path               - Path to acpi overriding tables in cpio file
+ *              Signature          - Signature of the table
+ *
+ * RETURN:      0 if it passes all the checks, -EINVAL if any check fails.
+ *
+ * DESCRIPTION: Check if an acpi table found in initrd is invalid.
+ *              @signature can be NULL. If it is NULL, the function will check
+ *              if the table signature matches any signature in table_sigs[].
+ *
+ ******************************************************************************/
+int __init acpi_invalid_table(struct cpio_data *file,
+			      const char *path, const char *signature)
+{
+	int idx;
+	struct acpi_table_header *table = file->data;
+
+	if (file->size < sizeof(struct acpi_table_header)) {
+		INVALID_TABLE("Table smaller than ACPI header",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	if (signature) {
+		if (memcmp(table->signature, signature, 4)) {
+			INVALID_TABLE("Table signature does not match",
+				      path, file->name);
+			return -EINVAL;
+		}
+	} else {
+		for (idx = 0; table_sigs[idx]; idx++)
+			if (!memcmp(table->signature, table_sigs[idx], 4))
+				break;
+
+		if (!table_sigs[idx]) {
+			INVALID_TABLE("Unknown signature", path, file->name);
+			return -EINVAL;
+		}
+	}
+
+	if (file->size != table->length) {
+		INVALID_TABLE("File length does not match table length",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	if (acpi_table_checksum(file->data, table->length)) {
+		INVALID_TABLE("Bad table checksum",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
-	int sig, no, table_nr = 0, total_offset = 0;
+	int no, table_nr = 0, total_offset = 0;
 	long offset = 0;
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
@@ -593,33 +652,10 @@ void __init acpi_initrd_override(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header)) {
-			INVALID_TABLE("Table smaller than ACPI header",
-				      cpio_path, file.name);
-			continue;
-		}
-
 		table = file.data;
 
-		for (sig = 0; table_sigs[sig]; sig++)
-			if (!memcmp(table->signature, table_sigs[sig], 4))
-				break;
-
-		if (!table_sigs[sig]) {
-			INVALID_TABLE("Unknown signature",
-				      cpio_path, file.name);
+		if (acpi_invalid_table(&file, cpio_path, NULL))
 			continue;
-		}
-		if (file.size != table->length) {
-			INVALID_TABLE("File length does not match table length",
-				      cpio_path, file.name);
-			continue;
-		}
-		if (acpi_table_checksum(file.data, table->length)) {
-			INVALID_TABLE("Bad table checksum",
-				      cpio_path, file.name);
-			continue;
-		}
 
 		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 04/18] acpi: Introduce acpi_invalid_table() to check if a table is invalid.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In acpi_initrd_override(), it checks several things to ensure the
table it found is valid. In later patches, we need to do these check
somewhere else. So this patch introduces a common function
acpi_invalid_table() to do all these checks, and reuse it in different
places. The function will be used in the subsequent patches.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c |   86 +++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 91d9f54..8df8a93 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -572,9 +572,68 @@ static const char * const table_sigs[] = {
 /* Must not increase 10 or needs code modification below */
 #define ACPI_OVERRIDE_TABLES 10
 
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_invalid_table
+ *
+ * PARAMETERS:  File               - The initrd file
+ *              Path               - Path to acpi overriding tables in cpio file
+ *              Signature          - Signature of the table
+ *
+ * RETURN:      0 if it passes all the checks, -EINVAL if any check fails.
+ *
+ * DESCRIPTION: Check if an acpi table found in initrd is invalid.
+ *              @signature can be NULL. If it is NULL, the function will check
+ *              if the table signature matches any signature in table_sigs[].
+ *
+ ******************************************************************************/
+int __init acpi_invalid_table(struct cpio_data *file,
+			      const char *path, const char *signature)
+{
+	int idx;
+	struct acpi_table_header *table = file->data;
+
+	if (file->size < sizeof(struct acpi_table_header)) {
+		INVALID_TABLE("Table smaller than ACPI header",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	if (signature) {
+		if (memcmp(table->signature, signature, 4)) {
+			INVALID_TABLE("Table signature does not match",
+				      path, file->name);
+			return -EINVAL;
+		}
+	} else {
+		for (idx = 0; table_sigs[idx]; idx++)
+			if (!memcmp(table->signature, table_sigs[idx], 4))
+				break;
+
+		if (!table_sigs[idx]) {
+			INVALID_TABLE("Unknown signature", path, file->name);
+			return -EINVAL;
+		}
+	}
+
+	if (file->size != table->length) {
+		INVALID_TABLE("File length does not match table length",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	if (acpi_table_checksum(file->data, table->length)) {
+		INVALID_TABLE("Bad table checksum",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
-	int sig, no, table_nr = 0, total_offset = 0;
+	int no, table_nr = 0, total_offset = 0;
 	long offset = 0;
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
@@ -593,33 +652,10 @@ void __init acpi_initrd_override(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header)) {
-			INVALID_TABLE("Table smaller than ACPI header",
-				      cpio_path, file.name);
-			continue;
-		}
-
 		table = file.data;
 
-		for (sig = 0; table_sigs[sig]; sig++)
-			if (!memcmp(table->signature, table_sigs[sig], 4))
-				break;
-
-		if (!table_sigs[sig]) {
-			INVALID_TABLE("Unknown signature",
-				      cpio_path, file.name);
+		if (acpi_invalid_table(&file, cpio_path, NULL))
 			continue;
-		}
-		if (file.size != table->length) {
-			INVALID_TABLE("File length does not match table length",
-				      cpio_path, file.name);
-			continue;
-		}
-		if (acpi_table_checksum(file.data, table->length)) {
-			INVALID_TABLE("Bad table checksum",
-				      cpio_path, file.name);
-			continue;
-		}
 
 		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In ACPI, SRAT(System Resource Affinity Table) contains NUMA info.
The memory affinities in SRAT record every memory range in the
system, and also, flags specifying if the memory range is
hotpluggable.
(Please refer to ACPI spec 5.0 5.2.16)

memblock starts to work at very early time, and SRAT has not been
parsed. So we don't know which memory is hotpluggable. In order
to use memblock to reserve hotpluggable memory, we need to obtain
SRAT memory affinity info earlier.

In the current acpi_boot_table_init(), it does the following:
1. Parse RSDT, so that we can find all the tables.
2. Initialize acpi_gbl_root_table_list, an array of acpi table
   descriptorsused to store each table's address, length, signature,
   and so on.
3. Check if there is any table in initrd intending to override
   tables from firmware. If so, override the firmware tables.
4. Initialize all the data in acpi_gbl_root_table_list.

In order to parse SRAT at early time, we need to do similar job as
step 1 and 2 above earlier to obtain SRAT. It will be very convenient
if we have acpi_gbl_root_table_list initialized. We can use address
and signature to find SRAT.

Since step 1 and 2 allocates no memory, it is OK to do these two
steps earlier.

But step 3 will check acpi initrd table override, not just SRAT,
but also all the other tables. So it is better to keep it untouched.

This patch splits acpi_boot_table_init() into two steps:
1. Parse RSDT, which cannot be overrided, and initialize
   acpi_gbl_root_table_list. (step 1 + 2 above)
2. Install all ACPI tables into acpi_gbl_root_table_list.
   (step 3 + 4 above)

In later patches, we will do step 1 + 2 earlier.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbutils.c |   25 ++++++++++++++++++++++---
 drivers/acpi/tables.c         |    2 ++
 include/acpi/acpixf.h         |    2 ++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index ce3d5db..9d68ffc 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -766,9 +766,30 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	 */
 	acpi_os_unmap_memory(table, length);
 
+	return_ACPI_STATUS(AE_OK);
+}
+
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_tb_install_root_table
+ *
+ * DESCRIPTION: This function installs all the ACPI tables in RSDT into
+ *              acpi_gbl_root_table_list.
+ *
+ ******************************************************************************/
+
+void __init
+acpi_tb_install_root_table()
+{
+	int i;
+
 	/*
 	 * Complete the initialization of the root table array by examining
-	 * the header of each table
+	 * the header of each table.
+	 *
+	 * First two entries in the table array are reserved for the DSDT
+	 * and FACS, which are not actually present in the RSDT/XSDT - they
+	 * come from the FADT.
 	 */
 	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
 		acpi_tb_install_table(acpi_gbl_root_table_list.tables[i].
@@ -782,6 +803,4 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 			acpi_tb_parse_fadt(i);
 		}
 	}
-
-	return_ACPI_STATUS(AE_OK);
 }
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index d67a1fe..8860e79 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -353,6 +353,8 @@ int __init acpi_table_init(void)
 	if (ACPI_FAILURE(status))
 		return 1;
 
+	acpi_tb_install_root_table();
+
 	check_multiple_madt();
 	return 0;
 }
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index 454881e..f5549b5 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -116,6 +116,8 @@ acpi_status
 acpi_initialize_tables(struct acpi_table_desc *initial_storage,
 		       u32 initial_table_count, u8 allow_resize);
 
+void acpi_tb_install_root_table(void);
+
 acpi_status __init acpi_initialize_subsystem(void);
 
 acpi_status acpi_enable_subsystem(u32 flags);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In ACPI, SRAT(System Resource Affinity Table) contains NUMA info.
The memory affinities in SRAT record every memory range in the
system, and also, flags specifying if the memory range is
hotpluggable.
(Please refer to ACPI spec 5.0 5.2.16)

memblock starts to work at very early time, and SRAT has not been
parsed. So we don't know which memory is hotpluggable. In order
to use memblock to reserve hotpluggable memory, we need to obtain
SRAT memory affinity info earlier.

In the current acpi_boot_table_init(), it does the following:
1. Parse RSDT, so that we can find all the tables.
2. Initialize acpi_gbl_root_table_list, an array of acpi table
   descriptorsused to store each table's address, length, signature,
   and so on.
3. Check if there is any table in initrd intending to override
   tables from firmware. If so, override the firmware tables.
4. Initialize all the data in acpi_gbl_root_table_list.

In order to parse SRAT at early time, we need to do similar job as
step 1 and 2 above earlier to obtain SRAT. It will be very convenient
if we have acpi_gbl_root_table_list initialized. We can use address
and signature to find SRAT.

Since step 1 and 2 allocates no memory, it is OK to do these two
steps earlier.

But step 3 will check acpi initrd table override, not just SRAT,
but also all the other tables. So it is better to keep it untouched.

This patch splits acpi_boot_table_init() into two steps:
1. Parse RSDT, which cannot be overrided, and initialize
   acpi_gbl_root_table_list. (step 1 + 2 above)
2. Install all ACPI tables into acpi_gbl_root_table_list.
   (step 3 + 4 above)

In later patches, we will do step 1 + 2 earlier.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbutils.c |   25 ++++++++++++++++++++++---
 drivers/acpi/tables.c         |    2 ++
 include/acpi/acpixf.h         |    2 ++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index ce3d5db..9d68ffc 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -766,9 +766,30 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	 */
 	acpi_os_unmap_memory(table, length);
 
+	return_ACPI_STATUS(AE_OK);
+}
+
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_tb_install_root_table
+ *
+ * DESCRIPTION: This function installs all the ACPI tables in RSDT into
+ *              acpi_gbl_root_table_list.
+ *
+ ******************************************************************************/
+
+void __init
+acpi_tb_install_root_table()
+{
+	int i;
+
 	/*
 	 * Complete the initialization of the root table array by examining
-	 * the header of each table
+	 * the header of each table.
+	 *
+	 * First two entries in the table array are reserved for the DSDT
+	 * and FACS, which are not actually present in the RSDT/XSDT - they
+	 * come from the FADT.
 	 */
 	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
 		acpi_tb_install_table(acpi_gbl_root_table_list.tables[i].
@@ -782,6 +803,4 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 			acpi_tb_parse_fadt(i);
 		}
 	}
-
-	return_ACPI_STATUS(AE_OK);
 }
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index d67a1fe..8860e79 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -353,6 +353,8 @@ int __init acpi_table_init(void)
 	if (ACPI_FAILURE(status))
 		return 1;
 
+	acpi_tb_install_root_table();
+
 	check_multiple_madt();
 	return 0;
 }
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index 454881e..f5549b5 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -116,6 +116,8 @@ acpi_status
 acpi_initialize_tables(struct acpi_table_desc *initial_storage,
 		       u32 initial_table_count, u8 allow_resize);
 
+void acpi_tb_install_root_table(void);
+
 acpi_status __init acpi_initialize_subsystem(void);
 
 acpi_status acpi_enable_subsystem(u32 flags);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 06/18] x86, acpi: Initialize ACPI root table list earlier.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

We have split acpi_table_init() into two steps:
1. Pares RSDT or XSDT, and initialize acpi_gbl_root_table_list.
   This step will record all tables' physical address in memory.
2. Check acpi initrd table override and install all tables into
   acpi_gbl_root_table_list.

This patch does step 1 earlier, right after memblock is ready.

When memblock_x86_fill() is called to fulfill memblock.memory[],
memblock is able to allocate memory.

This patch introduces a new function acpi_root_table_init() to
do step 1, and call this function right after memblock_x86_fill()
is called.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/acpi/boot.c |   38 +++++++++++++++++++++++---------------
 arch/x86/kernel/setup.c     |    3 +++
 drivers/acpi/tables.c       |    7 +++++--
 include/linux/acpi.h        |    2 ++
 4 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 230c8ea..3da5b3c 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1491,6 +1491,28 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
 };
 
 /*
+ * acpi_root_table_init - Initialize acpi_gbl_root_table_list.
+ *
+ * This function will parse RSDT or XSDT, find all tables' phys addr,
+ * initialize acpi_gbl_root_table_list, and record all tables' phys addr
+ * in acpi_gbl_root_table_list.
+ */
+void __init acpi_root_table_init(void)
+{
+	dmi_check_system(acpi_dmi_table);
+
+	/* If acpi_disabled, bail out */
+	if (acpi_disabled)
+		return;
+
+	/* Initialize the ACPI boot-time table parser */
+	if (acpi_table_init()) {
+		disable_acpi();
+		return;
+	}
+}
+
+/*
  * acpi_boot_table_init() and acpi_boot_init()
  *  called from setup_arch(), always.
  *	1. checksums all tables
@@ -1511,21 +1533,7 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
 
 void __init acpi_boot_table_init(void)
 {
-	dmi_check_system(acpi_dmi_table);
-
-	/*
-	 * If acpi_disabled, bail out
-	 */
-	if (acpi_disabled)
-		return; 
-
-	/*
-	 * Initialize the ACPI boot-time table parser.
-	 */
-	if (acpi_table_init()) {
-		disable_acpi();
-		return;
-	}
+	acpi_install_root_table();
 
 	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 56f7fcf..38a5952 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1075,6 +1075,9 @@ void __init setup_arch(char **cmdline_p)
 	memblock.current_limit = ISA_END_ADDRESS;
 	memblock_x86_fill();
 
+	/* Initialize ACPI root table */
+	acpi_root_table_init();
+
 	/*
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 8860e79..60ecbb8 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -353,10 +353,13 @@ int __init acpi_table_init(void)
 	if (ACPI_FAILURE(status))
 		return 1;
 
-	acpi_tb_install_root_table();
+	return 0;
+}
 
+void __init acpi_install_root_table(void)
+{
+	acpi_tb_install_root_table();
 	check_multiple_madt();
-	return 0;
 }
 
 static int __init acpi_parse_apic_instance(char *str)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 17b5b59..95f600c 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -92,10 +92,12 @@ void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
 int acpi_boot_init (void);
 void acpi_boot_table_init (void);
+void acpi_root_table_init(void);
 int acpi_mps_check (void);
 int acpi_numa_init (void);
 
 int acpi_table_init (void);
+void acpi_install_root_table(void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init acpi_table_parse_entries(char *id, unsigned long table_size,
 				    int entry_id,
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 06/18] x86, acpi: Initialize ACPI root table list earlier.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

We have split acpi_table_init() into two steps:
1. Pares RSDT or XSDT, and initialize acpi_gbl_root_table_list.
   This step will record all tables' physical address in memory.
2. Check acpi initrd table override and install all tables into
   acpi_gbl_root_table_list.

This patch does step 1 earlier, right after memblock is ready.

When memblock_x86_fill() is called to fulfill memblock.memory[],
memblock is able to allocate memory.

This patch introduces a new function acpi_root_table_init() to
do step 1, and call this function right after memblock_x86_fill()
is called.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/acpi/boot.c |   38 +++++++++++++++++++++++---------------
 arch/x86/kernel/setup.c     |    3 +++
 drivers/acpi/tables.c       |    7 +++++--
 include/linux/acpi.h        |    2 ++
 4 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 230c8ea..3da5b3c 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1491,6 +1491,28 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
 };
 
 /*
+ * acpi_root_table_init - Initialize acpi_gbl_root_table_list.
+ *
+ * This function will parse RSDT or XSDT, find all tables' phys addr,
+ * initialize acpi_gbl_root_table_list, and record all tables' phys addr
+ * in acpi_gbl_root_table_list.
+ */
+void __init acpi_root_table_init(void)
+{
+	dmi_check_system(acpi_dmi_table);
+
+	/* If acpi_disabled, bail out */
+	if (acpi_disabled)
+		return;
+
+	/* Initialize the ACPI boot-time table parser */
+	if (acpi_table_init()) {
+		disable_acpi();
+		return;
+	}
+}
+
+/*
  * acpi_boot_table_init() and acpi_boot_init()
  *  called from setup_arch(), always.
  *	1. checksums all tables
@@ -1511,21 +1533,7 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
 
 void __init acpi_boot_table_init(void)
 {
-	dmi_check_system(acpi_dmi_table);
-
-	/*
-	 * If acpi_disabled, bail out
-	 */
-	if (acpi_disabled)
-		return; 
-
-	/*
-	 * Initialize the ACPI boot-time table parser.
-	 */
-	if (acpi_table_init()) {
-		disable_acpi();
-		return;
-	}
+	acpi_install_root_table();
 
 	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 56f7fcf..38a5952 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1075,6 +1075,9 @@ void __init setup_arch(char **cmdline_p)
 	memblock.current_limit = ISA_END_ADDRESS;
 	memblock_x86_fill();
 
+	/* Initialize ACPI root table */
+	acpi_root_table_init();
+
 	/*
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 8860e79..60ecbb8 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -353,10 +353,13 @@ int __init acpi_table_init(void)
 	if (ACPI_FAILURE(status))
 		return 1;
 
-	acpi_tb_install_root_table();
+	return 0;
+}
 
+void __init acpi_install_root_table(void)
+{
+	acpi_tb_install_root_table();
 	check_multiple_madt();
-	return 0;
 }
 
 static int __init acpi_parse_apic_instance(char *str)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 17b5b59..95f600c 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -92,10 +92,12 @@ void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
 int acpi_boot_init (void);
 void acpi_boot_table_init (void);
+void acpi_root_table_init(void);
 int acpi_mps_check (void);
 int acpi_numa_init (void);
 
 int acpi_table_init (void);
+void acpi_install_root_table(void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init acpi_table_parse_entries(char *id, unsigned long table_size,
 				    int entry_id,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 07/18] x86, acpi: Also initialize signature and length when parsing root table.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Besides the phys addr of the acpi tables, it will be very convenient if
we also have the signature of each table in acpi_gbl_root_table_list at
early time. We can find SRAT easily by comparing the signature.

This patch alse record signature and some other info in
acpi_gbl_root_table_list at early time.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbutils.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index 9d68ffc..5d31887 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -627,6 +627,7 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	u32 i;
 	u32 table_count;
 	struct acpi_table_header *table;
+	struct acpi_table_desc *table_desc;
 	acpi_physical_address address;
 	acpi_physical_address uninitialized_var(rsdt_address);
 	u32 length;
@@ -766,6 +767,27 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	 */
 	acpi_os_unmap_memory(table, length);
 
+	/*
+	 * Also initialize the table entries here, so that later we can use them
+	 * to find SRAT at very eraly time to reserve hotpluggable memory.
+	 */
+	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
+		table = acpi_os_map_memory(
+				acpi_gbl_root_table_list.tables[i].address,
+				sizeof(struct acpi_table_header));
+		if (!table)
+			return_ACPI_STATUS(AE_NO_MEMORY);
+
+		table_desc = &acpi_gbl_root_table_list.tables[i];
+
+		table_desc->pointer = NULL;
+		table_desc->length = table->length;
+		table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
+		ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
+
+		acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
+	}
+
 	return_ACPI_STATUS(AE_OK);
 }
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 07/18] x86, acpi: Also initialize signature and length when parsing root table.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Besides the phys addr of the acpi tables, it will be very convenient if
we also have the signature of each table in acpi_gbl_root_table_list at
early time. We can find SRAT easily by comparing the signature.

This patch alse record signature and some other info in
acpi_gbl_root_table_list at early time.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbutils.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index 9d68ffc..5d31887 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -627,6 +627,7 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	u32 i;
 	u32 table_count;
 	struct acpi_table_header *table;
+	struct acpi_table_desc *table_desc;
 	acpi_physical_address address;
 	acpi_physical_address uninitialized_var(rsdt_address);
 	u32 length;
@@ -766,6 +767,27 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	 */
 	acpi_os_unmap_memory(table, length);
 
+	/*
+	 * Also initialize the table entries here, so that later we can use them
+	 * to find SRAT at very eraly time to reserve hotpluggable memory.
+	 */
+	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
+		table = acpi_os_map_memory(
+				acpi_gbl_root_table_list.tables[i].address,
+				sizeof(struct acpi_table_header));
+		if (!table)
+			return_ACPI_STATUS(AE_NO_MEMORY);
+
+		table_desc = &acpi_gbl_root_table_list.tables[i];
+
+		table_desc->pointer = NULL;
+		table_desc->length = table->length;
+		table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
+		ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
+
+		acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
+	}
+
 	return_ACPI_STATUS(AE_OK);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 08/18] x86: get pg_data_t's memory from other node
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If system can create movable node which all memory of the node is allocated
as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's
pg_data_t. So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
to retry when the first allocation fails. Otherwise, the system could failed
to boot.

The node_data could be on hotpluggable node. And so could pagetable and
vmemmap. But for now, doing so will break memory hot-remove path.

A node could have several memory devices. And the device who holds node
data should be hot-removed in the last place. But in NUAM level, we don't
know which memory_block (/sys/devices/system/node/nodeX/memoryXXX) belongs
to which memory device. We only have node. So we can only do node hotplug.

But in virtualization, developers are now developing memory hotplug in qemu,
which support a single memory device hotplug. So a whole node hotplug will
not satisfy virtualization users.

So at last, we concluded that we'd better do memory hotplug and local node
things (local node node data, pagetable, vmemmap, ...) in two steps.
Please refer to https://lkml.org/lkml/2013/6/19/73

For now, we put node_data of movable node to another node, and then improve
it in the future.

In the later patches, a boot option will be introduced to enable/disable this
functionality. If users disable it, the node_data will still be put on the
local node.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/mm/numa.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index a71c4e2..5013583 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -209,10 +209,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
 	 * Allocate node data.  Try node-local memory and then any node.
 	 * Never allocate in DMA zone.
 	 */
-	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
 	if (!nd_pa) {
-		pr_err("Cannot find %zu bytes in node %d\n",
-		       nd_size, nid);
+		pr_err("Cannot find %zu bytes in any node\n", nd_size);
 		return;
 	}
 	nd = __va(nd_pa);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 08/18] x86: get pg_data_t's memory from other node
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If system can create movable node which all memory of the node is allocated
as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's
pg_data_t. So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
to retry when the first allocation fails. Otherwise, the system could failed
to boot.

The node_data could be on hotpluggable node. And so could pagetable and
vmemmap. But for now, doing so will break memory hot-remove path.

A node could have several memory devices. And the device who holds node
data should be hot-removed in the last place. But in NUAM level, we don't
know which memory_block (/sys/devices/system/node/nodeX/memoryXXX) belongs
to which memory device. We only have node. So we can only do node hotplug.

But in virtualization, developers are now developing memory hotplug in qemu,
which support a single memory device hotplug. So a whole node hotplug will
not satisfy virtualization users.

So at last, we concluded that we'd better do memory hotplug and local node
things (local node node data, pagetable, vmemmap, ...) in two steps.
Please refer to https://lkml.org/lkml/2013/6/19/73

For now, we put node_data of movable node to another node, and then improve
it in the future.

In the later patches, a boot option will be introduced to enable/disable this
functionality. If users disable it, the node_data will still be put on the
local node.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/mm/numa.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index a71c4e2..5013583 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -209,10 +209,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
 	 * Allocate node data.  Try node-local memory and then any node.
 	 * Never allocate in DMA zone.
 	 */
-	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
 	if (!nd_pa) {
-		pr_err("Cannot find %zu bytes in node %d\n",
-		       nd_size, nid);
+		pr_err("Cannot find %zu bytes in any node\n", nd_size);
 		return;
 	}
 	nd = __va(nd_pa);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 09/18] x86: Make get_ramdisk_{image|size}() global.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In the following patches, we need to call get_ramdisk_{image|size}()
to get initrd file's address and size. So make these two functions
global.

v1 -> v2:
As tj suggested, make these two function static inline in
arch/x86/include/asm/setup.h.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |   21 +++++++++++++++++++++
 arch/x86/kernel/setup.c      |   18 ------------------
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..cfdb55d 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,27 @@ void *extend_brk(size_t size, size_t align);
 	RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+
+#ifdef CONFIG_BLK_DEV_INITRD
+static inline u64 __init get_ramdisk_image(void)
+{
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
+	return ramdisk_image;
+}
+
+static inline u64 __init get_ramdisk_size(void)
+{
+	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
+	return ramdisk_size;
+}
+#endif /* CONFIG_BLK_DEV_INITRD */
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 38a5952..c8f5d1a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -296,24 +296,6 @@ static void __init reserve_brk(void)
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
-
-static u64 __init get_ramdisk_image(void)
-{
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-
-	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
-
-	return ramdisk_image;
-}
-static u64 __init get_ramdisk_size(void)
-{
-	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
-
-	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
-
-	return ramdisk_size;
-}
-
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 09/18] x86: Make get_ramdisk_{image|size}() global.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In the following patches, we need to call get_ramdisk_{image|size}()
to get initrd file's address and size. So make these two functions
global.

v1 -> v2:
As tj suggested, make these two function static inline in
arch/x86/include/asm/setup.h.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |   21 +++++++++++++++++++++
 arch/x86/kernel/setup.c      |   18 ------------------
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..cfdb55d 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,27 @@ void *extend_brk(size_t size, size_t align);
 	RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+
+#ifdef CONFIG_BLK_DEV_INITRD
+static inline u64 __init get_ramdisk_image(void)
+{
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
+	return ramdisk_image;
+}
+
+static inline u64 __init get_ramdisk_size(void)
+{
+	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
+	return ramdisk_size;
+}
+#endif /* CONFIG_BLK_DEV_INITRD */
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 38a5952..c8f5d1a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -296,24 +296,6 @@ static void __init reserve_brk(void)
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
-
-static u64 __init get_ramdisk_image(void)
-{
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-
-	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
-
-	return ramdisk_image;
-}
-static u64 __init get_ramdisk_size(void)
-{
-	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
-
-	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
-
-	return ramdisk_size;
-}
-
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 10/18] x86, acpi: Try to find if SRAT is overrided earlier.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Linux cannot migrate pages used by the kernel due to the direct mapping
(va = pa + PAGE_OFFSET), any memory used by the kernel cannot be hot-removed.
So when using memory hotplug, we have to prevent the kernel from using
hotpluggable memory.

The ACPI table SRAT (System Resource Affinity Table) contains info to specify
which memory is hotpluggble. After SRAT is parsed, we are aware of which
memory is hotpluggable.

At the early time when system is booting, SRAT has not been parsed. The boot
memory allocator memblock will allocate any memory to the kernel. So we need
SRAT parsed before memblock starts to work.

In this patch, we are going to parse SRAT earlier, right after memblock is ready.

Generally speaking, tables such as SRAT are provided by firmware. But
ACPI_INITRD_TABLE_OVERRIDE functionality allows users to customize their own
tables in initrd, and override the ones from firmware. So if we want to parse
SRAT earlier, we also need to do SRAT override earlier.

First, we introduce early_acpi_override_srat() to check if SRAT will be overridden
from initrd.

Second, we introduce find_hotpluggable_memory() to reserve hotpluggable memory,
which will firstly call early_acpi_override_srat() to find out which memory is
hotpluggable in the override SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c        |   10 +++++++
 drivers/acpi/osl.c             |   58 ++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h           |   14 ++++++++-
 include/linux/memory_hotplug.h |    2 +
 mm/memory_hotplug.c            |   25 ++++++++++++++++-
 5 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c8f5d1a..8b1bddd 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1060,6 +1060,16 @@ void __init setup_arch(char **cmdline_p)
 	/* Initialize ACPI root table */
 	acpi_root_table_init();
 
+#ifdef CONFIG_ACPI_NUMA
+	/*
+	 * Linux kernel cannot migrate kernel pages, as a result, memory used
+	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
+	 * memory in memblock to prevent memblock from allocating hotpluggable
+	 * memory for the kernel.
+	 */
+	find_hotpluggable_memory();
+#endif
+
 	/*
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 8df8a93..d0b687c 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -48,6 +48,7 @@
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
+#include <asm/setup.h>
 
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
@@ -631,6 +632,63 @@ int __init acpi_invalid_table(struct cpio_data *file,
 	return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_override_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Try to get the phys addr of SRAT in initrd.
+ *              The ACPI_INITRD_TABLE_OVERRIDE procedure is able to use tables
+ *              in initrd file to override the ones provided by firmware. This
+ *              function checks if there is a SRAT in initrd at early time. If
+ *              so, return the phys addr of the SRAT.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_override_srat(void)
+{
+	int i;
+	u32 length;
+	long offset;
+	void *ramdisk_vaddr;
+	struct acpi_table_header *table;
+	struct cpio_data file;
+	unsigned long map_step = NR_FIX_BTMAPS << PAGE_SHIFT;
+	phys_addr_t ramdisk_image = get_ramdisk_image();
+	char cpio_path[32] = "kernel/firmware/acpi/";
+
+	/* Try to find if SRAT is overrided */
+	for (i = 0; i < ACPI_OVERRIDE_TABLES; i++) {
+		ramdisk_vaddr = early_ioremap(ramdisk_image, map_step);
+
+		file = find_cpio_data(cpio_path, ramdisk_vaddr,
+				      map_step, &offset);
+		if (!file.data) {
+			early_iounmap(ramdisk_vaddr, map_step);
+			return 0;
+		}
+
+		table = file.data;
+		length = table->length;
+
+		if (acpi_invalid_table(&file, cpio_path, ACPI_SIG_SRAT)) {
+			ramdisk_image += offset;
+			early_iounmap(ramdisk_vaddr, map_step);
+			continue;
+		}
+
+		/* Found SRAT */
+		early_iounmap(ramdisk_vaddr, map_step);
+		ramdisk_image = ramdisk_image + offset - length;
+
+		break;
+	}
+
+	return ramdisk_image;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
 	int no, table_nr = 0, total_offset = 0;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 95f600c..17155bc 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -81,11 +81,21 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
 void acpi_initrd_override(void *data, size_t size);
-#else
+
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_override_srat(void);
+#endif	/* CONFIG_ACPI_NUMA */
+
+#else	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 static inline void acpi_initrd_override(void *data, size_t size)
 {
 }
-#endif
+
+static inline phys_addr_t early_acpi_override_srat(void)
+{
+	return 0;
+}
+#endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 3e622c6..c32af49 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -104,6 +104,7 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
+extern void find_hotpluggable_memory(void);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
@@ -181,6 +182,7 @@ static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
 }
 #endif
+
 extern void put_page_bootmem(struct page *page);
 extern void get_page_bootmem(unsigned long ingo, struct page *page,
 			     unsigned long type);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1ad92b4..48964bf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -30,6 +30,7 @@
 #include <linux/mm_inline.h>
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
+#include <linux/acpi.h>
 
 #include <asm/tlbflush.h>
 
@@ -62,7 +63,6 @@ void unlock_memory_hotplug(void)
 	mutex_unlock(&mem_hotplug_mutex);
 }
 
-
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
@@ -91,6 +91,29 @@ static void release_memory_resource(struct resource *res)
 	return;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/**
+ * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
+ *
+ * This function did the following:
+ * 1. Try to find if there is a SRAT in initrd file used to override the one
+ *    provided by firmware. If so, get its phys addr.
+ * 2. If there is no override SRAT, get the phys addr of the SRAT in firmware.
+ * 3. Parse SRAT, find out which memory is hotpluggable.
+ */
+void __init find_hotpluggable_memory(void)
+{
+	phys_addr_t srat_paddr;
+
+	/* Try to find if SRAT is overridden */
+	srat_paddr = early_acpi_override_srat();
+	if (!srat_paddr)
+		return;
+
+	/* Will parse SRAT and find out hotpluggable memory here */
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 void get_page_bootmem(unsigned long info,  struct page *page,
 		      unsigned long type)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 10/18] x86, acpi: Try to find if SRAT is overrided earlier.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Linux cannot migrate pages used by the kernel due to the direct mapping
(va = pa + PAGE_OFFSET), any memory used by the kernel cannot be hot-removed.
So when using memory hotplug, we have to prevent the kernel from using
hotpluggable memory.

The ACPI table SRAT (System Resource Affinity Table) contains info to specify
which memory is hotpluggble. After SRAT is parsed, we are aware of which
memory is hotpluggable.

At the early time when system is booting, SRAT has not been parsed. The boot
memory allocator memblock will allocate any memory to the kernel. So we need
SRAT parsed before memblock starts to work.

In this patch, we are going to parse SRAT earlier, right after memblock is ready.

Generally speaking, tables such as SRAT are provided by firmware. But
ACPI_INITRD_TABLE_OVERRIDE functionality allows users to customize their own
tables in initrd, and override the ones from firmware. So if we want to parse
SRAT earlier, we also need to do SRAT override earlier.

First, we introduce early_acpi_override_srat() to check if SRAT will be overridden
from initrd.

Second, we introduce find_hotpluggable_memory() to reserve hotpluggable memory,
which will firstly call early_acpi_override_srat() to find out which memory is
hotpluggable in the override SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c        |   10 +++++++
 drivers/acpi/osl.c             |   58 ++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h           |   14 ++++++++-
 include/linux/memory_hotplug.h |    2 +
 mm/memory_hotplug.c            |   25 ++++++++++++++++-
 5 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c8f5d1a..8b1bddd 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1060,6 +1060,16 @@ void __init setup_arch(char **cmdline_p)
 	/* Initialize ACPI root table */
 	acpi_root_table_init();
 
+#ifdef CONFIG_ACPI_NUMA
+	/*
+	 * Linux kernel cannot migrate kernel pages, as a result, memory used
+	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
+	 * memory in memblock to prevent memblock from allocating hotpluggable
+	 * memory for the kernel.
+	 */
+	find_hotpluggable_memory();
+#endif
+
 	/*
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 8df8a93..d0b687c 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -48,6 +48,7 @@
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
+#include <asm/setup.h>
 
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
@@ -631,6 +632,63 @@ int __init acpi_invalid_table(struct cpio_data *file,
 	return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_override_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Try to get the phys addr of SRAT in initrd.
+ *              The ACPI_INITRD_TABLE_OVERRIDE procedure is able to use tables
+ *              in initrd file to override the ones provided by firmware. This
+ *              function checks if there is a SRAT in initrd at early time. If
+ *              so, return the phys addr of the SRAT.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_override_srat(void)
+{
+	int i;
+	u32 length;
+	long offset;
+	void *ramdisk_vaddr;
+	struct acpi_table_header *table;
+	struct cpio_data file;
+	unsigned long map_step = NR_FIX_BTMAPS << PAGE_SHIFT;
+	phys_addr_t ramdisk_image = get_ramdisk_image();
+	char cpio_path[32] = "kernel/firmware/acpi/";
+
+	/* Try to find if SRAT is overrided */
+	for (i = 0; i < ACPI_OVERRIDE_TABLES; i++) {
+		ramdisk_vaddr = early_ioremap(ramdisk_image, map_step);
+
+		file = find_cpio_data(cpio_path, ramdisk_vaddr,
+				      map_step, &offset);
+		if (!file.data) {
+			early_iounmap(ramdisk_vaddr, map_step);
+			return 0;
+		}
+
+		table = file.data;
+		length = table->length;
+
+		if (acpi_invalid_table(&file, cpio_path, ACPI_SIG_SRAT)) {
+			ramdisk_image += offset;
+			early_iounmap(ramdisk_vaddr, map_step);
+			continue;
+		}
+
+		/* Found SRAT */
+		early_iounmap(ramdisk_vaddr, map_step);
+		ramdisk_image = ramdisk_image + offset - length;
+
+		break;
+	}
+
+	return ramdisk_image;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
 	int no, table_nr = 0, total_offset = 0;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 95f600c..17155bc 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -81,11 +81,21 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
 void acpi_initrd_override(void *data, size_t size);
-#else
+
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_override_srat(void);
+#endif	/* CONFIG_ACPI_NUMA */
+
+#else	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 static inline void acpi_initrd_override(void *data, size_t size)
 {
 }
-#endif
+
+static inline phys_addr_t early_acpi_override_srat(void)
+{
+	return 0;
+}
+#endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 3e622c6..c32af49 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -104,6 +104,7 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
+extern void find_hotpluggable_memory(void);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
@@ -181,6 +182,7 @@ static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
 }
 #endif
+
 extern void put_page_bootmem(struct page *page);
 extern void get_page_bootmem(unsigned long ingo, struct page *page,
 			     unsigned long type);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1ad92b4..48964bf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -30,6 +30,7 @@
 #include <linux/mm_inline.h>
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
+#include <linux/acpi.h>
 
 #include <asm/tlbflush.h>
 
@@ -62,7 +63,6 @@ void unlock_memory_hotplug(void)
 	mutex_unlock(&mem_hotplug_mutex);
 }
 
-
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
@@ -91,6 +91,29 @@ static void release_memory_resource(struct resource *res)
 	return;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/**
+ * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
+ *
+ * This function did the following:
+ * 1. Try to find if there is a SRAT in initrd file used to override the one
+ *    provided by firmware. If so, get its phys addr.
+ * 2. If there is no override SRAT, get the phys addr of the SRAT in firmware.
+ * 3. Parse SRAT, find out which memory is hotpluggable.
+ */
+void __init find_hotpluggable_memory(void)
+{
+	phys_addr_t srat_paddr;
+
+	/* Try to find if SRAT is overridden */
+	srat_paddr = early_acpi_override_srat();
+	if (!srat_paddr)
+		return;
+
+	/* Will parse SRAT and find out hotpluggable memory here */
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 void get_page_bootmem(unsigned long info,  struct page *page,
 		      unsigned long type)
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 11/18] x86, acpi: Try to find SRAT in firmware earlier.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch introduce early_acpi_firmware_srat() to find the
phys addr of SRAT provided by firmware. And call it in
find_hotpluggable_memory().

Since we have initialized acpi_gbl_root_table_list earlier,
and store all the tables' phys addrs and signatures in it,
it is easy to find the SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbxface.c |   32 ++++++++++++++++++++++++++++++++
 drivers/acpi/osl.c            |   22 ++++++++++++++++++++++
 include/acpi/acpixf.h         |    4 ++++
 include/linux/acpi.h          |    4 ++++
 mm/memory_hotplug.c           |    8 ++++++--
 5 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c
index ad11162..6a92f12 100644
--- a/drivers/acpi/acpica/tbxface.c
+++ b/drivers/acpi/acpica/tbxface.c
@@ -181,6 +181,38 @@ acpi_status acpi_reallocate_root_table(void)
 	return_ACPI_STATUS(status);
 }
 
+/**
+ * acpi_get_table_desc - Get the acpi table descriptor of a specific table.
+ * @signature: The signature of the table to be found.
+ * @out_desc: The out returned descriptor.
+ *
+ * Iterate over acpi_gbl_root_table_list to find a specific table and then
+ * return its phys addr.
+ *
+ * NOTE: The caller has the responsibility to allocate memory for @out_desc.
+ *
+ * Return AE_OK on success, AE_NOT_FOUND if the table is not found.
+ */
+acpi_status acpi_get_table_desc(char *signature,
+				struct acpi_table_desc *out_desc)
+{
+	struct acpi_table_desc *desc;
+	int pos, count = acpi_gbl_root_table_list.current_table_count;
+
+	for (pos = 0; pos < count; pos++) {
+		desc = &acpi_gbl_root_table_list.tables[pos];
+
+		if (!ACPI_COMPARE_NAME(&desc->signature, signature))
+			continue;
+
+		memcpy(out_desc, desc, sizeof(struct acpi_table_desc));
+
+		return_ACPI_STATUS(AE_OK);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
+
 /*******************************************************************************
  *
  * FUNCTION:    acpi_get_table_header
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index d0b687c..319a274 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -53,6 +53,7 @@
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
 #include <acpi/processor.h>
+#include <acpi/acpixf.h>
 
 #define _COMPONENT		ACPI_OS_SERVICES
 ACPI_MODULE_NAME("osl");
@@ -757,6 +758,27 @@ void __init acpi_initrd_override(void *data, size_t size)
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_firmware_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Get the phys addr of SRAT provided by firmware.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_firmware_srat(void)
+{
+	struct acpi_table_desc table_desc;
+
+	if (acpi_get_table_desc(ACPI_SIG_SRAT, &table_desc))
+		return 0;
+
+	return table_desc.address;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 static void acpi_table_taint(struct acpi_table_header *table)
 {
 	pr_warn(PREFIX
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index f5549b5..1d94f89 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -184,6 +184,10 @@ acpi_status acpi_find_root_pointer(acpi_size *rsdp_address);
 acpi_status acpi_unload_table_id(acpi_owner_id id);
 
 acpi_status
+acpi_get_table_desc(char *signature,
+		    struct acpi_table_desc *out_desc);
+
+acpi_status
 acpi_get_table_header(acpi_string signature,
 		      u32 instance, struct acpi_table_header *out_table_header);
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 17155bc..6fa7543 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -97,6 +97,10 @@ static inline phys_addr_t early_acpi_override_srat(void)
 }
 #endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_firmware_srat(void);
+#endif  /* CONFIG_ACPI_NUMA */
+
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 48964bf..4ccffe6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -107,8 +107,12 @@ void __init find_hotpluggable_memory(void)
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
-	if (!srat_paddr)
-		return;
+	if (!srat_paddr) {
+		/* Try to find SRAT from firmware if it wasn't overridden */
+		srat_paddr = early_acpi_firmware_srat();
+		if (!srat_paddr)
+			return;
+	}
 
 	/* Will parse SRAT and find out hotpluggable memory here */
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 11/18] x86, acpi: Try to find SRAT in firmware earlier.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch introduce early_acpi_firmware_srat() to find the
phys addr of SRAT provided by firmware. And call it in
find_hotpluggable_memory().

Since we have initialized acpi_gbl_root_table_list earlier,
and store all the tables' phys addrs and signatures in it,
it is easy to find the SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbxface.c |   32 ++++++++++++++++++++++++++++++++
 drivers/acpi/osl.c            |   22 ++++++++++++++++++++++
 include/acpi/acpixf.h         |    4 ++++
 include/linux/acpi.h          |    4 ++++
 mm/memory_hotplug.c           |    8 ++++++--
 5 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c
index ad11162..6a92f12 100644
--- a/drivers/acpi/acpica/tbxface.c
+++ b/drivers/acpi/acpica/tbxface.c
@@ -181,6 +181,38 @@ acpi_status acpi_reallocate_root_table(void)
 	return_ACPI_STATUS(status);
 }
 
+/**
+ * acpi_get_table_desc - Get the acpi table descriptor of a specific table.
+ * @signature: The signature of the table to be found.
+ * @out_desc: The out returned descriptor.
+ *
+ * Iterate over acpi_gbl_root_table_list to find a specific table and then
+ * return its phys addr.
+ *
+ * NOTE: The caller has the responsibility to allocate memory for @out_desc.
+ *
+ * Return AE_OK on success, AE_NOT_FOUND if the table is not found.
+ */
+acpi_status acpi_get_table_desc(char *signature,
+				struct acpi_table_desc *out_desc)
+{
+	struct acpi_table_desc *desc;
+	int pos, count = acpi_gbl_root_table_list.current_table_count;
+
+	for (pos = 0; pos < count; pos++) {
+		desc = &acpi_gbl_root_table_list.tables[pos];
+
+		if (!ACPI_COMPARE_NAME(&desc->signature, signature))
+			continue;
+
+		memcpy(out_desc, desc, sizeof(struct acpi_table_desc));
+
+		return_ACPI_STATUS(AE_OK);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
+
 /*******************************************************************************
  *
  * FUNCTION:    acpi_get_table_header
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index d0b687c..319a274 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -53,6 +53,7 @@
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
 #include <acpi/processor.h>
+#include <acpi/acpixf.h>
 
 #define _COMPONENT		ACPI_OS_SERVICES
 ACPI_MODULE_NAME("osl");
@@ -757,6 +758,27 @@ void __init acpi_initrd_override(void *data, size_t size)
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_firmware_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Get the phys addr of SRAT provided by firmware.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_firmware_srat(void)
+{
+	struct acpi_table_desc table_desc;
+
+	if (acpi_get_table_desc(ACPI_SIG_SRAT, &table_desc))
+		return 0;
+
+	return table_desc.address;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 static void acpi_table_taint(struct acpi_table_header *table)
 {
 	pr_warn(PREFIX
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index f5549b5..1d94f89 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -184,6 +184,10 @@ acpi_status acpi_find_root_pointer(acpi_size *rsdp_address);
 acpi_status acpi_unload_table_id(acpi_owner_id id);
 
 acpi_status
+acpi_get_table_desc(char *signature,
+		    struct acpi_table_desc *out_desc);
+
+acpi_status
 acpi_get_table_header(acpi_string signature,
 		      u32 instance, struct acpi_table_header *out_table_header);
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 17155bc..6fa7543 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -97,6 +97,10 @@ static inline phys_addr_t early_acpi_override_srat(void)
 }
 #endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_firmware_srat(void);
+#endif  /* CONFIG_ACPI_NUMA */
+
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 48964bf..4ccffe6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -107,8 +107,12 @@ void __init find_hotpluggable_memory(void)
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
-	if (!srat_paddr)
-		return;
+	if (!srat_paddr) {
+		/* Try to find SRAT from firmware if it wasn't overridden */
+		srat_paddr = early_acpi_firmware_srat();
+		if (!srat_paddr)
+			return;
+	}
 
 	/* Will parse SRAT and find out hotpluggable memory here */
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 12/18] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In ACPI SRAT(System Resource Affinity Table), there is a memory affinity for each
memory range in the system. In each memory affinity, there is a field indicating
that if the memory range is hotpluggable.

This patch parses all the memory affinities in SRAT only, and find out all the
hotpluggable memory ranges in the system.

This patch doesn't mark hotpluggable memory in memblock. Memory marked as hotplug
won't be allocated to the kernel. If all the memory in the system is hotpluggable,
then the system won't have enough memory to boot. The basic idea to solve this
problem is making the nodes the kerenl resides in unhotpluggable. So, before we do
this, we don't mark any hotpluggable memory in memory so that to keep memblock
working as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c   |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |    2 +
 mm/memory_hotplug.c  |   22 ++++++++++++-
 3 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 319a274..5063574 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -777,6 +777,91 @@ phys_addr_t __init early_acpi_firmware_srat(void)
 
 	return table_desc.address;
 }
+
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_hotplug_mem_affinity
+ *
+ * PARAMETERS:  Srat_vaddr         - Virt addr of SRAT
+ *              Base               - The base address of the found hotpluggable
+ *                                   memory region
+ *              Size               - The size of the found hotpluggable memory
+ *                                   region
+ *              Offset             - Offset of the found memory affinity
+ *
+ * RETURN:      Status
+ *
+ * DESCRIPTION: This function iterates SRAT affinities list to find memory
+ *              affinities with hotpluggable memory one by one. Return the
+ *              offset of the found memory affinity through @offset. @offset
+ *              can be used to iterate the SRAT affinities list to find all the
+ *              hotpluggable memory affinities. If @offset is 0, it is the first
+ *              time of the iteration.
+ *
+ ******************************************************************************/
+acpi_status __init
+acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base, u64 *size,
+			  unsigned long *offset)
+{
+	struct acpi_table_header *table_header;
+	struct acpi_subtable_header *entry;
+	struct acpi_srat_mem_affinity *ma;
+	unsigned long table_end, curr;
+
+	if (!offset)
+		return_ACPI_STATUS(AE_BAD_PARAMETER);
+
+	table_header = (struct acpi_table_header *)srat_vaddr;
+	table_end = (unsigned long)table_header + table_header->length;
+
+	entry = (struct acpi_subtable_header *)
+		((unsigned long)table_header + *offset);
+
+	if (*offset) {
+		/*
+		 * @offset is the offset of the last affinity found in the
+		 * last call. So need to move to the next affinity.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	} else {
+		/*
+		 * Offset of the first affinity is the size of SRAT
+		 * table header.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + sizeof(struct acpi_table_srat));
+	}
+
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) <
+	       table_end) {
+		if (entry->length == 0)
+			break;
+
+		if (entry->type != ACPI_SRAT_TYPE_MEMORY_AFFINITY)
+			goto next;
+
+		ma = (struct acpi_srat_mem_affinity *)entry;
+
+		if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE))
+			goto next;
+
+		if (base)
+			*base = ma->base_address;
+
+		if (size)
+			*size = ma->length;
+
+		*offset = (unsigned long)entry - (unsigned long)srat_vaddr;
+		return_ACPI_STATUS(AE_OK);
+
+next:
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
 #endif	/* CONFIG_ACPI_NUMA */
 
 static void acpi_table_taint(struct acpi_table_header *table)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 6fa7543..06f6e15 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -99,6 +99,8 @@ static inline phys_addr_t early_acpi_override_srat(void)
 
 #ifdef CONFIG_ACPI_NUMA
 phys_addr_t early_acpi_firmware_srat(void);
+acpi_status acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base,
+				      u64 *size, unsigned long *offset);
 #endif  /* CONFIG_ACPI_NUMA */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4ccffe6..326e2f2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -103,7 +103,11 @@ static void release_memory_resource(struct resource *res)
  */
 void __init find_hotpluggable_memory(void)
 {
-	phys_addr_t srat_paddr;
+	void *srat_vaddr;
+	phys_addr_t srat_paddr, base, size;
+	u32 length;
+	struct acpi_table_header *srat_header;
+	unsigned long offset = 0;
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
@@ -114,7 +118,21 @@ void __init find_hotpluggable_memory(void)
 			return;
 	}
 
-	/* Will parse SRAT and find out hotpluggable memory here */
+	/* Get the length of SRAT */
+	srat_header = early_ioremap(srat_paddr,
+				    sizeof(struct acpi_table_header));
+	length = srat_header->length;
+	early_iounmap(srat_header, sizeof(struct acpi_table_header));
+
+	/* Find all the hotpluggable memory regions */
+	srat_vaddr = early_ioremap(srat_paddr, length);
+
+	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
+						      &size, &offset))) {
+		/* Will mark hotpluggable memory regions here */
+	}
+
+	early_iounmap(srat_vaddr, length);
 }
 #endif	/* CONFIG_ACPI_NUMA */
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 12/18] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In ACPI SRAT(System Resource Affinity Table), there is a memory affinity for each
memory range in the system. In each memory affinity, there is a field indicating
that if the memory range is hotpluggable.

This patch parses all the memory affinities in SRAT only, and find out all the
hotpluggable memory ranges in the system.

This patch doesn't mark hotpluggable memory in memblock. Memory marked as hotplug
won't be allocated to the kernel. If all the memory in the system is hotpluggable,
then the system won't have enough memory to boot. The basic idea to solve this
problem is making the nodes the kerenl resides in unhotpluggable. So, before we do
this, we don't mark any hotpluggable memory in memory so that to keep memblock
working as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c   |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |    2 +
 mm/memory_hotplug.c  |   22 ++++++++++++-
 3 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 319a274..5063574 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -777,6 +777,91 @@ phys_addr_t __init early_acpi_firmware_srat(void)
 
 	return table_desc.address;
 }
+
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_hotplug_mem_affinity
+ *
+ * PARAMETERS:  Srat_vaddr         - Virt addr of SRAT
+ *              Base               - The base address of the found hotpluggable
+ *                                   memory region
+ *              Size               - The size of the found hotpluggable memory
+ *                                   region
+ *              Offset             - Offset of the found memory affinity
+ *
+ * RETURN:      Status
+ *
+ * DESCRIPTION: This function iterates SRAT affinities list to find memory
+ *              affinities with hotpluggable memory one by one. Return the
+ *              offset of the found memory affinity through @offset. @offset
+ *              can be used to iterate the SRAT affinities list to find all the
+ *              hotpluggable memory affinities. If @offset is 0, it is the first
+ *              time of the iteration.
+ *
+ ******************************************************************************/
+acpi_status __init
+acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base, u64 *size,
+			  unsigned long *offset)
+{
+	struct acpi_table_header *table_header;
+	struct acpi_subtable_header *entry;
+	struct acpi_srat_mem_affinity *ma;
+	unsigned long table_end, curr;
+
+	if (!offset)
+		return_ACPI_STATUS(AE_BAD_PARAMETER);
+
+	table_header = (struct acpi_table_header *)srat_vaddr;
+	table_end = (unsigned long)table_header + table_header->length;
+
+	entry = (struct acpi_subtable_header *)
+		((unsigned long)table_header + *offset);
+
+	if (*offset) {
+		/*
+		 * @offset is the offset of the last affinity found in the
+		 * last call. So need to move to the next affinity.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	} else {
+		/*
+		 * Offset of the first affinity is the size of SRAT
+		 * table header.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + sizeof(struct acpi_table_srat));
+	}
+
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) <
+	       table_end) {
+		if (entry->length == 0)
+			break;
+
+		if (entry->type != ACPI_SRAT_TYPE_MEMORY_AFFINITY)
+			goto next;
+
+		ma = (struct acpi_srat_mem_affinity *)entry;
+
+		if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE))
+			goto next;
+
+		if (base)
+			*base = ma->base_address;
+
+		if (size)
+			*size = ma->length;
+
+		*offset = (unsigned long)entry - (unsigned long)srat_vaddr;
+		return_ACPI_STATUS(AE_OK);
+
+next:
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
 #endif	/* CONFIG_ACPI_NUMA */
 
 static void acpi_table_taint(struct acpi_table_header *table)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 6fa7543..06f6e15 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -99,6 +99,8 @@ static inline phys_addr_t early_acpi_override_srat(void)
 
 #ifdef CONFIG_ACPI_NUMA
 phys_addr_t early_acpi_firmware_srat(void);
+acpi_status acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base,
+				      u64 *size, unsigned long *offset);
 #endif  /* CONFIG_ACPI_NUMA */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4ccffe6..326e2f2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -103,7 +103,11 @@ static void release_memory_resource(struct resource *res)
  */
 void __init find_hotpluggable_memory(void)
 {
-	phys_addr_t srat_paddr;
+	void *srat_vaddr;
+	phys_addr_t srat_paddr, base, size;
+	u32 length;
+	struct acpi_table_header *srat_header;
+	unsigned long offset = 0;
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
@@ -114,7 +118,21 @@ void __init find_hotpluggable_memory(void)
 			return;
 	}
 
-	/* Will parse SRAT and find out hotpluggable memory here */
+	/* Get the length of SRAT */
+	srat_header = early_ioremap(srat_paddr,
+				    sizeof(struct acpi_table_header));
+	length = srat_header->length;
+	early_iounmap(srat_header, sizeof(struct acpi_table_header));
+
+	/* Find all the hotpluggable memory regions */
+	srat_vaddr = early_ioremap(srat_paddr, length);
+
+	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
+						      &size, &offset))) {
+		/* Will mark hotpluggable memory regions here */
+	}
+
+	early_iounmap(srat_vaddr, length);
 }
 #endif	/* CONFIG_ACPI_NUMA */
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

At early time, memblock will reserve some memory for the kernel,
such as the kernel code and data segments, initrd file, and so on,
which means the kernel resides in these memory regions.

Even if these memory regions are hotpluggable, we should not
mark them as hotpluggable. Otherwise the kernel won't have enough
memory to boot.

This patch finds out which memory regions the kernel resides in,
and skip them when finding all hotpluggable memory regions.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 326e2f2..b800c9c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -31,6 +31,7 @@
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
 #include <linux/acpi.h>
+#include <linux/memblock.h>
 
 #include <asm/tlbflush.h>
 
@@ -93,6 +94,40 @@ static void release_memory_resource(struct resource *res)
 
 #ifdef CONFIG_ACPI_NUMA
 /**
+ * kernel_resides_in_range - Check if kernel resides in a memory region.
+ * @base: The base address of the memory region.
+ * @length: The length of the memory region.
+ *
+ * This function is used at early time. It iterates memblock.reserved and check
+ * if the kernel has used any memory in [@base, @base + @length).
+ *
+ * Return true if the kernel resides in the memory region, false otherwise.
+ */
+static bool __init kernel_resides_in_region(phys_addr_t base, u64 length)
+{
+	int i;
+	phys_addr_t start, end;
+	struct memblock_region *region;
+	struct memblock_type *reserved = &memblock.reserved;
+
+	for (i = 0; i < reserved->cnt; i++) {
+		region = &reserved->regions[i];
+
+		if (region->flags != MEMBLOCK_HOTPLUG)
+			continue;
+
+		start = region->base;
+		end = region->base + region->size;
+		if (end <= base || start >= base + length)
+			continue;
+
+		return true;
+	}
+
+	return false;
+}
+
+/**
  * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
  *
  * This function did the following:
@@ -129,6 +164,16 @@ void __init find_hotpluggable_memory(void)
 
 	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
 						      &size, &offset))) {
+		/*
+		 * At early time, memblock will reserve some memory for the
+		 * kernel, such as the kernel code and data segments, initrd
+		 * file, and so on,which means the kernel resides in these
+		 * memory regions. These regions should not be hotpluggable.
+		 * So do not mark them as hotpluggable.
+		 */
+		if (kernel_resides_in_region(base, size))
+			continue;
+
 		/* Will mark hotpluggable memory regions here */
 	}
 
-- 
1.7.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

At early time, memblock will reserve some memory for the kernel,
such as the kernel code and data segments, initrd file, and so on,
which means the kernel resides in these memory regions.

Even if these memory regions are hotpluggable, we should not
mark them as hotpluggable. Otherwise the kernel won't have enough
memory to boot.

This patch finds out which memory regions the kernel resides in,
and skip them when finding all hotpluggable memory regions.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 326e2f2..b800c9c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -31,6 +31,7 @@
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
 #include <linux/acpi.h>
+#include <linux/memblock.h>
 
 #include <asm/tlbflush.h>
 
@@ -93,6 +94,40 @@ static void release_memory_resource(struct resource *res)
 
 #ifdef CONFIG_ACPI_NUMA
 /**
+ * kernel_resides_in_range - Check if kernel resides in a memory region.
+ * @base: The base address of the memory region.
+ * @length: The length of the memory region.
+ *
+ * This function is used at early time. It iterates memblock.reserved and check
+ * if the kernel has used any memory in [@base, @base + @length).
+ *
+ * Return true if the kernel resides in the memory region, false otherwise.
+ */
+static bool __init kernel_resides_in_region(phys_addr_t base, u64 length)
+{
+	int i;
+	phys_addr_t start, end;
+	struct memblock_region *region;
+	struct memblock_type *reserved = &memblock.reserved;
+
+	for (i = 0; i < reserved->cnt; i++) {
+		region = &reserved->regions[i];
+
+		if (region->flags != MEMBLOCK_HOTPLUG)
+			continue;
+
+		start = region->base;
+		end = region->base + region->size;
+		if (end <= base || start >= base + length)
+			continue;
+
+		return true;
+	}
+
+	return false;
+}
+
+/**
  * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
  *
  * This function did the following:
@@ -129,6 +164,16 @@ void __init find_hotpluggable_memory(void)
 
 	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
 						      &size, &offset))) {
+		/*
+		 * At early time, memblock will reserve some memory for the
+		 * kernel, such as the kernel code and data segments, initrd
+		 * file, and so on,which means the kernel resides in these
+		 * memory regions. These regions should not be hotpluggable.
+		 * So do not mark them as hotpluggable.
+		 */
+		if (kernel_resides_in_region(base, size))
+			continue;
+
 		/* Will mark hotpluggable memory regions here */
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 14/18] memblock, numa: Introduce flag into memblock.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

There is no flag in memblock to describe what type the memory is.
Sometimes, we may use memblock to reserve some memory for special usage.
And we want to know what kind of memory it is. So we need a way to
differentiate memory for different usage.

In hotplug environment, we want to reserve hotpluggable memory so the
kernel won't be able to use it. And when the system is up, we have to
free these hotpluggable memory to buddy. So we need to mark these memory
first.

In order to do so, we need to mark out these special memory in memblock.
In this patch, we introduce a new "flags" member into memblock_region:
   struct memblock_region {
           phys_addr_t base;
           phys_addr_t size;
           unsigned long flags;		/* This is new. */
   #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
           int nid;
   #endif
   };

This patch does the following things:
1) Add "flags" member to memblock_region.
2) Modify the following APIs' prototype:
	memblock_add_region()
	memblock_insert_region()
3) Add memblock_reserve_region() to support reserve memory with flags, and keep
   memblock_reserve()'s prototype unmodified.
4) Modify other APIs to support flags, but keep their prototype unmodified.

The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>.

v1 -> v2:
As tj suggested, a zero flag MEMBLK_DEFAULT will make users confused. If
we want to specify any other flag, such MEMBLK_HOTPLUG, users don't know
to use MEMBLK_DEFAULT | MEMBLK_HOTPLUG or just MEMBLK_HOTPLUG. So remove
MEMBLK_DEFAULT (which is 0), and just use 0 by default to avoid confusions
to users.

Suggested-by: Wen Congyang <wency@cn.fujitsu.com>
Suggested-by: Liu Jiang <jiang.liu@huawei.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   53 +++++++++++++++++++++++++++++++++-------------
 2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f388203..e89e0cd 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -22,6 +22,7 @@
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
+	unsigned long flags;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	int nid;
 #endif
diff --git a/mm/memblock.c b/mm/memblock.c
index c5fad93..f494a89 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -157,6 +157,7 @@ static void __init_memblock memblock_remove_region(struct memblock_type *type, u
 		type->cnt = 1;
 		type->regions[0].base = 0;
 		type->regions[0].size = 0;
+		type->regions[0].flags = 0;
 		memblock_set_region_node(&type->regions[0], MAX_NUMNODES);
 	}
 }
@@ -307,7 +308,8 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
 
 		if (this->base + this->size != next->base ||
 		    memblock_get_region_node(this) !=
-		    memblock_get_region_node(next)) {
+		    memblock_get_region_node(next) ||
+		    this->flags != next->flags) {
 			BUG_ON(this->base + this->size > next->base);
 			i++;
 			continue;
@@ -327,13 +329,15 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
  * @base:	base address of the new region
  * @size:	size of the new region
  * @nid:	node id of the new region
+ * @flags:	flags of the new region
  *
  * Insert new memblock region [@base,@base+@size) into @type at @idx.
  * @type must already have extra room to accomodate the new region.
  */
 static void __init_memblock memblock_insert_region(struct memblock_type *type,
 						   int idx, phys_addr_t base,
-						   phys_addr_t size, int nid)
+						   phys_addr_t size,
+						   int nid, unsigned long flags)
 {
 	struct memblock_region *rgn = &type->regions[idx];
 
@@ -341,6 +345,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
 	memmove(rgn + 1, rgn, (type->cnt - idx) * sizeof(*rgn));
 	rgn->base = base;
 	rgn->size = size;
+	rgn->flags = flags;
 	memblock_set_region_node(rgn, nid);
 	type->cnt++;
 	type->total_size += size;
@@ -352,6 +357,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  * @base: base address of the new region
  * @size: size of the new region
  * @nid: nid of the new region
+ * @flags: flags of the new region
  *
  * Add new memblock region [@base,@base+@size) into @type.  The new region
  * is allowed to overlap with existing ones - overlaps don't affect already
@@ -362,7 +368,8 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  * 0 on success, -errno on failure.
  */
 static int __init_memblock memblock_add_region(struct memblock_type *type,
-				phys_addr_t base, phys_addr_t size, int nid)
+				phys_addr_t base, phys_addr_t size,
+				int nid, unsigned long flags)
 {
 	bool insert = false;
 	phys_addr_t obase = base;
@@ -377,6 +384,7 @@ static int __init_memblock memblock_add_region(struct memblock_type *type,
 		WARN_ON(type->cnt != 1 || type->total_size);
 		type->regions[0].base = base;
 		type->regions[0].size = size;
+		type->regions[0].flags = flags;
 		memblock_set_region_node(&type->regions[0], nid);
 		type->total_size = size;
 		return 0;
@@ -407,7 +415,8 @@ repeat:
 			nr_new++;
 			if (insert)
 				memblock_insert_region(type, i++, base,
-						       rbase - base, nid);
+						       rbase - base, nid,
+						       flags);
 		}
 		/* area below @rend is dealt with, forget about it */
 		base = min(rend, end);
@@ -417,7 +426,8 @@ repeat:
 	if (base < end) {
 		nr_new++;
 		if (insert)
-			memblock_insert_region(type, i, base, end - base, nid);
+			memblock_insert_region(type, i, base, end - base,
+					       nid, flags);
 	}
 
 	/*
@@ -439,12 +449,13 @@ repeat:
 int __init_memblock memblock_add_node(phys_addr_t base, phys_addr_t size,
 				       int nid)
 {
-	return memblock_add_region(&memblock.memory, base, size, nid);
+	return memblock_add_region(&memblock.memory, base, size, nid, 0);
 }
 
 int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
 {
-	return memblock_add_region(&memblock.memory, base, size, MAX_NUMNODES);
+	return memblock_add_region(&memblock.memory, base, size,
+				   MAX_NUMNODES, 0);
 }
 
 /**
@@ -499,7 +510,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			rgn->size -= base - rbase;
 			type->total_size -= base - rbase;
 			memblock_insert_region(type, i, rbase, base - rbase,
-					       memblock_get_region_node(rgn));
+					       memblock_get_region_node(rgn),
+					       rgn->flags);
 		} else if (rend > end) {
 			/*
 			 * @rgn intersects from above.  Split and redo the
@@ -509,7 +521,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			rgn->size -= end - rbase;
 			type->total_size -= end - rbase;
 			memblock_insert_region(type, i--, rbase, end - rbase,
-					       memblock_get_region_node(rgn));
+					       memblock_get_region_node(rgn),
+					       rgn->flags);
 		} else {
 			/* @rgn is fully contained, record it */
 			if (!*end_rgn)
@@ -551,16 +564,24 @@ int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
 	return __memblock_remove(&memblock.reserved, base, size);
 }
 
-int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+static int __init_memblock memblock_reserve_region(phys_addr_t base,
+						   phys_addr_t size,
+						   int nid,
+						   unsigned long flags)
 {
 	struct memblock_type *_rgn = &memblock.reserved;
 
-	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] %pF\n",
+	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] flags %#02lx %pF\n",
 		     (unsigned long long)base,
 		     (unsigned long long)base + size,
-		     (void *)_RET_IP_);
+		     flags, (void *)_RET_IP_);
+
+	return memblock_add_region(_rgn, base, size, nid, flags);
+}
 
-	return memblock_add_region(_rgn, base, size, MAX_NUMNODES);
+int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+{
+	return memblock_reserve_region(base, size, MAX_NUMNODES, 0);
 }
 
 /**
@@ -985,6 +1006,7 @@ void __init_memblock memblock_set_current_limit(phys_addr_t limit)
 static void __init_memblock memblock_dump(struct memblock_type *type, char *name)
 {
 	unsigned long long base, size;
+	unsigned long flags;
 	int i;
 
 	pr_info(" %s.cnt  = 0x%lx\n", name, type->cnt);
@@ -995,13 +1017,14 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name
 
 		base = rgn->base;
 		size = rgn->size;
+		flags = rgn->flags;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 		if (memblock_get_region_node(rgn) != MAX_NUMNODES)
 			snprintf(nid_buf, sizeof(nid_buf), " on node %d",
 				 memblock_get_region_node(rgn));
 #endif
-		pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s\n",
-			name, i, base, base + size - 1, size, nid_buf);
+		pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s flags: %#lx\n",
+			name, i, base, base + size - 1, size, nid_buf, flags);
 	}
 }
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 14/18] memblock, numa: Introduce flag into memblock.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

There is no flag in memblock to describe what type the memory is.
Sometimes, we may use memblock to reserve some memory for special usage.
And we want to know what kind of memory it is. So we need a way to
differentiate memory for different usage.

In hotplug environment, we want to reserve hotpluggable memory so the
kernel won't be able to use it. And when the system is up, we have to
free these hotpluggable memory to buddy. So we need to mark these memory
first.

In order to do so, we need to mark out these special memory in memblock.
In this patch, we introduce a new "flags" member into memblock_region:
   struct memblock_region {
           phys_addr_t base;
           phys_addr_t size;
           unsigned long flags;		/* This is new. */
   #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
           int nid;
   #endif
   };

This patch does the following things:
1) Add "flags" member to memblock_region.
2) Modify the following APIs' prototype:
	memblock_add_region()
	memblock_insert_region()
3) Add memblock_reserve_region() to support reserve memory with flags, and keep
   memblock_reserve()'s prototype unmodified.
4) Modify other APIs to support flags, but keep their prototype unmodified.

The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>.

v1 -> v2:
As tj suggested, a zero flag MEMBLK_DEFAULT will make users confused. If
we want to specify any other flag, such MEMBLK_HOTPLUG, users don't know
to use MEMBLK_DEFAULT | MEMBLK_HOTPLUG or just MEMBLK_HOTPLUG. So remove
MEMBLK_DEFAULT (which is 0), and just use 0 by default to avoid confusions
to users.

Suggested-by: Wen Congyang <wency@cn.fujitsu.com>
Suggested-by: Liu Jiang <jiang.liu@huawei.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   53 +++++++++++++++++++++++++++++++++-------------
 2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f388203..e89e0cd 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -22,6 +22,7 @@
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
+	unsigned long flags;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	int nid;
 #endif
diff --git a/mm/memblock.c b/mm/memblock.c
index c5fad93..f494a89 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -157,6 +157,7 @@ static void __init_memblock memblock_remove_region(struct memblock_type *type, u
 		type->cnt = 1;
 		type->regions[0].base = 0;
 		type->regions[0].size = 0;
+		type->regions[0].flags = 0;
 		memblock_set_region_node(&type->regions[0], MAX_NUMNODES);
 	}
 }
@@ -307,7 +308,8 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
 
 		if (this->base + this->size != next->base ||
 		    memblock_get_region_node(this) !=
-		    memblock_get_region_node(next)) {
+		    memblock_get_region_node(next) ||
+		    this->flags != next->flags) {
 			BUG_ON(this->base + this->size > next->base);
 			i++;
 			continue;
@@ -327,13 +329,15 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
  * @base:	base address of the new region
  * @size:	size of the new region
  * @nid:	node id of the new region
+ * @flags:	flags of the new region
  *
  * Insert new memblock region [@base,@base+@size) into @type at @idx.
  * @type must already have extra room to accomodate the new region.
  */
 static void __init_memblock memblock_insert_region(struct memblock_type *type,
 						   int idx, phys_addr_t base,
-						   phys_addr_t size, int nid)
+						   phys_addr_t size,
+						   int nid, unsigned long flags)
 {
 	struct memblock_region *rgn = &type->regions[idx];
 
@@ -341,6 +345,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
 	memmove(rgn + 1, rgn, (type->cnt - idx) * sizeof(*rgn));
 	rgn->base = base;
 	rgn->size = size;
+	rgn->flags = flags;
 	memblock_set_region_node(rgn, nid);
 	type->cnt++;
 	type->total_size += size;
@@ -352,6 +357,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  * @base: base address of the new region
  * @size: size of the new region
  * @nid: nid of the new region
+ * @flags: flags of the new region
  *
  * Add new memblock region [@base,@base+@size) into @type.  The new region
  * is allowed to overlap with existing ones - overlaps don't affect already
@@ -362,7 +368,8 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  * 0 on success, -errno on failure.
  */
 static int __init_memblock memblock_add_region(struct memblock_type *type,
-				phys_addr_t base, phys_addr_t size, int nid)
+				phys_addr_t base, phys_addr_t size,
+				int nid, unsigned long flags)
 {
 	bool insert = false;
 	phys_addr_t obase = base;
@@ -377,6 +384,7 @@ static int __init_memblock memblock_add_region(struct memblock_type *type,
 		WARN_ON(type->cnt != 1 || type->total_size);
 		type->regions[0].base = base;
 		type->regions[0].size = size;
+		type->regions[0].flags = flags;
 		memblock_set_region_node(&type->regions[0], nid);
 		type->total_size = size;
 		return 0;
@@ -407,7 +415,8 @@ repeat:
 			nr_new++;
 			if (insert)
 				memblock_insert_region(type, i++, base,
-						       rbase - base, nid);
+						       rbase - base, nid,
+						       flags);
 		}
 		/* area below @rend is dealt with, forget about it */
 		base = min(rend, end);
@@ -417,7 +426,8 @@ repeat:
 	if (base < end) {
 		nr_new++;
 		if (insert)
-			memblock_insert_region(type, i, base, end - base, nid);
+			memblock_insert_region(type, i, base, end - base,
+					       nid, flags);
 	}
 
 	/*
@@ -439,12 +449,13 @@ repeat:
 int __init_memblock memblock_add_node(phys_addr_t base, phys_addr_t size,
 				       int nid)
 {
-	return memblock_add_region(&memblock.memory, base, size, nid);
+	return memblock_add_region(&memblock.memory, base, size, nid, 0);
 }
 
 int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
 {
-	return memblock_add_region(&memblock.memory, base, size, MAX_NUMNODES);
+	return memblock_add_region(&memblock.memory, base, size,
+				   MAX_NUMNODES, 0);
 }
 
 /**
@@ -499,7 +510,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			rgn->size -= base - rbase;
 			type->total_size -= base - rbase;
 			memblock_insert_region(type, i, rbase, base - rbase,
-					       memblock_get_region_node(rgn));
+					       memblock_get_region_node(rgn),
+					       rgn->flags);
 		} else if (rend > end) {
 			/*
 			 * @rgn intersects from above.  Split and redo the
@@ -509,7 +521,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			rgn->size -= end - rbase;
 			type->total_size -= end - rbase;
 			memblock_insert_region(type, i--, rbase, end - rbase,
-					       memblock_get_region_node(rgn));
+					       memblock_get_region_node(rgn),
+					       rgn->flags);
 		} else {
 			/* @rgn is fully contained, record it */
 			if (!*end_rgn)
@@ -551,16 +564,24 @@ int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
 	return __memblock_remove(&memblock.reserved, base, size);
 }
 
-int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+static int __init_memblock memblock_reserve_region(phys_addr_t base,
+						   phys_addr_t size,
+						   int nid,
+						   unsigned long flags)
 {
 	struct memblock_type *_rgn = &memblock.reserved;
 
-	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] %pF\n",
+	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] flags %#02lx %pF\n",
 		     (unsigned long long)base,
 		     (unsigned long long)base + size,
-		     (void *)_RET_IP_);
+		     flags, (void *)_RET_IP_);
+
+	return memblock_add_region(_rgn, base, size, nid, flags);
+}
 
-	return memblock_add_region(_rgn, base, size, MAX_NUMNODES);
+int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+{
+	return memblock_reserve_region(base, size, MAX_NUMNODES, 0);
 }
 
 /**
@@ -985,6 +1006,7 @@ void __init_memblock memblock_set_current_limit(phys_addr_t limit)
 static void __init_memblock memblock_dump(struct memblock_type *type, char *name)
 {
 	unsigned long long base, size;
+	unsigned long flags;
 	int i;
 
 	pr_info(" %s.cnt  = 0x%lx\n", name, type->cnt);
@@ -995,13 +1017,14 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name
 
 		base = rgn->base;
 		size = rgn->size;
+		flags = rgn->flags;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 		if (memblock_get_region_node(rgn) != MAX_NUMNODES)
 			snprintf(nid_buf, sizeof(nid_buf), " on node %d",
 				 memblock_get_region_node(rgn));
 #endif
-		pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s\n",
-			name, i, base, base + size - 1, size, nid_buf);
+		pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s flags: %#lx\n",
+			name, i, base, base + size - 1, size, nid_buf, flags);
 	}
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 15/18] memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In find_hotpluggable_memory, once we find out a memory region which is
hotpluggable, we want to mark them in memblock.memory. So that we could
control memblock allocator not to allocte hotpluggable memory for the kernel
later.

To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the
hotpluggable memory regions in memblock and a function memblock_mark_hotplug()
to mark hotpluggable memory if we find one.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |   12 ++++++++++++
 mm/memblock.c            |   26 ++++++++++++++++++++++++++
 mm/memory_hotplug.c      |    3 ++-
 3 files changed, 40 insertions(+), 1 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e89e0cd..637ec3d 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -19,6 +19,9 @@
 
 #define INIT_MEMBLOCK_REGIONS	128
 
+/* Definition of memblock flags. */
+#define MEMBLOCK_HOTPLUG	0x1	/* hotpluggable region */
+
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
@@ -60,6 +63,8 @@ int memblock_free(phys_addr_t base, phys_addr_t size);
 int memblock_reserve(phys_addr_t base, phys_addr_t size);
 void memblock_trim_memory(phys_addr_t align);
 
+int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
+
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
 			  unsigned long *out_end_pfn, int *out_nid);
@@ -119,6 +124,13 @@ void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
 	     i != (u64)ULLONG_MAX;					\
 	     __next_free_mem_range_rev(&i, nid, p_start, p_end, p_nid))
 
+static inline void memblock_set_region_flags(struct memblock_region *r,
+					     unsigned long flags)
+{
+	r->flags = flags;
+}
+
+
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 int memblock_set_node(phys_addr_t base, phys_addr_t size, int nid);
 
diff --git a/mm/memblock.c b/mm/memblock.c
index f494a89..05e142b 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -585,6 +585,32 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
 }
 
 /**
+ * memblock_mark_hotplug - Mark hotpluggable memory with flag MEMBLOCK_HOTPLUG.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * This function isolates region [@base, @base + @size), and mark it with flag
+ * MEMBLOCK_HOTPLUG.
+ *
+ * Return 0 on succees, -errno on failure.
+ */
+int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
+{
+	struct memblock_type *type = &memblock.memory;
+	int i, ret, start_rgn, end_rgn;
+
+	ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
+	if (ret)
+		return ret;
+
+	for (i = start_rgn; i < end_rgn; i++)
+		memblock_set_region_flags(&type->regions[i], MEMBLOCK_HOTPLUG);
+
+	memblock_merge_regions(type);
+	return 0;
+}
+
+/**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
  * @nid: nid: node selector, %MAX_NUMNODES for all nodes
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b800c9c..3e95fe5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -174,7 +174,8 @@ void __init find_hotpluggable_memory(void)
 		if (kernel_resides_in_region(base, size))
 			continue;
 
-		/* Will mark hotpluggable memory regions here */
+		/* Mark hotpluggable memory regions in memblock.memory */
+		memblock_mark_hotplug(base, size);
 	}
 
 	early_iounmap(srat_vaddr, length);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 15/18] memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In find_hotpluggable_memory, once we find out a memory region which is
hotpluggable, we want to mark them in memblock.memory. So that we could
control memblock allocator not to allocte hotpluggable memory for the kernel
later.

To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the
hotpluggable memory regions in memblock and a function memblock_mark_hotplug()
to mark hotpluggable memory if we find one.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |   12 ++++++++++++
 mm/memblock.c            |   26 ++++++++++++++++++++++++++
 mm/memory_hotplug.c      |    3 ++-
 3 files changed, 40 insertions(+), 1 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e89e0cd..637ec3d 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -19,6 +19,9 @@
 
 #define INIT_MEMBLOCK_REGIONS	128
 
+/* Definition of memblock flags. */
+#define MEMBLOCK_HOTPLUG	0x1	/* hotpluggable region */
+
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
@@ -60,6 +63,8 @@ int memblock_free(phys_addr_t base, phys_addr_t size);
 int memblock_reserve(phys_addr_t base, phys_addr_t size);
 void memblock_trim_memory(phys_addr_t align);
 
+int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
+
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
 			  unsigned long *out_end_pfn, int *out_nid);
@@ -119,6 +124,13 @@ void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
 	     i != (u64)ULLONG_MAX;					\
 	     __next_free_mem_range_rev(&i, nid, p_start, p_end, p_nid))
 
+static inline void memblock_set_region_flags(struct memblock_region *r,
+					     unsigned long flags)
+{
+	r->flags = flags;
+}
+
+
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 int memblock_set_node(phys_addr_t base, phys_addr_t size, int nid);
 
diff --git a/mm/memblock.c b/mm/memblock.c
index f494a89..05e142b 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -585,6 +585,32 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
 }
 
 /**
+ * memblock_mark_hotplug - Mark hotpluggable memory with flag MEMBLOCK_HOTPLUG.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * This function isolates region [@base, @base + @size), and mark it with flag
+ * MEMBLOCK_HOTPLUG.
+ *
+ * Return 0 on succees, -errno on failure.
+ */
+int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
+{
+	struct memblock_type *type = &memblock.memory;
+	int i, ret, start_rgn, end_rgn;
+
+	ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
+	if (ret)
+		return ret;
+
+	for (i = start_rgn; i < end_rgn; i++)
+		memblock_set_region_flags(&type->regions[i], MEMBLOCK_HOTPLUG);
+
+	memblock_merge_regions(type);
+	return 0;
+}
+
+/**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
  * @nid: nid: node selector, %MAX_NUMNODES for all nodes
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b800c9c..3e95fe5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -174,7 +174,8 @@ void __init find_hotpluggable_memory(void)
 		if (kernel_resides_in_region(base, size))
 			continue;
 
-		/* Will mark hotpluggable memory regions here */
+		/* Mark hotpluggable memory regions in memblock.memory */
+		memblock_mark_hotplug(base, size);
 	}
 
 	early_iounmap(srat_vaddr, length);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 16/18] memblock, mem_hotplug: Make memblock skip hotpluggable regions by default.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Linux kernel cannot migrate pages used by the kernel. As a result, hotpluggable
memory used by the kernel won't be able to be hot-removed. To solve this
problem, the basic idea is to prevent memblock from allocating hotpluggable
memory for the kernel at early time, and arrange all hotpluggable memory in
ACPI SRAT(System Resource Affinity Table) as ZONE_MOVABLE when initializing
zones.

In the previous patches, we have marked hotpluggable memory regions with
MEMBLOCK_HOTPLUG flag in memblock.memory.

In this patch, we make memblock skip these hotpluggable memory regions in
the default allocate function.

memblock_find_in_range_node()
  |-->for_each_free_mem_range_reverse()
        |-->__next_free_mem_range_rev()

The above is the only place where __next_free_mem_range_rev() is used. So
skip hotpluggable memory regions when iterating memblock.memory to find
free memory.

In the later patches, a boot option named "movablenode" will be introduced
to enable/disable using SRAT to arrange ZONE_MOVABLE.


NOTE: This check will always be done. It is OK because if users didn't specify
      movablenode option, the hotpluggable memory won't be marked. So this
      check won't skip any memory, which means the kernel will act as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 mm/memblock.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 05e142b..84bd568 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -695,6 +695,10 @@ void __init_memblock __next_free_mem_range(u64 *idx, int nid,
  * @out_nid: ptr to int for nid of the range, can be %NULL
  *
  * Reverse of __next_free_mem_range().
+ *
+ * Linux kernel cannot migrate pages used by itself. Memory hotplug users won't
+ * be able to hot-remove hotpluggable memory used by the kernel. So this
+ * function skip hotpluggable regions when allocating memory for the kernel.
  */
 void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
 					   phys_addr_t *out_start,
@@ -719,6 +723,10 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
 		if (nid != MAX_NUMNODES && nid != memblock_get_region_node(m))
 			continue;
 
+		/* skip hotpluggable memory regions */
+		if (m->flags & MEMBLOCK_HOTPLUG)
+			continue;
+
 		/* scan areas before each reservation for intersection */
 		for ( ; ri >= 0; ri--) {
 			struct memblock_region *r = &rsv->regions[ri];
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 16/18] memblock, mem_hotplug: Make memblock skip hotpluggable regions by default.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Linux kernel cannot migrate pages used by the kernel. As a result, hotpluggable
memory used by the kernel won't be able to be hot-removed. To solve this
problem, the basic idea is to prevent memblock from allocating hotpluggable
memory for the kernel at early time, and arrange all hotpluggable memory in
ACPI SRAT(System Resource Affinity Table) as ZONE_MOVABLE when initializing
zones.

In the previous patches, we have marked hotpluggable memory regions with
MEMBLOCK_HOTPLUG flag in memblock.memory.

In this patch, we make memblock skip these hotpluggable memory regions in
the default allocate function.

memblock_find_in_range_node()
  |-->for_each_free_mem_range_reverse()
        |-->__next_free_mem_range_rev()

The above is the only place where __next_free_mem_range_rev() is used. So
skip hotpluggable memory regions when iterating memblock.memory to find
free memory.

In the later patches, a boot option named "movablenode" will be introduced
to enable/disable using SRAT to arrange ZONE_MOVABLE.


NOTE: This check will always be done. It is OK because if users didn't specify
      movablenode option, the hotpluggable memory won't be marked. So this
      check won't skip any memory, which means the kernel will act as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 mm/memblock.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 05e142b..84bd568 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -695,6 +695,10 @@ void __init_memblock __next_free_mem_range(u64 *idx, int nid,
  * @out_nid: ptr to int for nid of the range, can be %NULL
  *
  * Reverse of __next_free_mem_range().
+ *
+ * Linux kernel cannot migrate pages used by itself. Memory hotplug users won't
+ * be able to hot-remove hotpluggable memory used by the kernel. So this
+ * function skip hotpluggable regions when allocating memory for the kernel.
  */
 void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
 					   phys_addr_t *out_start,
@@ -719,6 +723,10 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
 		if (nid != MAX_NUMNODES && nid != memblock_get_region_node(m))
 			continue;
 
+		/* skip hotpluggable memory regions */
+		if (m->flags & MEMBLOCK_HOTPLUG)
+			continue;
+
 		/* scan areas before each reservation for intersection */
 		for ( ; ri >= 0; ri--) {
 			struct memblock_region *r = &rsv->regions[ri];
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 17/18] mem-hotplug: Introduce movablenode boot option to {en|dis}able using SRAT.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Hot-Pluggable fired in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel,
it cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movablenode boot option to allow users to
choose to reserve hotpluggable memory and set it as ZONE_MOVABLE or not.

Users can specify "movablenode" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.

Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |   15 +++++++++++++++
 arch/x86/kernel/setup.c             |   10 ++++++++--
 include/linux/memory_hotplug.h      |    3 +++
 mm/memory_hotplug.c                 |   11 +++++++++++
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 2fe6e76..3f77ba3 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1714,6 +1714,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablenode		[KNL,X86] This parameter enables/disables the
+			kernel to arrange hotpluggable memory ranges recorded
+			in ACPI SRAT(System Resource Affinity Table) as
+			ZONE_MOVABLE. And these memory can be hot-removed when
+			the system is up.
+			By specifying this option, all the hotpluggable memory
+			will be in ZONE_MOVABLE, which the kernel cannot use.
+			This will cause NUMA performance down. For users who
+			care about NUMA performance, just don't use it.
+			If all the memory ranges in the system are hotpluggable,
+			then the ones used by the kernel at early time, such as
+			kernel code and data segments, initrd file and so on,
+			won't be set as ZONE_MOVABLE, and won't be hotpluggable.
+			Otherwise the kernel won't have enough memory to boot.
+
 	MTD_Partition=	[MTD]
 			Format: <name>,<region-number>,<size>,<offset>
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 8b1bddd..7c94e92 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1060,14 +1060,20 @@ void __init setup_arch(char **cmdline_p)
 	/* Initialize ACPI root table */
 	acpi_root_table_init();
 
-#ifdef CONFIG_ACPI_NUMA
+#if defined(CONFIG_ACPI_NUMA) && defined(CONFIG_MOVABLE_NODE)
 	/*
 	 * Linux kernel cannot migrate kernel pages, as a result, memory used
 	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
 	 * memory in memblock to prevent memblock from allocating hotpluggable
 	 * memory for the kernel.
+	 *
+	 * If all the memory in a node is hotpluggable, then the kernel won't
+	 * be able to use memory on that node. This will cause NUMA performance
+	 * down. So by default, we don't reserve any hotpluggable memory. Users
+	 * may use "movablenode" boot option to enable this functionality.
 	 */
-	find_hotpluggable_memory();
+	if (movablenode_enable_srat)
+		find_hotpluggable_memory();
 #endif
 
 	/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index c32af49..65b2a48 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,9 @@ enum {
 	ONLINE_MOVABLE,
 };
 
+/* Enable/disable SRAT in movablenode boot option */
+extern bool movablenode_enable_srat;
+
 /*
  * pgdat resizing functions
  */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3e95fe5..97eb26b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -93,6 +93,17 @@ static void release_memory_resource(struct resource *res)
 }
 
 #ifdef CONFIG_ACPI_NUMA
+#ifdef CONFIG_MOVABLE_NODE
+bool __initdata movablenode_enable_srat;
+
+static int __init cmdline_parse_movablenode(char *p)
+{
+	movablenode_enable_srat = true;
+	return 0;
+}
+early_param("movablenode", cmdline_parse_movablenode);
+#endif	/* CONFIG_MOVABLE_NODE */
+
 /**
  * kernel_resides_in_range - Check if kernel resides in a memory region.
  * @base: The base address of the memory region.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 17/18] mem-hotplug: Introduce movablenode boot option to {en|dis}able using SRAT.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Hot-Pluggable fired in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel,
it cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movablenode boot option to allow users to
choose to reserve hotpluggable memory and set it as ZONE_MOVABLE or not.

Users can specify "movablenode" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.

Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |   15 +++++++++++++++
 arch/x86/kernel/setup.c             |   10 ++++++++--
 include/linux/memory_hotplug.h      |    3 +++
 mm/memory_hotplug.c                 |   11 +++++++++++
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 2fe6e76..3f77ba3 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1714,6 +1714,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablenode		[KNL,X86] This parameter enables/disables the
+			kernel to arrange hotpluggable memory ranges recorded
+			in ACPI SRAT(System Resource Affinity Table) as
+			ZONE_MOVABLE. And these memory can be hot-removed when
+			the system is up.
+			By specifying this option, all the hotpluggable memory
+			will be in ZONE_MOVABLE, which the kernel cannot use.
+			This will cause NUMA performance down. For users who
+			care about NUMA performance, just don't use it.
+			If all the memory ranges in the system are hotpluggable,
+			then the ones used by the kernel at early time, such as
+			kernel code and data segments, initrd file and so on,
+			won't be set as ZONE_MOVABLE, and won't be hotpluggable.
+			Otherwise the kernel won't have enough memory to boot.
+
 	MTD_Partition=	[MTD]
 			Format: <name>,<region-number>,<size>,<offset>
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 8b1bddd..7c94e92 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1060,14 +1060,20 @@ void __init setup_arch(char **cmdline_p)
 	/* Initialize ACPI root table */
 	acpi_root_table_init();
 
-#ifdef CONFIG_ACPI_NUMA
+#if defined(CONFIG_ACPI_NUMA) && defined(CONFIG_MOVABLE_NODE)
 	/*
 	 * Linux kernel cannot migrate kernel pages, as a result, memory used
 	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
 	 * memory in memblock to prevent memblock from allocating hotpluggable
 	 * memory for the kernel.
+	 *
+	 * If all the memory in a node is hotpluggable, then the kernel won't
+	 * be able to use memory on that node. This will cause NUMA performance
+	 * down. So by default, we don't reserve any hotpluggable memory. Users
+	 * may use "movablenode" boot option to enable this functionality.
 	 */
-	find_hotpluggable_memory();
+	if (movablenode_enable_srat)
+		find_hotpluggable_memory();
 #endif
 
 	/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index c32af49..65b2a48 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,9 @@ enum {
 	ONLINE_MOVABLE,
 };
 
+/* Enable/disable SRAT in movablenode boot option */
+extern bool movablenode_enable_srat;
+
 /*
  * pgdat resizing functions
  */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3e95fe5..97eb26b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -93,6 +93,17 @@ static void release_memory_resource(struct resource *res)
 }
 
 #ifdef CONFIG_ACPI_NUMA
+#ifdef CONFIG_MOVABLE_NODE
+bool __initdata movablenode_enable_srat;
+
+static int __init cmdline_parse_movablenode(char *p)
+{
+	movablenode_enable_srat = true;
+	return 0;
+}
+early_param("movablenode", cmdline_parse_movablenode);
+#endif	/* CONFIG_MOVABLE_NODE */
+
 /**
  * kernel_resides_in_range - Check if kernel resides in a memory region.
  * @base: The base address of the memory region.
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 18/18] x86, numa, acpi, memory-hotplug: Make movablenode have higher priority.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-01  7:06   ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Arrange hotpluggable memory as ZONE_MOVABLE will cause NUMA performance down
because the kernel cannot use movable memory. For users who don't use memory
hotplug and who don't want to lose their NUMA performance, they need a way to
disable this functionality. So we improved movablecore boot option.

If users specify the original movablecore=nn@ss boot option, the kernel will
arrange [ss, ss+nn) as ZONE_MOVABLE. The kernelcore=nn@ss boot option is similar
except it specifies ZONE_NORMAL ranges.

Now, if users specify "movablenode" in kernel commandline, the kernel will
arrange hotpluggable memory in SRAT as ZONE_MOVABLE. And if users do this, all
the other movablecore=nn@ss and kernelcore=nn@ss options should be ignored.

For those who don't want this, just specify nothing. The kernel will act as
before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |    5 +++++
 mm/page_alloc.c          |   31 ++++++++++++++++++++++++++++---
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 637ec3d..545d143 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -64,6 +64,7 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
 void memblock_trim_memory(phys_addr_t align);
 
 int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
+bool memblock_is_hotpluggable(struct memblock_region *region);
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git a/mm/memblock.c b/mm/memblock.c
index 84bd568..97d7f41 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -610,6 +610,11 @@ int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
 	return 0;
 }
 
+bool __init_memblock memblock_is_hotpluggable(struct memblock_region *region)
+{
+	return region->flags & MEMBLOCK_HOTPLUG;
+}
+
 /**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3edb62..21017d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4878,9 +4878,35 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	nodemask_t saved_node_state = node_states[N_MEMORY];
 	unsigned long totalpages = early_calculate_totalpages();
 	int usable_nodes = nodes_weight(node_states[N_MEMORY]);
+	struct memblock_type *type = &memblock.memory;
 
+	/* Need to find movable_zone earlier when movablenode is specified. */
+	find_usable_zone_for_movable();
+
+#ifdef CONFIG_MOVABLE_NODE
 	/*
-	 * If movablecore was specified, calculate what size of
+	 * If movablenode is specified, ignore kernelcore and movablecore
+	 * options.
+	 */
+	if (movablenode_enable_srat) {
+		for (i = 0; i < type->cnt; i++) {
+			if (!memblock_is_hotpluggable(&type->regions[i]))
+				continue;
+
+			nid = type->regions[i].nid;
+
+			usable_startpfn = PFN_DOWN(type->regions[i].base);
+			zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
+				min(usable_startpfn, zone_movable_pfn[nid]) :
+				usable_startpfn;
+		}
+
+		goto out;
+	}
+#endif
+
+	/*
+	 * If movablecore=nn[KMG] was specified, calculate what size of
 	 * kernelcore that corresponds so that memory usable for
 	 * any allocation type is evenly spread. If both kernelcore
 	 * and movablecore are specified, then the value of kernelcore
@@ -4906,7 +4932,6 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		goto out;
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-	find_usable_zone_for_movable();
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
@@ -4997,12 +5022,12 @@ restart:
 	if (usable_nodes && required_kernelcore > usable_nodes)
 		goto restart;
 
+out:
 	/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
 	for (nid = 0; nid < MAX_NUMNODES; nid++)
 		zone_movable_pfn[nid] =
 			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
 
-out:
 	/* restore the node_state */
 	node_states[N_MEMORY] = saved_node_state;
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 18/18] x86, numa, acpi, memory-hotplug: Make movablenode have higher priority.
@ 2013-08-01  7:06   ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-01  7:06 UTC (permalink / raw)
  To: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Arrange hotpluggable memory as ZONE_MOVABLE will cause NUMA performance down
because the kernel cannot use movable memory. For users who don't use memory
hotplug and who don't want to lose their NUMA performance, they need a way to
disable this functionality. So we improved movablecore boot option.

If users specify the original movablecore=nn@ss boot option, the kernel will
arrange [ss, ss+nn) as ZONE_MOVABLE. The kernelcore=nn@ss boot option is similar
except it specifies ZONE_NORMAL ranges.

Now, if users specify "movablenode" in kernel commandline, the kernel will
arrange hotpluggable memory in SRAT as ZONE_MOVABLE. And if users do this, all
the other movablecore=nn@ss and kernelcore=nn@ss options should be ignored.

For those who don't want this, just specify nothing. The kernel will act as
before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |    5 +++++
 mm/page_alloc.c          |   31 ++++++++++++++++++++++++++++---
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 637ec3d..545d143 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -64,6 +64,7 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
 void memblock_trim_memory(phys_addr_t align);
 
 int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
+bool memblock_is_hotpluggable(struct memblock_region *region);
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git a/mm/memblock.c b/mm/memblock.c
index 84bd568..97d7f41 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -610,6 +610,11 @@ int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
 	return 0;
 }
 
+bool __init_memblock memblock_is_hotpluggable(struct memblock_region *region)
+{
+	return region->flags & MEMBLOCK_HOTPLUG;
+}
+
 /**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3edb62..21017d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4878,9 +4878,35 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	nodemask_t saved_node_state = node_states[N_MEMORY];
 	unsigned long totalpages = early_calculate_totalpages();
 	int usable_nodes = nodes_weight(node_states[N_MEMORY]);
+	struct memblock_type *type = &memblock.memory;
 
+	/* Need to find movable_zone earlier when movablenode is specified. */
+	find_usable_zone_for_movable();
+
+#ifdef CONFIG_MOVABLE_NODE
 	/*
-	 * If movablecore was specified, calculate what size of
+	 * If movablenode is specified, ignore kernelcore and movablecore
+	 * options.
+	 */
+	if (movablenode_enable_srat) {
+		for (i = 0; i < type->cnt; i++) {
+			if (!memblock_is_hotpluggable(&type->regions[i]))
+				continue;
+
+			nid = type->regions[i].nid;
+
+			usable_startpfn = PFN_DOWN(type->regions[i].base);
+			zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
+				min(usable_startpfn, zone_movable_pfn[nid]) :
+				usable_startpfn;
+		}
+
+		goto out;
+	}
+#endif
+
+	/*
+	 * If movablecore=nn[KMG] was specified, calculate what size of
 	 * kernelcore that corresponds so that memory usable for
 	 * any allocation type is evenly spread. If both kernelcore
 	 * and movablecore are specified, then the value of kernelcore
@@ -4906,7 +4932,6 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		goto out;
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-	find_usable_zone_for_movable();
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
@@ -4997,12 +5022,12 @@ restart:
 	if (usable_nodes && required_kernelcore > usable_nodes)
 		goto restart;
 
+out:
 	/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
 	for (nid = 0; nid < MAX_NUMNODES; nid++)
 		zone_movable_pfn[nid] =
 			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
 
-out:
 	/* restore the node_state */
 	node_states[N_MEMORY] = saved_node_state;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-01 13:42     ` Tejun Heo
  -1 siblings, 0 replies; 98+ messages in thread
From: Tejun Heo @ 2013-08-01 13:42 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

> On Thu, Aug 01, 2013 at 03:06:35PM +0800, Tang Chen wrote:
> 
> At early time, memblock will reserve some memory for the kernel,
> such as the kernel code and data segments, initrd file, and so on=EF=BC=8C
> which means the kernel resides in these memory regions.
> 
> Even if these memory regions are hotpluggable, we should not
> mark them as hotpluggable. Otherwise the kernel won't have enough
> memory to boot.
> 
> This patch finds out which memory regions the kernel resides in,
> and skip them when finding all hotpluggable memory regions.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  mm/memory=5Fhotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>   1 files changed, 45 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/memory=5Fhotplug.c b/mm/memory=5Fhotplug.c
> index 326e2f2..b800c9c 100644
> --- a/mm/memory=5Fhotplug.c
> +++ b/mm/memory=5Fhotplug.c
> @@ -31,6 +31,7 @@
>  #include <linux/firmware-map.h>
>  #include <linux/stop=5Fmachine.h>
>  #include <linux/acpi.h>
> +#include <linux/memblock.h>
> =20
>  #include <asm/tlbflush.h>
> =20

This patch is contaminated.  Can you please resend?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
@ 2013-08-01 13:42     ` Tejun Heo
  0 siblings, 0 replies; 98+ messages in thread
From: Tejun Heo @ 2013-08-01 13:42 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

> On Thu, Aug 01, 2013 at 03:06:35PM +0800, Tang Chen wrote:
> 
> At early time, memblock will reserve some memory for the kernel,
> such as the kernel code and data segments, initrd file, and so on=EF=BC=8C
> which means the kernel resides in these memory regions.
> 
> Even if these memory regions are hotpluggable, we should not
> mark them as hotpluggable. Otherwise the kernel won't have enough
> memory to boot.
> 
> This patch finds out which memory regions the kernel resides in,
> and skip them when finding all hotpluggable memory regions.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  mm/memory=5Fhotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>   1 files changed, 45 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/memory=5Fhotplug.c b/mm/memory=5Fhotplug.c
> index 326e2f2..b800c9c 100644
> --- a/mm/memory=5Fhotplug.c
> +++ b/mm/memory=5Fhotplug.c
> @@ -31,6 +31,7 @@
>  #include <linux/firmware-map.h>
>  #include <linux/stop=5Fmachine.h>
>  #include <linux/acpi.h>
> +#include <linux/memblock.h>
> =20
>  #include <asm/tlbflush.h>
> =20

This patch is contaminated.  Can you please resend?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 03/18] acpi: Remove "continue" in macro INVALID_TABLE().
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-01 20:26     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2013-08-01 20:26 UTC (permalink / raw)
  To: Tang Chen
  Cc: lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thursday, August 01, 2013 03:06:25 PM Tang Chen wrote:
> The macro INVALID_TABLE() is defined like this:
> 
>  #define INVALID_TABLE(x, path, name)                                    \
>          { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
> 
> And it is used like this:
> 
> 	for (...) {
> 		...
> 		if (...)
> 			INVALID_TABLE()
> 		...
> 	}
> 
> The "continue" in the macro makes the code hard to understand.
> Change it to the style like other macros:
> 
>  #define INVALID_TABLE(x, path, name)                                    \
>          do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
> 
> So after this patch, this macro should be used like this:
> 
> 	for (...) {
> 		...
> 		if (...) {
> 			INVALID_TABLE()
> 			continue;
> 		}
> 		...
> 	}
> 
> Add the "continue" wherever the macro is called.
> (For now, it is only called in acpi_initrd_override().)
> 
> The idea is from Yinghai Lu <yinghai@kernel.org>.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Acked-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/osl.c |   18 +++++++++++++-----
>  1 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index e721863..91d9f54 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -565,7 +565,7 @@ static const char * const table_sigs[] = {
>  
>  /* Non-fatal errors: Affected tables/files are ignored */
>  #define INVALID_TABLE(x, path, name)					\
> -	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
> +	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
>  
>  #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>  
> @@ -593,9 +593,11 @@ void __init acpi_initrd_override(void *data, size_t size)
>  		data += offset;
>  		size -= offset;
>  
> -		if (file.size < sizeof(struct acpi_table_header))
> +		if (file.size < sizeof(struct acpi_table_header)) {
>  			INVALID_TABLE("Table smaller than ACPI header",
>  				      cpio_path, file.name);
> +			continue;
> +		}
>  
>  		table = file.data;
>  
> @@ -603,15 +605,21 @@ void __init acpi_initrd_override(void *data, size_t size)
>  			if (!memcmp(table->signature, table_sigs[sig], 4))
>  				break;
>  
> -		if (!table_sigs[sig])
> +		if (!table_sigs[sig]) {
>  			INVALID_TABLE("Unknown signature",
>  				      cpio_path, file.name);
> -		if (file.size != table->length)
> +			continue;
> +		}
> +		if (file.size != table->length) {
>  			INVALID_TABLE("File length does not match table length",
>  				      cpio_path, file.name);
> -		if (acpi_table_checksum(file.data, table->length))
> +			continue;
> +		}
> +		if (acpi_table_checksum(file.data, table->length)) {
>  			INVALID_TABLE("Bad table checksum",
>  				      cpio_path, file.name);
> +			continue;
> +		}
>  
>  		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
>  			table->signature, cpio_path, file.name, table->length);
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 03/18] acpi: Remove "continue" in macro INVALID_TABLE().
@ 2013-08-01 20:26     ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2013-08-01 20:26 UTC (permalink / raw)
  To: Tang Chen
  Cc: lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thursday, August 01, 2013 03:06:25 PM Tang Chen wrote:
> The macro INVALID_TABLE() is defined like this:
> 
>  #define INVALID_TABLE(x, path, name)                                    \
>          { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
> 
> And it is used like this:
> 
> 	for (...) {
> 		...
> 		if (...)
> 			INVALID_TABLE()
> 		...
> 	}
> 
> The "continue" in the macro makes the code hard to understand.
> Change it to the style like other macros:
> 
>  #define INVALID_TABLE(x, path, name)                                    \
>          do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
> 
> So after this patch, this macro should be used like this:
> 
> 	for (...) {
> 		...
> 		if (...) {
> 			INVALID_TABLE()
> 			continue;
> 		}
> 		...
> 	}
> 
> Add the "continue" wherever the macro is called.
> (For now, it is only called in acpi_initrd_override().)
> 
> The idea is from Yinghai Lu <yinghai@kernel.org>.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Acked-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/osl.c |   18 +++++++++++++-----
>  1 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index e721863..91d9f54 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -565,7 +565,7 @@ static const char * const table_sigs[] = {
>  
>  /* Non-fatal errors: Affected tables/files are ignored */
>  #define INVALID_TABLE(x, path, name)					\
> -	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
> +	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
>  
>  #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>  
> @@ -593,9 +593,11 @@ void __init acpi_initrd_override(void *data, size_t size)
>  		data += offset;
>  		size -= offset;
>  
> -		if (file.size < sizeof(struct acpi_table_header))
> +		if (file.size < sizeof(struct acpi_table_header)) {
>  			INVALID_TABLE("Table smaller than ACPI header",
>  				      cpio_path, file.name);
> +			continue;
> +		}
>  
>  		table = file.data;
>  
> @@ -603,15 +605,21 @@ void __init acpi_initrd_override(void *data, size_t size)
>  			if (!memcmp(table->signature, table_sigs[sig], 4))
>  				break;
>  
> -		if (!table_sigs[sig])
> +		if (!table_sigs[sig]) {
>  			INVALID_TABLE("Unknown signature",
>  				      cpio_path, file.name);
> -		if (file.size != table->length)
> +			continue;
> +		}
> +		if (file.size != table->length) {
>  			INVALID_TABLE("File length does not match table length",
>  				      cpio_path, file.name);
> -		if (acpi_table_checksum(file.data, table->length))
> +			continue;
> +		}
> +		if (acpi_table_checksum(file.data, table->length)) {
>  			INVALID_TABLE("Bad table checksum",
>  				      cpio_path, file.name);
> +			continue;
> +		}
>  
>  		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
>  			table->signature, cpio_path, file.name, table->length);
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 04/18] acpi: Introduce acpi_invalid_table() to check if a table is invalid.
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-01 20:27     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2013-08-01 20:27 UTC (permalink / raw)
  To: Tang Chen
  Cc: lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thursday, August 01, 2013 03:06:26 PM Tang Chen wrote:
> In acpi_initrd_override(), it checks several things to ensure the
> table it found is valid. In later patches, we need to do these check
> somewhere else. So this patch introduces a common function
> acpi_invalid_table() to do all these checks, and reuse it in different
> places. The function will be used in the subsequent patches.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/osl.c |   86 +++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 61 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 91d9f54..8df8a93 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -572,9 +572,68 @@ static const char * const table_sigs[] = {
>  /* Must not increase 10 or needs code modification below */
>  #define ACPI_OVERRIDE_TABLES 10
>  
> +/*******************************************************************************
> + *
> + * FUNCTION:    acpi_invalid_table
> + *
> + * PARAMETERS:  File               - The initrd file
> + *              Path               - Path to acpi overriding tables in cpio file
> + *              Signature          - Signature of the table
> + *
> + * RETURN:      0 if it passes all the checks, -EINVAL if any check fails.
> + *
> + * DESCRIPTION: Check if an acpi table found in initrd is invalid.
> + *              @signature can be NULL. If it is NULL, the function will check
> + *              if the table signature matches any signature in table_sigs[].
> + *
> + ******************************************************************************/
> +int __init acpi_invalid_table(struct cpio_data *file,
> +			      const char *path, const char *signature)
> +{
> +	int idx;
> +	struct acpi_table_header *table = file->data;
> +
> +	if (file->size < sizeof(struct acpi_table_header)) {
> +		INVALID_TABLE("Table smaller than ACPI header",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	if (signature) {
> +		if (memcmp(table->signature, signature, 4)) {
> +			INVALID_TABLE("Table signature does not match",
> +				      path, file->name);
> +			return -EINVAL;
> +		}
> +	} else {
> +		for (idx = 0; table_sigs[idx]; idx++)
> +			if (!memcmp(table->signature, table_sigs[idx], 4))
> +				break;
> +
> +		if (!table_sigs[idx]) {
> +			INVALID_TABLE("Unknown signature", path, file->name);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	if (file->size != table->length) {
> +		INVALID_TABLE("File length does not match table length",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	if (acpi_table_checksum(file->data, table->length)) {
> +		INVALID_TABLE("Bad table checksum",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  void __init acpi_initrd_override(void *data, size_t size)
>  {
> -	int sig, no, table_nr = 0, total_offset = 0;
> +	int no, table_nr = 0, total_offset = 0;
>  	long offset = 0;
>  	struct acpi_table_header *table;
>  	char cpio_path[32] = "kernel/firmware/acpi/";
> @@ -593,33 +652,10 @@ void __init acpi_initrd_override(void *data, size_t size)
>  		data += offset;
>  		size -= offset;
>  
> -		if (file.size < sizeof(struct acpi_table_header)) {
> -			INVALID_TABLE("Table smaller than ACPI header",
> -				      cpio_path, file.name);
> -			continue;
> -		}
> -
>  		table = file.data;
>  
> -		for (sig = 0; table_sigs[sig]; sig++)
> -			if (!memcmp(table->signature, table_sigs[sig], 4))
> -				break;
> -
> -		if (!table_sigs[sig]) {
> -			INVALID_TABLE("Unknown signature",
> -				      cpio_path, file.name);
> +		if (acpi_invalid_table(&file, cpio_path, NULL))
>  			continue;
> -		}
> -		if (file.size != table->length) {
> -			INVALID_TABLE("File length does not match table length",
> -				      cpio_path, file.name);
> -			continue;
> -		}
> -		if (acpi_table_checksum(file.data, table->length)) {
> -			INVALID_TABLE("Bad table checksum",
> -				      cpio_path, file.name);
> -			continue;
> -		}
>  
>  		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
>  			table->signature, cpio_path, file.name, table->length);
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 04/18] acpi: Introduce acpi_invalid_table() to check if a table is invalid.
@ 2013-08-01 20:27     ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2013-08-01 20:27 UTC (permalink / raw)
  To: Tang Chen
  Cc: lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thursday, August 01, 2013 03:06:26 PM Tang Chen wrote:
> In acpi_initrd_override(), it checks several things to ensure the
> table it found is valid. In later patches, we need to do these check
> somewhere else. So this patch introduces a common function
> acpi_invalid_table() to do all these checks, and reuse it in different
> places. The function will be used in the subsequent patches.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/osl.c |   86 +++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 61 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 91d9f54..8df8a93 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -572,9 +572,68 @@ static const char * const table_sigs[] = {
>  /* Must not increase 10 or needs code modification below */
>  #define ACPI_OVERRIDE_TABLES 10
>  
> +/*******************************************************************************
> + *
> + * FUNCTION:    acpi_invalid_table
> + *
> + * PARAMETERS:  File               - The initrd file
> + *              Path               - Path to acpi overriding tables in cpio file
> + *              Signature          - Signature of the table
> + *
> + * RETURN:      0 if it passes all the checks, -EINVAL if any check fails.
> + *
> + * DESCRIPTION: Check if an acpi table found in initrd is invalid.
> + *              @signature can be NULL. If it is NULL, the function will check
> + *              if the table signature matches any signature in table_sigs[].
> + *
> + ******************************************************************************/
> +int __init acpi_invalid_table(struct cpio_data *file,
> +			      const char *path, const char *signature)
> +{
> +	int idx;
> +	struct acpi_table_header *table = file->data;
> +
> +	if (file->size < sizeof(struct acpi_table_header)) {
> +		INVALID_TABLE("Table smaller than ACPI header",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	if (signature) {
> +		if (memcmp(table->signature, signature, 4)) {
> +			INVALID_TABLE("Table signature does not match",
> +				      path, file->name);
> +			return -EINVAL;
> +		}
> +	} else {
> +		for (idx = 0; table_sigs[idx]; idx++)
> +			if (!memcmp(table->signature, table_sigs[idx], 4))
> +				break;
> +
> +		if (!table_sigs[idx]) {
> +			INVALID_TABLE("Unknown signature", path, file->name);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	if (file->size != table->length) {
> +		INVALID_TABLE("File length does not match table length",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	if (acpi_table_checksum(file->data, table->length)) {
> +		INVALID_TABLE("Bad table checksum",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  void __init acpi_initrd_override(void *data, size_t size)
>  {
> -	int sig, no, table_nr = 0, total_offset = 0;
> +	int no, table_nr = 0, total_offset = 0;
>  	long offset = 0;
>  	struct acpi_table_header *table;
>  	char cpio_path[32] = "kernel/firmware/acpi/";
> @@ -593,33 +652,10 @@ void __init acpi_initrd_override(void *data, size_t size)
>  		data += offset;
>  		size -= offset;
>  
> -		if (file.size < sizeof(struct acpi_table_header)) {
> -			INVALID_TABLE("Table smaller than ACPI header",
> -				      cpio_path, file.name);
> -			continue;
> -		}
> -
>  		table = file.data;
>  
> -		for (sig = 0; table_sigs[sig]; sig++)
> -			if (!memcmp(table->signature, table_sigs[sig], 4))
> -				break;
> -
> -		if (!table_sigs[sig]) {
> -			INVALID_TABLE("Unknown signature",
> -				      cpio_path, file.name);
> +		if (acpi_invalid_table(&file, cpio_path, NULL))
>  			continue;
> -		}
> -		if (file.size != table->length) {
> -			INVALID_TABLE("File length does not match table length",
> -				      cpio_path, file.name);
> -			continue;
> -		}
> -		if (acpi_table_checksum(file.data, table->length)) {
> -			INVALID_TABLE("Bad table checksum",
> -				      cpio_path, file.name);
> -			continue;
> -		}
>  
>  		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
>  			table->signature, cpio_path, file.name, table->length);
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 01/18] acpi: Print Hot-Pluggable Field in SRAT.
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-01 21:55     ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 21:55 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> The Hot-Pluggable field in SRAT suggests if the memory could be
> hotplugged while the system is running. Print it as well when
> parsing SRAT will help users to know which memory is hotpluggable.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Acked-by: Tejun Heo <tj@kernel.org>

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> ---
>  arch/x86/mm/srat.c |   11 +++++++----
>  1 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
> index cdd0da9..d44c8a4 100644
> --- a/arch/x86/mm/srat.c
> +++ b/arch/x86/mm/srat.c
> @@ -146,6 +146,7 @@ int __init
>  acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  {
>  	u64 start, end;
> +	u32 hotpluggable;
>  	int node, pxm;
>  
>  	if (srat_disabled())
> @@ -154,7 +155,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  		goto out_err_bad_srat;
>  	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
>  		goto out_err;
> -	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
> +	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
> +	if (hotpluggable && !save_add_info())
>  		goto out_err;
>  
>  	start = ma->base_address;
> @@ -174,9 +176,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  
>  	node_set(node, numa_nodes_parsed);
>  
> -	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
> -	       node, pxm,
> -	       (unsigned long long) start, (unsigned long long) end - 1);
> +	pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s\n",
> +		node, pxm,
> +		(unsigned long long) start, (unsigned long long) end - 1,
> +		hotpluggable ? " Hot Pluggable" : "");
>  
>  	return 0;
>  out_err_bad_srat:


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 01/18] acpi: Print Hot-Pluggable Field in SRAT.
@ 2013-08-01 21:55     ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 21:55 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> The Hot-Pluggable field in SRAT suggests if the memory could be
> hotplugged while the system is running. Print it as well when
> parsing SRAT will help users to know which memory is hotpluggable.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Acked-by: Tejun Heo <tj@kernel.org>

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> ---
>  arch/x86/mm/srat.c |   11 +++++++----
>  1 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
> index cdd0da9..d44c8a4 100644
> --- a/arch/x86/mm/srat.c
> +++ b/arch/x86/mm/srat.c
> @@ -146,6 +146,7 @@ int __init
>  acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  {
>  	u64 start, end;
> +	u32 hotpluggable;
>  	int node, pxm;
>  
>  	if (srat_disabled())
> @@ -154,7 +155,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  		goto out_err_bad_srat;
>  	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
>  		goto out_err;
> -	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
> +	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
> +	if (hotpluggable && !save_add_info())
>  		goto out_err;
>  
>  	start = ma->base_address;
> @@ -174,9 +176,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  
>  	node_set(node, numa_nodes_parsed);
>  
> -	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
> -	       node, pxm,
> -	       (unsigned long long) start, (unsigned long long) end - 1);
> +	pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s\n",
> +		node, pxm,
> +		(unsigned long long) start, (unsigned long long) end - 1,
> +		hotpluggable ? " Hot Pluggable" : "");
>  
>  	return 0;
>  out_err_bad_srat:



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 02/18] earlycpio.c: Fix the confusing comment of find_cpio_data().
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-01 21:57     ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 21:57 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> The comments of find_cpio_data() says:
> 
>   * @offset: When a matching file is found, this is the offset to the
>   *          beginning of the cpio. ......
> 
> But according to the code,
> 
>   dptr = PTR_ALIGN(p + ch[C_NAMESIZE], 4);
>   nptr = PTR_ALIGN(dptr + ch[C_FILESIZE], 4);
>   ....
>   *offset = (long)nptr - (long)data;	/* data is the cpio file */
> 
> @offset is the offset of the next file, not the matching file itself.
> This is confused and may cause unnecessary waste of time to debug.
> So fix it.
> 
> v1 -> v2:
> As tj suggested, rename @offset to @nextoff which is more clear to
> users. And also adjust the new comments.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  lib/earlycpio.c |   27 ++++++++++++++-------------
>  1 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/earlycpio.c b/lib/earlycpio.c
> index 8078ef4..7affac0 100644
> --- a/lib/earlycpio.c
> +++ b/lib/earlycpio.c
> @@ -49,22 +49,23 @@ enum cpio_fields {
>  
>  /**
>   * cpio_data find_cpio_data - Search for files in an uncompressed cpio
> - * @path:   The directory to search for, including a slash at the end
> - * @data:   Pointer to the the cpio archive or a header inside
> - * @len:    Remaining length of the cpio based on data pointer
> - * @offset: When a matching file is found, this is the offset to the
> - *          beginning of the cpio. It can be used to iterate through
> - *          the cpio to find all files inside of a directory path
> + * @path:       The directory to search for, including a slash at the end
> + * @data:       Pointer to the the cpio archive or a header inside
> + * @len:        Remaining length of the cpio based on data pointer
> + * @nextoff:    When a matching file is found, this is the offset from the
> + *              beginning of the cpio to the beginning of the next file, not the
> + *              matching file itself. It can be used to iterate through the cpio
> + *              to find all files inside of a directory path
>   *
> - * @return: struct cpio_data containing the address, length and
> - *          filename (with the directory path cut off) of the found file.
> - *          If you search for a filename and not for files in a directory,
> - *          pass the absolute path of the filename in the cpio and make sure
> - *          the match returned an empty filename string.
> + * @return:     struct cpio_data containing the address, length and
> + *              filename (with the directory path cut off) of the found file.
> + *              If you search for a filename and not for files in a directory,
> + *              pass the absolute path of the filename in the cpio and make sure
> + *              the match returned an empty filename string.
>   */
>  
>  struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,

This patch does not apply cleanly.  It seems that your branch does not
have 0db0628d90125193280eabb501c94feaf48fa9ab.

Thanks,
-Toshi


> -					  size_t len,  long *offset)
> +					  size_t len,  long *nextoff)
>  {
>  	const size_t cpio_header_len = 8*C_NFIELDS - 2;
>  	struct cpio_data cd = { NULL, 0, "" };
> @@ -124,7 +125,7 @@ struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
>  		if ((ch[C_MODE] & 0170000) == 0100000 &&
>  		    ch[C_NAMESIZE] >= mypathsize &&
>  		    !memcmp(p, path, mypathsize)) {
> -			*offset = (long)nptr - (long)data;
> +			*nextoff = (long)nptr - (long)data;
>  			if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
>  				pr_warn(
>  				"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 02/18] earlycpio.c: Fix the confusing comment of find_cpio_data().
@ 2013-08-01 21:57     ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 21:57 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> The comments of find_cpio_data() says:
> 
>   * @offset: When a matching file is found, this is the offset to the
>   *          beginning of the cpio. ......
> 
> But according to the code,
> 
>   dptr = PTR_ALIGN(p + ch[C_NAMESIZE], 4);
>   nptr = PTR_ALIGN(dptr + ch[C_FILESIZE], 4);
>   ....
>   *offset = (long)nptr - (long)data;	/* data is the cpio file */
> 
> @offset is the offset of the next file, not the matching file itself.
> This is confused and may cause unnecessary waste of time to debug.
> So fix it.
> 
> v1 -> v2:
> As tj suggested, rename @offset to @nextoff which is more clear to
> users. And also adjust the new comments.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  lib/earlycpio.c |   27 ++++++++++++++-------------
>  1 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/earlycpio.c b/lib/earlycpio.c
> index 8078ef4..7affac0 100644
> --- a/lib/earlycpio.c
> +++ b/lib/earlycpio.c
> @@ -49,22 +49,23 @@ enum cpio_fields {
>  
>  /**
>   * cpio_data find_cpio_data - Search for files in an uncompressed cpio
> - * @path:   The directory to search for, including a slash at the end
> - * @data:   Pointer to the the cpio archive or a header inside
> - * @len:    Remaining length of the cpio based on data pointer
> - * @offset: When a matching file is found, this is the offset to the
> - *          beginning of the cpio. It can be used to iterate through
> - *          the cpio to find all files inside of a directory path
> + * @path:       The directory to search for, including a slash at the end
> + * @data:       Pointer to the the cpio archive or a header inside
> + * @len:        Remaining length of the cpio based on data pointer
> + * @nextoff:    When a matching file is found, this is the offset from the
> + *              beginning of the cpio to the beginning of the next file, not the
> + *              matching file itself. It can be used to iterate through the cpio
> + *              to find all files inside of a directory path
>   *
> - * @return: struct cpio_data containing the address, length and
> - *          filename (with the directory path cut off) of the found file.
> - *          If you search for a filename and not for files in a directory,
> - *          pass the absolute path of the filename in the cpio and make sure
> - *          the match returned an empty filename string.
> + * @return:     struct cpio_data containing the address, length and
> + *              filename (with the directory path cut off) of the found file.
> + *              If you search for a filename and not for files in a directory,
> + *              pass the absolute path of the filename in the cpio and make sure
> + *              the match returned an empty filename string.
>   */
>  
>  struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,

This patch does not apply cleanly.  It seems that your branch does not
have 0db0628d90125193280eabb501c94feaf48fa9ab.

Thanks,
-Toshi


> -					  size_t len,  long *offset)
> +					  size_t len,  long *nextoff)
>  {
>  	const size_t cpio_header_len = 8*C_NFIELDS - 2;
>  	struct cpio_data cd = { NULL, 0, "" };
> @@ -124,7 +125,7 @@ struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
>  		if ((ch[C_MODE] & 0170000) == 0100000 &&
>  		    ch[C_NAMESIZE] >= mypathsize &&
>  		    !memcmp(p, path, mypathsize)) {
> -			*offset = (long)nptr - (long)data;
> +			*nextoff = (long)nptr - (long)data;
>  			if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
>  				pr_warn(
>  				"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 03/18] acpi: Remove "continue" in macro INVALID_TABLE().
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-01 22:06     ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 22:06 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> The macro INVALID_TABLE() is defined like this:
> 
>  #define INVALID_TABLE(x, path, name)                                    \
>          { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
> 
> And it is used like this:
> 
> 	for (...) {
> 		...
> 		if (...)
> 			INVALID_TABLE()
> 		...
> 	}
> 
> The "continue" in the macro makes the code hard to understand.
> Change it to the style like other macros:
> 
>  #define INVALID_TABLE(x, path, name)                                    \
>          do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
> 
> So after this patch, this macro should be used like this:
> 
> 	for (...) {
> 		...
> 		if (...) {
> 			INVALID_TABLE()
> 			continue;
> 		}
> 		...
> 	}
> 
> Add the "continue" wherever the macro is called.
> (For now, it is only called in acpi_initrd_override().)
> 
> The idea is from Yinghai Lu <yinghai@kernel.org>.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Acked-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/osl.c |   18 +++++++++++++-----
>  1 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index e721863..91d9f54 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -565,7 +565,7 @@ static const char * const table_sigs[] = {
>  
>  /* Non-fatal errors: Affected tables/files are ignored */
>  #define INVALID_TABLE(x, path, name)					\

Since you are touching this macro, I'd suggest to rename it something
like ACPI_INVALID_TABLE().  INVALID_TABLE() sounds too generic to me.
Otherwise, it looks good.

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> -	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
> +	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
>  
>  #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>  
> @@ -593,9 +593,11 @@ void __init acpi_initrd_override(void *data, size_t size)
>  		data += offset;
>  		size -= offset;
>  
> -		if (file.size < sizeof(struct acpi_table_header))
> +		if (file.size < sizeof(struct acpi_table_header)) {
>  			INVALID_TABLE("Table smaller than ACPI header",
>  				      cpio_path, file.name);
> +			continue;
> +		}
>  
>  		table = file.data;
>  
> @@ -603,15 +605,21 @@ void __init acpi_initrd_override(void *data, size_t size)
>  			if (!memcmp(table->signature, table_sigs[sig], 4))
>  				break;
>  
> -		if (!table_sigs[sig])
> +		if (!table_sigs[sig]) {
>  			INVALID_TABLE("Unknown signature",
>  				      cpio_path, file.name);
> -		if (file.size != table->length)
> +			continue;
> +		}
> +		if (file.size != table->length) {
>  			INVALID_TABLE("File length does not match table length",
>  				      cpio_path, file.name);
> -		if (acpi_table_checksum(file.data, table->length))
> +			continue;
> +		}
> +		if (acpi_table_checksum(file.data, table->length)) {
>  			INVALID_TABLE("Bad table checksum",
>  				      cpio_path, file.name);
> +			continue;
> +		}
>  
>  		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
>  			table->signature, cpio_path, file.name, table->length);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 03/18] acpi: Remove "continue" in macro INVALID_TABLE().
@ 2013-08-01 22:06     ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 22:06 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> The macro INVALID_TABLE() is defined like this:
> 
>  #define INVALID_TABLE(x, path, name)                                    \
>          { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
> 
> And it is used like this:
> 
> 	for (...) {
> 		...
> 		if (...)
> 			INVALID_TABLE()
> 		...
> 	}
> 
> The "continue" in the macro makes the code hard to understand.
> Change it to the style like other macros:
> 
>  #define INVALID_TABLE(x, path, name)                                    \
>          do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
> 
> So after this patch, this macro should be used like this:
> 
> 	for (...) {
> 		...
> 		if (...) {
> 			INVALID_TABLE()
> 			continue;
> 		}
> 		...
> 	}
> 
> Add the "continue" wherever the macro is called.
> (For now, it is only called in acpi_initrd_override().)
> 
> The idea is from Yinghai Lu <yinghai@kernel.org>.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Acked-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/osl.c |   18 +++++++++++++-----
>  1 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index e721863..91d9f54 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -565,7 +565,7 @@ static const char * const table_sigs[] = {
>  
>  /* Non-fatal errors: Affected tables/files are ignored */
>  #define INVALID_TABLE(x, path, name)					\

Since you are touching this macro, I'd suggest to rename it something
like ACPI_INVALID_TABLE().  INVALID_TABLE() sounds too generic to me.
Otherwise, it looks good.

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> -	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
> +	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
>  
>  #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>  
> @@ -593,9 +593,11 @@ void __init acpi_initrd_override(void *data, size_t size)
>  		data += offset;
>  		size -= offset;
>  
> -		if (file.size < sizeof(struct acpi_table_header))
> +		if (file.size < sizeof(struct acpi_table_header)) {
>  			INVALID_TABLE("Table smaller than ACPI header",
>  				      cpio_path, file.name);
> +			continue;
> +		}
>  
>  		table = file.data;
>  
> @@ -603,15 +605,21 @@ void __init acpi_initrd_override(void *data, size_t size)
>  			if (!memcmp(table->signature, table_sigs[sig], 4))
>  				break;
>  
> -		if (!table_sigs[sig])
> +		if (!table_sigs[sig]) {
>  			INVALID_TABLE("Unknown signature",
>  				      cpio_path, file.name);
> -		if (file.size != table->length)
> +			continue;
> +		}
> +		if (file.size != table->length) {
>  			INVALID_TABLE("File length does not match table length",
>  				      cpio_path, file.name);
> -		if (acpi_table_checksum(file.data, table->length))
> +			continue;
> +		}
> +		if (acpi_table_checksum(file.data, table->length)) {
>  			INVALID_TABLE("Bad table checksum",
>  				      cpio_path, file.name);
> +			continue;
> +		}
>  
>  		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
>  			table->signature, cpio_path, file.name, table->length);



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 04/18] acpi: Introduce acpi_invalid_table() to check if a table is invalid.
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-01 22:26     ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 22:26 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> In acpi_initrd_override(), it checks several things to ensure the
> table it found is valid. In later patches, we need to do these check
> somewhere else. So this patch introduces a common function
> acpi_invalid_table() to do all these checks, and reuse it in different
> places. The function will be used in the subsequent patches.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/osl.c |   86 +++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 61 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 91d9f54..8df8a93 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -572,9 +572,68 @@ static const char * const table_sigs[] = {
>  /* Must not increase 10 or needs code modification below */
>  #define ACPI_OVERRIDE_TABLES 10
>  
> +/*******************************************************************************
> + *
> + * FUNCTION:    acpi_invalid_table
> + *
> + * PARAMETERS:  File               - The initrd file
> + *              Path               - Path to acpi overriding tables in cpio file
> + *              Signature          - Signature of the table
> + *
> + * RETURN:      0 if it passes all the checks, -EINVAL if any check fails.
> + *
> + * DESCRIPTION: Check if an acpi table found in initrd is invalid.
> + *              @signature can be NULL. If it is NULL, the function will check
> + *              if the table signature matches any signature in table_sigs[].
> + *
> + ******************************************************************************/
> +int __init acpi_invalid_table(struct cpio_data *file,
> +			      const char *path, const char *signature)

Since this function verifies a given acpi table in initrd (not that the
table is invalid), I'd suggest to rename it something like
acpi_verify_initrd().  Otherwise, it looks good to me.

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> +{
> +	int idx;
> +	struct acpi_table_header *table = file->data;
> +
> +	if (file->size < sizeof(struct acpi_table_header)) {
> +		INVALID_TABLE("Table smaller than ACPI header",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	if (signature) {
> +		if (memcmp(table->signature, signature, 4)) {
> +			INVALID_TABLE("Table signature does not match",
> +				      path, file->name);
> +			return -EINVAL;
> +		}
> +	} else {
> +		for (idx = 0; table_sigs[idx]; idx++)
> +			if (!memcmp(table->signature, table_sigs[idx], 4))
> +				break;
> +
> +		if (!table_sigs[idx]) {
> +			INVALID_TABLE("Unknown signature", path, file->name);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	if (file->size != table->length) {
> +		INVALID_TABLE("File length does not match table length",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	if (acpi_table_checksum(file->data, table->length)) {
> +		INVALID_TABLE("Bad table checksum",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  void __init acpi_initrd_override(void *data, size_t size)
>  {
> -	int sig, no, table_nr = 0, total_offset = 0;
> +	int no, table_nr = 0, total_offset = 0;
>  	long offset = 0;
>  	struct acpi_table_header *table;
>  	char cpio_path[32] = "kernel/firmware/acpi/";
> @@ -593,33 +652,10 @@ void __init acpi_initrd_override(void *data, size_t size)
>  		data += offset;
>  		size -= offset;
>  
> -		if (file.size < sizeof(struct acpi_table_header)) {
> -			INVALID_TABLE("Table smaller than ACPI header",
> -				      cpio_path, file.name);
> -			continue;
> -		}
> -
>  		table = file.data;
>  
> -		for (sig = 0; table_sigs[sig]; sig++)
> -			if (!memcmp(table->signature, table_sigs[sig], 4))
> -				break;
> -
> -		if (!table_sigs[sig]) {
> -			INVALID_TABLE("Unknown signature",
> -				      cpio_path, file.name);
> +		if (acpi_invalid_table(&file, cpio_path, NULL))
>  			continue;
> -		}
> -		if (file.size != table->length) {
> -			INVALID_TABLE("File length does not match table length",
> -				      cpio_path, file.name);
> -			continue;
> -		}
> -		if (acpi_table_checksum(file.data, table->length)) {
> -			INVALID_TABLE("Bad table checksum",
> -				      cpio_path, file.name);
> -			continue;
> -		}
>  
>  		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
>  			table->signature, cpio_path, file.name, table->length);



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 04/18] acpi: Introduce acpi_invalid_table() to check if a table is invalid.
@ 2013-08-01 22:26     ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 22:26 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> In acpi_initrd_override(), it checks several things to ensure the
> table it found is valid. In later patches, we need to do these check
> somewhere else. So this patch introduces a common function
> acpi_invalid_table() to do all these checks, and reuse it in different
> places. The function will be used in the subsequent patches.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/osl.c |   86 +++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 61 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 91d9f54..8df8a93 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -572,9 +572,68 @@ static const char * const table_sigs[] = {
>  /* Must not increase 10 or needs code modification below */
>  #define ACPI_OVERRIDE_TABLES 10
>  
> +/*******************************************************************************
> + *
> + * FUNCTION:    acpi_invalid_table
> + *
> + * PARAMETERS:  File               - The initrd file
> + *              Path               - Path to acpi overriding tables in cpio file
> + *              Signature          - Signature of the table
> + *
> + * RETURN:      0 if it passes all the checks, -EINVAL if any check fails.
> + *
> + * DESCRIPTION: Check if an acpi table found in initrd is invalid.
> + *              @signature can be NULL. If it is NULL, the function will check
> + *              if the table signature matches any signature in table_sigs[].
> + *
> + ******************************************************************************/
> +int __init acpi_invalid_table(struct cpio_data *file,
> +			      const char *path, const char *signature)

Since this function verifies a given acpi table in initrd (not that the
table is invalid), I'd suggest to rename it something like
acpi_verify_initrd().  Otherwise, it looks good to me.

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> +{
> +	int idx;
> +	struct acpi_table_header *table = file->data;
> +
> +	if (file->size < sizeof(struct acpi_table_header)) {
> +		INVALID_TABLE("Table smaller than ACPI header",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	if (signature) {
> +		if (memcmp(table->signature, signature, 4)) {
> +			INVALID_TABLE("Table signature does not match",
> +				      path, file->name);
> +			return -EINVAL;
> +		}
> +	} else {
> +		for (idx = 0; table_sigs[idx]; idx++)
> +			if (!memcmp(table->signature, table_sigs[idx], 4))
> +				break;
> +
> +		if (!table_sigs[idx]) {
> +			INVALID_TABLE("Unknown signature", path, file->name);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	if (file->size != table->length) {
> +		INVALID_TABLE("File length does not match table length",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	if (acpi_table_checksum(file->data, table->length)) {
> +		INVALID_TABLE("Bad table checksum",
> +			      path, file->name);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  void __init acpi_initrd_override(void *data, size_t size)
>  {
> -	int sig, no, table_nr = 0, total_offset = 0;
> +	int no, table_nr = 0, total_offset = 0;
>  	long offset = 0;
>  	struct acpi_table_header *table;
>  	char cpio_path[32] = "kernel/firmware/acpi/";
> @@ -593,33 +652,10 @@ void __init acpi_initrd_override(void *data, size_t size)
>  		data += offset;
>  		size -= offset;
>  
> -		if (file.size < sizeof(struct acpi_table_header)) {
> -			INVALID_TABLE("Table smaller than ACPI header",
> -				      cpio_path, file.name);
> -			continue;
> -		}
> -
>  		table = file.data;
>  
> -		for (sig = 0; table_sigs[sig]; sig++)
> -			if (!memcmp(table->signature, table_sigs[sig], 4))
> -				break;
> -
> -		if (!table_sigs[sig]) {
> -			INVALID_TABLE("Unknown signature",
> -				      cpio_path, file.name);
> +		if (acpi_invalid_table(&file, cpio_path, NULL))
>  			continue;
> -		}
> -		if (file.size != table->length) {
> -			INVALID_TABLE("File length does not match table length",
> -				      cpio_path, file.name);
> -			continue;
> -		}
> -		if (acpi_table_checksum(file.data, table->length)) {
> -			INVALID_TABLE("Bad table checksum",
> -				      cpio_path, file.name);
> -			continue;
> -		}
>  
>  		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
>  			table->signature, cpio_path, file.name, table->length);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-01 23:32     ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 23:32 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi, robert.moore

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> In ACPI, SRAT(System Resource Affinity Table) contains NUMA info.
> The memory affinities in SRAT record every memory range in the
> system, and also, flags specifying if the memory range is
> hotpluggable.
> (Please refer to ACPI spec 5.0 5.2.16)
> 
> memblock starts to work at very early time, and SRAT has not been
> parsed. So we don't know which memory is hotpluggable. In order
> to use memblock to reserve hotpluggable memory, we need to obtain
> SRAT memory affinity info earlier.
> 
> In the current acpi_boot_table_init(), it does the following:
> 1. Parse RSDT, so that we can find all the tables.
> 2. Initialize acpi_gbl_root_table_list, an array of acpi table
>    descriptorsused to store each table's address, length, signature,
>    and so on.
> 3. Check if there is any table in initrd intending to override
>    tables from firmware. If so, override the firmware tables.
> 4. Initialize all the data in acpi_gbl_root_table_list.
> 
> In order to parse SRAT at early time, we need to do similar job as
> step 1 and 2 above earlier to obtain SRAT. It will be very convenient
> if we have acpi_gbl_root_table_list initialized. We can use address
> and signature to find SRAT.
> 
> Since step 1 and 2 allocates no memory, it is OK to do these two
> steps earlier.
> 
> But step 3 will check acpi initrd table override, not just SRAT,
> but also all the other tables. So it is better to keep it untouched.
> 
> This patch splits acpi_boot_table_init() into two steps:
> 1. Parse RSDT, which cannot be overrided, and initialize
>    acpi_gbl_root_table_list. (step 1 + 2 above)
> 2. Install all ACPI tables into acpi_gbl_root_table_list.
>    (step 3 + 4 above)
> 
> In later patches, we will do step 1 + 2 earlier.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/acpica/tbutils.c |   25 ++++++++++++++++++++++---
>  drivers/acpi/tables.c         |    2 ++
>  include/acpi/acpixf.h         |    2 ++
>  3 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c

This patch needs to be reviewed by ACPICA folks.  I'd suggest to change
the patch title to "x86, ACPICA:".  I added Bob to the list.

Thanks,
-Toshi



> index ce3d5db..9d68ffc 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -766,9 +766,30 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	 */
>  	acpi_os_unmap_memory(table, length);
>  
> +	return_ACPI_STATUS(AE_OK);
> +}
> +
> +/*******************************************************************************
> + *
> + * FUNCTION:    acpi_tb_install_root_table
> + *
> + * DESCRIPTION: This function installs all the ACPI tables in RSDT into
> + *              acpi_gbl_root_table_list.
> + *
> + ******************************************************************************/
> +
> +void __init
> +acpi_tb_install_root_table()
> +{
> +	int i;
> +
>  	/*
>  	 * Complete the initialization of the root table array by examining
> -	 * the header of each table
> +	 * the header of each table.
> +	 *
> +	 * First two entries in the table array are reserved for the DSDT
> +	 * and FACS, which are not actually present in the RSDT/XSDT - they
> +	 * come from the FADT.
>  	 */
>  	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
>  		acpi_tb_install_table(acpi_gbl_root_table_list.tables[i].
> @@ -782,6 +803,4 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  			acpi_tb_parse_fadt(i);
>  		}
>  	}
> -
> -	return_ACPI_STATUS(AE_OK);
>  }
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index d67a1fe..8860e79 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -353,6 +353,8 @@ int __init acpi_table_init(void)
>  	if (ACPI_FAILURE(status))
>  		return 1;
>  
> +	acpi_tb_install_root_table();
> +
>  	check_multiple_madt();
>  	return 0;
>  }
> diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
> index 454881e..f5549b5 100644
> --- a/include/acpi/acpixf.h
> +++ b/include/acpi/acpixf.h
> @@ -116,6 +116,8 @@ acpi_status
>  acpi_initialize_tables(struct acpi_table_desc *initial_storage,
>  		       u32 initial_table_count, u8 allow_resize);
>  
> +void acpi_tb_install_root_table(void);
> +
>  acpi_status __init acpi_initialize_subsystem(void);
>  
>  acpi_status acpi_enable_subsystem(u32 flags);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
@ 2013-08-01 23:32     ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 23:32 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi, robert.moore

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> In ACPI, SRAT(System Resource Affinity Table) contains NUMA info.
> The memory affinities in SRAT record every memory range in the
> system, and also, flags specifying if the memory range is
> hotpluggable.
> (Please refer to ACPI spec 5.0 5.2.16)
> 
> memblock starts to work at very early time, and SRAT has not been
> parsed. So we don't know which memory is hotpluggable. In order
> to use memblock to reserve hotpluggable memory, we need to obtain
> SRAT memory affinity info earlier.
> 
> In the current acpi_boot_table_init(), it does the following:
> 1. Parse RSDT, so that we can find all the tables.
> 2. Initialize acpi_gbl_root_table_list, an array of acpi table
>    descriptorsused to store each table's address, length, signature,
>    and so on.
> 3. Check if there is any table in initrd intending to override
>    tables from firmware. If so, override the firmware tables.
> 4. Initialize all the data in acpi_gbl_root_table_list.
> 
> In order to parse SRAT at early time, we need to do similar job as
> step 1 and 2 above earlier to obtain SRAT. It will be very convenient
> if we have acpi_gbl_root_table_list initialized. We can use address
> and signature to find SRAT.
> 
> Since step 1 and 2 allocates no memory, it is OK to do these two
> steps earlier.
> 
> But step 3 will check acpi initrd table override, not just SRAT,
> but also all the other tables. So it is better to keep it untouched.
> 
> This patch splits acpi_boot_table_init() into two steps:
> 1. Parse RSDT, which cannot be overrided, and initialize
>    acpi_gbl_root_table_list. (step 1 + 2 above)
> 2. Install all ACPI tables into acpi_gbl_root_table_list.
>    (step 3 + 4 above)
> 
> In later patches, we will do step 1 + 2 earlier.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/acpica/tbutils.c |   25 ++++++++++++++++++++++---
>  drivers/acpi/tables.c         |    2 ++
>  include/acpi/acpixf.h         |    2 ++
>  3 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c

This patch needs to be reviewed by ACPICA folks.  I'd suggest to change
the patch title to "x86, ACPICA:".  I added Bob to the list.

Thanks,
-Toshi



> index ce3d5db..9d68ffc 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -766,9 +766,30 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	 */
>  	acpi_os_unmap_memory(table, length);
>  
> +	return_ACPI_STATUS(AE_OK);
> +}
> +
> +/*******************************************************************************
> + *
> + * FUNCTION:    acpi_tb_install_root_table
> + *
> + * DESCRIPTION: This function installs all the ACPI tables in RSDT into
> + *              acpi_gbl_root_table_list.
> + *
> + ******************************************************************************/
> +
> +void __init
> +acpi_tb_install_root_table()
> +{
> +	int i;
> +
>  	/*
>  	 * Complete the initialization of the root table array by examining
> -	 * the header of each table
> +	 * the header of each table.
> +	 *
> +	 * First two entries in the table array are reserved for the DSDT
> +	 * and FACS, which are not actually present in the RSDT/XSDT - they
> +	 * come from the FADT.
>  	 */
>  	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
>  		acpi_tb_install_table(acpi_gbl_root_table_list.tables[i].
> @@ -782,6 +803,4 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  			acpi_tb_parse_fadt(i);
>  		}
>  	}
> -
> -	return_ACPI_STATUS(AE_OK);
>  }
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index d67a1fe..8860e79 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -353,6 +353,8 @@ int __init acpi_table_init(void)
>  	if (ACPI_FAILURE(status))
>  		return 1;
>  
> +	acpi_tb_install_root_table();
> +
>  	check_multiple_madt();
>  	return 0;
>  }
> diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
> index 454881e..f5549b5 100644
> --- a/include/acpi/acpixf.h
> +++ b/include/acpi/acpixf.h
> @@ -116,6 +116,8 @@ acpi_status
>  acpi_initialize_tables(struct acpi_table_desc *initial_storage,
>  		       u32 initial_table_count, u8 allow_resize);
>  
> +void acpi_tb_install_root_table(void);
> +
>  acpi_status __init acpi_initialize_subsystem(void);
>  
>  acpi_status acpi_enable_subsystem(u32 flags);



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 06/18] x86, acpi: Initialize ACPI root table list earlier.
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-01 23:54     ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 23:54 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> We have split acpi_table_init() into two steps:
> 1. Pares RSDT or XSDT, and initialize acpi_gbl_root_table_list.
>    This step will record all tables' physical address in memory.
> 2. Check acpi initrd table override and install all tables into
>    acpi_gbl_root_table_list.
> 
> This patch does step 1 earlier, right after memblock is ready.
> 
> When memblock_x86_fill() is called to fulfill memblock.memory[],
> memblock is able to allocate memory.
> 
> This patch introduces a new function acpi_root_table_init() to
> do step 1, and call this function right after memblock_x86_fill()
> is called.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  arch/x86/kernel/acpi/boot.c |   38 +++++++++++++++++++++++---------------
>  arch/x86/kernel/setup.c     |    3 +++
>  drivers/acpi/tables.c       |    7 +++++--
>  include/linux/acpi.h        |    2 ++
>  4 files changed, 33 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 230c8ea..3da5b3c 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -1491,6 +1491,28 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
>  };
>  
>  /*
> + * acpi_root_table_init - Initialize acpi_gbl_root_table_list.
> + *
> + * This function will parse RSDT or XSDT, find all tables' phys addr,
> + * initialize acpi_gbl_root_table_list, and record all tables' phys addr
> + * in acpi_gbl_root_table_list.
> + */
> +void __init acpi_root_table_init(void)

I think acpi_root_table_init() is a bit confusing with
acpi_boot_table_init().  Perhaps, something like
acpi_boot_table_pre_init() or early_acpi_boot_table_init() is better to
indicate that this new function is called before acpi_boot_table_init().

> +{
> +	dmi_check_system(acpi_dmi_table);
> +
> +	/* If acpi_disabled, bail out */
> +	if (acpi_disabled)
> +		return;
> +
> +	/* Initialize the ACPI boot-time table parser */
> +	if (acpi_table_init()) {
> +		disable_acpi();
> +		return;
> +	}
> +}
> +
> +/*
>   * acpi_boot_table_init() and acpi_boot_init()
>   *  called from setup_arch(), always.
>   *	1. checksums all tables
> @@ -1511,21 +1533,7 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
>  
>  void __init acpi_boot_table_init(void)

The comment of this function needs to be updated.  For instance, it
describes acpi_table_init(), which you just relocated.

 * acpi_table_init() is separate to allow reading SRAT without
 * other side effects.
 *

>  {
> -	dmi_check_system(acpi_dmi_table);
> -
> -	/*
> -	 * If acpi_disabled, bail out
> -	 */
> -	if (acpi_disabled)
> -		return; 

I think this check is still necessary.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 06/18] x86, acpi: Initialize ACPI root table list earlier.
@ 2013-08-01 23:54     ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-01 23:54 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> We have split acpi_table_init() into two steps:
> 1. Pares RSDT or XSDT, and initialize acpi_gbl_root_table_list.
>    This step will record all tables' physical address in memory.
> 2. Check acpi initrd table override and install all tables into
>    acpi_gbl_root_table_list.
> 
> This patch does step 1 earlier, right after memblock is ready.
> 
> When memblock_x86_fill() is called to fulfill memblock.memory[],
> memblock is able to allocate memory.
> 
> This patch introduces a new function acpi_root_table_init() to
> do step 1, and call this function right after memblock_x86_fill()
> is called.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  arch/x86/kernel/acpi/boot.c |   38 +++++++++++++++++++++++---------------
>  arch/x86/kernel/setup.c     |    3 +++
>  drivers/acpi/tables.c       |    7 +++++--
>  include/linux/acpi.h        |    2 ++
>  4 files changed, 33 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 230c8ea..3da5b3c 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -1491,6 +1491,28 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
>  };
>  
>  /*
> + * acpi_root_table_init - Initialize acpi_gbl_root_table_list.
> + *
> + * This function will parse RSDT or XSDT, find all tables' phys addr,
> + * initialize acpi_gbl_root_table_list, and record all tables' phys addr
> + * in acpi_gbl_root_table_list.
> + */
> +void __init acpi_root_table_init(void)

I think acpi_root_table_init() is a bit confusing with
acpi_boot_table_init().  Perhaps, something like
acpi_boot_table_pre_init() or early_acpi_boot_table_init() is better to
indicate that this new function is called before acpi_boot_table_init().

> +{
> +	dmi_check_system(acpi_dmi_table);
> +
> +	/* If acpi_disabled, bail out */
> +	if (acpi_disabled)
> +		return;
> +
> +	/* Initialize the ACPI boot-time table parser */
> +	if (acpi_table_init()) {
> +		disable_acpi();
> +		return;
> +	}
> +}
> +
> +/*
>   * acpi_boot_table_init() and acpi_boot_init()
>   *  called from setup_arch(), always.
>   *	1. checksums all tables
> @@ -1511,21 +1533,7 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
>  
>  void __init acpi_boot_table_init(void)

The comment of this function needs to be updated.  For instance, it
describes acpi_table_init(), which you just relocated.

 * acpi_table_init() is separate to allow reading SRAT without
 * other side effects.
 *

>  {
> -	dmi_check_system(acpi_dmi_table);
> -
> -	/*
> -	 * If acpi_disabled, bail out
> -	 */
> -	if (acpi_disabled)
> -		return; 

I think this check is still necessary.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 07/18] x86, acpi: Also initialize signature and length when parsing root table.
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-02  0:10     ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-02  0:10 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi, robert.moore

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> Besides the phys addr of the acpi tables, it will be very convenient if
> we also have the signature of each table in acpi_gbl_root_table_list at
> early time. We can find SRAT easily by comparing the signature.
> 
> This patch alse record signature and some other info in
> acpi_gbl_root_table_list at early time.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/acpica/tbutils.c |   22 ++++++++++++++++++++++
>  1 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c

Same as patch 5/18.  Please change the title to "x86, ACPICA:".  Added
Bob.

Thanks,
-Toshi


> index 9d68ffc..5d31887 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -627,6 +627,7 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	u32 i;
>  	u32 table_count;
>  	struct acpi_table_header *table;
> +	struct acpi_table_desc *table_desc;
>  	acpi_physical_address address;
>  	acpi_physical_address uninitialized_var(rsdt_address);
>  	u32 length;
> @@ -766,6 +767,27 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	 */
>  	acpi_os_unmap_memory(table, length);
>  
> +	/*
> +	 * Also initialize the table entries here, so that later we can use them
> +	 * to find SRAT at very eraly time to reserve hotpluggable memory.
> +	 */
> +	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
> +		table = acpi_os_map_memory(
> +				acpi_gbl_root_table_list.tables[i].address,
> +				sizeof(struct acpi_table_header));
> +		if (!table)
> +			return_ACPI_STATUS(AE_NO_MEMORY);
> +
> +		table_desc = &acpi_gbl_root_table_list.tables[i];
> +
> +		table_desc->pointer = NULL;
> +		table_desc->length = table->length;
> +		table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
> +		ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
> +
> +		acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
> +	}
> +
>  	return_ACPI_STATUS(AE_OK);
>  }
>  



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 07/18] x86, acpi: Also initialize signature and length when parsing root table.
@ 2013-08-02  0:10     ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-02  0:10 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi, robert.moore

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> Besides the phys addr of the acpi tables, it will be very convenient if
> we also have the signature of each table in acpi_gbl_root_table_list at
> early time. We can find SRAT easily by comparing the signature.
> 
> This patch alse record signature and some other info in
> acpi_gbl_root_table_list at early time.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/acpica/tbutils.c |   22 ++++++++++++++++++++++
>  1 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c

Same as patch 5/18.  Please change the title to "x86, ACPICA:".  Added
Bob.

Thanks,
-Toshi


> index 9d68ffc..5d31887 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -627,6 +627,7 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	u32 i;
>  	u32 table_count;
>  	struct acpi_table_header *table;
> +	struct acpi_table_desc *table_desc;
>  	acpi_physical_address address;
>  	acpi_physical_address uninitialized_var(rsdt_address);
>  	u32 length;
> @@ -766,6 +767,27 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	 */
>  	acpi_os_unmap_memory(table, length);
>  
> +	/*
> +	 * Also initialize the table entries here, so that later we can use them
> +	 * to find SRAT at very eraly time to reserve hotpluggable memory.
> +	 */
> +	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
> +		table = acpi_os_map_memory(
> +				acpi_gbl_root_table_list.tables[i].address,
> +				sizeof(struct acpi_table_header));
> +		if (!table)
> +			return_ACPI_STATUS(AE_NO_MEMORY);
> +
> +		table_desc = &acpi_gbl_root_table_list.tables[i];
> +
> +		table_desc->pointer = NULL;
> +		table_desc->length = table->length;
> +		table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
> +		ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
> +
> +		acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
> +	}
> +
>  	return_ACPI_STATUS(AE_OK);
>  }
>  


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 08/18] x86: get pg_data_t's memory from other node
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-02  0:23     ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-02  0:23 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> If system can create movable node which all memory of the node is allocated
> as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's
> pg_data_t. So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
> to retry when the first allocation fails. Otherwise, the system could failed
> to boot.
> 
> The node_data could be on hotpluggable node. And so could pagetable and
> vmemmap. But for now, doing so will break memory hot-remove path.
> 
> A node could have several memory devices. And the device who holds node
> data should be hot-removed in the last place. But in NUAM level, we don't

NUAM -> NUMA

> know which memory_block (/sys/devices/system/node/nodeX/memoryXXX) belongs
> to which memory device. We only have node. So we can only do node hotplug.
> 
> But in virtualization, developers are now developing memory hotplug in qemu,
> which support a single memory device hotplug. So a whole node hotplug will
> not satisfy virtualization users.
> 
> So at last, we concluded that we'd better do memory hotplug and local node
> things (local node node data, pagetable, vmemmap, ...) in two steps.
> Please refer to https://lkml.org/lkml/2013/6/19/73
> 
> For now, we put node_data of movable node to another node, and then improve
> it in the future.
> 
> In the later patches, a boot option will be introduced to enable/disable this
> functionality. If users disable it, the node_data will still be put on the
> local node.
> 
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> ---
>  arch/x86/mm/numa.c |    5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index a71c4e2..5013583 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -209,10 +209,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
>  	 * Allocate node data.  Try node-local memory and then any node.
>  	 * Never allocate in DMA zone.
>  	 */
> -	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
> +	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>  	if (!nd_pa) {
> -		pr_err("Cannot find %zu bytes in node %d\n",
> -		       nd_size, nid);
> +		pr_err("Cannot find %zu bytes in any node\n", nd_size);
>  		return;
>  	}
>  	nd = __va(nd_pa);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 08/18] x86: get pg_data_t's memory from other node
@ 2013-08-02  0:23     ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-02  0:23 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> If system can create movable node which all memory of the node is allocated
> as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's
> pg_data_t. So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
> to retry when the first allocation fails. Otherwise, the system could failed
> to boot.
> 
> The node_data could be on hotpluggable node. And so could pagetable and
> vmemmap. But for now, doing so will break memory hot-remove path.
> 
> A node could have several memory devices. And the device who holds node
> data should be hot-removed in the last place. But in NUAM level, we don't

NUAM -> NUMA

> know which memory_block (/sys/devices/system/node/nodeX/memoryXXX) belongs
> to which memory device. We only have node. So we can only do node hotplug.
> 
> But in virtualization, developers are now developing memory hotplug in qemu,
> which support a single memory device hotplug. So a whole node hotplug will
> not satisfy virtualization users.
> 
> So at last, we concluded that we'd better do memory hotplug and local node
> things (local node node data, pagetable, vmemmap, ...) in two steps.
> Please refer to https://lkml.org/lkml/2013/6/19/73
> 
> For now, we put node_data of movable node to another node, and then improve
> it in the future.
> 
> In the later patches, a boot option will be introduced to enable/disable this
> functionality. If users disable it, the node_data will still be put on the
> local node.
> 
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> ---
>  arch/x86/mm/numa.c |    5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index a71c4e2..5013583 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -209,10 +209,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
>  	 * Allocate node data.  Try node-local memory and then any node.
>  	 * Never allocate in DMA zone.
>  	 */
> -	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
> +	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>  	if (!nd_pa) {
> -		pr_err("Cannot find %zu bytes in node %d\n",
> -		       nd_size, nid);
> +		pr_err("Cannot find %zu bytes in any node\n", nd_size);
>  		return;
>  	}
>  	nd = __va(nd_pa);



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 10/18] x86, acpi: Try to find if SRAT is overrided earlier.
  2013-08-01  7:06   ` Tang Chen
@ 2013-08-02  1:19     ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-02  1:19 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> Linux cannot migrate pages used by the kernel due to the direct mapping
> (va = pa + PAGE_OFFSET), any memory used by the kernel cannot be hot-removed.
> So when using memory hotplug, we have to prevent the kernel from using
> hotpluggable memory.
> 
> The ACPI table SRAT (System Resource Affinity Table) contains info to specify
> which memory is hotpluggble. After SRAT is parsed, we are aware of which
> memory is hotpluggable.
> 
> At the early time when system is booting, SRAT has not been parsed. The boot
> memory allocator memblock will allocate any memory to the kernel. So we need
> SRAT parsed before memblock starts to work.
> 
> In this patch, we are going to parse SRAT earlier, right after memblock is ready.
> 
> Generally speaking, tables such as SRAT are provided by firmware. But
> ACPI_INITRD_TABLE_OVERRIDE functionality allows users to customize their own
> tables in initrd, and override the ones from firmware. So if we want to parse
> SRAT earlier, we also need to do SRAT override earlier.
> 
> First, we introduce early_acpi_override_srat() to check if SRAT will be overridden
> from initrd.
> 
> Second, we introduce find_hotpluggable_memory() to reserve hotpluggable memory,
> which will firstly call early_acpi_override_srat() to find out which memory is
> hotpluggable in the override SRAT.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  arch/x86/kernel/setup.c        |   10 +++++++
>  drivers/acpi/osl.c             |   58 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h           |   14 ++++++++-
>  include/linux/memory_hotplug.h |    2 +
>  mm/memory_hotplug.c            |   25 ++++++++++++++++-
>  5 files changed, 106 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index c8f5d1a..8b1bddd 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1060,6 +1060,16 @@ void __init setup_arch(char **cmdline_p)
>  	/* Initialize ACPI root table */
>  	acpi_root_table_init();
>  
> +#ifdef CONFIG_ACPI_NUMA
> +	/*
> +	 * Linux kernel cannot migrate kernel pages, as a result, memory used
> +	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
> +	 * memory in memblock to prevent memblock from allocating hotpluggable
> +	 * memory for the kernel.
> +	 */
> +	find_hotpluggable_memory();
> +#endif
> +
>  	/*
>  	 * The EFI specification says that boot service code won't be called
>  	 * after ExitBootServices(). This is, in fact, a lie.
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 8df8a93..d0b687c 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -48,6 +48,7 @@
>  
>  #include <asm/io.h>
>  #include <asm/uaccess.h>
> +#include <asm/setup.h>
>  
>  #include <acpi/acpi.h>
>  #include <acpi/acpi_bus.h>
> @@ -631,6 +632,63 @@ int __init acpi_invalid_table(struct cpio_data *file,
>  	return 0;
>  }
>  
> +#ifdef CONFIG_ACPI_NUMA
> +/*******************************************************************************
> + *
> + * FUNCTION:    early_acpi_override_srat
> + *
> + * RETURN:      Phys addr of SRAT on success, 0 on error.
> + *
> + * DESCRIPTION: Try to get the phys addr of SRAT in initrd.
> + *              The ACPI_INITRD_TABLE_OVERRIDE procedure is able to use tables
> + *              in initrd file to override the ones provided by firmware. This
> + *              function checks if there is a SRAT in initrd at early time. If
> + *              so, return the phys addr of the SRAT.
> + *
> + ******************************************************************************/
> +phys_addr_t __init early_acpi_override_srat(void)
> +{
> +	int i;
> +	u32 length;
> +	long offset;
> +	void *ramdisk_vaddr;
> +	struct acpi_table_header *table;
> +	struct cpio_data file;
> +	unsigned long map_step = NR_FIX_BTMAPS << PAGE_SHIFT;
> +	phys_addr_t ramdisk_image = get_ramdisk_image();
> +	char cpio_path[32] = "kernel/firmware/acpi/";

Don't you need to check if ramdisk is present before parsing the table?
You may need something like:

  if (!ramdisk_image || !get_ramdisk_size())
        return 0;

> +
> +	/* Try to find if SRAT is overrided */

overrided -> overridden

> +	for (i = 0; i < ACPI_OVERRIDE_TABLES; i++) {
> +		ramdisk_vaddr = early_ioremap(ramdisk_image, map_step);
> +
> +		file = find_cpio_data(cpio_path, ramdisk_vaddr,
> +				      map_step, &offset);
> +		if (!file.data) {
> +			early_iounmap(ramdisk_vaddr, map_step);
> +			return 0;
> +		}
> +
> +		table = file.data;
> +		length = table->length;
> +
> +		if (acpi_invalid_table(&file, cpio_path, ACPI_SIG_SRAT)) {
> +			ramdisk_image += offset;
> +			early_iounmap(ramdisk_vaddr, map_step);
> +			continue;
> +		}
> +
> +		/* Found SRAT */
> +		early_iounmap(ramdisk_vaddr, map_step);
> +		ramdisk_image = ramdisk_image + offset - length;
> +
> +		break;
> +	}
> +
> +	return ramdisk_image;

Doesn't this function return a physical address regardless of SRAT if a
ramdisk is present?

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 10/18] x86, acpi: Try to find if SRAT is overrided earlier.
@ 2013-08-02  1:19     ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-02  1:19 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> Linux cannot migrate pages used by the kernel due to the direct mapping
> (va = pa + PAGE_OFFSET), any memory used by the kernel cannot be hot-removed.
> So when using memory hotplug, we have to prevent the kernel from using
> hotpluggable memory.
> 
> The ACPI table SRAT (System Resource Affinity Table) contains info to specify
> which memory is hotpluggble. After SRAT is parsed, we are aware of which
> memory is hotpluggable.
> 
> At the early time when system is booting, SRAT has not been parsed. The boot
> memory allocator memblock will allocate any memory to the kernel. So we need
> SRAT parsed before memblock starts to work.
> 
> In this patch, we are going to parse SRAT earlier, right after memblock is ready.
> 
> Generally speaking, tables such as SRAT are provided by firmware. But
> ACPI_INITRD_TABLE_OVERRIDE functionality allows users to customize their own
> tables in initrd, and override the ones from firmware. So if we want to parse
> SRAT earlier, we also need to do SRAT override earlier.
> 
> First, we introduce early_acpi_override_srat() to check if SRAT will be overridden
> from initrd.
> 
> Second, we introduce find_hotpluggable_memory() to reserve hotpluggable memory,
> which will firstly call early_acpi_override_srat() to find out which memory is
> hotpluggable in the override SRAT.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  arch/x86/kernel/setup.c        |   10 +++++++
>  drivers/acpi/osl.c             |   58 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h           |   14 ++++++++-
>  include/linux/memory_hotplug.h |    2 +
>  mm/memory_hotplug.c            |   25 ++++++++++++++++-
>  5 files changed, 106 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index c8f5d1a..8b1bddd 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1060,6 +1060,16 @@ void __init setup_arch(char **cmdline_p)
>  	/* Initialize ACPI root table */
>  	acpi_root_table_init();
>  
> +#ifdef CONFIG_ACPI_NUMA
> +	/*
> +	 * Linux kernel cannot migrate kernel pages, as a result, memory used
> +	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
> +	 * memory in memblock to prevent memblock from allocating hotpluggable
> +	 * memory for the kernel.
> +	 */
> +	find_hotpluggable_memory();
> +#endif
> +
>  	/*
>  	 * The EFI specification says that boot service code won't be called
>  	 * after ExitBootServices(). This is, in fact, a lie.
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 8df8a93..d0b687c 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -48,6 +48,7 @@
>  
>  #include <asm/io.h>
>  #include <asm/uaccess.h>
> +#include <asm/setup.h>
>  
>  #include <acpi/acpi.h>
>  #include <acpi/acpi_bus.h>
> @@ -631,6 +632,63 @@ int __init acpi_invalid_table(struct cpio_data *file,
>  	return 0;
>  }
>  
> +#ifdef CONFIG_ACPI_NUMA
> +/*******************************************************************************
> + *
> + * FUNCTION:    early_acpi_override_srat
> + *
> + * RETURN:      Phys addr of SRAT on success, 0 on error.
> + *
> + * DESCRIPTION: Try to get the phys addr of SRAT in initrd.
> + *              The ACPI_INITRD_TABLE_OVERRIDE procedure is able to use tables
> + *              in initrd file to override the ones provided by firmware. This
> + *              function checks if there is a SRAT in initrd at early time. If
> + *              so, return the phys addr of the SRAT.
> + *
> + ******************************************************************************/
> +phys_addr_t __init early_acpi_override_srat(void)
> +{
> +	int i;
> +	u32 length;
> +	long offset;
> +	void *ramdisk_vaddr;
> +	struct acpi_table_header *table;
> +	struct cpio_data file;
> +	unsigned long map_step = NR_FIX_BTMAPS << PAGE_SHIFT;
> +	phys_addr_t ramdisk_image = get_ramdisk_image();
> +	char cpio_path[32] = "kernel/firmware/acpi/";

Don't you need to check if ramdisk is present before parsing the table?
You may need something like:

  if (!ramdisk_image || !get_ramdisk_size())
        return 0;

> +
> +	/* Try to find if SRAT is overrided */

overrided -> overridden

> +	for (i = 0; i < ACPI_OVERRIDE_TABLES; i++) {
> +		ramdisk_vaddr = early_ioremap(ramdisk_image, map_step);
> +
> +		file = find_cpio_data(cpio_path, ramdisk_vaddr,
> +				      map_step, &offset);
> +		if (!file.data) {
> +			early_iounmap(ramdisk_vaddr, map_step);
> +			return 0;
> +		}
> +
> +		table = file.data;
> +		length = table->length;
> +
> +		if (acpi_invalid_table(&file, cpio_path, ACPI_SIG_SRAT)) {
> +			ramdisk_image += offset;
> +			early_iounmap(ramdisk_vaddr, map_step);
> +			continue;
> +		}
> +
> +		/* Found SRAT */
> +		early_iounmap(ramdisk_vaddr, map_step);
> +		ramdisk_image = ramdisk_image + offset - length;
> +
> +		break;
> +	}
> +
> +	return ramdisk_image;

Doesn't this function return a physical address regardless of SRAT if a
ramdisk is present?

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 03/18] acpi: Remove "continue" in macro INVALID_TABLE().
  2013-08-01 22:06     ` Toshi Kani
@ 2013-08-02  1:32       ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  1:32 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/02/2013 06:06 AM, Toshi Kani wrote:
......
>>   /* Non-fatal errors: Affected tables/files are ignored */
>>   #define INVALID_TABLE(x, path, name)					\
>
> Since you are touching this macro, I'd suggest to rename it something
> like ACPI_INVALID_TABLE().  INVALID_TABLE() sounds too generic to me.
> Otherwise, it looks good.

Hi Toshi-san,

Thanks for your advice and ack, will change the name.

Thanks.

>
> Acked-by: Toshi Kani<toshi.kani@hp.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 03/18] acpi: Remove "continue" in macro INVALID_TABLE().
@ 2013-08-02  1:32       ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  1:32 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/02/2013 06:06 AM, Toshi Kani wrote:
......
>>   /* Non-fatal errors: Affected tables/files are ignored */
>>   #define INVALID_TABLE(x, path, name)					\
>
> Since you are touching this macro, I'd suggest to rename it something
> like ACPI_INVALID_TABLE().  INVALID_TABLE() sounds too generic to me.
> Otherwise, it looks good.

Hi Toshi-san,

Thanks for your advice and ack, will change the name.

Thanks.

>
> Acked-by: Toshi Kani<toshi.kani@hp.com>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 04/18] acpi: Introduce acpi_invalid_table() to check if a table is invalid.
  2013-08-01 22:26     ` Toshi Kani
@ 2013-08-02  1:34       ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  1:34 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/02/2013 06:26 AM, Toshi Kani wrote:
......
>> +int __init acpi_invalid_table(struct cpio_data *file,
>> +			      const char *path, const char *signature)
>
> Since this function verifies a given acpi table in initrd (not that the
> table is invalid), I'd suggest to rename it something like
> acpi_verify_initrd().  Otherwise, it looks good to me.
>

Hi Toshi-san,

Thanks, will change the name.

Thanks.

> Acked-by: Toshi Kani<toshi.kani@hp.com>
>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 04/18] acpi: Introduce acpi_invalid_table() to check if a table is invalid.
@ 2013-08-02  1:34       ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  1:34 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/02/2013 06:26 AM, Toshi Kani wrote:
......
>> +int __init acpi_invalid_table(struct cpio_data *file,
>> +			      const char *path, const char *signature)
>
> Since this function verifies a given acpi table in initrd (not that the
> table is invalid), I'd suggest to rename it something like
> acpi_verify_initrd().  Otherwise, it looks good to me.
>

Hi Toshi-san,

Thanks, will change the name.

Thanks.

> Acked-by: Toshi Kani<toshi.kani@hp.com>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 02/18] earlycpio.c: Fix the confusing comment of find_cpio_data().
  2013-08-01 21:57     ` Toshi Kani
@ 2013-08-02  4:48       ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  4:48 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/02/2013 05:57 AM, Toshi Kani wrote:
......
>>   struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
>
> This patch does not apply cleanly.  It seems that your branch does not
> have 0db0628d90125193280eabb501c94feaf48fa9ab.
>

I have rebased the patch-set to linux 3.11-rc3. And will resend it later.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 02/18] earlycpio.c: Fix the confusing comment of find_cpio_data().
@ 2013-08-02  4:48       ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  4:48 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/02/2013 05:57 AM, Toshi Kani wrote:
......
>>   struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
>
> This patch does not apply cleanly.  It seems that your branch does not
> have 0db0628d90125193280eabb501c94feaf48fa9ab.
>

I have rebased the patch-set to linux 3.11-rc3. And will resend it later.

Thanks.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
  2013-08-01 23:32     ` Toshi Kani
@ 2013-08-02  5:25       ` Zheng, Lv
  -1 siblings, 0 replies; 98+ messages in thread
From: Zheng, Lv @ 2013-08-02  5:25 UTC (permalink / raw)
  To: Toshi Kani, Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6342 bytes --]

> From: linux-acpi-owner@vger.kernel.org
> Sent: Friday, August 02, 2013 7:32 AM
> 
> On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> > In ACPI, SRAT(System Resource Affinity Table) contains NUMA info.
> > The memory affinities in SRAT record every memory range in the
> > system, and also, flags specifying if the memory range is
> > hotpluggable.
> > (Please refer to ACPI spec 5.0 5.2.16)
> >
> > memblock starts to work at very early time, and SRAT has not been
> > parsed. So we don't know which memory is hotpluggable. In order
> > to use memblock to reserve hotpluggable memory, we need to obtain
> > SRAT memory affinity info earlier.
> >
> > In the current acpi_boot_table_init(), it does the following:
> > 1. Parse RSDT, so that we can find all the tables.
> > 2. Initialize acpi_gbl_root_table_list, an array of acpi table
> >    descriptorsused to store each table's address, length, signature,

A missing space here.

> >    and so on.
> > 3. Check if there is any table in initrd intending to override
> >    tables from firmware. If so, override the firmware tables.
> > 4. Initialize all the data in acpi_gbl_root_table_list.
> >
> > In order to parse SRAT at early time, we need to do similar job as
> > step 1 and 2 above earlier to obtain SRAT. It will be very convenient
> > if we have acpi_gbl_root_table_list initialized. We can use address
> > and signature to find SRAT.
> >
> > Since step 1 and 2 allocates no memory, it is OK to do these two
> > steps earlier.
> >
> > But step 3 will check acpi initrd table override, not just SRAT,
> > but also all the other tables. So it is better to keep it untouched.
> >
> > This patch splits acpi_boot_table_init() into two steps:
> > 1. Parse RSDT, which cannot be overrided, and initialize
> >    acpi_gbl_root_table_list. (step 1 + 2 above)
> > 2. Install all ACPI tables into acpi_gbl_root_table_list.
> >    (step 3 + 4 above)
> >
> > In later patches, we will do step 1 + 2 earlier.
> >
> > Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> > Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> > ---
> >  drivers/acpi/acpica/tbutils.c |   25 ++++++++++++++++++++++---
> >  drivers/acpi/tables.c         |    2 ++
> >  include/acpi/acpixf.h         |    2 ++
> >  3 files changed, 26 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> 
> This patch needs to be reviewed by ACPICA folks.  I'd suggest to change
> the patch title to "x86, ACPICA:".  I added Bob to the list.
> 
> Thanks,
> -Toshi
> 
> 
> 
> > index ce3d5db..9d68ffc 100644
> > --- a/drivers/acpi/acpica/tbutils.c
> > +++ b/drivers/acpi/acpica/tbutils.c
> > @@ -766,9 +766,30 @@
> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >  	 */
> >  	acpi_os_unmap_memory(table, length);
> >
> > +	return_ACPI_STATUS(AE_OK);
> > +}
> > +
> >

I don't think you can split the function here.
ACPICA still need to continue to parse the table using the logic implemented in the acpi_tb_install_table() and acpi_tb_parse_fadt(). (for example, endianess of the signature).
You'd better to keep them as is and split some codes from 'acpi_tb_install_table' to form another function: acpi_tb_override_table().

> +/*********************************************************
> **********************
> > + *
> > + * FUNCTION:    acpi_tb_install_root_table

I think this function should be acpi_tb_override_tables, and call acpi_tb_override_table() inside this function for each table.

> > + *
> > + * DESCRIPTION: This function installs all the ACPI tables in RSDT into
> > + *              acpi_gbl_root_table_list.
> > + *
> > +
> **********************************************************
> ********************/
> > +
> > +void __init

Basically, ACPICA will use acpi_status as return value.

> > +acpi_tb_install_root_table()

(void)?

> > +{
> > +	int i;

Please use u32 instead of int.

> > +
> >  	/*
> >  	 * Complete the initialization of the root table array by examining
> > -	 * the header of each table
> > +	 * the header of each table.
> > +	 *
> > +	 * First two entries in the table array are reserved for the DSDT
> > +	 * and FACS, which are not actually present in the RSDT/XSDT - they
> > +	 * come from the FADT.
> >  	 */
> >  	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
> >  		acpi_tb_install_table(acpi_gbl_root_table_list.tables[i].
> > @@ -782,6 +803,4 @@
> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >  			acpi_tb_parse_fadt(i);
> >  		}
> >  	}
> > -
> > -	return_ACPI_STATUS(AE_OK);
> >  }
> > diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> > index d67a1fe..8860e79 100644
> > --- a/drivers/acpi/tables.c
> > +++ b/drivers/acpi/tables.c
> > @@ -353,6 +353,8 @@ int __init acpi_table_init(void)
> >  	if (ACPI_FAILURE(status))
> >  		return 1;
> >
> > +	acpi_tb_install_root_table();
> > +

I think you shouldn't call a function named as acpi_tb_xxx directly in a file belongs to drivers/acpi not drivers/acpi/acpica.
This kind of a function name can only be used inside ACPICA.
You need an interface wrapper in drivers/acpi/acpica/tbxface.c to call such internal functions.

> >  	check_multiple_madt();
> >  	return 0;
> >  }
> > diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
> > index 454881e..f5549b5 100644
> > --- a/include/acpi/acpixf.h
> > +++ b/include/acpi/acpixf.h
> > @@ -116,6 +116,8 @@ acpi_status
> >  acpi_initialize_tables(struct acpi_table_desc *initial_storage,
> >  		       u32 initial_table_count, u8 allow_resize);
> >
> > +void acpi_tb_install_root_table(void);
> > +

The reason is same as the above.
It doesn't make sense that a function with such name appears in the acpixf.h.

Thanks and best regards
-Lv

> >  acpi_status __init acpi_initialize_subsystem(void);
> >
> >  acpi_status acpi_enable_subsystem(u32 flags);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
@ 2013-08-02  5:25       ` Zheng, Lv
  0 siblings, 0 replies; 98+ messages in thread
From: Zheng, Lv @ 2013-08-02  5:25 UTC (permalink / raw)
  To: Toshi Kani, Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi, Moore, Robert

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6342 bytes --]

> From: linux-acpi-owner@vger.kernel.org
> Sent: Friday, August 02, 2013 7:32 AM
> 
> On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> > In ACPI, SRAT(System Resource Affinity Table) contains NUMA info.
> > The memory affinities in SRAT record every memory range in the
> > system, and also, flags specifying if the memory range is
> > hotpluggable.
> > (Please refer to ACPI spec 5.0 5.2.16)
> >
> > memblock starts to work at very early time, and SRAT has not been
> > parsed. So we don't know which memory is hotpluggable. In order
> > to use memblock to reserve hotpluggable memory, we need to obtain
> > SRAT memory affinity info earlier.
> >
> > In the current acpi_boot_table_init(), it does the following:
> > 1. Parse RSDT, so that we can find all the tables.
> > 2. Initialize acpi_gbl_root_table_list, an array of acpi table
> >    descriptorsused to store each table's address, length, signature,

A missing space here.

> >    and so on.
> > 3. Check if there is any table in initrd intending to override
> >    tables from firmware. If so, override the firmware tables.
> > 4. Initialize all the data in acpi_gbl_root_table_list.
> >
> > In order to parse SRAT at early time, we need to do similar job as
> > step 1 and 2 above earlier to obtain SRAT. It will be very convenient
> > if we have acpi_gbl_root_table_list initialized. We can use address
> > and signature to find SRAT.
> >
> > Since step 1 and 2 allocates no memory, it is OK to do these two
> > steps earlier.
> >
> > But step 3 will check acpi initrd table override, not just SRAT,
> > but also all the other tables. So it is better to keep it untouched.
> >
> > This patch splits acpi_boot_table_init() into two steps:
> > 1. Parse RSDT, which cannot be overrided, and initialize
> >    acpi_gbl_root_table_list. (step 1 + 2 above)
> > 2. Install all ACPI tables into acpi_gbl_root_table_list.
> >    (step 3 + 4 above)
> >
> > In later patches, we will do step 1 + 2 earlier.
> >
> > Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> > Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> > ---
> >  drivers/acpi/acpica/tbutils.c |   25 ++++++++++++++++++++++---
> >  drivers/acpi/tables.c         |    2 ++
> >  include/acpi/acpixf.h         |    2 ++
> >  3 files changed, 26 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> 
> This patch needs to be reviewed by ACPICA folks.  I'd suggest to change
> the patch title to "x86, ACPICA:".  I added Bob to the list.
> 
> Thanks,
> -Toshi
> 
> 
> 
> > index ce3d5db..9d68ffc 100644
> > --- a/drivers/acpi/acpica/tbutils.c
> > +++ b/drivers/acpi/acpica/tbutils.c
> > @@ -766,9 +766,30 @@
> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >  	 */
> >  	acpi_os_unmap_memory(table, length);
> >
> > +	return_ACPI_STATUS(AE_OK);
> > +}
> > +
> >

I don't think you can split the function here.
ACPICA still need to continue to parse the table using the logic implemented in the acpi_tb_install_table() and acpi_tb_parse_fadt(). (for example, endianess of the signature).
You'd better to keep them as is and split some codes from 'acpi_tb_install_table' to form another function: acpi_tb_override_table().

> +/*********************************************************
> **********************
> > + *
> > + * FUNCTION:    acpi_tb_install_root_table

I think this function should be acpi_tb_override_tables, and call acpi_tb_override_table() inside this function for each table.

> > + *
> > + * DESCRIPTION: This function installs all the ACPI tables in RSDT into
> > + *              acpi_gbl_root_table_list.
> > + *
> > +
> **********************************************************
> ********************/
> > +
> > +void __init

Basically, ACPICA will use acpi_status as return value.

> > +acpi_tb_install_root_table()

(void)?

> > +{
> > +	int i;

Please use u32 instead of int.

> > +
> >  	/*
> >  	 * Complete the initialization of the root table array by examining
> > -	 * the header of each table
> > +	 * the header of each table.
> > +	 *
> > +	 * First two entries in the table array are reserved for the DSDT
> > +	 * and FACS, which are not actually present in the RSDT/XSDT - they
> > +	 * come from the FADT.
> >  	 */
> >  	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
> >  		acpi_tb_install_table(acpi_gbl_root_table_list.tables[i].
> > @@ -782,6 +803,4 @@
> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >  			acpi_tb_parse_fadt(i);
> >  		}
> >  	}
> > -
> > -	return_ACPI_STATUS(AE_OK);
> >  }
> > diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> > index d67a1fe..8860e79 100644
> > --- a/drivers/acpi/tables.c
> > +++ b/drivers/acpi/tables.c
> > @@ -353,6 +353,8 @@ int __init acpi_table_init(void)
> >  	if (ACPI_FAILURE(status))
> >  		return 1;
> >
> > +	acpi_tb_install_root_table();
> > +

I think you shouldn't call a function named as acpi_tb_xxx directly in a file belongs to drivers/acpi not drivers/acpi/acpica.
This kind of a function name can only be used inside ACPICA.
You need an interface wrapper in drivers/acpi/acpica/tbxface.c to call such internal functions.

> >  	check_multiple_madt();
> >  	return 0;
> >  }
> > diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
> > index 454881e..f5549b5 100644
> > --- a/include/acpi/acpixf.h
> > +++ b/include/acpi/acpixf.h
> > @@ -116,6 +116,8 @@ acpi_status
> >  acpi_initialize_tables(struct acpi_table_desc *initial_storage,
> >  		       u32 initial_table_count, u8 allow_resize);
> >
> > +void acpi_tb_install_root_table(void);
> > +

The reason is same as the above.
It doesn't make sense that a function with such name appears in the acpixf.h.

Thanks and best regards
-Lv

> >  acpi_status __init acpi_initialize_subsystem(void);
> >
> >  acpi_status acpi_enable_subsystem(u32 flags);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [PATCH v2 07/18] x86, acpi: Also initialize signature and length when parsing root table.
  2013-08-02  0:10     ` Toshi Kani
@ 2013-08-02  5:28       ` Zheng, Lv
  -1 siblings, 0 replies; 98+ messages in thread
From: Zheng, Lv @ 2013-08-02  5:28 UTC (permalink / raw)
  To: Toshi Kani, Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman

> From: linux-acpi-owner@vger.kernel.org
> Sent: Friday, August 02, 2013 8:11 AM
> 
> On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> > Besides the phys addr of the acpi tables, it will be very convenient if
> > we also have the signature of each table in acpi_gbl_root_table_list at
> > early time. We can find SRAT easily by comparing the signature.
> >
> > This patch alse record signature and some other info in
> > acpi_gbl_root_table_list at early time.

If you have addressed my comments against PATCH 05, you needn't this patch at all.

Thanks and best regards
-Lv

> >
> > Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> > Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> > ---
> >  drivers/acpi/acpica/tbutils.c |   22 ++++++++++++++++++++++
> >  1 files changed, 22 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> 
> Same as patch 5/18.  Please change the title to "x86, ACPICA:".
> Added
> Bob.
> 
> Thanks,
> -Toshi
> 
> 
> > index 9d68ffc..5d31887 100644
> > --- a/drivers/acpi/acpica/tbutils.c
> > +++ b/drivers/acpi/acpica/tbutils.c
> > @@ -627,6 +627,7 @@
> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >  	u32 i;
> >  	u32 table_count;
> >  	struct acpi_table_header *table;
> > +	struct acpi_table_desc *table_desc;
> >  	acpi_physical_address address;
> >  	acpi_physical_address uninitialized_var(rsdt_address);
> >  	u32 length;
> > @@ -766,6 +767,27 @@
> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >  	 */
> >  	acpi_os_unmap_memory(table, length);
> >
> > +	/*
> > +	 * Also initialize the table entries here, so that later we can use
> them
> > +	 * to find SRAT at very eraly time to reserve hotpluggable memory.
> > +	 */
> > +	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
> > +		table = acpi_os_map_memory(
> > +				acpi_gbl_root_table_list.tables[i].address,
> > +				sizeof(struct acpi_table_header));
> > +		if (!table)
> > +			return_ACPI_STATUS(AE_NO_MEMORY);
> > +
> > +		table_desc = &acpi_gbl_root_table_list.tables[i];
> > +
> > +		table_desc->pointer = NULL;
> > +		table_desc->length = table->length;
> > +		table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
> > +		ACPI_MOVE_32_TO_32(table_desc->signature.ascii,
> table->signature);
> > +
> > +		acpi_os_unmap_memory(table, sizeof(struct
> acpi_table_header));
> > +	}
> > +
> >  	return_ACPI_STATUS(AE_OK);
> >  }
> >
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [PATCH v2 07/18] x86, acpi: Also initialize signature and length when parsing root table.
@ 2013-08-02  5:28       ` Zheng, Lv
  0 siblings, 0 replies; 98+ messages in thread
From: Zheng, Lv @ 2013-08-02  5:28 UTC (permalink / raw)
  To: Toshi Kani, Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi, Moore, Robert

> From: linux-acpi-owner@vger.kernel.org
> Sent: Friday, August 02, 2013 8:11 AM
> 
> On Thu, 2013-08-01 at 15:06 +0800, Tang Chen wrote:
> > Besides the phys addr of the acpi tables, it will be very convenient if
> > we also have the signature of each table in acpi_gbl_root_table_list at
> > early time. We can find SRAT easily by comparing the signature.
> >
> > This patch alse record signature and some other info in
> > acpi_gbl_root_table_list at early time.

If you have addressed my comments against PATCH 05, you needn't this patch at all.

Thanks and best regards
-Lv

> >
> > Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> > Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> > ---
> >  drivers/acpi/acpica/tbutils.c |   22 ++++++++++++++++++++++
> >  1 files changed, 22 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> 
> Same as patch 5/18.  Please change the title to "x86, ACPICA:".
> Added
> Bob.
> 
> Thanks,
> -Toshi
> 
> 
> > index 9d68ffc..5d31887 100644
> > --- a/drivers/acpi/acpica/tbutils.c
> > +++ b/drivers/acpi/acpica/tbutils.c
> > @@ -627,6 +627,7 @@
> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >  	u32 i;
> >  	u32 table_count;
> >  	struct acpi_table_header *table;
> > +	struct acpi_table_desc *table_desc;
> >  	acpi_physical_address address;
> >  	acpi_physical_address uninitialized_var(rsdt_address);
> >  	u32 length;
> > @@ -766,6 +767,27 @@
> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >  	 */
> >  	acpi_os_unmap_memory(table, length);
> >
> > +	/*
> > +	 * Also initialize the table entries here, so that later we can use
> them
> > +	 * to find SRAT at very eraly time to reserve hotpluggable memory.
> > +	 */
> > +	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
> > +		table = acpi_os_map_memory(
> > +				acpi_gbl_root_table_list.tables[i].address,
> > +				sizeof(struct acpi_table_header));
> > +		if (!table)
> > +			return_ACPI_STATUS(AE_NO_MEMORY);
> > +
> > +		table_desc = &acpi_gbl_root_table_list.tables[i];
> > +
> > +		table_desc->pointer = NULL;
> > +		table_desc->length = table->length;
> > +		table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
> > +		ACPI_MOVE_32_TO_32(table_desc->signature.ascii,
> table->signature);
> > +
> > +		acpi_os_unmap_memory(table, sizeof(struct
> acpi_table_header));
> > +	}
> > +
> >  	return_ACPI_STATUS(AE_OK);
> >  }
> >
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 10/18] x86, acpi: Try to find if SRAT is overrided earlier.
  2013-08-02  1:19     ` Toshi Kani
@ 2013-08-02  5:49       ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  5:49 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/02/2013 09:19 AM, Toshi Kani wrote:
......
>> +phys_addr_t __init early_acpi_override_srat(void)
>> +{
>> +	int i;
>> +	u32 length;
>> +	long offset;
>> +	void *ramdisk_vaddr;
>> +	struct acpi_table_header *table;
>> +	struct cpio_data file;
>> +	unsigned long map_step = NR_FIX_BTMAPS<<  PAGE_SHIFT;
>> +	phys_addr_t ramdisk_image = get_ramdisk_image();
>> +	char cpio_path[32] = "kernel/firmware/acpi/";
>
> Don't you need to check if ramdisk is present before parsing the table?
> You may need something like:
>
>    if (!ramdisk_image || !get_ramdisk_size())
>          return 0;

Yes, it is better to do such a check here. But is there a possibility that
no ramdisk is present and we come to setup_arch() ?

......
>> +
>> +	return ramdisk_image;
>
> Doesn't this function return a physical address regardless of SRAT if a
> ramdisk is present?

Yes, and it is not good. I'll add the check above so that this won't happen.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 10/18] x86, acpi: Try to find if SRAT is overrided earlier.
@ 2013-08-02  5:49       ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  5:49 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/02/2013 09:19 AM, Toshi Kani wrote:
......
>> +phys_addr_t __init early_acpi_override_srat(void)
>> +{
>> +	int i;
>> +	u32 length;
>> +	long offset;
>> +	void *ramdisk_vaddr;
>> +	struct acpi_table_header *table;
>> +	struct cpio_data file;
>> +	unsigned long map_step = NR_FIX_BTMAPS<<  PAGE_SHIFT;
>> +	phys_addr_t ramdisk_image = get_ramdisk_image();
>> +	char cpio_path[32] = "kernel/firmware/acpi/";
>
> Don't you need to check if ramdisk is present before parsing the table?
> You may need something like:
>
>    if (!ramdisk_image || !get_ramdisk_size())
>          return 0;

Yes, it is better to do such a check here. But is there a possibility that
no ramdisk is present and we come to setup_arch() ?

......
>> +
>> +	return ramdisk_image;
>
> Doesn't this function return a physical address regardless of SRAT if a
> ramdisk is present?

Yes, and it is not good. I'll add the check above so that this won't happen.

Thanks.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  2013-08-01 13:42     ` Tejun Heo
@ 2013-08-02  5:51       ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  5:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/01/2013 09:42 PM, Tejun Heo wrote:
>> On Thu, Aug 01, 2013 at 03:06:35PM +0800, Tang Chen wrote:
>>
>> At early time, memblock will reserve some memory for the kernel,
>> such as the kernel code and data segments, initrd file, and so on=EF=BC=8C
>> which means the kernel resides in these memory regions.
>>
>> Even if these memory regions are hotpluggable, we should not
>> mark them as hotpluggable. Otherwise the kernel won't have enough
>> memory to boot.
>>
>> This patch finds out which memory regions the kernel resides in,
>> and skip them when finding all hotpluggable memory regions.
>>
>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>> Reviewed-by: Zhang Yanfei<zhangyanfei@cn.fujitsu.com>
>> ---
>>   mm/memory=5Fhotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>    1 files changed, 45 insertions(+), 0 deletions(-)
>>
>> diff --git a/mm/memory=5Fhotplug.c b/mm/memory=5Fhotplug.c
>> index 326e2f2..b800c9c 100644
>> --- a/mm/memory=5Fhotplug.c
>> +++ b/mm/memory=5Fhotplug.c
>> @@ -31,6 +31,7 @@
>>   #include<linux/firmware-map.h>
>>   #include<linux/stop=5Fmachine.h>
>>   #include<linux/acpi.h>
>> +#include<linux/memblock.h>
>> =20
>>   #include<asm/tlbflush.h>
>> =20
>
> This patch is contaminated.  Can you please resend?It

It's wired. I'll rebase these patches to linux 3.11-rc3 and resend them all.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
@ 2013-08-02  5:51       ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  5:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/01/2013 09:42 PM, Tejun Heo wrote:
>> On Thu, Aug 01, 2013 at 03:06:35PM +0800, Tang Chen wrote:
>>
>> At early time, memblock will reserve some memory for the kernel,
>> such as the kernel code and data segments, initrd file, and so on=EF=BC=8C
>> which means the kernel resides in these memory regions.
>>
>> Even if these memory regions are hotpluggable, we should not
>> mark them as hotpluggable. Otherwise the kernel won't have enough
>> memory to boot.
>>
>> This patch finds out which memory regions the kernel resides in,
>> and skip them when finding all hotpluggable memory regions.
>>
>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>> Reviewed-by: Zhang Yanfei<zhangyanfei@cn.fujitsu.com>
>> ---
>>   mm/memory=5Fhotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>    1 files changed, 45 insertions(+), 0 deletions(-)
>>
>> diff --git a/mm/memory=5Fhotplug.c b/mm/memory=5Fhotplug.c
>> index 326e2f2..b800c9c 100644
>> --- a/mm/memory=5Fhotplug.c
>> +++ b/mm/memory=5Fhotplug.c
>> @@ -31,6 +31,7 @@
>>   #include<linux/firmware-map.h>
>>   #include<linux/stop=5Fmachine.h>
>>   #include<linux/acpi.h>
>> +#include<linux/memblock.h>
>> =20
>>   #include<asm/tlbflush.h>
>> =20
>
> This patch is contaminated.  Can you please resend?It

It's wired. I'll rebase these patches to linux 3.11-rc3 and resend them all.

Thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
  2013-08-02  5:25       ` Zheng, Lv
@ 2013-08-02  7:01         ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  7:01 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen

On 08/02/2013 01:25 PM, Zheng, Lv wrote:
......
>>> index ce3d5db..9d68ffc 100644
>>> --- a/drivers/acpi/acpica/tbutils.c
>>> +++ b/drivers/acpi/acpica/tbutils.c
>>> @@ -766,9 +766,30 @@
>> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>>>   	*/
>>>   	acpi_os_unmap_memory(table, length);
>>>
>>> +	return_ACPI_STATUS(AE_OK);
>>> +}
>>> +
>>>
>
> I don't think you can split the function here.
> ACPICA still need to continue to parse the table using the logic implemented in the acpi_tb_install_table() and acpi_tb_parse_fadt(). (for example, endianess of the signature).
> You'd better to keep them as is and split some codes from 'acpi_tb_install_table' to form another function: acpi_tb_override_table().

I'm sorry, I don't quite follow this.

I split acpi_tb_parse_root_table(), not acpi_tb_install_table() and 
acpi_tb_parse_fadt().
If ACPICA wants to use these two functions somewhere else, I think it is 
OK, isn't it?

And the reason I did this, please see below.

......
>>> + *
>>> + * FUNCTION:    acpi_tb_install_root_table
>
> I think this function should be acpi_tb_override_tables, and call acpi_tb_override_table() inside this function for each table.

It is not just about acpi initrd table override.

acpi_tb_parse_root_table() was split into two steps:
1. initialize acpi_gbl_root_table_list
2. install tables into acpi_gbl_root_table_list

I need step1 earlier because I want to find SRAT at early time.
But I don't want step2 earlier because before install the tables in 
firmware,
acpi initrd table override could happen. I want only SRAT, I don't want to
touch much existing code.

Would you please explain more about your comment ? I think maybe I 
missed something
important to you guys. :)

And all the other ACPICA rules will be followed in the next version.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
@ 2013-08-02  7:01         ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  7:01 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi, Moore, Robert

On 08/02/2013 01:25 PM, Zheng, Lv wrote:
......
>>> index ce3d5db..9d68ffc 100644
>>> --- a/drivers/acpi/acpica/tbutils.c
>>> +++ b/drivers/acpi/acpica/tbutils.c
>>> @@ -766,9 +766,30 @@
>> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>>>   	*/
>>>   	acpi_os_unmap_memory(table, length);
>>>
>>> +	return_ACPI_STATUS(AE_OK);
>>> +}
>>> +
>>>
>
> I don't think you can split the function here.
> ACPICA still need to continue to parse the table using the logic implemented in the acpi_tb_install_table() and acpi_tb_parse_fadt(). (for example, endianess of the signature).
> You'd better to keep them as is and split some codes from 'acpi_tb_install_table' to form another function: acpi_tb_override_table().

I'm sorry, I don't quite follow this.

I split acpi_tb_parse_root_table(), not acpi_tb_install_table() and 
acpi_tb_parse_fadt().
If ACPICA wants to use these two functions somewhere else, I think it is 
OK, isn't it?

And the reason I did this, please see below.

......
>>> + *
>>> + * FUNCTION:    acpi_tb_install_root_table
>
> I think this function should be acpi_tb_override_tables, and call acpi_tb_override_table() inside this function for each table.

It is not just about acpi initrd table override.

acpi_tb_parse_root_table() was split into two steps:
1. initialize acpi_gbl_root_table_list
2. install tables into acpi_gbl_root_table_list

I need step1 earlier because I want to find SRAT at early time.
But I don't want step2 earlier because before install the tables in 
firmware,
acpi initrd table override could happen. I want only SRAT, I don't want to
touch much existing code.

Would you please explain more about your comment ? I think maybe I 
missed something
important to you guys. :)

And all the other ACPICA rules will be followed in the next version.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 06/18] x86, acpi: Initialize ACPI root table list earlier.
  2013-08-01 23:54     ` Toshi Kani
@ 2013-08-02  7:49       ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  7:49 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/02/2013 07:54 AM, Toshi Kani wrote:
......
>>   /*
>> + * acpi_root_table_init - Initialize acpi_gbl_root_table_list.
>> + *
>> + * This function will parse RSDT or XSDT, find all tables' phys addr,
>> + * initialize acpi_gbl_root_table_list, and record all tables' phys addr
>> + * in acpi_gbl_root_table_list.
>> + */
>> +void __init acpi_root_table_init(void)
>
> I think acpi_root_table_init() is a bit confusing with
> acpi_boot_table_init().  Perhaps, something like
> acpi_boot_table_pre_init() or early_acpi_boot_table_init() is better to
> indicate that this new function is called before acpi_boot_table_init().
>

OK, will change it to early_acpi_boot_table_init().

>> +{
>> +	dmi_check_system(acpi_dmi_table);
>> +
>> +	/* If acpi_disabled, bail out */
>> +	if (acpi_disabled)
>> +		return;
>> +
>> +	/* Initialize the ACPI boot-time table parser */
>> +	if (acpi_table_init()) {
>> +		disable_acpi();
>> +		return;
>> +	}
>> +}
>> +
>> +/*
>>    * acpi_boot_table_init() and acpi_boot_init()
>>    *  called from setup_arch(), always.
>>    *	1. checksums all tables
>> @@ -1511,21 +1533,7 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
>>
>>   void __init acpi_boot_table_init(void)
>
> The comment of this function needs to be updated.  For instance, it
> describes acpi_table_init(), which you just relocated.
>
>   * acpi_table_init() is separate to allow reading SRAT without
>   * other side effects.
>   *

Sure. But I don't quite understand this comment. It seems that
acpi_table_init() has nothing to do with SRAT.

Do you know anything about this ?

>
>>   {
>> -	dmi_check_system(acpi_dmi_table);
>> -
>> -	/*
>> -	 * If acpi_disabled, bail out
>> -	 */
>> -	if (acpi_disabled)
>> -		return;
>
> I think this check is still necessary.
>

Yes. Will add it.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 06/18] x86, acpi: Initialize ACPI root table list earlier.
@ 2013-08-02  7:49       ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  7:49 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/02/2013 07:54 AM, Toshi Kani wrote:
......
>>   /*
>> + * acpi_root_table_init - Initialize acpi_gbl_root_table_list.
>> + *
>> + * This function will parse RSDT or XSDT, find all tables' phys addr,
>> + * initialize acpi_gbl_root_table_list, and record all tables' phys addr
>> + * in acpi_gbl_root_table_list.
>> + */
>> +void __init acpi_root_table_init(void)
>
> I think acpi_root_table_init() is a bit confusing with
> acpi_boot_table_init().  Perhaps, something like
> acpi_boot_table_pre_init() or early_acpi_boot_table_init() is better to
> indicate that this new function is called before acpi_boot_table_init().
>

OK, will change it to early_acpi_boot_table_init().

>> +{
>> +	dmi_check_system(acpi_dmi_table);
>> +
>> +	/* If acpi_disabled, bail out */
>> +	if (acpi_disabled)
>> +		return;
>> +
>> +	/* Initialize the ACPI boot-time table parser */
>> +	if (acpi_table_init()) {
>> +		disable_acpi();
>> +		return;
>> +	}
>> +}
>> +
>> +/*
>>    * acpi_boot_table_init() and acpi_boot_init()
>>    *  called from setup_arch(), always.
>>    *	1. checksums all tables
>> @@ -1511,21 +1533,7 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
>>
>>   void __init acpi_boot_table_init(void)
>
> The comment of this function needs to be updated.  For instance, it
> describes acpi_table_init(), which you just relocated.
>
>   * acpi_table_init() is separate to allow reading SRAT without
>   * other side effects.
>   *

Sure. But I don't quite understand this comment. It seems that
acpi_table_init() has nothing to do with SRAT.

Do you know anything about this ?

>
>>   {
>> -	dmi_check_system(acpi_dmi_table);
>> -
>> -	/*
>> -	 * If acpi_disabled, bail out
>> -	 */
>> -	if (acpi_disabled)
>> -		return;
>
> I think this check is still necessary.
>

Yes. Will add it.

Thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
  2013-08-02  7:01         ` Tang Chen
@ 2013-08-02  8:11           ` Zheng, Lv
  -1 siblings, 0 replies; 98+ messages in thread
From: Zheng, Lv @ 2013-08-02  8:11 UTC (permalink / raw)
  To: Tang Chen
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis@profitbricks.com

> From: Tang Chen [mailto:tangchen@cn.fujitsu.com]
> Sent: Friday, August 02, 2013 3:01 PM
> 
> On 08/02/2013 01:25 PM, Zheng, Lv wrote:
> ......
> >>> index ce3d5db..9d68ffc 100644
> >>> --- a/drivers/acpi/acpica/tbutils.c
> >>> +++ b/drivers/acpi/acpica/tbutils.c
> >>> @@ -766,9 +766,30 @@
> >> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >>>   	*/
> >>>   	acpi_os_unmap_memory(table, length);
> >>>
> >>> +	return_ACPI_STATUS(AE_OK);
> >>> +}
> >>> +
> >>>
> >
> > I don't think you can split the function here.
> > ACPICA still need to continue to parse the table using the logic
> implemented in the acpi_tb_install_table() and acpi_tb_parse_fadt().
> (for example, endianess of the signature).
> > You'd better to keep them as is and split some codes from
> 'acpi_tb_install_table' to form another function:
> acpi_tb_override_table().
> 
> I'm sorry, I don't quite follow this.
> 
> I split acpi_tb_parse_root_table(), not acpi_tb_install_table() and
> acpi_tb_parse_fadt().
> If ACPICA wants to use these two functions somewhere else, I think it is
> OK, isn't it?
> 
> And the reason I did this, please see below.
> 
> ......
> >>> + *
> >>> + * FUNCTION:    acpi_tb_install_root_table
> >
> > I think this function should be acpi_tb_override_tables, and call
> acpi_tb_override_table() inside this function for each table.
> 
> It is not just about acpi initrd table override.
> 
> acpi_tb_parse_root_table() was split into two steps:
> 1. initialize acpi_gbl_root_table_list
> 2. install tables into acpi_gbl_root_table_list
> 
> I need step1 earlier because I want to find SRAT at early time.
> But I don't want step2 earlier because before install the tables in
> firmware,
> acpi initrd table override could happen. I want only SRAT, I don't want to
> touch much existing code.

According to what you've explained, what you didn’t want to be called earlier is exactly "acpi initrd table override", please split only this logic to the step 2 and leave the others remained.
I think you should write a function named as acpi_override_tables() or likewise in tbxface.c to be executed as the OSPM entry of the step 2.
Inside this function, acpi_tb_table_override() should be called.

268 void
269 acpi_tb_install_table(acpi_physical_address address,
270                       char *signature, u32 table_index)
271 {

I think you still need the following codes to be called at the early stage.

272         struct acpi_table_header *table;
273         struct acpi_table_header *final_table;
274         struct acpi_table_desc *table_desc;
275 
276         if (!address) {
277                 ACPI_ERROR((AE_INFO,
278                             "Null physical address for ACPI table [%s]",
279                             signature));
280                 return;
281         }
282 
283         /* Map just the table header */
284 
285         table = acpi_os_map_memory(address, sizeof(struct acpi_table_header));
286         if (!table) {
287                 ACPI_ERROR((AE_INFO,
288                             "Could not map memory for table [%s] at %p",
289                             signature, ACPI_CAST_PTR(void, address)));
290                 return;
291         }
292 
293         /* If a particular signature is expected (DSDT/FACS), it must match */
294 
295         if (signature && !ACPI_COMPARE_NAME(table->signature, signature)) {
296                 ACPI_BIOS_ERROR((AE_INFO,
297                                  "Invalid signature 0x%X for ACPI table, expected [%s]",
298                                  *ACPI_CAST_PTR(u32, table->signature),
299                                  signature));
300                 goto unmap_and_exit;
301         }
302 
303         /*
304          * Initialize the table entry. Set the pointer to NULL, since the
305          * table is not fully mapped at this time.
306          */
307         table_desc = &acpi_gbl_root_table_list.tables[table_index];
308 
309         table_desc->address = address;
310         table_desc->pointer = NULL;
311         table_desc->length = table->length;
312         table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
313         ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
314 

You should delete the following codes:

315         /*
316          * ACPI Table Override:
317          *
318          * Before we install the table, let the host OS override it with a new
319          * one if desired. Any table within the RSDT/XSDT can be replaced,
320          * including the DSDT which is pointed to by the FADT.
321          *
322          * NOTE: If the table is overridden, then final_table will contain a
323          * mapped pointer to the full new table. If the table is not overridden,
324          * or if there has been a physical override, then the table will be
325          * fully mapped later (in verify table). In any case, we must
326          * unmap the header that was mapped above.
327          */
328         final_table = acpi_tb_table_override(table, table_desc);
329         if (!final_table) {
330                 final_table = table;    /* There was no override */
331         }
332 

You still need to keep the following logic.

333         acpi_tb_print_table_header(table_desc->address, final_table);
334 
335         /* Set the global integer width (based upon revision of the DSDT) */
336 
337         if (table_index == ACPI_TABLE_INDEX_DSDT) {
338                 acpi_ut_set_integer_width(final_table->revision);
339         }
340 

You should delete the following codes:

341         /*
342          * If we have a physical override during this early loading of the ACPI
343          * tables, unmap the table for now. It will be mapped again later when
344          * it is actually used. This supports very early loading of ACPI tables,
345          * before virtual memory is fully initialized and running within the
346          * host OS. Note: A logical override has the ACPI_TABLE_ORIGIN_OVERRIDE
347          * flag set and will not be deleted below.
348          */
349         if (final_table != table) {
350                 acpi_tb_delete_table(table_desc);
351         }

Keep the following.

352 
353       unmap_and_exit:
354 
355         /* Always unmap the table header that we mapped above */
356 
357         acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
358 }

I'm not sure if this can make my concerns clearer for you now.

Thanks and best regards
-Lv

> 
> Would you please explain more about your comment ? I think maybe I
> missed something
> important to you guys. :)
> 
> And all the other ACPICA rules will be followed in the next version.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
@ 2013-08-02  8:11           ` Zheng, Lv
  0 siblings, 0 replies; 98+ messages in thread
From: Zheng, Lv @ 2013-08-02  8:11 UTC (permalink / raw)
  To: Tang Chen
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi, Moore, Robert

> From: Tang Chen [mailto:tangchen@cn.fujitsu.com]
> Sent: Friday, August 02, 2013 3:01 PM
> 
> On 08/02/2013 01:25 PM, Zheng, Lv wrote:
> ......
> >>> index ce3d5db..9d68ffc 100644
> >>> --- a/drivers/acpi/acpica/tbutils.c
> >>> +++ b/drivers/acpi/acpica/tbutils.c
> >>> @@ -766,9 +766,30 @@
> >> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >>>   	*/
> >>>   	acpi_os_unmap_memory(table, length);
> >>>
> >>> +	return_ACPI_STATUS(AE_OK);
> >>> +}
> >>> +
> >>>
> >
> > I don't think you can split the function here.
> > ACPICA still need to continue to parse the table using the logic
> implemented in the acpi_tb_install_table() and acpi_tb_parse_fadt().
> (for example, endianess of the signature).
> > You'd better to keep them as is and split some codes from
> 'acpi_tb_install_table' to form another function:
> acpi_tb_override_table().
> 
> I'm sorry, I don't quite follow this.
> 
> I split acpi_tb_parse_root_table(), not acpi_tb_install_table() and
> acpi_tb_parse_fadt().
> If ACPICA wants to use these two functions somewhere else, I think it is
> OK, isn't it?
> 
> And the reason I did this, please see below.
> 
> ......
> >>> + *
> >>> + * FUNCTION:    acpi_tb_install_root_table
> >
> > I think this function should be acpi_tb_override_tables, and call
> acpi_tb_override_table() inside this function for each table.
> 
> It is not just about acpi initrd table override.
> 
> acpi_tb_parse_root_table() was split into two steps:
> 1. initialize acpi_gbl_root_table_list
> 2. install tables into acpi_gbl_root_table_list
> 
> I need step1 earlier because I want to find SRAT at early time.
> But I don't want step2 earlier because before install the tables in
> firmware,
> acpi initrd table override could happen. I want only SRAT, I don't want to
> touch much existing code.

According to what you've explained, what you didn’t want to be called earlier is exactly "acpi initrd table override", please split only this logic to the step 2 and leave the others remained.
I think you should write a function named as acpi_override_tables() or likewise in tbxface.c to be executed as the OSPM entry of the step 2.
Inside this function, acpi_tb_table_override() should be called.

268 void
269 acpi_tb_install_table(acpi_physical_address address,
270                       char *signature, u32 table_index)
271 {

I think you still need the following codes to be called at the early stage.

272         struct acpi_table_header *table;
273         struct acpi_table_header *final_table;
274         struct acpi_table_desc *table_desc;
275 
276         if (!address) {
277                 ACPI_ERROR((AE_INFO,
278                             "Null physical address for ACPI table [%s]",
279                             signature));
280                 return;
281         }
282 
283         /* Map just the table header */
284 
285         table = acpi_os_map_memory(address, sizeof(struct acpi_table_header));
286         if (!table) {
287                 ACPI_ERROR((AE_INFO,
288                             "Could not map memory for table [%s] at %p",
289                             signature, ACPI_CAST_PTR(void, address)));
290                 return;
291         }
292 
293         /* If a particular signature is expected (DSDT/FACS), it must match */
294 
295         if (signature && !ACPI_COMPARE_NAME(table->signature, signature)) {
296                 ACPI_BIOS_ERROR((AE_INFO,
297                                  "Invalid signature 0x%X for ACPI table, expected [%s]",
298                                  *ACPI_CAST_PTR(u32, table->signature),
299                                  signature));
300                 goto unmap_and_exit;
301         }
302 
303         /*
304          * Initialize the table entry. Set the pointer to NULL, since the
305          * table is not fully mapped at this time.
306          */
307         table_desc = &acpi_gbl_root_table_list.tables[table_index];
308 
309         table_desc->address = address;
310         table_desc->pointer = NULL;
311         table_desc->length = table->length;
312         table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
313         ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
314 

You should delete the following codes:

315         /*
316          * ACPI Table Override:
317          *
318          * Before we install the table, let the host OS override it with a new
319          * one if desired. Any table within the RSDT/XSDT can be replaced,
320          * including the DSDT which is pointed to by the FADT.
321          *
322          * NOTE: If the table is overridden, then final_table will contain a
323          * mapped pointer to the full new table. If the table is not overridden,
324          * or if there has been a physical override, then the table will be
325          * fully mapped later (in verify table). In any case, we must
326          * unmap the header that was mapped above.
327          */
328         final_table = acpi_tb_table_override(table, table_desc);
329         if (!final_table) {
330                 final_table = table;    /* There was no override */
331         }
332 

You still need to keep the following logic.

333         acpi_tb_print_table_header(table_desc->address, final_table);
334 
335         /* Set the global integer width (based upon revision of the DSDT) */
336 
337         if (table_index == ACPI_TABLE_INDEX_DSDT) {
338                 acpi_ut_set_integer_width(final_table->revision);
339         }
340 

You should delete the following codes:

341         /*
342          * If we have a physical override during this early loading of the ACPI
343          * tables, unmap the table for now. It will be mapped again later when
344          * it is actually used. This supports very early loading of ACPI tables,
345          * before virtual memory is fully initialized and running within the
346          * host OS. Note: A logical override has the ACPI_TABLE_ORIGIN_OVERRIDE
347          * flag set and will not be deleted below.
348          */
349         if (final_table != table) {
350                 acpi_tb_delete_table(table_desc);
351         }

Keep the following.

352 
353       unmap_and_exit:
354 
355         /* Always unmap the table header that we mapped above */
356 
357         acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
358 }

I'm not sure if this can make my concerns clearer for you now.

Thanks and best regards
-Lv

> 
> Would you please explain more about your comment ? I think maybe I
> missed something
> important to you guys. :)
> 
> And all the other ACPICA rules will be followed in the next version.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
  2013-08-02  7:01         ` Tang Chen
@ 2013-08-02  8:23           ` Zheng, Lv
  -1 siblings, 0 replies; 98+ messages in thread
From: Zheng, Lv @ 2013-08-02  8:23 UTC (permalink / raw)
  To: Tang Chen
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis@profitbricks.com

> From: Zheng, Lv
> Sent: Friday, August 02, 2013 4:11 PM
> 
> > From: Tang Chen [mailto:tangchen@cn.fujitsu.com]
> > Sent: Friday, August 02, 2013 3:01 PM
> >
> > On 08/02/2013 01:25 PM, Zheng, Lv wrote:
> > ......
> > >>> index ce3d5db..9d68ffc 100644
> > >>> --- a/drivers/acpi/acpica/tbutils.c
> > >>> +++ b/drivers/acpi/acpica/tbutils.c
> > >>> @@ -766,9 +766,30 @@
> > >> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> > >>>   	*/
> > >>>   	acpi_os_unmap_memory(table, length);
> > >>>
> > >>> +	return_ACPI_STATUS(AE_OK);
> > >>> +}
> > >>> +
> > >>>
> > >
> > > I don't think you can split the function here.
> > > ACPICA still need to continue to parse the table using the logic
> > implemented in the acpi_tb_install_table() and acpi_tb_parse_fadt().
> > (for example, endianess of the signature).
> > > You'd better to keep them as is and split some codes from
> > 'acpi_tb_install_table' to form another function:
> > acpi_tb_override_table().
> >
> > I'm sorry, I don't quite follow this.
> >
> > I split acpi_tb_parse_root_table(), not acpi_tb_install_table() and
> > acpi_tb_parse_fadt().
> > If ACPICA wants to use these two functions somewhere else, I think it
> is
> > OK, isn't it?
> >
> > And the reason I did this, please see below.
> >
> > ......
> > >>> + *
> > >>> + * FUNCTION:    acpi_tb_install_root_table
> > >
> > > I think this function should be acpi_tb_override_tables, and call
> > acpi_tb_override_table() inside this function for each table.
> >
> > It is not just about acpi initrd table override.
> >
> > acpi_tb_parse_root_table() was split into two steps:
> > 1. initialize acpi_gbl_root_table_list
> > 2. install tables into acpi_gbl_root_table_list
> >
> > I need step1 earlier because I want to find SRAT at early time.
> > But I don't want step2 earlier because before install the tables in
> > firmware,
> > acpi initrd table override could happen. I want only SRAT, I don't want
> to
> > touch much existing code.
> 
> According to what you've explained, what you didn’t want to be called
> earlier is exactly "acpi initrd table override", please split only this logic to
> the step 2 and leave the others remained.
> I think you should write a function named as acpi_override_tables() or
> likewise in tbxface.c to be executed as the OSPM entry of the step 2.
> Inside this function, acpi_tb_table_override() should be called.
> 
> 268 void
> 269 acpi_tb_install_table(acpi_physical_address address,
> 270                       char *signature, u32 table_index)
> 271 {
> 
> I think you still need the following codes to be called at the early stage.
> 
> 272         struct acpi_table_header *table;
> 273         struct acpi_table_header *final_table;
> 274         struct acpi_table_desc *table_desc;
> 275
> 276         if (!address) {
> 277                 ACPI_ERROR((AE_INFO,
> 278                             "Null physical address for ACPI
> table [%s]",
> 279                             signature));
> 280                 return;
> 281         }
> 282
> 283         /* Map just the table header */
> 284
> 285         table = acpi_os_map_memory(address, sizeof(struct
> acpi_table_header));
> 286         if (!table) {
> 287                 ACPI_ERROR((AE_INFO,
> 288                             "Could not map memory for
> table [%s] at %p",
> 289                             signature, ACPI_CAST_PTR(void,
> address)));
> 290                 return;
> 291         }
> 292
> 293         /* If a particular signature is expected (DSDT/FACS), it
> must match */
> 294
> 295         if (signature
> && !ACPI_COMPARE_NAME(table->signature, signature)) {
> 296                 ACPI_BIOS_ERROR((AE_INFO,
> 297                                  "Invalid signature 0x%X for
> ACPI table, expected [%s]",
> 298                                  *ACPI_CAST_PTR(u32,
> table->signature),
> 299                                  signature));
> 300                 goto unmap_and_exit;
> 301         }
> 302
> 303         /*
> 304          * Initialize the table entry. Set the pointer to NULL, since
> the
> 305          * table is not fully mapped at this time.
> 306          */
> 307         table_desc =
> &acpi_gbl_root_table_list.tables[table_index];
> 308
> 309         table_desc->address = address;
> 310         table_desc->pointer = NULL;
> 311         table_desc->length = table->length;
> 312         table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
> 313         ACPI_MOVE_32_TO_32(table_desc->signature.ascii,
> table->signature);
> 314
> 
> You should delete the following codes:
> 
> 315         /*
> 316          * ACPI Table Override:
> 317          *
> 318          * Before we install the table, let the host OS override it
> with a new
> 319          * one if desired. Any table within the RSDT/XSDT can be
> replaced,
> 320          * including the DSDT which is pointed to by the FADT.
> 321          *
> 322          * NOTE: If the table is overridden, then final_table will
> contain a
> 323          * mapped pointer to the full new table. If the table is not
> overridden,
> 324          * or if there has been a physical override, then the table
> will be
> 325          * fully mapped later (in verify table). In any case, we
> must
> 326          * unmap the header that was mapped above.
> 327          */
> 328         final_table = acpi_tb_table_override(table, table_desc);
> 329         if (!final_table) {
> 330                 final_table = table;    /* There was no
> override */
> 331         }
> 332
> 
> You still need to keep the following logic.
> 
> 333         acpi_tb_print_table_header(table_desc->address,
> final_table);
> 334
> 335         /* Set the global integer width (based upon revision of the
> DSDT) */
> 336
> 337         if (table_index == ACPI_TABLE_INDEX_DSDT) {
> 338
> acpi_ut_set_integer_width(final_table->revision);
> 339         }
> 340
> 
> You should delete the following codes:
> 
> 341         /*
> 342          * If we have a physical override during this early loading
> of the ACPI
> 343          * tables, unmap the table for now. It will be mapped
> again later when
> 344          * it is actually used. This supports very early loading of
> ACPI tables,
> 345          * before virtual memory is fully initialized and running
> within the
> 346          * host OS. Note: A logical override has the
> ACPI_TABLE_ORIGIN_OVERRIDE
> 347          * flag set and will not be deleted below.
> 348          */
> 349         if (final_table != table) {
> 350                 acpi_tb_delete_table(table_desc);
> 351         }
> 
> Keep the following.
> 
> 352
> 353       unmap_and_exit:
> 354
> 355         /* Always unmap the table header that we mapped above
> */
> 356
> 357         acpi_os_unmap_memory(table, sizeof(struct
> acpi_table_header));
> 358 }
> 
> I'm not sure if this can make my concerns clearer for you now.

You might have concerns about how to handle FADT.

In acpi_override_tables, your codes might be looking like:

584         for (i = 0; i < acpi_gbl_root_table_list.current_table_count; i++) {

Just change the I from 2 to 0.

585                 acpi_tb_table_override (...);
595         }

You don't need to call acpi_tb_parse_table, the tables pointed by the FADT are DSDT and FACS, their index is 0 and 1.

Thanks and best regards
-Lv

> 
> Thanks and best regards
> -Lv
> 
> >
> > Would you please explain more about your comment ? I think maybe I
> > missed something
> > important to you guys. :)
> >
> > And all the other ACPICA rules will be followed in the next version.
> >
> > Thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
@ 2013-08-02  8:23           ` Zheng, Lv
  0 siblings, 0 replies; 98+ messages in thread
From: Zheng, Lv @ 2013-08-02  8:23 UTC (permalink / raw)
  To: Tang Chen
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi, Moore, Robert

> From: Zheng, Lv
> Sent: Friday, August 02, 2013 4:11 PM
> 
> > From: Tang Chen [mailto:tangchen@cn.fujitsu.com]
> > Sent: Friday, August 02, 2013 3:01 PM
> >
> > On 08/02/2013 01:25 PM, Zheng, Lv wrote:
> > ......
> > >>> index ce3d5db..9d68ffc 100644
> > >>> --- a/drivers/acpi/acpica/tbutils.c
> > >>> +++ b/drivers/acpi/acpica/tbutils.c
> > >>> @@ -766,9 +766,30 @@
> > >> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> > >>>   	*/
> > >>>   	acpi_os_unmap_memory(table, length);
> > >>>
> > >>> +	return_ACPI_STATUS(AE_OK);
> > >>> +}
> > >>> +
> > >>>
> > >
> > > I don't think you can split the function here.
> > > ACPICA still need to continue to parse the table using the logic
> > implemented in the acpi_tb_install_table() and acpi_tb_parse_fadt().
> > (for example, endianess of the signature).
> > > You'd better to keep them as is and split some codes from
> > 'acpi_tb_install_table' to form another function:
> > acpi_tb_override_table().
> >
> > I'm sorry, I don't quite follow this.
> >
> > I split acpi_tb_parse_root_table(), not acpi_tb_install_table() and
> > acpi_tb_parse_fadt().
> > If ACPICA wants to use these two functions somewhere else, I think it
> is
> > OK, isn't it?
> >
> > And the reason I did this, please see below.
> >
> > ......
> > >>> + *
> > >>> + * FUNCTION:    acpi_tb_install_root_table
> > >
> > > I think this function should be acpi_tb_override_tables, and call
> > acpi_tb_override_table() inside this function for each table.
> >
> > It is not just about acpi initrd table override.
> >
> > acpi_tb_parse_root_table() was split into two steps:
> > 1. initialize acpi_gbl_root_table_list
> > 2. install tables into acpi_gbl_root_table_list
> >
> > I need step1 earlier because I want to find SRAT at early time.
> > But I don't want step2 earlier because before install the tables in
> > firmware,
> > acpi initrd table override could happen. I want only SRAT, I don't want
> to
> > touch much existing code.
> 
> According to what you've explained, what you didn’t want to be called
> earlier is exactly "acpi initrd table override", please split only this logic to
> the step 2 and leave the others remained.
> I think you should write a function named as acpi_override_tables() or
> likewise in tbxface.c to be executed as the OSPM entry of the step 2.
> Inside this function, acpi_tb_table_override() should be called.
> 
> 268 void
> 269 acpi_tb_install_table(acpi_physical_address address,
> 270                       char *signature, u32 table_index)
> 271 {
> 
> I think you still need the following codes to be called at the early stage.
> 
> 272         struct acpi_table_header *table;
> 273         struct acpi_table_header *final_table;
> 274         struct acpi_table_desc *table_desc;
> 275
> 276         if (!address) {
> 277                 ACPI_ERROR((AE_INFO,
> 278                             "Null physical address for ACPI
> table [%s]",
> 279                             signature));
> 280                 return;
> 281         }
> 282
> 283         /* Map just the table header */
> 284
> 285         table = acpi_os_map_memory(address, sizeof(struct
> acpi_table_header));
> 286         if (!table) {
> 287                 ACPI_ERROR((AE_INFO,
> 288                             "Could not map memory for
> table [%s] at %p",
> 289                             signature, ACPI_CAST_PTR(void,
> address)));
> 290                 return;
> 291         }
> 292
> 293         /* If a particular signature is expected (DSDT/FACS), it
> must match */
> 294
> 295         if (signature
> && !ACPI_COMPARE_NAME(table->signature, signature)) {
> 296                 ACPI_BIOS_ERROR((AE_INFO,
> 297                                  "Invalid signature 0x%X for
> ACPI table, expected [%s]",
> 298                                  *ACPI_CAST_PTR(u32,
> table->signature),
> 299                                  signature));
> 300                 goto unmap_and_exit;
> 301         }
> 302
> 303         /*
> 304          * Initialize the table entry. Set the pointer to NULL, since
> the
> 305          * table is not fully mapped at this time.
> 306          */
> 307         table_desc =
> &acpi_gbl_root_table_list.tables[table_index];
> 308
> 309         table_desc->address = address;
> 310         table_desc->pointer = NULL;
> 311         table_desc->length = table->length;
> 312         table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
> 313         ACPI_MOVE_32_TO_32(table_desc->signature.ascii,
> table->signature);
> 314
> 
> You should delete the following codes:
> 
> 315         /*
> 316          * ACPI Table Override:
> 317          *
> 318          * Before we install the table, let the host OS override it
> with a new
> 319          * one if desired. Any table within the RSDT/XSDT can be
> replaced,
> 320          * including the DSDT which is pointed to by the FADT.
> 321          *
> 322          * NOTE: If the table is overridden, then final_table will
> contain a
> 323          * mapped pointer to the full new table. If the table is not
> overridden,
> 324          * or if there has been a physical override, then the table
> will be
> 325          * fully mapped later (in verify table). In any case, we
> must
> 326          * unmap the header that was mapped above.
> 327          */
> 328         final_table = acpi_tb_table_override(table, table_desc);
> 329         if (!final_table) {
> 330                 final_table = table;    /* There was no
> override */
> 331         }
> 332
> 
> You still need to keep the following logic.
> 
> 333         acpi_tb_print_table_header(table_desc->address,
> final_table);
> 334
> 335         /* Set the global integer width (based upon revision of the
> DSDT) */
> 336
> 337         if (table_index == ACPI_TABLE_INDEX_DSDT) {
> 338
> acpi_ut_set_integer_width(final_table->revision);
> 339         }
> 340
> 
> You should delete the following codes:
> 
> 341         /*
> 342          * If we have a physical override during this early loading
> of the ACPI
> 343          * tables, unmap the table for now. It will be mapped
> again later when
> 344          * it is actually used. This supports very early loading of
> ACPI tables,
> 345          * before virtual memory is fully initialized and running
> within the
> 346          * host OS. Note: A logical override has the
> ACPI_TABLE_ORIGIN_OVERRIDE
> 347          * flag set and will not be deleted below.
> 348          */
> 349         if (final_table != table) {
> 350                 acpi_tb_delete_table(table_desc);
> 351         }
> 
> Keep the following.
> 
> 352
> 353       unmap_and_exit:
> 354
> 355         /* Always unmap the table header that we mapped above
> */
> 356
> 357         acpi_os_unmap_memory(table, sizeof(struct
> acpi_table_header));
> 358 }
> 
> I'm not sure if this can make my concerns clearer for you now.

You might have concerns about how to handle FADT.

In acpi_override_tables, your codes might be looking like:

584         for (i = 0; i < acpi_gbl_root_table_list.current_table_count; i++) {

Just change the I from 2 to 0.

585                 acpi_tb_table_override (...);
595         }

You don't need to call acpi_tb_parse_table, the tables pointed by the FADT are DSDT and FACS, their index is 0 and 1.

Thanks and best regards
-Lv

> 
> Thanks and best regards
> -Lv
> 
> >
> > Would you please explain more about your comment ? I think maybe I
> > missed something
> > important to you guys. :)
> >
> > And all the other ACPICA rules will be followed in the next version.
> >
> > Thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
  2013-08-02  8:23           ` Zheng, Lv
@ 2013-08-02  8:29             ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  8:29 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen

On 08/02/2013 04:23 PM, Zheng, Lv wrote:
......
>> According to what you've explained, what you didn’t want to be called
>> earlier is exactly "acpi initrd table override", please split only this logic to
>> the step 2 and leave the others remained.
>> I think you should write a function named as acpi_override_tables() or
>> likewise in tbxface.c to be executed as the OSPM entry of the step 2.
>> Inside this function, acpi_tb_table_override() should be called.
......

OK, I understand what you are suggesting now. It is reasonable.
I'll update the patch-set in the next version.

But today, I just rebased it to the latest kernel. I'll resend this
rebased v2 patch-set so that Tj and other guys can review it.

I'll include all of your comments in the v3 patch-set. Thank you very 
much. :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
@ 2013-08-02  8:29             ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  8:29 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi, Moore, Robert

On 08/02/2013 04:23 PM, Zheng, Lv wrote:
......
>> According to what you've explained, what you didn’t want to be called
>> earlier is exactly "acpi initrd table override", please split only this logic to
>> the step 2 and leave the others remained.
>> I think you should write a function named as acpi_override_tables() or
>> likewise in tbxface.c to be executed as the OSPM entry of the step 2.
>> Inside this function, acpi_tb_table_override() should be called.
......

OK, I understand what you are suggesting now. It is reasonable.
I'll update the patch-set in the next version.

But today, I just rebased it to the latest kernel. I'll resend this
rebased v2 patch-set so that Tj and other guys can review it.

I'll include all of your comments in the v3 patch-set. Thank you very 
much. :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
  2013-08-02  8:29             ` Tang Chen
@ 2013-08-02  8:54               ` Zheng, Lv
  -1 siblings, 0 replies; 98+ messages in thread
From: Zheng, Lv @ 2013-08-02  8:54 UTC (permalink / raw)
  To: Tang Chen
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis@profitbricks.com

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1320 bytes --]

> From: Tang Chen [mailto:tangchen@cn.fujitsu.com]
> Sent: Friday, August 02, 2013 4:29 PM
> 
> On 08/02/2013 04:23 PM, Zheng, Lv wrote:
> ......
> >> According to what you've explained, what you didn’t want to be
> called
> >> earlier is exactly "acpi initrd table override", please split only this logic
> to
> >> the step 2 and leave the others remained.
> >> I think you should write a function named as acpi_override_tables() or
> >> likewise in tbxface.c to be executed as the OSPM entry of the step 2.
> >> Inside this function, acpi_tb_table_override() should be called.
> ......
> 
> OK, I understand what you are suggesting now. It is reasonable.
> I'll update the patch-set in the next version.
> 
> But today, I just rebased it to the latest kernel. I'll resend this
> rebased v2 patch-set so that Tj and other guys can review it.
> 
> I'll include all of your comments in the v3 patch-set. Thank you very
> much. :)

If the review process takes longer time, you could also let ACPICA folks to do this first in ACPICA, you'll find the commit in the next release cycle.
In this way, there won't be source code divergences between Linux and ACPICA.

Thanks
-Lv

> 
> Thanks.

N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
@ 2013-08-02  8:54               ` Zheng, Lv
  0 siblings, 0 replies; 98+ messages in thread
From: Zheng, Lv @ 2013-08-02  8:54 UTC (permalink / raw)
  To: Tang Chen
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi, Moore, Robert

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1320 bytes --]

> From: Tang Chen [mailto:tangchen@cn.fujitsu.com]
> Sent: Friday, August 02, 2013 4:29 PM
> 
> On 08/02/2013 04:23 PM, Zheng, Lv wrote:
> ......
> >> According to what you've explained, what you didn’t want to be
> called
> >> earlier is exactly "acpi initrd table override", please split only this logic
> to
> >> the step 2 and leave the others remained.
> >> I think you should write a function named as acpi_override_tables() or
> >> likewise in tbxface.c to be executed as the OSPM entry of the step 2.
> >> Inside this function, acpi_tb_table_override() should be called.
> ......
> 
> OK, I understand what you are suggesting now. It is reasonable.
> I'll update the patch-set in the next version.
> 
> But today, I just rebased it to the latest kernel. I'll resend this
> rebased v2 patch-set so that Tj and other guys can review it.
> 
> I'll include all of your comments in the v3 patch-set. Thank you very
> much. :)

If the review process takes longer time, you could also let ACPICA folks to do this first in ACPICA, you'll find the commit in the next release cycle.
In this way, there won't be source code divergences between Linux and ACPICA.

Thanks
-Lv

> 
> Thanks.

N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
  2013-08-02  8:54               ` Zheng, Lv
@ 2013-08-02  9:13                 ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  9:13 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen

On 08/02/2013 04:54 PM, Zheng, Lv wrote:
>> From: Tang Chen [mailto:tangchen@cn.fujitsu.com]
>> Sent: Friday, August 02, 2013 4:29 PM
>>
>> On 08/02/2013 04:23 PM, Zheng, Lv wrote:
>> ......
>>>> According to what you've explained, what you didn’t want to be
>> called
>>>> earlier is exactly "acpi initrd table override", please split only this logic
>> to
>>>> the step 2 and leave the others remained.
>>>> I think you should write a function named as acpi_override_tables() or
>>>> likewise in tbxface.c to be executed as the OSPM entry of the step 2.
>>>> Inside this function, acpi_tb_table_override() should be called.
>> ......
>>
>> OK, I understand what you are suggesting now. It is reasonable.
>> I'll update the patch-set in the next version.
>>
>> But today, I just rebased it to the latest kernel. I'll resend this
>> rebased v2 patch-set so that Tj and other guys can review it.
>>
>> I'll include all of your comments in the v3 patch-set. Thank you very
>> much. :)
>
> If the review process takes longer time, you could also let ACPICA folks to do this first in ACPICA, you'll find the commit in the next release cycle.
> In this way, there won't be source code divergences between Linux and ACPICA.

Thank you very much. Will add you and Bob to the cc list.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
@ 2013-08-02  9:13                 ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-02  9:13 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Toshi Kani, rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi, Moore, Robert

On 08/02/2013 04:54 PM, Zheng, Lv wrote:
>> From: Tang Chen [mailto:tangchen@cn.fujitsu.com]
>> Sent: Friday, August 02, 2013 4:29 PM
>>
>> On 08/02/2013 04:23 PM, Zheng, Lv wrote:
>> ......
>>>> According to what you've explained, what you didn’t want to be
>> called
>>>> earlier is exactly "acpi initrd table override", please split only this logic
>> to
>>>> the step 2 and leave the others remained.
>>>> I think you should write a function named as acpi_override_tables() or
>>>> likewise in tbxface.c to be executed as the OSPM entry of the step 2.
>>>> Inside this function, acpi_tb_table_override() should be called.
>> ......
>>
>> OK, I understand what you are suggesting now. It is reasonable.
>> I'll update the patch-set in the next version.
>>
>> But today, I just rebased it to the latest kernel. I'll resend this
>> rebased v2 patch-set so that Tj and other guys can review it.
>>
>> I'll include all of your comments in the v3 patch-set. Thank you very
>> much. :)
>
> If the review process takes longer time, you could also let ACPICA folks to do this first in ACPICA, you'll find the commit in the next release cycle.
> In this way, there won't be source code divergences between Linux and ACPICA.

Thank you very much. Will add you and Bob to the cc list.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 10/18] x86, acpi: Try to find if SRAT is overrided earlier.
  2013-08-02  5:49       ` Tang Chen
@ 2013-08-02 16:05         ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-02 16:05 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Fri, 2013-08-02 at 13:49 +0800, Tang Chen wrote:
> On 08/02/2013 09:19 AM, Toshi Kani wrote:
> ......
> >> +phys_addr_t __init early_acpi_override_srat(void)
> >> +{
> >> +	int i;
> >> +	u32 length;
> >> +	long offset;
> >> +	void *ramdisk_vaddr;
> >> +	struct acpi_table_header *table;
> >> +	struct cpio_data file;
> >> +	unsigned long map_step = NR_FIX_BTMAPS<<  PAGE_SHIFT;
> >> +	phys_addr_t ramdisk_image = get_ramdisk_image();
> >> +	char cpio_path[32] = "kernel/firmware/acpi/";
> >
> > Don't you need to check if ramdisk is present before parsing the table?
> > You may need something like:
> >
> >    if (!ramdisk_image || !get_ramdisk_size())
> >          return 0;
> 
> Yes, it is better to do such a check here. But is there a possibility that
> no ramdisk is present and we come to setup_arch() ?

Without a ramdisk, the boot procedure will likely fail in mounting the
root disk due to missing drivers.  But it should come to setup_arch()
without it.

Thanks,
-Toshi


> 
> ......
> >> +
> >> +	return ramdisk_image;
> >
> > Doesn't this function return a physical address regardless of SRAT if a
> > ramdisk is present?
> 
> Yes, and it is not good. I'll add the check above so that this won't happen.
> 
> Thanks.
> 



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 10/18] x86, acpi: Try to find if SRAT is overrided earlier.
@ 2013-08-02 16:05         ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-02 16:05 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Fri, 2013-08-02 at 13:49 +0800, Tang Chen wrote:
> On 08/02/2013 09:19 AM, Toshi Kani wrote:
> ......
> >> +phys_addr_t __init early_acpi_override_srat(void)
> >> +{
> >> +	int i;
> >> +	u32 length;
> >> +	long offset;
> >> +	void *ramdisk_vaddr;
> >> +	struct acpi_table_header *table;
> >> +	struct cpio_data file;
> >> +	unsigned long map_step = NR_FIX_BTMAPS<<  PAGE_SHIFT;
> >> +	phys_addr_t ramdisk_image = get_ramdisk_image();
> >> +	char cpio_path[32] = "kernel/firmware/acpi/";
> >
> > Don't you need to check if ramdisk is present before parsing the table?
> > You may need something like:
> >
> >    if (!ramdisk_image || !get_ramdisk_size())
> >          return 0;
> 
> Yes, it is better to do such a check here. But is there a possibility that
> no ramdisk is present and we come to setup_arch() ?

Without a ramdisk, the boot procedure will likely fail in mounting the
root disk due to missing drivers.  But it should come to setup_arch()
without it.

Thanks,
-Toshi


> 
> ......
> >> +
> >> +	return ramdisk_image;
> >
> > Doesn't this function return a physical address regardless of SRAT if a
> > ramdisk is present?
> 
> Yes, and it is not good. I'll add the check above so that this won't happen.
> 
> Thanks.
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 06/18] x86, acpi: Initialize ACPI root table list earlier.
  2013-08-02  7:49       ` Tang Chen
@ 2013-08-02 16:57         ` Toshi Kani
  -1 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-02 16:57 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Fri, 2013-08-02 at 15:49 +0800, Tang Chen wrote:
> On 08/02/2013 07:54 AM, Toshi Kani wrote:
> ......
> >>   /*
> >> + * acpi_root_table_init - Initialize acpi_gbl_root_table_list.
> >> + *
> >> + * This function will parse RSDT or XSDT, find all tables' phys addr,
> >> + * initialize acpi_gbl_root_table_list, and record all tables' phys addr
> >> + * in acpi_gbl_root_table_list.
> >> + */
> >> +void __init acpi_root_table_init(void)
> >
> > I think acpi_root_table_init() is a bit confusing with
> > acpi_boot_table_init().  Perhaps, something like
> > acpi_boot_table_pre_init() or early_acpi_boot_table_init() is better to
> > indicate that this new function is called before acpi_boot_table_init().
> >
> 
> OK, will change it to early_acpi_boot_table_init().
> 
> >> +{
> >> +	dmi_check_system(acpi_dmi_table);
> >> +
> >> +	/* If acpi_disabled, bail out */
> >> +	if (acpi_disabled)
> >> +		return;
> >> +
> >> +	/* Initialize the ACPI boot-time table parser */
> >> +	if (acpi_table_init()) {
> >> +		disable_acpi();
> >> +		return;
> >> +	}
> >> +}
> >> +
> >> +/*
> >>    * acpi_boot_table_init() and acpi_boot_init()
> >>    *  called from setup_arch(), always.
> >>    *	1. checksums all tables
> >> @@ -1511,21 +1533,7 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
> >>
> >>   void __init acpi_boot_table_init(void)
> >
> > The comment of this function needs to be updated.  For instance, it
> > describes acpi_table_init(), which you just relocated.
> >
> >   * acpi_table_init() is separate to allow reading SRAT without
> >   * other side effects.
> >   *
> 
> Sure. But I don't quite understand this comment. It seems that
> acpi_table_init() has nothing to do with SRAT.
> 
> Do you know anything about this ?

Well, I do not know, either.  But if I have to guess, it might mean that
"acpi_table_init() is separated from acpi_boot_init() to allow reading
SRAT without the conditional flags, ex. acpi_lapic and acpi_ioapic, in
acpi_boot_init()."

I'd suggest you simply rephrase it to match with your change, instead of
trying to keep such old history.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 06/18] x86, acpi: Initialize ACPI root table list earlier.
@ 2013-08-02 16:57         ` Toshi Kani
  0 siblings, 0 replies; 98+ messages in thread
From: Toshi Kani @ 2013-08-02 16:57 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, hpa, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Fri, 2013-08-02 at 15:49 +0800, Tang Chen wrote:
> On 08/02/2013 07:54 AM, Toshi Kani wrote:
> ......
> >>   /*
> >> + * acpi_root_table_init - Initialize acpi_gbl_root_table_list.
> >> + *
> >> + * This function will parse RSDT or XSDT, find all tables' phys addr,
> >> + * initialize acpi_gbl_root_table_list, and record all tables' phys addr
> >> + * in acpi_gbl_root_table_list.
> >> + */
> >> +void __init acpi_root_table_init(void)
> >
> > I think acpi_root_table_init() is a bit confusing with
> > acpi_boot_table_init().  Perhaps, something like
> > acpi_boot_table_pre_init() or early_acpi_boot_table_init() is better to
> > indicate that this new function is called before acpi_boot_table_init().
> >
> 
> OK, will change it to early_acpi_boot_table_init().
> 
> >> +{
> >> +	dmi_check_system(acpi_dmi_table);
> >> +
> >> +	/* If acpi_disabled, bail out */
> >> +	if (acpi_disabled)
> >> +		return;
> >> +
> >> +	/* Initialize the ACPI boot-time table parser */
> >> +	if (acpi_table_init()) {
> >> +		disable_acpi();
> >> +		return;
> >> +	}
> >> +}
> >> +
> >> +/*
> >>    * acpi_boot_table_init() and acpi_boot_init()
> >>    *  called from setup_arch(), always.
> >>    *	1. checksums all tables
> >> @@ -1511,21 +1533,7 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
> >>
> >>   void __init acpi_boot_table_init(void)
> >
> > The comment of this function needs to be updated.  For instance, it
> > describes acpi_table_init(), which you just relocated.
> >
> >   * acpi_table_init() is separate to allow reading SRAT without
> >   * other side effects.
> >   *
> 
> Sure. But I don't quite understand this comment. It seems that
> acpi_table_init() has nothing to do with SRAT.
> 
> Do you know anything about this ?

Well, I do not know, either.  But if I have to guess, it might mean that
"acpi_table_init() is separated from acpi_boot_init() to allow reading
SRAT without the conditional flags, ex. acpi_lapic and acpi_ioapic, in
acpi_boot_init()."

I'd suggest you simply rephrase it to match with your change, instead of
trying to keep such old history.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 00/18] Arrange hotpluggable memory as ZONE_MOVABLE.
  2013-08-01  7:06 ` Tang Chen
@ 2013-08-05 13:07   ` H. Peter Anvin
  -1 siblings, 0 replies; 98+ messages in thread
From: H. Peter Anvin @ 2013-08-05 13:07 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/01/2013 12:06 AM, Tang Chen wrote:
> This patch-set aims to solve some problems at system boot time
> to enhance memory hotplug functionality.
> 
> [Background]
> 
> The Linux kernel cannot migrate pages used by the kernel because
> of the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
> physical address is changed, we cannot simply update the kernel
> pagetable. On the contrary, we have to update all the pointers
> pointing to the virtual address, which is very difficult to do.
> 

It does beg the question if that "since" statement should be changed ...
we already have it handled differently on Xen PV, but that is kind of
"special".  There are a whole bunch of other issues with moving kernel
memory around: you have to worry what might have a physical address
cached somewhere and what might be in active use and so on... I am not
really suggesting it as anything but food for thought at this time.

	-hpa


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 00/18] Arrange hotpluggable memory as ZONE_MOVABLE.
@ 2013-08-05 13:07   ` H. Peter Anvin
  0 siblings, 0 replies; 98+ messages in thread
From: H. Peter Anvin @ 2013-08-05 13:07 UTC (permalink / raw)
  To: Tang Chen
  Cc: rjw, lenb, tglx, mingo, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On 08/01/2013 12:06 AM, Tang Chen wrote:
> This patch-set aims to solve some problems at system boot time
> to enhance memory hotplug functionality.
> 
> [Background]
> 
> The Linux kernel cannot migrate pages used by the kernel because
> of the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
> physical address is changed, we cannot simply update the kernel
> pagetable. On the contrary, we have to update all the pointers
> pointing to the virtual address, which is very difficult to do.
> 

It does beg the question if that "since" statement should be changed ...
we already have it handled differently on Xen PV, but that is kind of
"special".  There are a whole bunch of other issues with moving kernel
memory around: you have to worry what might have a physical address
cached somewhere and what might be in active use and so on... I am not
really suggesting it as anything but food for thought at this time.

	-hpa



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 00/18] Arrange hotpluggable memory as ZONE_MOVABLE.
  2013-08-05 13:07   ` H. Peter Anvin
@ 2013-08-05 13:38     ` Tang Chen
  -1 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-05 13:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: rjw, lenb, tglx, mingo, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi


Hi hpa,

I'm sorry but I don't quite following it.

On 08/05/2013 09:07 PM, H. Peter Anvin wrote:
> On 08/01/2013 12:06 AM, Tang Chen wrote:
>> This patch-set aims to solve some problems at system boot time
>> to enhance memory hotplug functionality.
>>
>> [Background]
>>
>> The Linux kernel cannot migrate pages used by the kernel because
>> of the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
>> physical address is changed, we cannot simply update the kernel
>> pagetable. On the contrary, we have to update all the pointers
>> pointing to the virtual address, which is very difficult to do.
>>
>
> It does beg the question if that "since" statement should be changed ...

Do you mean we are going to do kernel page migration in the future ?

> we already have it handled differently on Xen PV, but that is kind of
> "special".  There are a whole bunch of other issues with moving kernel
> memory around: you have to worry what might have a physical address
> cached somewhere and what might be in active use and so on...

The current solution is to hotplug ZONE_MOVABLE, which the kernel won't
use. So do you mean if I want to do kernel page migration (which I'm not
doing), I need to worry about what you said above ?

>I am not
> really suggesting it as anything but food for thought at this time.

Sorry for my poor English, and I really cannot understand this one.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 00/18] Arrange hotpluggable memory as ZONE_MOVABLE.
@ 2013-08-05 13:38     ` Tang Chen
  0 siblings, 0 replies; 98+ messages in thread
From: Tang Chen @ 2013-08-05 13:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: rjw, lenb, tglx, mingo, akpm, tj, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi


Hi hpa,

I'm sorry but I don't quite following it.

On 08/05/2013 09:07 PM, H. Peter Anvin wrote:
> On 08/01/2013 12:06 AM, Tang Chen wrote:
>> This patch-set aims to solve some problems at system boot time
>> to enhance memory hotplug functionality.
>>
>> [Background]
>>
>> The Linux kernel cannot migrate pages used by the kernel because
>> of the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
>> physical address is changed, we cannot simply update the kernel
>> pagetable. On the contrary, we have to update all the pointers
>> pointing to the virtual address, which is very difficult to do.
>>
>
> It does beg the question if that "since" statement should be changed ...

Do you mean we are going to do kernel page migration in the future ?

> we already have it handled differently on Xen PV, but that is kind of
> "special".  There are a whole bunch of other issues with moving kernel
> memory around: you have to worry what might have a physical address
> cached somewhere and what might be in active use and so on...

The current solution is to hotplug ZONE_MOVABLE, which the kernel won't
use. So do you mean if I want to do kernel page migration (which I'm not
doing), I need to worry about what you said above ?

>I am not
> really suggesting it as anything but food for thought at this time.

Sorry for my poor English, and I really cannot understand this one.

Thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2013-08-05 13:39 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-01  7:06 [PATCH v2 00/18] Arrange hotpluggable memory as ZONE_MOVABLE Tang Chen
2013-08-01  7:06 ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 01/18] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01 21:55   ` Toshi Kani
2013-08-01 21:55     ` Toshi Kani
2013-08-01  7:06 ` [PATCH v2 02/18] earlycpio.c: Fix the confusing comment of find_cpio_data() Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01 21:57   ` Toshi Kani
2013-08-01 21:57     ` Toshi Kani
2013-08-02  4:48     ` Tang Chen
2013-08-02  4:48       ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 03/18] acpi: Remove "continue" in macro INVALID_TABLE() Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01 20:26   ` Rafael J. Wysocki
2013-08-01 20:26     ` Rafael J. Wysocki
2013-08-01 22:06   ` Toshi Kani
2013-08-01 22:06     ` Toshi Kani
2013-08-02  1:32     ` Tang Chen
2013-08-02  1:32       ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 04/18] acpi: Introduce acpi_invalid_table() to check if a table is invalid Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01 20:27   ` Rafael J. Wysocki
2013-08-01 20:27     ` Rafael J. Wysocki
2013-08-01 22:26   ` Toshi Kani
2013-08-01 22:26     ` Toshi Kani
2013-08-02  1:34     ` Tang Chen
2013-08-02  1:34       ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01 23:32   ` Toshi Kani
2013-08-01 23:32     ` Toshi Kani
2013-08-02  5:25     ` Zheng, Lv
2013-08-02  5:25       ` Zheng, Lv
2013-08-02  7:01       ` Tang Chen
2013-08-02  7:01         ` Tang Chen
2013-08-02  8:11         ` Zheng, Lv
2013-08-02  8:11           ` Zheng, Lv
2013-08-02  8:23         ` Zheng, Lv
2013-08-02  8:23           ` Zheng, Lv
2013-08-02  8:29           ` Tang Chen
2013-08-02  8:29             ` Tang Chen
2013-08-02  8:54             ` Zheng, Lv
2013-08-02  8:54               ` Zheng, Lv
2013-08-02  9:13               ` Tang Chen
2013-08-02  9:13                 ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 06/18] x86, acpi: Initialize ACPI root table list earlier Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01 23:54   ` Toshi Kani
2013-08-01 23:54     ` Toshi Kani
2013-08-02  7:49     ` Tang Chen
2013-08-02  7:49       ` Tang Chen
2013-08-02 16:57       ` Toshi Kani
2013-08-02 16:57         ` Toshi Kani
2013-08-01  7:06 ` [PATCH v2 07/18] x86, acpi: Also initialize signature and length when parsing root table Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-02  0:10   ` Toshi Kani
2013-08-02  0:10     ` Toshi Kani
2013-08-02  5:28     ` Zheng, Lv
2013-08-02  5:28       ` Zheng, Lv
2013-08-01  7:06 ` [PATCH v2 08/18] x86: get pg_data_t's memory from other node Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-02  0:23   ` Toshi Kani
2013-08-02  0:23     ` Toshi Kani
2013-08-01  7:06 ` [PATCH v2 09/18] x86: Make get_ramdisk_{image|size}() global Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 10/18] x86, acpi: Try to find if SRAT is overrided earlier Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-02  1:19   ` Toshi Kani
2013-08-02  1:19     ` Toshi Kani
2013-08-02  5:49     ` Tang Chen
2013-08-02  5:49       ` Tang Chen
2013-08-02 16:05       ` Toshi Kani
2013-08-02 16:05         ` Toshi Kani
2013-08-01  7:06 ` [PATCH v2 11/18] x86, acpi: Try to find SRAT in firmware earlier Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 12/18] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01 13:42   ` Tejun Heo
2013-08-01 13:42     ` Tejun Heo
2013-08-02  5:51     ` Tang Chen
2013-08-02  5:51       ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 14/18] memblock, numa: Introduce flag into memblock Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 15/18] memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 16/18] memblock, mem_hotplug: Make memblock skip hotpluggable regions by default Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 17/18] mem-hotplug: Introduce movablenode boot option to {en|dis}able using SRAT Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-01  7:06 ` [PATCH v2 18/18] x86, numa, acpi, memory-hotplug: Make movablenode have higher priority Tang Chen
2013-08-01  7:06   ` Tang Chen
2013-08-05 13:07 ` [PATCH v2 00/18] Arrange hotpluggable memory as ZONE_MOVABLE H. Peter Anvin
2013-08-05 13:07   ` H. Peter Anvin
2013-08-05 13:38   ` Tang Chen
2013-08-05 13:38     ` Tang Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.