All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 RESEND 00/18] Arrange hotpluggable memory as ZONE_MOVABLE.
@ 2013-08-02  9:14 ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Rebased to Linux 3.11-rc3, and followed some advice from Toshi Kani.
Please refer to the change log in the end of the comment.

This patch-set aims to solve some problems at system boot time
to enhance memory hotplug functionality.

[Background]

The Linux kernel cannot migrate pages used by the kernel because
of the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
physical address is changed, we cannot simply update the kernel
pagetable. On the contrary, we have to update all the pointers
pointing to the virtual address, which is very difficult to do.

In order to do memory hotplug, we should prevent the kernel to use
hotpluggable memory.

In ACPI, there is a table named SRAT(System Resource Affinity Table).
It contains system NUMA info (CPUs, memory ranges, PXM), and also a
flag field indicating which memory ranges are hotpluggable.


[Problem to be solved]

At the very early time when the system is booting, we use a bootmem
allocator, named memblock, to allocate memory for the kernel.
memblock will start to work before the kernel parse SRAT, which
means memblock won't know which memory is hotpluggable before SRAT
is parsed.

So at this time, memblock could allocate hotpluggable memory for
the kernel to use permanently. For example, the kernel may allocate
pagetables in hotpluggable memory, which cannot be freed when the
system is up.

So we have to prevent memblock allocating hotpluggable memory for
the kernel at the early boot time.


[Earlier solutions]

We have tried to parse SRAT earlier, before memblock is ready. To
do this, we also have to do ACPI_INITRD_TABLE_OVERRIDE earlier.
Otherwise the override tables won't be able to effect.

This is not that easy to do because memblock is ready before direct
mapping is setup. So Yinghai split the ACPI_INITRD_TABLE_OVERRIDE
procedure into two steps: find and copy. Please refer to the
following patch-set:
        https://lkml.org/lkml/2013/6/13/587

To this solution, tj gave a lot of comments and the following
suggestions.


[Suggestion from tj]

tj mainly gave the following suggestions:

1. Necessary reordering is OK, but we should not rely on
   reordering to achieve the goal because it makes the kernel
   too fragile.

2. Memory allocated to kernel for temporary usage is OK because
   it will be freed when the system is up. Doing relocation
   for permanent allocated hotpluggable memory will make the
   the kernel more robust.

3. Need to enhance memblock to discover and complain if any
   hotpluggable memory is allocated to kernel.

After a long thinking, we choose not to do the relocation for
the following reasons:

1. It's easy to find out the allocated hotpluggable memory. But
   memblock will merge the adjoined ranges owned by different users
   and used for different purposes. It's hard to find the owners.

2. Different memory has different way to be relocated. I think one
   function for each kind of memory will make the code too messy.

3. Pagetable could be in hotpluggable memory. Relocating pagetable
   is too difficult and risky. We have to update all PUD, PMD pages.
   And also, ACPI_INITRD_TABLE_OVERRIDE and parsing SRAT procedures
   are not long after pagetable is initialized. If we relocate the
   pagetable not long after it was initialized, the code will be
   very ugly.


[Solution in this patch-set]

In this patch-set, we still do the reordering, but in a new way.

1. Improve memblock with flags, so that it is able to differentiate
   memory regions for different usage. And also a MEMBLOCK_HOTPLUG
   flag to mark hotpluggable memory.

2. When memblock is ready (memblock_x86_fill() is called), initialize
   acpi_gbl_root_table_list, fulfill all the ACPI tables' phys addrs.
   Now, we have all the ACPI tables' phys addrs provided by firmware.

3. Check if there is a SRAT in initrd file used to override the one
   provided by firmware. If so, get its phys addr.

4. If no override SRAT in initrd, get the phys addr of the SRAT
   provided by firmware.

   Now, we have the phys addr of the to be used SRAT, the one in
   initrd or the one in firmware.

5. Parse only the memory affinities in SRAT, find out all the
   hotpluggable memory regions and mark them in memblock.memory with
   MEMBLOCK_HOTPLUG flag.

6. The kernel goes through the current path. Any other related parts,
   such as ACPI_INITRD_TABLE_OVERRIDE path, the current parsing ACPI
   tables pathes, global variable numa_meminfo, and so on, are not
   modified. They work as before.

7. Make memblock default allocator skip hotpluggable memory.

8. Introduce movablenode boot option to allow users to enable
   and disable this functionality.


In summary, in order to get hotpluggable memory info as early as possible,
this patch-set only parse memory affinities in SRAT one more time right
after memblock is ready, and leave all the other pathes untouched. With
the hotpluggable memory info, we can arrange hotpluggable memory in
ZONE_MOVABLE to prevent the kernel to use it.

change log v2 -> v2 RESEND:
According to Toshi's advice:
1. Rename acpi_invalid_table() to acpi_verify_table().
2. Rename acpi_root_table_init() to early_acpi_boot_table_init().
3. Rename INVALID_TABLE() to ACPI_INVALID_TABLE().
4. Check if ramdisk is present in early_acpi_override_srat().
5. Check if ACPI is disabled in acpi_boot_table_init().
6. Rebased to Linux 3.11-rc3.

change log v1 -> v2:
1. According to Tejun's advice, make ACPI side report which memory regions
   are hotpluggable, and memblock side handle the memory allocation.
2. Change "movablecore=acpi" boot option to "movablenode" boot option.

Thanks. 


Tang Chen (17):
  acpi: Print Hot-Pluggable Field in SRAT.
  earlycpio.c: Fix the confusing comment of find_cpio_data().
  acpi: Remove "continue" in macro INVALID_TABLE().
  acpi: Introduce acpi_verify_initrd() to check if a table is invalid.
  x86, ACPICA: Split acpi_boot_table_init() into two parts.
  x86, acpi, ACPICA: Initialize ACPI root table list earlier.
  x86, ACPI: Also initialize signature and length when parsing root
    table.
  x86: Make get_ramdisk_{image|size}() global.
  x86, acpi: Try to find if SRAT is overrided earlier.
  x86, acpi: Try to find SRAT in firmware earlier.
  x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory
    affinities.
  x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  memblock, numa: Introduce flag into memblock.
  memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark
    hotpluggable regions.
  memblock, mem_hotplug: Make memblock skip hotpluggable regions by
    default.
  mem-hotplug: Introduce movablenode boot option to {en|dis}able using
    SRAT.
  x86, numa, acpi, memory-hotplug: Make movablenode have higher
    priority.

Yasuaki Ishimatsu (1):
  x86: get pg_data_t's memory from other node

 Documentation/kernel-parameters.txt |   15 ++
 arch/x86/include/asm/setup.h        |   21 +++
 arch/x86/kernel/acpi/boot.c         |   45 ++++---
 arch/x86/kernel/setup.c             |   37 +++---
 arch/x86/mm/numa.c                  |    5 +-
 arch/x86/mm/srat.c                  |   11 +-
 drivers/acpi/acpica/tbutils.c       |   47 ++++++-
 drivers/acpi/acpica/tbxface.c       |   32 +++++
 drivers/acpi/osl.c                  |  252 ++++++++++++++++++++++++++++++++---
 drivers/acpi/tables.c               |    7 +-
 include/acpi/acpixf.h               |    6 +
 include/linux/acpi.h                |   22 +++-
 include/linux/memblock.h            |   13 ++
 include/linux/memory_hotplug.h      |    5 +
 lib/earlycpio.c                     |   27 ++--
 mm/memblock.c                       |   92 +++++++++++--
 mm/memory_hotplug.c                 |  104 ++++++++++++++-
 mm/page_alloc.c                     |   31 ++++-
 18 files changed, 673 insertions(+), 99 deletions(-)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 00/18] Arrange hotpluggable memory as ZONE_MOVABLE.
@ 2013-08-02  9:14 ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Rebased to Linux 3.11-rc3, and followed some advice from Toshi Kani.
Please refer to the change log in the end of the comment.

This patch-set aims to solve some problems at system boot time
to enhance memory hotplug functionality.

[Background]

The Linux kernel cannot migrate pages used by the kernel because
of the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
physical address is changed, we cannot simply update the kernel
pagetable. On the contrary, we have to update all the pointers
pointing to the virtual address, which is very difficult to do.

In order to do memory hotplug, we should prevent the kernel to use
hotpluggable memory.

In ACPI, there is a table named SRAT(System Resource Affinity Table).
It contains system NUMA info (CPUs, memory ranges, PXM), and also a
flag field indicating which memory ranges are hotpluggable.


[Problem to be solved]

At the very early time when the system is booting, we use a bootmem
allocator, named memblock, to allocate memory for the kernel.
memblock will start to work before the kernel parse SRAT, which
means memblock won't know which memory is hotpluggable before SRAT
is parsed.

So at this time, memblock could allocate hotpluggable memory for
the kernel to use permanently. For example, the kernel may allocate
pagetables in hotpluggable memory, which cannot be freed when the
system is up.

So we have to prevent memblock allocating hotpluggable memory for
the kernel at the early boot time.


[Earlier solutions]

We have tried to parse SRAT earlier, before memblock is ready. To
do this, we also have to do ACPI_INITRD_TABLE_OVERRIDE earlier.
Otherwise the override tables won't be able to effect.

This is not that easy to do because memblock is ready before direct
mapping is setup. So Yinghai split the ACPI_INITRD_TABLE_OVERRIDE
procedure into two steps: find and copy. Please refer to the
following patch-set:
        https://lkml.org/lkml/2013/6/13/587

To this solution, tj gave a lot of comments and the following
suggestions.


[Suggestion from tj]

tj mainly gave the following suggestions:

1. Necessary reordering is OK, but we should not rely on
   reordering to achieve the goal because it makes the kernel
   too fragile.

2. Memory allocated to kernel for temporary usage is OK because
   it will be freed when the system is up. Doing relocation
   for permanent allocated hotpluggable memory will make the
   the kernel more robust.

3. Need to enhance memblock to discover and complain if any
   hotpluggable memory is allocated to kernel.

After a long thinking, we choose not to do the relocation for
the following reasons:

1. It's easy to find out the allocated hotpluggable memory. But
   memblock will merge the adjoined ranges owned by different users
   and used for different purposes. It's hard to find the owners.

2. Different memory has different way to be relocated. I think one
   function for each kind of memory will make the code too messy.

3. Pagetable could be in hotpluggable memory. Relocating pagetable
   is too difficult and risky. We have to update all PUD, PMD pages.
   And also, ACPI_INITRD_TABLE_OVERRIDE and parsing SRAT procedures
   are not long after pagetable is initialized. If we relocate the
   pagetable not long after it was initialized, the code will be
   very ugly.


[Solution in this patch-set]

In this patch-set, we still do the reordering, but in a new way.

1. Improve memblock with flags, so that it is able to differentiate
   memory regions for different usage. And also a MEMBLOCK_HOTPLUG
   flag to mark hotpluggable memory.

2. When memblock is ready (memblock_x86_fill() is called), initialize
   acpi_gbl_root_table_list, fulfill all the ACPI tables' phys addrs.
   Now, we have all the ACPI tables' phys addrs provided by firmware.

3. Check if there is a SRAT in initrd file used to override the one
   provided by firmware. If so, get its phys addr.

4. If no override SRAT in initrd, get the phys addr of the SRAT
   provided by firmware.

   Now, we have the phys addr of the to be used SRAT, the one in
   initrd or the one in firmware.

5. Parse only the memory affinities in SRAT, find out all the
   hotpluggable memory regions and mark them in memblock.memory with
   MEMBLOCK_HOTPLUG flag.

6. The kernel goes through the current path. Any other related parts,
   such as ACPI_INITRD_TABLE_OVERRIDE path, the current parsing ACPI
   tables pathes, global variable numa_meminfo, and so on, are not
   modified. They work as before.

7. Make memblock default allocator skip hotpluggable memory.

8. Introduce movablenode boot option to allow users to enable
   and disable this functionality.


In summary, in order to get hotpluggable memory info as early as possible,
this patch-set only parse memory affinities in SRAT one more time right
after memblock is ready, and leave all the other pathes untouched. With
the hotpluggable memory info, we can arrange hotpluggable memory in
ZONE_MOVABLE to prevent the kernel to use it.

change log v2 -> v2 RESEND:
According to Toshi's advice:
1. Rename acpi_invalid_table() to acpi_verify_table().
2. Rename acpi_root_table_init() to early_acpi_boot_table_init().
3. Rename INVALID_TABLE() to ACPI_INVALID_TABLE().
4. Check if ramdisk is present in early_acpi_override_srat().
5. Check if ACPI is disabled in acpi_boot_table_init().
6. Rebased to Linux 3.11-rc3.

change log v1 -> v2:
1. According to Tejun's advice, make ACPI side report which memory regions
   are hotpluggable, and memblock side handle the memory allocation.
2. Change "movablecore=acpi" boot option to "movablenode" boot option.

Thanks. 


Tang Chen (17):
  acpi: Print Hot-Pluggable Field in SRAT.
  earlycpio.c: Fix the confusing comment of find_cpio_data().
  acpi: Remove "continue" in macro INVALID_TABLE().
  acpi: Introduce acpi_verify_initrd() to check if a table is invalid.
  x86, ACPICA: Split acpi_boot_table_init() into two parts.
  x86, acpi, ACPICA: Initialize ACPI root table list earlier.
  x86, ACPI: Also initialize signature and length when parsing root
    table.
  x86: Make get_ramdisk_{image|size}() global.
  x86, acpi: Try to find if SRAT is overrided earlier.
  x86, acpi: Try to find SRAT in firmware earlier.
  x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory
    affinities.
  x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  memblock, numa: Introduce flag into memblock.
  memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark
    hotpluggable regions.
  memblock, mem_hotplug: Make memblock skip hotpluggable regions by
    default.
  mem-hotplug: Introduce movablenode boot option to {en|dis}able using
    SRAT.
  x86, numa, acpi, memory-hotplug: Make movablenode have higher
    priority.

Yasuaki Ishimatsu (1):
  x86: get pg_data_t's memory from other node

 Documentation/kernel-parameters.txt |   15 ++
 arch/x86/include/asm/setup.h        |   21 +++
 arch/x86/kernel/acpi/boot.c         |   45 ++++---
 arch/x86/kernel/setup.c             |   37 +++---
 arch/x86/mm/numa.c                  |    5 +-
 arch/x86/mm/srat.c                  |   11 +-
 drivers/acpi/acpica/tbutils.c       |   47 ++++++-
 drivers/acpi/acpica/tbxface.c       |   32 +++++
 drivers/acpi/osl.c                  |  252 ++++++++++++++++++++++++++++++++---
 drivers/acpi/tables.c               |    7 +-
 include/acpi/acpixf.h               |    6 +
 include/linux/acpi.h                |   22 +++-
 include/linux/memblock.h            |   13 ++
 include/linux/memory_hotplug.h      |    5 +
 lib/earlycpio.c                     |   27 ++--
 mm/memblock.c                       |   92 +++++++++++--
 mm/memory_hotplug.c                 |  104 ++++++++++++++-
 mm/page_alloc.c                     |   31 ++++-
 18 files changed, 673 insertions(+), 99 deletions(-)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 01/18] acpi: Print Hot-Pluggable Field in SRAT.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Hot-Pluggable field in SRAT suggests if the memory could be
hotplugged while the system is running. Print it as well when
parsing SRAT will help users to know which memory is hotpluggable.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/srat.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index cdd0da9..d44c8a4 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -146,6 +146,7 @@ int __init
 acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 {
 	u64 start, end;
+	u32 hotpluggable;
 	int node, pxm;
 
 	if (srat_disabled())
@@ -154,7 +155,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		goto out_err_bad_srat;
 	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
 		goto out_err;
-	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
+	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+	if (hotpluggable && !save_add_info())
 		goto out_err;
 
 	start = ma->base_address;
@@ -174,9 +176,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 
 	node_set(node, numa_nodes_parsed);
 
-	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
-	       node, pxm,
-	       (unsigned long long) start, (unsigned long long) end - 1);
+	pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s\n",
+		node, pxm,
+		(unsigned long long) start, (unsigned long long) end - 1,
+		hotpluggable ? " Hot Pluggable" : "");
 
 	return 0;
 out_err_bad_srat:
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 01/18] acpi: Print Hot-Pluggable Field in SRAT.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Hot-Pluggable field in SRAT suggests if the memory could be
hotplugged while the system is running. Print it as well when
parsing SRAT will help users to know which memory is hotpluggable.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/srat.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index cdd0da9..d44c8a4 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -146,6 +146,7 @@ int __init
 acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 {
 	u64 start, end;
+	u32 hotpluggable;
 	int node, pxm;
 
 	if (srat_disabled())
@@ -154,7 +155,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		goto out_err_bad_srat;
 	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
 		goto out_err;
-	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
+	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+	if (hotpluggable && !save_add_info())
 		goto out_err;
 
 	start = ma->base_address;
@@ -174,9 +176,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 
 	node_set(node, numa_nodes_parsed);
 
-	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
-	       node, pxm,
-	       (unsigned long long) start, (unsigned long long) end - 1);
+	pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s\n",
+		node, pxm,
+		(unsigned long long) start, (unsigned long long) end - 1,
+		hotpluggable ? " Hot Pluggable" : "");
 
 	return 0;
 out_err_bad_srat:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 02/18] earlycpio.c: Fix the confusing comment of find_cpio_data().
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The comments of find_cpio_data() says:

  * @offset: When a matching file is found, this is the offset to the
  *          beginning of the cpio. ......

But according to the code,

  dptr = PTR_ALIGN(p + ch[C_NAMESIZE], 4);
  nptr = PTR_ALIGN(dptr + ch[C_FILESIZE], 4);
  ....
  *offset = (long)nptr - (long)data;	/* data is the cpio file */

@offset is the offset of the next file, not the matching file itself.
This is confused and may cause unnecessary waste of time to debug.
So fix it.

v1 -> v2:
As tj suggested, rename @offset to @nextoff which is more clear to
users. And also adjust the new comments.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 lib/earlycpio.c |   27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/earlycpio.c b/lib/earlycpio.c
index 7aa7ce2..c7ae5ed 100644
--- a/lib/earlycpio.c
+++ b/lib/earlycpio.c
@@ -49,22 +49,23 @@ enum cpio_fields {
 
 /**
  * cpio_data find_cpio_data - Search for files in an uncompressed cpio
- * @path:   The directory to search for, including a slash at the end
- * @data:   Pointer to the the cpio archive or a header inside
- * @len:    Remaining length of the cpio based on data pointer
- * @offset: When a matching file is found, this is the offset to the
- *          beginning of the cpio. It can be used to iterate through
- *          the cpio to find all files inside of a directory path
+ * @path:       The directory to search for, including a slash at the end
+ * @data:       Pointer to the the cpio archive or a header inside
+ * @len:        Remaining length of the cpio based on data pointer
+ * @nextoff:    When a matching file is found, this is the offset from the
+ *              beginning of the cpio to the beginning of the next file, not the
+ *              matching file itself. It can be used to iterate through the cpio
+ *              to find all files inside of a directory path 
  *
- * @return: struct cpio_data containing the address, length and
- *          filename (with the directory path cut off) of the found file.
- *          If you search for a filename and not for files in a directory,
- *          pass the absolute path of the filename in the cpio and make sure
- *          the match returned an empty filename string.
+ * @return:     struct cpio_data containing the address, length and
+ *              filename (with the directory path cut off) of the found file.
+ *              If you search for a filename and not for files in a directory,
+ *              pass the absolute path of the filename in the cpio and make sure
+ *              the match returned an empty filename string.
  */
 
 struct cpio_data find_cpio_data(const char *path, void *data,
-					  size_t len,  long *offset)
+				size_t len,  long *nextoff)
 {
 	const size_t cpio_header_len = 8*C_NFIELDS - 2;
 	struct cpio_data cd = { NULL, 0, "" };
@@ -124,7 +125,7 @@ struct cpio_data find_cpio_data(const char *path, void *data,
 		if ((ch[C_MODE] & 0170000) == 0100000 &&
 		    ch[C_NAMESIZE] >= mypathsize &&
 		    !memcmp(p, path, mypathsize)) {
-			*offset = (long)nptr - (long)data;
+			*nextoff = (long)nptr - (long)data;
 			if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
 				pr_warn(
 				"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 02/18] earlycpio.c: Fix the confusing comment of find_cpio_data().
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The comments of find_cpio_data() says:

  * @offset: When a matching file is found, this is the offset to the
  *          beginning of the cpio. ......

But according to the code,

  dptr = PTR_ALIGN(p + ch[C_NAMESIZE], 4);
  nptr = PTR_ALIGN(dptr + ch[C_FILESIZE], 4);
  ....
  *offset = (long)nptr - (long)data;	/* data is the cpio file */

@offset is the offset of the next file, not the matching file itself.
This is confused and may cause unnecessary waste of time to debug.
So fix it.

v1 -> v2:
As tj suggested, rename @offset to @nextoff which is more clear to
users. And also adjust the new comments.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 lib/earlycpio.c |   27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/earlycpio.c b/lib/earlycpio.c
index 7aa7ce2..c7ae5ed 100644
--- a/lib/earlycpio.c
+++ b/lib/earlycpio.c
@@ -49,22 +49,23 @@ enum cpio_fields {
 
 /**
  * cpio_data find_cpio_data - Search for files in an uncompressed cpio
- * @path:   The directory to search for, including a slash at the end
- * @data:   Pointer to the the cpio archive or a header inside
- * @len:    Remaining length of the cpio based on data pointer
- * @offset: When a matching file is found, this is the offset to the
- *          beginning of the cpio. It can be used to iterate through
- *          the cpio to find all files inside of a directory path
+ * @path:       The directory to search for, including a slash at the end
+ * @data:       Pointer to the the cpio archive or a header inside
+ * @len:        Remaining length of the cpio based on data pointer
+ * @nextoff:    When a matching file is found, this is the offset from the
+ *              beginning of the cpio to the beginning of the next file, not the
+ *              matching file itself. It can be used to iterate through the cpio
+ *              to find all files inside of a directory path 
  *
- * @return: struct cpio_data containing the address, length and
- *          filename (with the directory path cut off) of the found file.
- *          If you search for a filename and not for files in a directory,
- *          pass the absolute path of the filename in the cpio and make sure
- *          the match returned an empty filename string.
+ * @return:     struct cpio_data containing the address, length and
+ *              filename (with the directory path cut off) of the found file.
+ *              If you search for a filename and not for files in a directory,
+ *              pass the absolute path of the filename in the cpio and make sure
+ *              the match returned an empty filename string.
  */
 
 struct cpio_data find_cpio_data(const char *path, void *data,
-					  size_t len,  long *offset)
+				size_t len,  long *nextoff)
 {
 	const size_t cpio_header_len = 8*C_NFIELDS - 2;
 	struct cpio_data cd = { NULL, 0, "" };
@@ -124,7 +125,7 @@ struct cpio_data find_cpio_data(const char *path, void *data,
 		if ((ch[C_MODE] & 0170000) == 0100000 &&
 		    ch[C_NAMESIZE] >= mypathsize &&
 		    !memcmp(p, path, mypathsize)) {
-			*offset = (long)nptr - (long)data;
+			*nextoff = (long)nptr - (long)data;
 			if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
 				pr_warn(
 				"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 03/18] acpi: Remove "continue" in macro INVALID_TABLE().
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The macro INVALID_TABLE() is defined like this:

 #define INVALID_TABLE(x, path, name)                                    \
         { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }

And it is used like this:

	for (...) {
		...
		if (...)
			INVALID_TABLE()
		...
	}

The "continue" in the macro makes the code hard to understand.
Change it to the style like other macros:

 #define INVALID_TABLE(x, path, name)                                    \
         do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)

And also, INVALID_TABLE() is used to checkout acpi tables, so rename it to
ACPI_INVALID_TABLE(). This is suggested by Toshi Kani <toshi.kani@hp.com>.

So after this patch, this macro should be used like this:

	for (...) {
		...
		if (...) {
			ACPI_INVALID_TABLE()
			continue;
		}
		...
	}

Add the "continue" wherever the macro is called.
(For now, it is only called in acpi_initrd_override().)

The idea is from Yinghai Lu <yinghai@kernel.org>.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c |   28 ++++++++++++++++++----------
 1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 6ab2c35..3b8bab2 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -564,8 +564,8 @@ static const char * const table_sigs[] = {
 	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
 /* Non-fatal errors: Affected tables/files are ignored */
-#define INVALID_TABLE(x, path, name)					\
-	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
+#define ACPI_INVALID_TABLE(x, path, name)					\
+	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
@@ -593,9 +593,11 @@ void __init acpi_initrd_override(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header))
-			INVALID_TABLE("Table smaller than ACPI header",
+		if (file.size < sizeof(struct acpi_table_header)) {
+			ACPI_INVALID_TABLE("Table smaller than ACPI header",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		table = file.data;
 
@@ -603,15 +605,21 @@ void __init acpi_initrd_override(void *data, size_t size)
 			if (!memcmp(table->signature, table_sigs[sig], 4))
 				break;
 
-		if (!table_sigs[sig])
-			INVALID_TABLE("Unknown signature",
+		if (!table_sigs[sig]) {
+			ACPI_INVALID_TABLE("Unknown signature",
 				      cpio_path, file.name);
-		if (file.size != table->length)
-			INVALID_TABLE("File length does not match table length",
+			continue;
+		}
+		if (file.size != table->length) {
+			ACPI_INVALID_TABLE("File length does not match table length",
 				      cpio_path, file.name);
-		if (acpi_table_checksum(file.data, table->length))
-			INVALID_TABLE("Bad table checksum",
+			continue;
+		}
+		if (acpi_table_checksum(file.data, table->length)) {
+			ACPI_INVALID_TABLE("Bad table checksum",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 03/18] acpi: Remove "continue" in macro INVALID_TABLE().
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The macro INVALID_TABLE() is defined like this:

 #define INVALID_TABLE(x, path, name)                                    \
         { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }

And it is used like this:

	for (...) {
		...
		if (...)
			INVALID_TABLE()
		...
	}

The "continue" in the macro makes the code hard to understand.
Change it to the style like other macros:

 #define INVALID_TABLE(x, path, name)                                    \
         do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)

And also, INVALID_TABLE() is used to checkout acpi tables, so rename it to
ACPI_INVALID_TABLE(). This is suggested by Toshi Kani <toshi.kani@hp.com>.

So after this patch, this macro should be used like this:

	for (...) {
		...
		if (...) {
			ACPI_INVALID_TABLE()
			continue;
		}
		...
	}

Add the "continue" wherever the macro is called.
(For now, it is only called in acpi_initrd_override().)

The idea is from Yinghai Lu <yinghai@kernel.org>.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c |   28 ++++++++++++++++++----------
 1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 6ab2c35..3b8bab2 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -564,8 +564,8 @@ static const char * const table_sigs[] = {
 	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
 /* Non-fatal errors: Affected tables/files are ignored */
-#define INVALID_TABLE(x, path, name)					\
-	{ pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
+#define ACPI_INVALID_TABLE(x, path, name)					\
+	do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
@@ -593,9 +593,11 @@ void __init acpi_initrd_override(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header))
-			INVALID_TABLE("Table smaller than ACPI header",
+		if (file.size < sizeof(struct acpi_table_header)) {
+			ACPI_INVALID_TABLE("Table smaller than ACPI header",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		table = file.data;
 
@@ -603,15 +605,21 @@ void __init acpi_initrd_override(void *data, size_t size)
 			if (!memcmp(table->signature, table_sigs[sig], 4))
 				break;
 
-		if (!table_sigs[sig])
-			INVALID_TABLE("Unknown signature",
+		if (!table_sigs[sig]) {
+			ACPI_INVALID_TABLE("Unknown signature",
 				      cpio_path, file.name);
-		if (file.size != table->length)
-			INVALID_TABLE("File length does not match table length",
+			continue;
+		}
+		if (file.size != table->length) {
+			ACPI_INVALID_TABLE("File length does not match table length",
 				      cpio_path, file.name);
-		if (acpi_table_checksum(file.data, table->length))
-			INVALID_TABLE("Bad table checksum",
+			continue;
+		}
+		if (acpi_table_checksum(file.data, table->length)) {
+			ACPI_INVALID_TABLE("Bad table checksum",
 				      cpio_path, file.name);
+			continue;
+		}
 
 		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 04/18] acpi: Introduce acpi_verify_initrd() to check if a table is invalid.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In acpi_initrd_override(), it checks several things to ensure the
table it found is valid. In later patches, we need to do these check
somewhere else. So this patch introduces a common function
acpi_verify_initrd() to do all these checks, and reuse it in different
places. The function will be used in the subsequent patches.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/acpi/osl.c |   86 +++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 3b8bab2..0043e9f 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -572,9 +572,68 @@ static const char * const table_sigs[] = {
 /* Must not increase 10 or needs code modification below */
 #define ACPI_OVERRIDE_TABLES 10
 
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_verify_table
+ *
+ * PARAMETERS:  File               - The initrd file
+ *              Path               - Path to acpi overriding tables in cpio file
+ *              Signature          - Signature of the table
+ *
+ * RETURN:      0 if it passes all the checks, -EINVAL if any check fails.
+ *
+ * DESCRIPTION: Check if an acpi table found in initrd is invalid.
+ *              @signature can be NULL. If it is NULL, the function will check
+ *              if the table signature matches any signature in table_sigs[].
+ *
+ ******************************************************************************/
+int __init acpi_verify_table(struct cpio_data *file,
+			      const char *path, const char *signature)
+{
+	int idx;
+	struct acpi_table_header *table = file->data;
+
+	if (file->size < sizeof(struct acpi_table_header)) {
+		ACPI_INVALID_TABLE("Table smaller than ACPI header",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	if (signature) {
+		if (memcmp(table->signature, signature, 4)) {
+			ACPI_INVALID_TABLE("Table signature does not match",
+				      path, file->name);
+			return -EINVAL;
+		}
+	} else {
+		for (idx = 0; table_sigs[idx]; idx++)
+			if (!memcmp(table->signature, table_sigs[idx], 4))
+				break;
+
+		if (!table_sigs[idx]) {
+			ACPI_INVALID_TABLE("Unknown signature", path, file->name);
+			return -EINVAL;
+		}
+	}
+
+	if (file->size != table->length) {
+		ACPI_INVALID_TABLE("File length does not match table length",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	if (acpi_table_checksum(file->data, table->length)) {
+		ACPI_INVALID_TABLE("Bad table checksum",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
-	int sig, no, table_nr = 0, total_offset = 0;
+	int no, table_nr = 0, total_offset = 0;
 	long offset = 0;
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
@@ -593,33 +652,10 @@ void __init acpi_initrd_override(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header)) {
-			ACPI_INVALID_TABLE("Table smaller than ACPI header",
-				      cpio_path, file.name);
-			continue;
-		}
-
 		table = file.data;
 
-		for (sig = 0; table_sigs[sig]; sig++)
-			if (!memcmp(table->signature, table_sigs[sig], 4))
-				break;
-
-		if (!table_sigs[sig]) {
-			ACPI_INVALID_TABLE("Unknown signature",
-				      cpio_path, file.name);
+		if (acpi_verify_table(&file, cpio_path, NULL))
 			continue;
-		}
-		if (file.size != table->length) {
-			ACPI_INVALID_TABLE("File length does not match table length",
-				      cpio_path, file.name);
-			continue;
-		}
-		if (acpi_table_checksum(file.data, table->length)) {
-			ACPI_INVALID_TABLE("Bad table checksum",
-				      cpio_path, file.name);
-			continue;
-		}
 
 		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 04/18] acpi: Introduce acpi_verify_initrd() to check if a table is invalid.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In acpi_initrd_override(), it checks several things to ensure the
table it found is valid. In later patches, we need to do these check
somewhere else. So this patch introduces a common function
acpi_verify_initrd() to do all these checks, and reuse it in different
places. The function will be used in the subsequent patches.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/acpi/osl.c |   86 +++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 3b8bab2..0043e9f 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -572,9 +572,68 @@ static const char * const table_sigs[] = {
 /* Must not increase 10 or needs code modification below */
 #define ACPI_OVERRIDE_TABLES 10
 
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_verify_table
+ *
+ * PARAMETERS:  File               - The initrd file
+ *              Path               - Path to acpi overriding tables in cpio file
+ *              Signature          - Signature of the table
+ *
+ * RETURN:      0 if it passes all the checks, -EINVAL if any check fails.
+ *
+ * DESCRIPTION: Check if an acpi table found in initrd is invalid.
+ *              @signature can be NULL. If it is NULL, the function will check
+ *              if the table signature matches any signature in table_sigs[].
+ *
+ ******************************************************************************/
+int __init acpi_verify_table(struct cpio_data *file,
+			      const char *path, const char *signature)
+{
+	int idx;
+	struct acpi_table_header *table = file->data;
+
+	if (file->size < sizeof(struct acpi_table_header)) {
+		ACPI_INVALID_TABLE("Table smaller than ACPI header",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	if (signature) {
+		if (memcmp(table->signature, signature, 4)) {
+			ACPI_INVALID_TABLE("Table signature does not match",
+				      path, file->name);
+			return -EINVAL;
+		}
+	} else {
+		for (idx = 0; table_sigs[idx]; idx++)
+			if (!memcmp(table->signature, table_sigs[idx], 4))
+				break;
+
+		if (!table_sigs[idx]) {
+			ACPI_INVALID_TABLE("Unknown signature", path, file->name);
+			return -EINVAL;
+		}
+	}
+
+	if (file->size != table->length) {
+		ACPI_INVALID_TABLE("File length does not match table length",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	if (acpi_table_checksum(file->data, table->length)) {
+		ACPI_INVALID_TABLE("Bad table checksum",
+			      path, file->name);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
-	int sig, no, table_nr = 0, total_offset = 0;
+	int no, table_nr = 0, total_offset = 0;
 	long offset = 0;
 	struct acpi_table_header *table;
 	char cpio_path[32] = "kernel/firmware/acpi/";
@@ -593,33 +652,10 @@ void __init acpi_initrd_override(void *data, size_t size)
 		data += offset;
 		size -= offset;
 
-		if (file.size < sizeof(struct acpi_table_header)) {
-			ACPI_INVALID_TABLE("Table smaller than ACPI header",
-				      cpio_path, file.name);
-			continue;
-		}
-
 		table = file.data;
 
-		for (sig = 0; table_sigs[sig]; sig++)
-			if (!memcmp(table->signature, table_sigs[sig], 4))
-				break;
-
-		if (!table_sigs[sig]) {
-			ACPI_INVALID_TABLE("Unknown signature",
-				      cpio_path, file.name);
+		if (acpi_verify_table(&file, cpio_path, NULL))
 			continue;
-		}
-		if (file.size != table->length) {
-			ACPI_INVALID_TABLE("File length does not match table length",
-				      cpio_path, file.name);
-			continue;
-		}
-		if (acpi_table_checksum(file.data, table->length)) {
-			ACPI_INVALID_TABLE("Bad table checksum",
-				      cpio_path, file.name);
-			continue;
-		}
 
 		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In ACPI, SRAT(System Resource Affinity Table) contains NUMA info.
The memory affinities in SRAT record every memory range in the
system, and also, flags specifying if the memory range is
hotpluggable.
(Please refer to ACPI spec 5.0 5.2.16)

memblock starts to work at very early time, and SRAT has not been
parsed. So we don't know which memory is hotpluggable. In order
to use memblock to reserve hotpluggable memory, we need to obtain
SRAT memory affinity info earlier.

In the current acpi_boot_table_init(), it does the following:
1. Parse RSDT, so that we can find all the tables.
2. Initialize acpi_gbl_root_table_list, an array of acpi table
   descriptors used to store each table's address, length, signature,
   and so on.
3. Check if there is any table in initrd intending to override
   tables from firmware. If so, override the firmware tables.
4. Initialize all the data in acpi_gbl_root_table_list.

In order to parse SRAT at early time, we need to do similar job as
step 1 and 2 above earlier to obtain SRAT. It will be very convenient
if we have acpi_gbl_root_table_list initialized. We can use address
and signature to find SRAT.

Since step 1 and 2 allocates no memory, it is OK to do these two
steps earlier.

But step 3 will check acpi initrd table override, not just SRAT,
but also all the other tables. So it is better to keep it untouched.

This patch splits acpi_boot_table_init() into two steps:
1. Parse RSDT, which cannot be overrided, and initialize
   acpi_gbl_root_table_list. (step 1 + 2 above)
2. Install all ACPI tables into acpi_gbl_root_table_list.
   (step 3 + 4 above)

In later patches, we will do step 1 + 2 earlier.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbutils.c |   25 ++++++++++++++++++++++---
 drivers/acpi/tables.c         |    2 ++
 include/acpi/acpixf.h         |    2 ++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index bffdfc7..e3621cf 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -577,9 +577,30 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	 */
 	acpi_os_unmap_memory(table, length);
 
+	return_ACPI_STATUS(AE_OK);
+}
+
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_tb_install_root_table
+ *
+ * DESCRIPTION: This function installs all the ACPI tables in RSDT into
+ *              acpi_gbl_root_table_list.
+ *
+ ******************************************************************************/
+
+void __init
+acpi_tb_install_root_table()
+{
+	int i;
+
 	/*
 	 * Complete the initialization of the root table array by examining
-	 * the header of each table
+	 * the header of each table.
+	 *
+	 * First two entries in the table array are reserved for the DSDT
+	 * and FACS, which are not actually present in the RSDT/XSDT - they
+	 * come from the FADT.
 	 */
 	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
 		acpi_tb_install_table(acpi_gbl_root_table_list.tables[i].
@@ -593,6 +614,4 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 			acpi_tb_parse_fadt(i);
 		}
 	}
-
-	return_ACPI_STATUS(AE_OK);
 }
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index d67a1fe..8860e79 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -353,6 +353,8 @@ int __init acpi_table_init(void)
 	if (ACPI_FAILURE(status))
 		return 1;
 
+	acpi_tb_install_root_table();
+
 	check_multiple_madt();
 	return 0;
 }
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index 22d497e..e9c9b88 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -118,6 +118,8 @@ acpi_status
 acpi_initialize_tables(struct acpi_table_desc *initial_storage,
 		       u32 initial_table_count, u8 allow_resize);
 
+void acpi_tb_install_root_table(void);
+
 acpi_status __init acpi_initialize_subsystem(void);
 
 acpi_status acpi_enable_subsystem(u32 flags);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In ACPI, SRAT(System Resource Affinity Table) contains NUMA info.
The memory affinities in SRAT record every memory range in the
system, and also, flags specifying if the memory range is
hotpluggable.
(Please refer to ACPI spec 5.0 5.2.16)

memblock starts to work at very early time, and SRAT has not been
parsed. So we don't know which memory is hotpluggable. In order
to use memblock to reserve hotpluggable memory, we need to obtain
SRAT memory affinity info earlier.

In the current acpi_boot_table_init(), it does the following:
1. Parse RSDT, so that we can find all the tables.
2. Initialize acpi_gbl_root_table_list, an array of acpi table
   descriptors used to store each table's address, length, signature,
   and so on.
3. Check if there is any table in initrd intending to override
   tables from firmware. If so, override the firmware tables.
4. Initialize all the data in acpi_gbl_root_table_list.

In order to parse SRAT at early time, we need to do similar job as
step 1 and 2 above earlier to obtain SRAT. It will be very convenient
if we have acpi_gbl_root_table_list initialized. We can use address
and signature to find SRAT.

Since step 1 and 2 allocates no memory, it is OK to do these two
steps earlier.

But step 3 will check acpi initrd table override, not just SRAT,
but also all the other tables. So it is better to keep it untouched.

This patch splits acpi_boot_table_init() into two steps:
1. Parse RSDT, which cannot be overrided, and initialize
   acpi_gbl_root_table_list. (step 1 + 2 above)
2. Install all ACPI tables into acpi_gbl_root_table_list.
   (step 3 + 4 above)

In later patches, we will do step 1 + 2 earlier.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbutils.c |   25 ++++++++++++++++++++++---
 drivers/acpi/tables.c         |    2 ++
 include/acpi/acpixf.h         |    2 ++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index bffdfc7..e3621cf 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -577,9 +577,30 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	 */
 	acpi_os_unmap_memory(table, length);
 
+	return_ACPI_STATUS(AE_OK);
+}
+
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_tb_install_root_table
+ *
+ * DESCRIPTION: This function installs all the ACPI tables in RSDT into
+ *              acpi_gbl_root_table_list.
+ *
+ ******************************************************************************/
+
+void __init
+acpi_tb_install_root_table()
+{
+	int i;
+
 	/*
 	 * Complete the initialization of the root table array by examining
-	 * the header of each table
+	 * the header of each table.
+	 *
+	 * First two entries in the table array are reserved for the DSDT
+	 * and FACS, which are not actually present in the RSDT/XSDT - they
+	 * come from the FADT.
 	 */
 	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
 		acpi_tb_install_table(acpi_gbl_root_table_list.tables[i].
@@ -593,6 +614,4 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 			acpi_tb_parse_fadt(i);
 		}
 	}
-
-	return_ACPI_STATUS(AE_OK);
 }
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index d67a1fe..8860e79 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -353,6 +353,8 @@ int __init acpi_table_init(void)
 	if (ACPI_FAILURE(status))
 		return 1;
 
+	acpi_tb_install_root_table();
+
 	check_multiple_madt();
 	return 0;
 }
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index 22d497e..e9c9b88 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -118,6 +118,8 @@ acpi_status
 acpi_initialize_tables(struct acpi_table_desc *initial_storage,
 		       u32 initial_table_count, u8 allow_resize);
 
+void acpi_tb_install_root_table(void);
+
 acpi_status __init acpi_initialize_subsystem(void);
 
 acpi_status acpi_enable_subsystem(u32 flags);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 06/18] x86, acpi, ACPICA: Initialize ACPI root table list earlier.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

We have split acpi_table_init() into two steps:
1. Pares RSDT or XSDT, and initialize acpi_gbl_root_table_list.
   This step will record all tables' physical address in memory.
2. Check acpi initrd table override and install all tables into
   acpi_gbl_root_table_list.

This patch does step 1 earlier, right after memblock is ready.

When memblock_x86_fill() is called to fulfill memblock.memory[],
memblock is able to allocate memory.

This patch introduces a new function acpi_root_table_init() to
do step 1, and call this function right after memblock_x86_fill()
is called.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/acpi/boot.c |   45 +++++++++++++++++++++++++++---------------
 arch/x86/kernel/setup.c     |    3 ++
 drivers/acpi/tables.c       |    7 ++++-
 include/linux/acpi.h        |    2 +
 4 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 2627a81..dcdf3e3 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1497,6 +1497,32 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
 	{}
 };
 
+/**
+ * early_acpi_boot_table_init - Initialize acpi_gbl_root_table_list.
+ *
+ * This function will parse RSDT or XSDT, find all tables' phys addr,
+ * initialize acpi_gbl_root_table_list, and record all tables' phys addr
+ * in acpi_gbl_root_table_list.
+ *
+ * acpi_table_init() is separate to allow reading SRAT without
+ * other side effects.
+ *
+ */
+void __init early_acpi_boot_table_init(void)
+{
+	dmi_check_system(acpi_dmi_table);
+
+	/* If acpi_disabled, bail out */
+	if (acpi_disabled)
+		return;
+
+	/* Initialize the ACPI boot-time table parser */
+	if (acpi_table_init()) {
+		disable_acpi();
+		return;
+	}
+}
+
 /*
  * acpi_boot_table_init() and acpi_boot_init()
  *  called from setup_arch(), always.
@@ -1504,9 +1530,6 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
  *	2. enumerates lapics
  *	3. enumerates io-apics
  *
- * acpi_table_init() is separate to allow reading SRAT without
- * other side effects.
- *
  * side effects of acpi_boot_init:
  *	acpi_lapic = 1 if LAPIC found
  *	acpi_ioapic = 1 if IOAPIC found
@@ -1518,21 +1541,11 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
 
 void __init acpi_boot_table_init(void)
 {
-	dmi_check_system(acpi_dmi_table);
-
-	/*
-	 * If acpi_disabled, bail out
-	 */
+	/* If acpi_disabled, bail out */
 	if (acpi_disabled)
-		return; 
-
-	/*
-	 * Initialize the ACPI boot-time table parser.
-	 */
-	if (acpi_table_init()) {
-		disable_acpi();
 		return;
-	}
+
+	acpi_install_root_table();
 
 	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f8ec578..53d4ac7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1073,6 +1073,9 @@ void __init setup_arch(char **cmdline_p)
 	memblock.current_limit = ISA_END_ADDRESS;
 	memblock_x86_fill();
 
+	/* Initialize ACPI global root table list. */
+	early_acpi_boot_table_init();
+
 	/*
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 8860e79..60ecbb8 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -353,10 +353,13 @@ int __init acpi_table_init(void)
 	if (ACPI_FAILURE(status))
 		return 1;
 
-	acpi_tb_install_root_table();
+	return 0;
+}
 
+void __init acpi_install_root_table(void)
+{
+	acpi_tb_install_root_table();
 	check_multiple_madt();
-	return 0;
 }
 
 static int __init acpi_parse_apic_instance(char *str)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 353ba25..b722183 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -92,10 +92,12 @@ void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
 int acpi_boot_init (void);
 void acpi_boot_table_init (void);
+void early_acpi_boot_table_init(void);
 int acpi_mps_check (void);
 int acpi_numa_init (void);
 
 int acpi_table_init (void);
+void acpi_install_root_table(void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init acpi_table_parse_entries(char *id, unsigned long table_size,
 				    int entry_id,
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 06/18] x86, acpi, ACPICA: Initialize ACPI root table list earlier.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

We have split acpi_table_init() into two steps:
1. Pares RSDT or XSDT, and initialize acpi_gbl_root_table_list.
   This step will record all tables' physical address in memory.
2. Check acpi initrd table override and install all tables into
   acpi_gbl_root_table_list.

This patch does step 1 earlier, right after memblock is ready.

When memblock_x86_fill() is called to fulfill memblock.memory[],
memblock is able to allocate memory.

This patch introduces a new function acpi_root_table_init() to
do step 1, and call this function right after memblock_x86_fill()
is called.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/acpi/boot.c |   45 +++++++++++++++++++++++++++---------------
 arch/x86/kernel/setup.c     |    3 ++
 drivers/acpi/tables.c       |    7 ++++-
 include/linux/acpi.h        |    2 +
 4 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 2627a81..dcdf3e3 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1497,6 +1497,32 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
 	{}
 };
 
+/**
+ * early_acpi_boot_table_init - Initialize acpi_gbl_root_table_list.
+ *
+ * This function will parse RSDT or XSDT, find all tables' phys addr,
+ * initialize acpi_gbl_root_table_list, and record all tables' phys addr
+ * in acpi_gbl_root_table_list.
+ *
+ * acpi_table_init() is separate to allow reading SRAT without
+ * other side effects.
+ *
+ */
+void __init early_acpi_boot_table_init(void)
+{
+	dmi_check_system(acpi_dmi_table);
+
+	/* If acpi_disabled, bail out */
+	if (acpi_disabled)
+		return;
+
+	/* Initialize the ACPI boot-time table parser */
+	if (acpi_table_init()) {
+		disable_acpi();
+		return;
+	}
+}
+
 /*
  * acpi_boot_table_init() and acpi_boot_init()
  *  called from setup_arch(), always.
@@ -1504,9 +1530,6 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
  *	2. enumerates lapics
  *	3. enumerates io-apics
  *
- * acpi_table_init() is separate to allow reading SRAT without
- * other side effects.
- *
  * side effects of acpi_boot_init:
  *	acpi_lapic = 1 if LAPIC found
  *	acpi_ioapic = 1 if IOAPIC found
@@ -1518,21 +1541,11 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
 
 void __init acpi_boot_table_init(void)
 {
-	dmi_check_system(acpi_dmi_table);
-
-	/*
-	 * If acpi_disabled, bail out
-	 */
+	/* If acpi_disabled, bail out */
 	if (acpi_disabled)
-		return; 
-
-	/*
-	 * Initialize the ACPI boot-time table parser.
-	 */
-	if (acpi_table_init()) {
-		disable_acpi();
 		return;
-	}
+
+	acpi_install_root_table();
 
 	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f8ec578..53d4ac7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1073,6 +1073,9 @@ void __init setup_arch(char **cmdline_p)
 	memblock.current_limit = ISA_END_ADDRESS;
 	memblock_x86_fill();
 
+	/* Initialize ACPI global root table list. */
+	early_acpi_boot_table_init();
+
 	/*
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 8860e79..60ecbb8 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -353,10 +353,13 @@ int __init acpi_table_init(void)
 	if (ACPI_FAILURE(status))
 		return 1;
 
-	acpi_tb_install_root_table();
+	return 0;
+}
 
+void __init acpi_install_root_table(void)
+{
+	acpi_tb_install_root_table();
 	check_multiple_madt();
-	return 0;
 }
 
 static int __init acpi_parse_apic_instance(char *str)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 353ba25..b722183 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -92,10 +92,12 @@ void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
 int acpi_boot_init (void);
 void acpi_boot_table_init (void);
+void early_acpi_boot_table_init(void);
 int acpi_mps_check (void);
 int acpi_numa_init (void);
 
 int acpi_table_init (void);
+void acpi_install_root_table(void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init acpi_table_parse_entries(char *id, unsigned long table_size,
 				    int entry_id,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 07/18] x86, ACPI: Also initialize signature and length when parsing root table.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Besides the phys addr of the acpi tables, it will be very convenient if
we also have the signature of each table in acpi_gbl_root_table_list at
early time. We can find SRAT easily by comparing the signature.

This patch alse record signature and some other info in
acpi_gbl_root_table_list at early time.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbutils.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index e3621cf..af942fe 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -438,6 +438,7 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	u32 i;
 	u32 table_count;
 	struct acpi_table_header *table;
+	struct acpi_table_desc *table_desc;
 	acpi_physical_address address;
 	acpi_physical_address uninitialized_var(rsdt_address);
 	u32 length;
@@ -577,6 +578,27 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	 */
 	acpi_os_unmap_memory(table, length);
 
+	/*
+	 * Also initialize the table entries here, so that later we can use them
+	 * to find SRAT at very eraly time to reserve hotpluggable memory.
+	 */
+	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
+		table = acpi_os_map_memory(
+				acpi_gbl_root_table_list.tables[i].address,
+				sizeof(struct acpi_table_header));
+		if (!table)
+			return_ACPI_STATUS(AE_NO_MEMORY);
+
+		table_desc = &acpi_gbl_root_table_list.tables[i];
+
+		table_desc->pointer = NULL;
+		table_desc->length = table->length;
+		table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
+		ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
+
+		acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
+	}
+
 	return_ACPI_STATUS(AE_OK);
 }
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 07/18] x86, ACPI: Also initialize signature and length when parsing root table.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Besides the phys addr of the acpi tables, it will be very convenient if
we also have the signature of each table in acpi_gbl_root_table_list at
early time. We can find SRAT easily by comparing the signature.

This patch alse record signature and some other info in
acpi_gbl_root_table_list at early time.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbutils.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index e3621cf..af942fe 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -438,6 +438,7 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	u32 i;
 	u32 table_count;
 	struct acpi_table_header *table;
+	struct acpi_table_desc *table_desc;
 	acpi_physical_address address;
 	acpi_physical_address uninitialized_var(rsdt_address);
 	u32 length;
@@ -577,6 +578,27 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 	 */
 	acpi_os_unmap_memory(table, length);
 
+	/*
+	 * Also initialize the table entries here, so that later we can use them
+	 * to find SRAT at very eraly time to reserve hotpluggable memory.
+	 */
+	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
+		table = acpi_os_map_memory(
+				acpi_gbl_root_table_list.tables[i].address,
+				sizeof(struct acpi_table_header));
+		if (!table)
+			return_ACPI_STATUS(AE_NO_MEMORY);
+
+		table_desc = &acpi_gbl_root_table_list.tables[i];
+
+		table_desc->pointer = NULL;
+		table_desc->length = table->length;
+		table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
+		ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
+
+		acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
+	}
+
 	return_ACPI_STATUS(AE_OK);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 08/18] x86: get pg_data_t's memory from other node
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If system can create movable node which all memory of the node is allocated
as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's
pg_data_t. So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
to retry when the first allocation fails. Otherwise, the system could failed
to boot.

The node_data could be on hotpluggable node. And so could pagetable and
vmemmap. But for now, doing so will break memory hot-remove path.

A node could have several memory devices. And the device who holds node
data should be hot-removed in the last place. But in NUMA level, we don't
know which memory_block (/sys/devices/system/node/nodeX/memoryXXX) belongs
to which memory device. We only have node. So we can only do node hotplug.

But in virtualization, developers are now developing memory hotplug in qemu,
which support a single memory device hotplug. So a whole node hotplug will
not satisfy virtualization users.

So at last, we concluded that we'd better do memory hotplug and local node
things (local node node data, pagetable, vmemmap, ...) in two steps.
Please refer to https://lkml.org/lkml/2013/6/19/73

For now, we put node_data of movable node to another node, and then improve
it in the future.

In the later patches, a boot option will be introduced to enable/disable this
functionality. If users disable it, the node_data will still be put on the
local node.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/mm/numa.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 8bf93ba..d532b6d 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -209,10 +209,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
 	 * Allocate node data.  Try node-local memory and then any node.
 	 * Never allocate in DMA zone.
 	 */
-	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
 	if (!nd_pa) {
-		pr_err("Cannot find %zu bytes in node %d\n",
-		       nd_size, nid);
+		pr_err("Cannot find %zu bytes in any node\n", nd_size);
 		return;
 	}
 	nd = __va(nd_pa);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 08/18] x86: get pg_data_t's memory from other node
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If system can create movable node which all memory of the node is allocated
as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's
pg_data_t. So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
to retry when the first allocation fails. Otherwise, the system could failed
to boot.

The node_data could be on hotpluggable node. And so could pagetable and
vmemmap. But for now, doing so will break memory hot-remove path.

A node could have several memory devices. And the device who holds node
data should be hot-removed in the last place. But in NUMA level, we don't
know which memory_block (/sys/devices/system/node/nodeX/memoryXXX) belongs
to which memory device. We only have node. So we can only do node hotplug.

But in virtualization, developers are now developing memory hotplug in qemu,
which support a single memory device hotplug. So a whole node hotplug will
not satisfy virtualization users.

So at last, we concluded that we'd better do memory hotplug and local node
things (local node node data, pagetable, vmemmap, ...) in two steps.
Please refer to https://lkml.org/lkml/2013/6/19/73

For now, we put node_data of movable node to another node, and then improve
it in the future.

In the later patches, a boot option will be introduced to enable/disable this
functionality. If users disable it, the node_data will still be put on the
local node.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/mm/numa.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 8bf93ba..d532b6d 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -209,10 +209,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
 	 * Allocate node data.  Try node-local memory and then any node.
 	 * Never allocate in DMA zone.
 	 */
-	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
 	if (!nd_pa) {
-		pr_err("Cannot find %zu bytes in node %d\n",
-		       nd_size, nid);
+		pr_err("Cannot find %zu bytes in any node\n", nd_size);
 		return;
 	}
 	nd = __va(nd_pa);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 09/18] x86: Make get_ramdisk_{image|size}() global.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In the following patches, we need to call get_ramdisk_{image|size}()
to get initrd file's address and size. So make these two functions
global.

v1 -> v2:
As tj suggested, make these two function static inline in
arch/x86/include/asm/setup.h.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |   21 +++++++++++++++++++++
 arch/x86/kernel/setup.c      |   18 ------------------
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..cfdb55d 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,27 @@ void *extend_brk(size_t size, size_t align);
 	RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+
+#ifdef CONFIG_BLK_DEV_INITRD
+static inline u64 __init get_ramdisk_image(void)
+{
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
+	return ramdisk_image;
+}
+
+static inline u64 __init get_ramdisk_size(void)
+{
+	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
+	return ramdisk_size;
+}
+#endif /* CONFIG_BLK_DEV_INITRD */
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 53d4ac7..b2ce0dc 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -296,24 +296,6 @@ static void __init reserve_brk(void)
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
-
-static u64 __init get_ramdisk_image(void)
-{
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-
-	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
-
-	return ramdisk_image;
-}
-static u64 __init get_ramdisk_size(void)
-{
-	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
-
-	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
-
-	return ramdisk_size;
-}
-
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 09/18] x86: Make get_ramdisk_{image|size}() global.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In the following patches, we need to call get_ramdisk_{image|size}()
to get initrd file's address and size. So make these two functions
global.

v1 -> v2:
As tj suggested, make these two function static inline in
arch/x86/include/asm/setup.h.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |   21 +++++++++++++++++++++
 arch/x86/kernel/setup.c      |   18 ------------------
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..cfdb55d 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,27 @@ void *extend_brk(size_t size, size_t align);
 	RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+
+#ifdef CONFIG_BLK_DEV_INITRD
+static inline u64 __init get_ramdisk_image(void)
+{
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
+	return ramdisk_image;
+}
+
+static inline u64 __init get_ramdisk_size(void)
+{
+	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
+	return ramdisk_size;
+}
+#endif /* CONFIG_BLK_DEV_INITRD */
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 53d4ac7..b2ce0dc 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -296,24 +296,6 @@ static void __init reserve_brk(void)
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
-
-static u64 __init get_ramdisk_image(void)
-{
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-
-	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
-
-	return ramdisk_image;
-}
-static u64 __init get_ramdisk_size(void)
-{
-	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
-
-	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
-
-	return ramdisk_size;
-}
-
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 10/18] x86, acpi: Try to find if SRAT is overrided earlier.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Linux cannot migrate pages used by the kernel due to the direct mapping
(va = pa + PAGE_OFFSET), any memory used by the kernel cannot be hot-removed.
So when using memory hotplug, we have to prevent the kernel from using
hotpluggable memory.

The ACPI table SRAT (System Resource Affinity Table) contains info to specify
which memory is hotpluggble. After SRAT is parsed, we are aware of which
memory is hotpluggable.

At the early time when system is booting, SRAT has not been parsed. The boot
memory allocator memblock will allocate any memory to the kernel. So we need
SRAT parsed before memblock starts to work.

In this patch, we are going to parse SRAT earlier, right after memblock is ready.

Generally speaking, tables such as SRAT are provided by firmware. But
ACPI_INITRD_TABLE_OVERRIDE functionality allows users to customize their own
tables in initrd, and override the ones from firmware. So if we want to parse
SRAT earlier, we also need to do SRAT override earlier.

First, we introduce early_acpi_override_srat() to check if SRAT will be overridden
from initrd.

Second, we introduce find_hotpluggable_memory() to reserve hotpluggable memory,
which will firstly call early_acpi_override_srat() to find out which memory is
hotpluggable in the override SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c        |   10 ++++++
 drivers/acpi/osl.c             |   61 ++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h           |   14 ++++++++-
 include/linux/memory_hotplug.h |    2 +
 mm/memory_hotplug.c            |   25 +++++++++++++++-
 5 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b2ce0dc..c23e6a7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1058,6 +1058,16 @@ void __init setup_arch(char **cmdline_p)
 	/* Initialize ACPI global root table list. */
 	early_acpi_boot_table_init();
 
+#ifdef CONFIG_ACPI_NUMA
+	/*
+	 * Linux kernel cannot migrate kernel pages, as a result, memory used
+	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
+	 * memory in memblock to prevent memblock from allocating hotpluggable
+	 * memory for the kernel.
+	 */
+	find_hotpluggable_memory();
+#endif
+
 	/*
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 0043e9f..dcbca3e 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -48,6 +48,7 @@
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
+#include <asm/setup.h>
 
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
@@ -631,6 +632,66 @@ int __init acpi_verify_table(struct cpio_data *file,
 	return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_override_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Try to get the phys addr of SRAT in initrd.
+ *              The ACPI_INITRD_TABLE_OVERRIDE procedure is able to use tables
+ *              in initrd file to override the ones provided by firmware. This
+ *              function checks if there is a SRAT in initrd at early time. If
+ *              so, return the phys addr of the SRAT.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_override_srat(void)
+{
+	int i;
+	u32 length;
+	long offset;
+	void *ramdisk_vaddr;
+	struct acpi_table_header *table;
+	struct cpio_data file;
+	unsigned long map_step = NR_FIX_BTMAPS << PAGE_SHIFT;
+	phys_addr_t ramdisk_image = get_ramdisk_image();
+	char cpio_path[32] = "kernel/firmware/acpi/";
+
+	if (!ramdisk_image || !get_ramdisk_size())
+		return 0;
+
+	/* Try to find if SRAT is overridden */
+	for (i = 0; i < ACPI_OVERRIDE_TABLES; i++) {
+		ramdisk_vaddr = early_ioremap(ramdisk_image, map_step);
+
+		file = find_cpio_data(cpio_path, ramdisk_vaddr,
+				      map_step, &offset);
+		if (!file.data) {
+			early_iounmap(ramdisk_vaddr, map_step);
+			return 0;
+		}
+
+		table = file.data;
+		length = table->length;
+
+		if (acpi_verify_table(&file, cpio_path, ACPI_SIG_SRAT)) {
+			ramdisk_image += offset;
+			early_iounmap(ramdisk_vaddr, map_step);
+			continue;
+		}
+
+		/* Found SRAT */
+		early_iounmap(ramdisk_vaddr, map_step);
+		ramdisk_image = ramdisk_image + offset - length;
+
+		break;
+	}
+
+	return ramdisk_image;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
 	int no, table_nr = 0, total_offset = 0;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index b722183..d86455a 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -81,11 +81,21 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
 void acpi_initrd_override(void *data, size_t size);
-#else
+
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_override_srat(void);
+#endif	/* CONFIG_ACPI_NUMA */
+
+#else	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 static inline void acpi_initrd_override(void *data, size_t size)
 {
 }
-#endif
+
+static inline phys_addr_t early_acpi_override_srat(void)
+{
+	return 0;
+}
+#endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index dd38e62..463efa9 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -104,6 +104,7 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
+extern void find_hotpluggable_memory(void);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
@@ -181,6 +182,7 @@ static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
 }
 #endif
+
 extern void put_page_bootmem(struct page *page);
 extern void get_page_bootmem(unsigned long ingo, struct page *page,
 			     unsigned long type);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ca1dd3a..2a57888 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -30,6 +30,7 @@
 #include <linux/mm_inline.h>
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
+#include <linux/acpi.h>
 
 #include <asm/tlbflush.h>
 
@@ -62,7 +63,6 @@ void unlock_memory_hotplug(void)
 	mutex_unlock(&mem_hotplug_mutex);
 }
 
-
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
@@ -91,6 +91,29 @@ static void release_memory_resource(struct resource *res)
 	return;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/**
+ * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
+ *
+ * This function did the following:
+ * 1. Try to find if there is a SRAT in initrd file used to override the one
+ *    provided by firmware. If so, get its phys addr.
+ * 2. If there is no override SRAT, get the phys addr of the SRAT in firmware.
+ * 3. Parse SRAT, find out which memory is hotpluggable.
+ */
+void __init find_hotpluggable_memory(void)
+{
+	phys_addr_t srat_paddr;
+
+	/* Try to find if SRAT is overridden */
+	srat_paddr = early_acpi_override_srat();
+	if (!srat_paddr)
+		return;
+
+	/* Will parse SRAT and find out hotpluggable memory here */
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 void get_page_bootmem(unsigned long info,  struct page *page,
 		      unsigned long type)
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 10/18] x86, acpi: Try to find if SRAT is overrided earlier.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Linux cannot migrate pages used by the kernel due to the direct mapping
(va = pa + PAGE_OFFSET), any memory used by the kernel cannot be hot-removed.
So when using memory hotplug, we have to prevent the kernel from using
hotpluggable memory.

The ACPI table SRAT (System Resource Affinity Table) contains info to specify
which memory is hotpluggble. After SRAT is parsed, we are aware of which
memory is hotpluggable.

At the early time when system is booting, SRAT has not been parsed. The boot
memory allocator memblock will allocate any memory to the kernel. So we need
SRAT parsed before memblock starts to work.

In this patch, we are going to parse SRAT earlier, right after memblock is ready.

Generally speaking, tables such as SRAT are provided by firmware. But
ACPI_INITRD_TABLE_OVERRIDE functionality allows users to customize their own
tables in initrd, and override the ones from firmware. So if we want to parse
SRAT earlier, we also need to do SRAT override earlier.

First, we introduce early_acpi_override_srat() to check if SRAT will be overridden
from initrd.

Second, we introduce find_hotpluggable_memory() to reserve hotpluggable memory,
which will firstly call early_acpi_override_srat() to find out which memory is
hotpluggable in the override SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c        |   10 ++++++
 drivers/acpi/osl.c             |   61 ++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h           |   14 ++++++++-
 include/linux/memory_hotplug.h |    2 +
 mm/memory_hotplug.c            |   25 +++++++++++++++-
 5 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b2ce0dc..c23e6a7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1058,6 +1058,16 @@ void __init setup_arch(char **cmdline_p)
 	/* Initialize ACPI global root table list. */
 	early_acpi_boot_table_init();
 
+#ifdef CONFIG_ACPI_NUMA
+	/*
+	 * Linux kernel cannot migrate kernel pages, as a result, memory used
+	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
+	 * memory in memblock to prevent memblock from allocating hotpluggable
+	 * memory for the kernel.
+	 */
+	find_hotpluggable_memory();
+#endif
+
 	/*
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 0043e9f..dcbca3e 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -48,6 +48,7 @@
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
+#include <asm/setup.h>
 
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
@@ -631,6 +632,66 @@ int __init acpi_verify_table(struct cpio_data *file,
 	return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_override_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Try to get the phys addr of SRAT in initrd.
+ *              The ACPI_INITRD_TABLE_OVERRIDE procedure is able to use tables
+ *              in initrd file to override the ones provided by firmware. This
+ *              function checks if there is a SRAT in initrd at early time. If
+ *              so, return the phys addr of the SRAT.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_override_srat(void)
+{
+	int i;
+	u32 length;
+	long offset;
+	void *ramdisk_vaddr;
+	struct acpi_table_header *table;
+	struct cpio_data file;
+	unsigned long map_step = NR_FIX_BTMAPS << PAGE_SHIFT;
+	phys_addr_t ramdisk_image = get_ramdisk_image();
+	char cpio_path[32] = "kernel/firmware/acpi/";
+
+	if (!ramdisk_image || !get_ramdisk_size())
+		return 0;
+
+	/* Try to find if SRAT is overridden */
+	for (i = 0; i < ACPI_OVERRIDE_TABLES; i++) {
+		ramdisk_vaddr = early_ioremap(ramdisk_image, map_step);
+
+		file = find_cpio_data(cpio_path, ramdisk_vaddr,
+				      map_step, &offset);
+		if (!file.data) {
+			early_iounmap(ramdisk_vaddr, map_step);
+			return 0;
+		}
+
+		table = file.data;
+		length = table->length;
+
+		if (acpi_verify_table(&file, cpio_path, ACPI_SIG_SRAT)) {
+			ramdisk_image += offset;
+			early_iounmap(ramdisk_vaddr, map_step);
+			continue;
+		}
+
+		/* Found SRAT */
+		early_iounmap(ramdisk_vaddr, map_step);
+		ramdisk_image = ramdisk_image + offset - length;
+
+		break;
+	}
+
+	return ramdisk_image;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
 	int no, table_nr = 0, total_offset = 0;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index b722183..d86455a 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -81,11 +81,21 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
 void acpi_initrd_override(void *data, size_t size);
-#else
+
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_override_srat(void);
+#endif	/* CONFIG_ACPI_NUMA */
+
+#else	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 static inline void acpi_initrd_override(void *data, size_t size)
 {
 }
-#endif
+
+static inline phys_addr_t early_acpi_override_srat(void)
+{
+	return 0;
+}
+#endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index dd38e62..463efa9 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -104,6 +104,7 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
+extern void find_hotpluggable_memory(void);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
@@ -181,6 +182,7 @@ static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
 }
 #endif
+
 extern void put_page_bootmem(struct page *page);
 extern void get_page_bootmem(unsigned long ingo, struct page *page,
 			     unsigned long type);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ca1dd3a..2a57888 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -30,6 +30,7 @@
 #include <linux/mm_inline.h>
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
+#include <linux/acpi.h>
 
 #include <asm/tlbflush.h>
 
@@ -62,7 +63,6 @@ void unlock_memory_hotplug(void)
 	mutex_unlock(&mem_hotplug_mutex);
 }
 
-
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
@@ -91,6 +91,29 @@ static void release_memory_resource(struct resource *res)
 	return;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+/**
+ * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
+ *
+ * This function did the following:
+ * 1. Try to find if there is a SRAT in initrd file used to override the one
+ *    provided by firmware. If so, get its phys addr.
+ * 2. If there is no override SRAT, get the phys addr of the SRAT in firmware.
+ * 3. Parse SRAT, find out which memory is hotpluggable.
+ */
+void __init find_hotpluggable_memory(void)
+{
+	phys_addr_t srat_paddr;
+
+	/* Try to find if SRAT is overridden */
+	srat_paddr = early_acpi_override_srat();
+	if (!srat_paddr)
+		return;
+
+	/* Will parse SRAT and find out hotpluggable memory here */
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 void get_page_bootmem(unsigned long info,  struct page *page,
 		      unsigned long type)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 11/18] x86, acpi: Try to find SRAT in firmware earlier.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch introduce early_acpi_firmware_srat() to find the
phys addr of SRAT provided by firmware. And call it in
find_hotpluggable_memory().

Since we have initialized acpi_gbl_root_table_list earlier,
and store all the tables' phys addrs and signatures in it,
it is easy to find the SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbxface.c |   32 ++++++++++++++++++++++++++++++++
 drivers/acpi/osl.c            |   22 ++++++++++++++++++++++
 include/acpi/acpixf.h         |    4 ++++
 include/linux/acpi.h          |    4 ++++
 mm/memory_hotplug.c           |    8 ++++++--
 5 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c
index ad11162..6a92f12 100644
--- a/drivers/acpi/acpica/tbxface.c
+++ b/drivers/acpi/acpica/tbxface.c
@@ -181,6 +181,38 @@ acpi_status acpi_reallocate_root_table(void)
 	return_ACPI_STATUS(status);
 }
 
+/**
+ * acpi_get_table_desc - Get the acpi table descriptor of a specific table.
+ * @signature: The signature of the table to be found.
+ * @out_desc: The out returned descriptor.
+ *
+ * Iterate over acpi_gbl_root_table_list to find a specific table and then
+ * return its phys addr.
+ *
+ * NOTE: The caller has the responsibility to allocate memory for @out_desc.
+ *
+ * Return AE_OK on success, AE_NOT_FOUND if the table is not found.
+ */
+acpi_status acpi_get_table_desc(char *signature,
+				struct acpi_table_desc *out_desc)
+{
+	struct acpi_table_desc *desc;
+	int pos, count = acpi_gbl_root_table_list.current_table_count;
+
+	for (pos = 0; pos < count; pos++) {
+		desc = &acpi_gbl_root_table_list.tables[pos];
+
+		if (!ACPI_COMPARE_NAME(&desc->signature, signature))
+			continue;
+
+		memcpy(out_desc, desc, sizeof(struct acpi_table_desc));
+
+		return_ACPI_STATUS(AE_OK);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
+
 /*******************************************************************************
  *
  * FUNCTION:    acpi_get_table_header
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index dcbca3e..ec490fe 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -53,6 +53,7 @@
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
 #include <acpi/processor.h>
+#include <acpi/acpixf.h>
 
 #define _COMPONENT		ACPI_OS_SERVICES
 ACPI_MODULE_NAME("osl");
@@ -760,6 +761,27 @@ void __init acpi_initrd_override(void *data, size_t size)
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_firmware_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Get the phys addr of SRAT provided by firmware.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_firmware_srat(void)
+{
+	struct acpi_table_desc table_desc;
+
+	if (acpi_get_table_desc(ACPI_SIG_SRAT, &table_desc))
+		return 0;
+
+	return table_desc.address;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 static void acpi_table_taint(struct acpi_table_header *table)
 {
 	pr_warn(PREFIX
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index e9c9b88..e45be94 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -186,6 +186,10 @@ acpi_status acpi_find_root_pointer(acpi_size *rsdp_address);
 acpi_status acpi_unload_table_id(acpi_owner_id id);
 
 acpi_status
+acpi_get_table_desc(char *signature,
+		    struct acpi_table_desc *out_desc);
+
+acpi_status
 acpi_get_table_header(acpi_string signature,
 		      u32 instance, struct acpi_table_header *out_table_header);
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index d86455a..10dfda7 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -97,6 +97,10 @@ static inline phys_addr_t early_acpi_override_srat(void)
 }
 #endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_firmware_srat(void);
+#endif  /* CONFIG_ACPI_NUMA */
+
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2a57888..2dfb06f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -107,8 +107,12 @@ void __init find_hotpluggable_memory(void)
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
-	if (!srat_paddr)
-		return;
+	if (!srat_paddr) {
+		/* Try to find SRAT from firmware if it wasn't overridden */
+		srat_paddr = early_acpi_firmware_srat();
+		if (!srat_paddr)
+			return;
+	}
 
 	/* Will parse SRAT and find out hotpluggable memory here */
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 11/18] x86, acpi: Try to find SRAT in firmware earlier.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch introduce early_acpi_firmware_srat() to find the
phys addr of SRAT provided by firmware. And call it in
find_hotpluggable_memory().

Since we have initialized acpi_gbl_root_table_list earlier,
and store all the tables' phys addrs and signatures in it,
it is easy to find the SRAT.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/acpica/tbxface.c |   32 ++++++++++++++++++++++++++++++++
 drivers/acpi/osl.c            |   22 ++++++++++++++++++++++
 include/acpi/acpixf.h         |    4 ++++
 include/linux/acpi.h          |    4 ++++
 mm/memory_hotplug.c           |    8 ++++++--
 5 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c
index ad11162..6a92f12 100644
--- a/drivers/acpi/acpica/tbxface.c
+++ b/drivers/acpi/acpica/tbxface.c
@@ -181,6 +181,38 @@ acpi_status acpi_reallocate_root_table(void)
 	return_ACPI_STATUS(status);
 }
 
+/**
+ * acpi_get_table_desc - Get the acpi table descriptor of a specific table.
+ * @signature: The signature of the table to be found.
+ * @out_desc: The out returned descriptor.
+ *
+ * Iterate over acpi_gbl_root_table_list to find a specific table and then
+ * return its phys addr.
+ *
+ * NOTE: The caller has the responsibility to allocate memory for @out_desc.
+ *
+ * Return AE_OK on success, AE_NOT_FOUND if the table is not found.
+ */
+acpi_status acpi_get_table_desc(char *signature,
+				struct acpi_table_desc *out_desc)
+{
+	struct acpi_table_desc *desc;
+	int pos, count = acpi_gbl_root_table_list.current_table_count;
+
+	for (pos = 0; pos < count; pos++) {
+		desc = &acpi_gbl_root_table_list.tables[pos];
+
+		if (!ACPI_COMPARE_NAME(&desc->signature, signature))
+			continue;
+
+		memcpy(out_desc, desc, sizeof(struct acpi_table_desc));
+
+		return_ACPI_STATUS(AE_OK);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
+
 /*******************************************************************************
  *
  * FUNCTION:    acpi_get_table_header
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index dcbca3e..ec490fe 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -53,6 +53,7 @@
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
 #include <acpi/processor.h>
+#include <acpi/acpixf.h>
 
 #define _COMPONENT		ACPI_OS_SERVICES
 ACPI_MODULE_NAME("osl");
@@ -760,6 +761,27 @@ void __init acpi_initrd_override(void *data, size_t size)
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+/*******************************************************************************
+ *
+ * FUNCTION:    early_acpi_firmware_srat
+ *
+ * RETURN:      Phys addr of SRAT on success, 0 on error.
+ *
+ * DESCRIPTION: Get the phys addr of SRAT provided by firmware.
+ *
+ ******************************************************************************/
+phys_addr_t __init early_acpi_firmware_srat(void)
+{
+	struct acpi_table_desc table_desc;
+
+	if (acpi_get_table_desc(ACPI_SIG_SRAT, &table_desc))
+		return 0;
+
+	return table_desc.address;
+}
+#endif	/* CONFIG_ACPI_NUMA */
+
 static void acpi_table_taint(struct acpi_table_header *table)
 {
 	pr_warn(PREFIX
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index e9c9b88..e45be94 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -186,6 +186,10 @@ acpi_status acpi_find_root_pointer(acpi_size *rsdp_address);
 acpi_status acpi_unload_table_id(acpi_owner_id id);
 
 acpi_status
+acpi_get_table_desc(char *signature,
+		    struct acpi_table_desc *out_desc);
+
+acpi_status
 acpi_get_table_header(acpi_string signature,
 		      u32 instance, struct acpi_table_header *out_table_header);
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index d86455a..10dfda7 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -97,6 +97,10 @@ static inline phys_addr_t early_acpi_override_srat(void)
 }
 #endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
+#ifdef CONFIG_ACPI_NUMA
+phys_addr_t early_acpi_firmware_srat(void);
+#endif  /* CONFIG_ACPI_NUMA */
+
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2a57888..2dfb06f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -107,8 +107,12 @@ void __init find_hotpluggable_memory(void)
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
-	if (!srat_paddr)
-		return;
+	if (!srat_paddr) {
+		/* Try to find SRAT from firmware if it wasn't overridden */
+		srat_paddr = early_acpi_firmware_srat();
+		if (!srat_paddr)
+			return;
+	}
 
 	/* Will parse SRAT and find out hotpluggable memory here */
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 12/18] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In ACPI SRAT(System Resource Affinity Table), there is a memory affinity for each
memory range in the system. In each memory affinity, there is a field indicating
that if the memory range is hotpluggable.

This patch parses all the memory affinities in SRAT only, and find out all the
hotpluggable memory ranges in the system.

This patch doesn't mark hotpluggable memory in memblock. Memory marked as hotplug
won't be allocated to the kernel. If all the memory in the system is hotpluggable,
then the system won't have enough memory to boot. The basic idea to solve this
problem is making the nodes the kerenl resides in unhotpluggable. So, before we do
this, we don't mark any hotpluggable memory in memory so that to keep memblock
working as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c   |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |    2 +
 mm/memory_hotplug.c  |   22 ++++++++++++-
 3 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index ec490fe..d01202d 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -780,6 +780,91 @@ phys_addr_t __init early_acpi_firmware_srat(void)
 
 	return table_desc.address;
 }
+
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_hotplug_mem_affinity
+ *
+ * PARAMETERS:  Srat_vaddr         - Virt addr of SRAT
+ *              Base               - The base address of the found hotpluggable
+ *                                   memory region
+ *              Size               - The size of the found hotpluggable memory
+ *                                   region
+ *              Offset             - Offset of the found memory affinity
+ *
+ * RETURN:      Status
+ *
+ * DESCRIPTION: This function iterates SRAT affinities list to find memory
+ *              affinities with hotpluggable memory one by one. Return the
+ *              offset of the found memory affinity through @offset. @offset
+ *              can be used to iterate the SRAT affinities list to find all the
+ *              hotpluggable memory affinities. If @offset is 0, it is the first
+ *              time of the iteration.
+ *
+ ******************************************************************************/
+acpi_status __init
+acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base, u64 *size,
+			  unsigned long *offset)
+{
+	struct acpi_table_header *table_header;
+	struct acpi_subtable_header *entry;
+	struct acpi_srat_mem_affinity *ma;
+	unsigned long table_end, curr;
+
+	if (!offset)
+		return_ACPI_STATUS(AE_BAD_PARAMETER);
+
+	table_header = (struct acpi_table_header *)srat_vaddr;
+	table_end = (unsigned long)table_header + table_header->length;
+
+	entry = (struct acpi_subtable_header *)
+		((unsigned long)table_header + *offset);
+
+	if (*offset) {
+		/*
+		 * @offset is the offset of the last affinity found in the
+		 * last call. So need to move to the next affinity.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	} else {
+		/*
+		 * Offset of the first affinity is the size of SRAT
+		 * table header.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + sizeof(struct acpi_table_srat));
+	}
+
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) <
+	       table_end) {
+		if (entry->length == 0)
+			break;
+
+		if (entry->type != ACPI_SRAT_TYPE_MEMORY_AFFINITY)
+			goto next;
+
+		ma = (struct acpi_srat_mem_affinity *)entry;
+
+		if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE))
+			goto next;
+
+		if (base)
+			*base = ma->base_address;
+
+		if (size)
+			*size = ma->length;
+
+		*offset = (unsigned long)entry - (unsigned long)srat_vaddr;
+		return_ACPI_STATUS(AE_OK);
+
+next:
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
 #endif	/* CONFIG_ACPI_NUMA */
 
 static void acpi_table_taint(struct acpi_table_header *table)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 10dfda7..590559a 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -99,6 +99,8 @@ static inline phys_addr_t early_acpi_override_srat(void)
 
 #ifdef CONFIG_ACPI_NUMA
 phys_addr_t early_acpi_firmware_srat(void);
+acpi_status acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base,
+				      u64 *size, unsigned long *offset);
 #endif  /* CONFIG_ACPI_NUMA */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2dfb06f..ef9ccf8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -103,7 +103,11 @@ static void release_memory_resource(struct resource *res)
  */
 void __init find_hotpluggable_memory(void)
 {
-	phys_addr_t srat_paddr;
+	void *srat_vaddr;
+	phys_addr_t srat_paddr, base, size;
+	u32 length;
+	struct acpi_table_header *srat_header;
+	unsigned long offset = 0;
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
@@ -114,7 +118,21 @@ void __init find_hotpluggable_memory(void)
 			return;
 	}
 
-	/* Will parse SRAT and find out hotpluggable memory here */
+	/* Get the length of SRAT */
+	srat_header = early_ioremap(srat_paddr,
+				    sizeof(struct acpi_table_header));
+	length = srat_header->length;
+	early_iounmap(srat_header, sizeof(struct acpi_table_header));
+
+	/* Find all the hotpluggable memory regions */
+	srat_vaddr = early_ioremap(srat_paddr, length);
+
+	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
+						      &size, &offset))) {
+		/* Will mark hotpluggable memory regions here */
+	}
+
+	early_iounmap(srat_vaddr, length);
 }
 #endif	/* CONFIG_ACPI_NUMA */
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 12/18] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In ACPI SRAT(System Resource Affinity Table), there is a memory affinity for each
memory range in the system. In each memory affinity, there is a field indicating
that if the memory range is hotpluggable.

This patch parses all the memory affinities in SRAT only, and find out all the
hotpluggable memory ranges in the system.

This patch doesn't mark hotpluggable memory in memblock. Memory marked as hotplug
won't be allocated to the kernel. If all the memory in the system is hotpluggable,
then the system won't have enough memory to boot. The basic idea to solve this
problem is making the nodes the kerenl resides in unhotpluggable. So, before we do
this, we don't mark any hotpluggable memory in memory so that to keep memblock
working as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c   |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |    2 +
 mm/memory_hotplug.c  |   22 ++++++++++++-
 3 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index ec490fe..d01202d 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -780,6 +780,91 @@ phys_addr_t __init early_acpi_firmware_srat(void)
 
 	return table_desc.address;
 }
+
+/*******************************************************************************
+ *
+ * FUNCTION:    acpi_hotplug_mem_affinity
+ *
+ * PARAMETERS:  Srat_vaddr         - Virt addr of SRAT
+ *              Base               - The base address of the found hotpluggable
+ *                                   memory region
+ *              Size               - The size of the found hotpluggable memory
+ *                                   region
+ *              Offset             - Offset of the found memory affinity
+ *
+ * RETURN:      Status
+ *
+ * DESCRIPTION: This function iterates SRAT affinities list to find memory
+ *              affinities with hotpluggable memory one by one. Return the
+ *              offset of the found memory affinity through @offset. @offset
+ *              can be used to iterate the SRAT affinities list to find all the
+ *              hotpluggable memory affinities. If @offset is 0, it is the first
+ *              time of the iteration.
+ *
+ ******************************************************************************/
+acpi_status __init
+acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base, u64 *size,
+			  unsigned long *offset)
+{
+	struct acpi_table_header *table_header;
+	struct acpi_subtable_header *entry;
+	struct acpi_srat_mem_affinity *ma;
+	unsigned long table_end, curr;
+
+	if (!offset)
+		return_ACPI_STATUS(AE_BAD_PARAMETER);
+
+	table_header = (struct acpi_table_header *)srat_vaddr;
+	table_end = (unsigned long)table_header + table_header->length;
+
+	entry = (struct acpi_subtable_header *)
+		((unsigned long)table_header + *offset);
+
+	if (*offset) {
+		/*
+		 * @offset is the offset of the last affinity found in the
+		 * last call. So need to move to the next affinity.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	} else {
+		/*
+		 * Offset of the first affinity is the size of SRAT
+		 * table header.
+		 */
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + sizeof(struct acpi_table_srat));
+	}
+
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) <
+	       table_end) {
+		if (entry->length == 0)
+			break;
+
+		if (entry->type != ACPI_SRAT_TYPE_MEMORY_AFFINITY)
+			goto next;
+
+		ma = (struct acpi_srat_mem_affinity *)entry;
+
+		if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE))
+			goto next;
+
+		if (base)
+			*base = ma->base_address;
+
+		if (size)
+			*size = ma->length;
+
+		*offset = (unsigned long)entry - (unsigned long)srat_vaddr;
+		return_ACPI_STATUS(AE_OK);
+
+next:
+		entry = (struct acpi_subtable_header *)
+			((unsigned long)entry + entry->length);
+	}
+
+	return_ACPI_STATUS(AE_NOT_FOUND);
+}
 #endif	/* CONFIG_ACPI_NUMA */
 
 static void acpi_table_taint(struct acpi_table_header *table)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 10dfda7..590559a 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -99,6 +99,8 @@ static inline phys_addr_t early_acpi_override_srat(void)
 
 #ifdef CONFIG_ACPI_NUMA
 phys_addr_t early_acpi_firmware_srat(void);
+acpi_status acpi_hotplug_mem_affinity(void *srat_vaddr, u64 *base,
+				      u64 *size, unsigned long *offset);
 #endif  /* CONFIG_ACPI_NUMA */
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2dfb06f..ef9ccf8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -103,7 +103,11 @@ static void release_memory_resource(struct resource *res)
  */
 void __init find_hotpluggable_memory(void)
 {
-	phys_addr_t srat_paddr;
+	void *srat_vaddr;
+	phys_addr_t srat_paddr, base, size;
+	u32 length;
+	struct acpi_table_header *srat_header;
+	unsigned long offset = 0;
 
 	/* Try to find if SRAT is overridden */
 	srat_paddr = early_acpi_override_srat();
@@ -114,7 +118,21 @@ void __init find_hotpluggable_memory(void)
 			return;
 	}
 
-	/* Will parse SRAT and find out hotpluggable memory here */
+	/* Get the length of SRAT */
+	srat_header = early_ioremap(srat_paddr,
+				    sizeof(struct acpi_table_header));
+	length = srat_header->length;
+	early_iounmap(srat_header, sizeof(struct acpi_table_header));
+
+	/* Find all the hotpluggable memory regions */
+	srat_vaddr = early_ioremap(srat_paddr, length);
+
+	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
+						      &size, &offset))) {
+		/* Will mark hotpluggable memory regions here */
+	}
+
+	early_iounmap(srat_vaddr, length);
 }
 #endif	/* CONFIG_ACPI_NUMA */
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

At early time, memblock will reserve some memory for the kernel,
such as the kernel code and data segments, initrd file, and so on,
which means the kernel resides in these memory regions.

Even if these memory regions are hotpluggable, we should not
mark them as hotpluggable. Otherwise the kernel won't have enough
memory to boot.

This patch finds out which memory regions the kernel resides in,
and skip them when finding all hotpluggable memory regions.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ef9ccf8..10a30ef 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -31,6 +31,7 @@
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
 #include <linux/acpi.h>
+#include <linux/memblock.h>
 
 #include <asm/tlbflush.h>
 
@@ -93,6 +94,40 @@ static void release_memory_resource(struct resource *res)
 
 #ifdef CONFIG_ACPI_NUMA
 /**
+ * kernel_resides_in_range - Check if kernel resides in a memory region.
+ * @base: The base address of the memory region.
+ * @length: The length of the memory region.
+ *
+ * This function is used at early time. It iterates memblock.reserved and check
+ * if the kernel has used any memory in [@base, @base + @length).
+ *
+ * Return true if the kernel resides in the memory region, false otherwise.
+ */
+static bool __init kernel_resides_in_region(phys_addr_t base, u64 length)
+{
+	int i;
+	phys_addr_t start, end;
+	struct memblock_region *region;
+	struct memblock_type *reserved = &memblock.reserved;
+
+	for (i = 0; i < reserved->cnt; i++) {
+		region = &reserved->regions[i];
+
+		if (region->flags != MEMBLOCK_HOTPLUG)
+			continue;
+
+		start = region->base;
+		end = region->base + region->size;
+		if (end <= base || start >= base + length)
+			continue;
+
+		return true;
+	}
+
+	return false;
+}
+
+/**
  * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
  *
  * This function did the following:
@@ -129,6 +164,16 @@ void __init find_hotpluggable_memory(void)
 
 	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
 						      &size, &offset))) {
+		/*
+		 * At early time, memblock will reserve some memory for the
+		 * kernel, such as the kernel code and data segments, initrd
+		 * file, and so on,which means the kernel resides in these
+		 * memory regions. These regions should not be hotpluggable.
+		 * So do not mark them as hotpluggable.
+		 */
+		if (kernel_resides_in_region(base, size))
+			continue;
+
 		/* Will mark hotpluggable memory regions here */
 	}
 
-- 
1.7.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

At early time, memblock will reserve some memory for the kernel,
such as the kernel code and data segments, initrd file, and so on,
which means the kernel resides in these memory regions.

Even if these memory regions are hotpluggable, we should not
mark them as hotpluggable. Otherwise the kernel won't have enough
memory to boot.

This patch finds out which memory regions the kernel resides in,
and skip them when finding all hotpluggable memory regions.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ef9ccf8..10a30ef 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -31,6 +31,7 @@
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
 #include <linux/acpi.h>
+#include <linux/memblock.h>
 
 #include <asm/tlbflush.h>
 
@@ -93,6 +94,40 @@ static void release_memory_resource(struct resource *res)
 
 #ifdef CONFIG_ACPI_NUMA
 /**
+ * kernel_resides_in_range - Check if kernel resides in a memory region.
+ * @base: The base address of the memory region.
+ * @length: The length of the memory region.
+ *
+ * This function is used at early time. It iterates memblock.reserved and check
+ * if the kernel has used any memory in [@base, @base + @length).
+ *
+ * Return true if the kernel resides in the memory region, false otherwise.
+ */
+static bool __init kernel_resides_in_region(phys_addr_t base, u64 length)
+{
+	int i;
+	phys_addr_t start, end;
+	struct memblock_region *region;
+	struct memblock_type *reserved = &memblock.reserved;
+
+	for (i = 0; i < reserved->cnt; i++) {
+		region = &reserved->regions[i];
+
+		if (region->flags != MEMBLOCK_HOTPLUG)
+			continue;
+
+		start = region->base;
+		end = region->base + region->size;
+		if (end <= base || start >= base + length)
+			continue;
+
+		return true;
+	}
+
+	return false;
+}
+
+/**
  * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
  *
  * This function did the following:
@@ -129,6 +164,16 @@ void __init find_hotpluggable_memory(void)
 
 	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
 						      &size, &offset))) {
+		/*
+		 * At early time, memblock will reserve some memory for the
+		 * kernel, such as the kernel code and data segments, initrd
+		 * file, and so on,which means the kernel resides in these
+		 * memory regions. These regions should not be hotpluggable.
+		 * So do not mark them as hotpluggable.
+		 */
+		if (kernel_resides_in_region(base, size))
+			continue;
+
 		/* Will mark hotpluggable memory regions here */
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 14/18] memblock, numa: Introduce flag into memblock.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

There is no flag in memblock to describe what type the memory is.
Sometimes, we may use memblock to reserve some memory for special usage.
And we want to know what kind of memory it is. So we need a way to
differentiate memory for different usage.

In hotplug environment, we want to reserve hotpluggable memory so the
kernel won't be able to use it. And when the system is up, we have to
free these hotpluggable memory to buddy. So we need to mark these memory
first.

In order to do so, we need to mark out these special memory in memblock.
In this patch, we introduce a new "flags" member into memblock_region:
   struct memblock_region {
           phys_addr_t base;
           phys_addr_t size;
           unsigned long flags;		/* This is new. */
   #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
           int nid;
   #endif
   };

This patch does the following things:
1) Add "flags" member to memblock_region.
2) Modify the following APIs' prototype:
	memblock_add_region()
	memblock_insert_region()
3) Add memblock_reserve_region() to support reserve memory with flags, and keep
   memblock_reserve()'s prototype unmodified.
4) Modify other APIs to support flags, but keep their prototype unmodified.

The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>.

v1 -> v2:
As tj suggested, a zero flag MEMBLK_DEFAULT will make users confused. If
we want to specify any other flag, such MEMBLK_HOTPLUG, users don't know
to use MEMBLK_DEFAULT | MEMBLK_HOTPLUG or just MEMBLK_HOTPLUG. So remove
MEMBLK_DEFAULT (which is 0), and just use 0 by default to avoid confusions
to users.

Suggested-by: Wen Congyang <wency@cn.fujitsu.com>
Suggested-by: Liu Jiang <jiang.liu@huawei.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   53 +++++++++++++++++++++++++++++++++-------------
 2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f388203..e89e0cd 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -22,6 +22,7 @@
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
+	unsigned long flags;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	int nid;
 #endif
diff --git a/mm/memblock.c b/mm/memblock.c
index a847bfe..0841a6e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -157,6 +157,7 @@ static void __init_memblock memblock_remove_region(struct memblock_type *type, u
 		type->cnt = 1;
 		type->regions[0].base = 0;
 		type->regions[0].size = 0;
+		type->regions[0].flags = 0;
 		memblock_set_region_node(&type->regions[0], MAX_NUMNODES);
 	}
 }
@@ -307,7 +308,8 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
 
 		if (this->base + this->size != next->base ||
 		    memblock_get_region_node(this) !=
-		    memblock_get_region_node(next)) {
+		    memblock_get_region_node(next) ||
+		    this->flags != next->flags) {
 			BUG_ON(this->base + this->size > next->base);
 			i++;
 			continue;
@@ -327,13 +329,15 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
  * @base:	base address of the new region
  * @size:	size of the new region
  * @nid:	node id of the new region
+ * @flags:	flags of the new region
  *
  * Insert new memblock region [@base,@base+@size) into @type at @idx.
  * @type must already have extra room to accomodate the new region.
  */
 static void __init_memblock memblock_insert_region(struct memblock_type *type,
 						   int idx, phys_addr_t base,
-						   phys_addr_t size, int nid)
+						   phys_addr_t size,
+						   int nid, unsigned long flags)
 {
 	struct memblock_region *rgn = &type->regions[idx];
 
@@ -341,6 +345,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
 	memmove(rgn + 1, rgn, (type->cnt - idx) * sizeof(*rgn));
 	rgn->base = base;
 	rgn->size = size;
+	rgn->flags = flags;
 	memblock_set_region_node(rgn, nid);
 	type->cnt++;
 	type->total_size += size;
@@ -352,6 +357,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  * @base: base address of the new region
  * @size: size of the new region
  * @nid: nid of the new region
+ * @flags: flags of the new region
  *
  * Add new memblock region [@base,@base+@size) into @type.  The new region
  * is allowed to overlap with existing ones - overlaps don't affect already
@@ -362,7 +368,8 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  * 0 on success, -errno on failure.
  */
 static int __init_memblock memblock_add_region(struct memblock_type *type,
-				phys_addr_t base, phys_addr_t size, int nid)
+				phys_addr_t base, phys_addr_t size,
+				int nid, unsigned long flags)
 {
 	bool insert = false;
 	phys_addr_t obase = base;
@@ -377,6 +384,7 @@ static int __init_memblock memblock_add_region(struct memblock_type *type,
 		WARN_ON(type->cnt != 1 || type->total_size);
 		type->regions[0].base = base;
 		type->regions[0].size = size;
+		type->regions[0].flags = flags;
 		memblock_set_region_node(&type->regions[0], nid);
 		type->total_size = size;
 		return 0;
@@ -407,7 +415,8 @@ repeat:
 			nr_new++;
 			if (insert)
 				memblock_insert_region(type, i++, base,
-						       rbase - base, nid);
+						       rbase - base, nid,
+						       flags);
 		}
 		/* area below @rend is dealt with, forget about it */
 		base = min(rend, end);
@@ -417,7 +426,8 @@ repeat:
 	if (base < end) {
 		nr_new++;
 		if (insert)
-			memblock_insert_region(type, i, base, end - base, nid);
+			memblock_insert_region(type, i, base, end - base,
+					       nid, flags);
 	}
 
 	/*
@@ -439,12 +449,13 @@ repeat:
 int __init_memblock memblock_add_node(phys_addr_t base, phys_addr_t size,
 				       int nid)
 {
-	return memblock_add_region(&memblock.memory, base, size, nid);
+	return memblock_add_region(&memblock.memory, base, size, nid, 0);
 }
 
 int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
 {
-	return memblock_add_region(&memblock.memory, base, size, MAX_NUMNODES);
+	return memblock_add_region(&memblock.memory, base, size,
+				   MAX_NUMNODES, 0);
 }
 
 /**
@@ -499,7 +510,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			rgn->size -= base - rbase;
 			type->total_size -= base - rbase;
 			memblock_insert_region(type, i, rbase, base - rbase,
-					       memblock_get_region_node(rgn));
+					       memblock_get_region_node(rgn),
+					       rgn->flags);
 		} else if (rend > end) {
 			/*
 			 * @rgn intersects from above.  Split and redo the
@@ -509,7 +521,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			rgn->size -= end - rbase;
 			type->total_size -= end - rbase;
 			memblock_insert_region(type, i--, rbase, end - rbase,
-					       memblock_get_region_node(rgn));
+					       memblock_get_region_node(rgn),
+					       rgn->flags);
 		} else {
 			/* @rgn is fully contained, record it */
 			if (!*end_rgn)
@@ -551,16 +564,24 @@ int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
 	return __memblock_remove(&memblock.reserved, base, size);
 }
 
-int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+static int __init_memblock memblock_reserve_region(phys_addr_t base,
+						   phys_addr_t size,
+						   int nid,
+						   unsigned long flags)
 {
 	struct memblock_type *_rgn = &memblock.reserved;
 
-	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] %pF\n",
+	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] flags %#02lx %pF\n",
 		     (unsigned long long)base,
 		     (unsigned long long)base + size,
-		     (void *)_RET_IP_);
+		     flags, (void *)_RET_IP_);
+
+	return memblock_add_region(_rgn, base, size, nid, flags);
+}
 
-	return memblock_add_region(_rgn, base, size, MAX_NUMNODES);
+int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+{
+	return memblock_reserve_region(base, size, MAX_NUMNODES, 0);
 }
 
 /**
@@ -985,6 +1006,7 @@ void __init_memblock memblock_set_current_limit(phys_addr_t limit)
 static void __init_memblock memblock_dump(struct memblock_type *type, char *name)
 {
 	unsigned long long base, size;
+	unsigned long flags;
 	int i;
 
 	pr_info(" %s.cnt  = 0x%lx\n", name, type->cnt);
@@ -995,13 +1017,14 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name
 
 		base = rgn->base;
 		size = rgn->size;
+		flags = rgn->flags;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 		if (memblock_get_region_node(rgn) != MAX_NUMNODES)
 			snprintf(nid_buf, sizeof(nid_buf), " on node %d",
 				 memblock_get_region_node(rgn));
 #endif
-		pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s\n",
-			name, i, base, base + size - 1, size, nid_buf);
+		pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s flags: %#lx\n",
+			name, i, base, base + size - 1, size, nid_buf, flags);
 	}
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 14/18] memblock, numa: Introduce flag into memblock.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

There is no flag in memblock to describe what type the memory is.
Sometimes, we may use memblock to reserve some memory for special usage.
And we want to know what kind of memory it is. So we need a way to
differentiate memory for different usage.

In hotplug environment, we want to reserve hotpluggable memory so the
kernel won't be able to use it. And when the system is up, we have to
free these hotpluggable memory to buddy. So we need to mark these memory
first.

In order to do so, we need to mark out these special memory in memblock.
In this patch, we introduce a new "flags" member into memblock_region:
   struct memblock_region {
           phys_addr_t base;
           phys_addr_t size;
           unsigned long flags;		/* This is new. */
   #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
           int nid;
   #endif
   };

This patch does the following things:
1) Add "flags" member to memblock_region.
2) Modify the following APIs' prototype:
	memblock_add_region()
	memblock_insert_region()
3) Add memblock_reserve_region() to support reserve memory with flags, and keep
   memblock_reserve()'s prototype unmodified.
4) Modify other APIs to support flags, but keep their prototype unmodified.

The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>.

v1 -> v2:
As tj suggested, a zero flag MEMBLK_DEFAULT will make users confused. If
we want to specify any other flag, such MEMBLK_HOTPLUG, users don't know
to use MEMBLK_DEFAULT | MEMBLK_HOTPLUG or just MEMBLK_HOTPLUG. So remove
MEMBLK_DEFAULT (which is 0), and just use 0 by default to avoid confusions
to users.

Suggested-by: Wen Congyang <wency@cn.fujitsu.com>
Suggested-by: Liu Jiang <jiang.liu@huawei.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   53 +++++++++++++++++++++++++++++++++-------------
 2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f388203..e89e0cd 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -22,6 +22,7 @@
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
+	unsigned long flags;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	int nid;
 #endif
diff --git a/mm/memblock.c b/mm/memblock.c
index a847bfe..0841a6e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -157,6 +157,7 @@ static void __init_memblock memblock_remove_region(struct memblock_type *type, u
 		type->cnt = 1;
 		type->regions[0].base = 0;
 		type->regions[0].size = 0;
+		type->regions[0].flags = 0;
 		memblock_set_region_node(&type->regions[0], MAX_NUMNODES);
 	}
 }
@@ -307,7 +308,8 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
 
 		if (this->base + this->size != next->base ||
 		    memblock_get_region_node(this) !=
-		    memblock_get_region_node(next)) {
+		    memblock_get_region_node(next) ||
+		    this->flags != next->flags) {
 			BUG_ON(this->base + this->size > next->base);
 			i++;
 			continue;
@@ -327,13 +329,15 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
  * @base:	base address of the new region
  * @size:	size of the new region
  * @nid:	node id of the new region
+ * @flags:	flags of the new region
  *
  * Insert new memblock region [@base,@base+@size) into @type at @idx.
  * @type must already have extra room to accomodate the new region.
  */
 static void __init_memblock memblock_insert_region(struct memblock_type *type,
 						   int idx, phys_addr_t base,
-						   phys_addr_t size, int nid)
+						   phys_addr_t size,
+						   int nid, unsigned long flags)
 {
 	struct memblock_region *rgn = &type->regions[idx];
 
@@ -341,6 +345,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
 	memmove(rgn + 1, rgn, (type->cnt - idx) * sizeof(*rgn));
 	rgn->base = base;
 	rgn->size = size;
+	rgn->flags = flags;
 	memblock_set_region_node(rgn, nid);
 	type->cnt++;
 	type->total_size += size;
@@ -352,6 +357,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  * @base: base address of the new region
  * @size: size of the new region
  * @nid: nid of the new region
+ * @flags: flags of the new region
  *
  * Add new memblock region [@base,@base+@size) into @type.  The new region
  * is allowed to overlap with existing ones - overlaps don't affect already
@@ -362,7 +368,8 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  * 0 on success, -errno on failure.
  */
 static int __init_memblock memblock_add_region(struct memblock_type *type,
-				phys_addr_t base, phys_addr_t size, int nid)
+				phys_addr_t base, phys_addr_t size,
+				int nid, unsigned long flags)
 {
 	bool insert = false;
 	phys_addr_t obase = base;
@@ -377,6 +384,7 @@ static int __init_memblock memblock_add_region(struct memblock_type *type,
 		WARN_ON(type->cnt != 1 || type->total_size);
 		type->regions[0].base = base;
 		type->regions[0].size = size;
+		type->regions[0].flags = flags;
 		memblock_set_region_node(&type->regions[0], nid);
 		type->total_size = size;
 		return 0;
@@ -407,7 +415,8 @@ repeat:
 			nr_new++;
 			if (insert)
 				memblock_insert_region(type, i++, base,
-						       rbase - base, nid);
+						       rbase - base, nid,
+						       flags);
 		}
 		/* area below @rend is dealt with, forget about it */
 		base = min(rend, end);
@@ -417,7 +426,8 @@ repeat:
 	if (base < end) {
 		nr_new++;
 		if (insert)
-			memblock_insert_region(type, i, base, end - base, nid);
+			memblock_insert_region(type, i, base, end - base,
+					       nid, flags);
 	}
 
 	/*
@@ -439,12 +449,13 @@ repeat:
 int __init_memblock memblock_add_node(phys_addr_t base, phys_addr_t size,
 				       int nid)
 {
-	return memblock_add_region(&memblock.memory, base, size, nid);
+	return memblock_add_region(&memblock.memory, base, size, nid, 0);
 }
 
 int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
 {
-	return memblock_add_region(&memblock.memory, base, size, MAX_NUMNODES);
+	return memblock_add_region(&memblock.memory, base, size,
+				   MAX_NUMNODES, 0);
 }
 
 /**
@@ -499,7 +510,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			rgn->size -= base - rbase;
 			type->total_size -= base - rbase;
 			memblock_insert_region(type, i, rbase, base - rbase,
-					       memblock_get_region_node(rgn));
+					       memblock_get_region_node(rgn),
+					       rgn->flags);
 		} else if (rend > end) {
 			/*
 			 * @rgn intersects from above.  Split and redo the
@@ -509,7 +521,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			rgn->size -= end - rbase;
 			type->total_size -= end - rbase;
 			memblock_insert_region(type, i--, rbase, end - rbase,
-					       memblock_get_region_node(rgn));
+					       memblock_get_region_node(rgn),
+					       rgn->flags);
 		} else {
 			/* @rgn is fully contained, record it */
 			if (!*end_rgn)
@@ -551,16 +564,24 @@ int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
 	return __memblock_remove(&memblock.reserved, base, size);
 }
 
-int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+static int __init_memblock memblock_reserve_region(phys_addr_t base,
+						   phys_addr_t size,
+						   int nid,
+						   unsigned long flags)
 {
 	struct memblock_type *_rgn = &memblock.reserved;
 
-	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] %pF\n",
+	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] flags %#02lx %pF\n",
 		     (unsigned long long)base,
 		     (unsigned long long)base + size,
-		     (void *)_RET_IP_);
+		     flags, (void *)_RET_IP_);
+
+	return memblock_add_region(_rgn, base, size, nid, flags);
+}
 
-	return memblock_add_region(_rgn, base, size, MAX_NUMNODES);
+int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+{
+	return memblock_reserve_region(base, size, MAX_NUMNODES, 0);
 }
 
 /**
@@ -985,6 +1006,7 @@ void __init_memblock memblock_set_current_limit(phys_addr_t limit)
 static void __init_memblock memblock_dump(struct memblock_type *type, char *name)
 {
 	unsigned long long base, size;
+	unsigned long flags;
 	int i;
 
 	pr_info(" %s.cnt  = 0x%lx\n", name, type->cnt);
@@ -995,13 +1017,14 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name
 
 		base = rgn->base;
 		size = rgn->size;
+		flags = rgn->flags;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 		if (memblock_get_region_node(rgn) != MAX_NUMNODES)
 			snprintf(nid_buf, sizeof(nid_buf), " on node %d",
 				 memblock_get_region_node(rgn));
 #endif
-		pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s\n",
-			name, i, base, base + size - 1, size, nid_buf);
+		pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s flags: %#lx\n",
+			name, i, base, base + size - 1, size, nid_buf, flags);
 	}
 }
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 15/18] memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In find_hotpluggable_memory, once we find out a memory region which is
hotpluggable, we want to mark them in memblock.memory. So that we could
control memblock allocator not to allocte hotpluggable memory for the kernel
later.

To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the
hotpluggable memory regions in memblock and a function memblock_mark_hotplug()
to mark hotpluggable memory if we find one.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |   11 +++++++++++
 mm/memblock.c            |   26 ++++++++++++++++++++++++++
 mm/memory_hotplug.c      |    3 ++-
 3 files changed, 39 insertions(+), 1 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e89e0cd..c0bd31c 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -19,6 +19,9 @@
 
 #define INIT_MEMBLOCK_REGIONS	128
 
+/* Definition of memblock flags. */
+#define MEMBLOCK_HOTPLUG	0x1	/* hotpluggable region */
+
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
@@ -60,6 +63,8 @@ int memblock_free(phys_addr_t base, phys_addr_t size);
 int memblock_reserve(phys_addr_t base, phys_addr_t size);
 void memblock_trim_memory(phys_addr_t align);
 
+int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
+
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
 			  unsigned long *out_end_pfn, int *out_nid);
@@ -119,6 +124,12 @@ void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
 	     i != (u64)ULLONG_MAX;					\
 	     __next_free_mem_range_rev(&i, nid, p_start, p_end, p_nid))
 
+static inline void memblock_set_region_flags(struct memblock_region *r,
+					     unsigned long flags)
+{
+	r->flags = flags;
+}
+
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 int memblock_set_node(phys_addr_t base, phys_addr_t size, int nid);
 
diff --git a/mm/memblock.c b/mm/memblock.c
index 0841a6e..ecd8568 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -585,6 +585,32 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
 }
 
 /**
+ * memblock_mark_hotplug - Mark hotpluggable memory with flag MEMBLOCK_HOTPLUG.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * This function isolates region [@base, @base + @size), and mark it with flag
+ * MEMBLOCK_HOTPLUG.
+ *
+ * Return 0 on succees, -errno on failure.
+ */
+int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
+{
+	struct memblock_type *type = &memblock.memory;
+	int i, ret, start_rgn, end_rgn;
+
+	ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
+	if (ret)
+		return ret;
+
+	for (i = start_rgn; i < end_rgn; i++)
+		memblock_set_region_flags(&type->regions[i], MEMBLOCK_HOTPLUG);
+
+	memblock_merge_regions(type);
+	return 0;
+}
+
+/**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
  * @nid: node selector, %MAX_NUMNODES for all nodes
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 10a30ef..4c6182d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -174,7 +174,8 @@ void __init find_hotpluggable_memory(void)
 		if (kernel_resides_in_region(base, size))
 			continue;
 
-		/* Will mark hotpluggable memory regions here */
+		/* Mark hotpluggable memory regions in memblock.memory */
+		memblock_mark_hotplug(base, size);
 	}
 
 	early_iounmap(srat_vaddr, length);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 15/18] memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

In find_hotpluggable_memory, once we find out a memory region which is
hotpluggable, we want to mark them in memblock.memory. So that we could
control memblock allocator not to allocte hotpluggable memory for the kernel
later.

To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the
hotpluggable memory regions in memblock and a function memblock_mark_hotplug()
to mark hotpluggable memory if we find one.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |   11 +++++++++++
 mm/memblock.c            |   26 ++++++++++++++++++++++++++
 mm/memory_hotplug.c      |    3 ++-
 3 files changed, 39 insertions(+), 1 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e89e0cd..c0bd31c 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -19,6 +19,9 @@
 
 #define INIT_MEMBLOCK_REGIONS	128
 
+/* Definition of memblock flags. */
+#define MEMBLOCK_HOTPLUG	0x1	/* hotpluggable region */
+
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
@@ -60,6 +63,8 @@ int memblock_free(phys_addr_t base, phys_addr_t size);
 int memblock_reserve(phys_addr_t base, phys_addr_t size);
 void memblock_trim_memory(phys_addr_t align);
 
+int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
+
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
 			  unsigned long *out_end_pfn, int *out_nid);
@@ -119,6 +124,12 @@ void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
 	     i != (u64)ULLONG_MAX;					\
 	     __next_free_mem_range_rev(&i, nid, p_start, p_end, p_nid))
 
+static inline void memblock_set_region_flags(struct memblock_region *r,
+					     unsigned long flags)
+{
+	r->flags = flags;
+}
+
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 int memblock_set_node(phys_addr_t base, phys_addr_t size, int nid);
 
diff --git a/mm/memblock.c b/mm/memblock.c
index 0841a6e..ecd8568 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -585,6 +585,32 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
 }
 
 /**
+ * memblock_mark_hotplug - Mark hotpluggable memory with flag MEMBLOCK_HOTPLUG.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * This function isolates region [@base, @base + @size), and mark it with flag
+ * MEMBLOCK_HOTPLUG.
+ *
+ * Return 0 on succees, -errno on failure.
+ */
+int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
+{
+	struct memblock_type *type = &memblock.memory;
+	int i, ret, start_rgn, end_rgn;
+
+	ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
+	if (ret)
+		return ret;
+
+	for (i = start_rgn; i < end_rgn; i++)
+		memblock_set_region_flags(&type->regions[i], MEMBLOCK_HOTPLUG);
+
+	memblock_merge_regions(type);
+	return 0;
+}
+
+/**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
  * @nid: node selector, %MAX_NUMNODES for all nodes
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 10a30ef..4c6182d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -174,7 +174,8 @@ void __init find_hotpluggable_memory(void)
 		if (kernel_resides_in_region(base, size))
 			continue;
 
-		/* Will mark hotpluggable memory regions here */
+		/* Mark hotpluggable memory regions in memblock.memory */
+		memblock_mark_hotplug(base, size);
 	}
 
 	early_iounmap(srat_vaddr, length);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 16/18] memblock, mem_hotplug: Make memblock skip hotpluggable regions by default.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Linux kernel cannot migrate pages used by the kernel. As a result, hotpluggable
memory used by the kernel won't be able to be hot-removed. To solve this
problem, the basic idea is to prevent memblock from allocating hotpluggable
memory for the kernel at early time, and arrange all hotpluggable memory in
ACPI SRAT(System Resource Affinity Table) as ZONE_MOVABLE when initializing
zones.

In the previous patches, we have marked hotpluggable memory regions with
MEMBLOCK_HOTPLUG flag in memblock.memory.

In this patch, we make memblock skip these hotpluggable memory regions in
the default allocate function.

memblock_find_in_range_node()
  |-->for_each_free_mem_range_reverse()
        |-->__next_free_mem_range_rev()

The above is the only place where __next_free_mem_range_rev() is used. So
skip hotpluggable memory regions when iterating memblock.memory to find
free memory.

In the later patches, a boot option named "movablenode" will be introduced
to enable/disable using SRAT to arrange ZONE_MOVABLE.

NOTE: This check will always be done. It is OK because if users didn't specify
      movablenode option, the hotpluggable memory won't be marked. So this
      check won't skip any memory, which means the kernel will act as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 mm/memblock.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index ecd8568..3ea4301 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -695,6 +695,10 @@ void __init_memblock __next_free_mem_range(u64 *idx, int nid,
  * @out_nid: ptr to int for nid of the range, can be %NULL
  *
  * Reverse of __next_free_mem_range().
+ *
+ * Linux kernel cannot migrate pages used by itself. Memory hotplug users won't
+ * be able to hot-remove hotpluggable memory used by the kernel. So this
+ * function skip hotpluggable regions when allocating memory for the kernel.
  */
 void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
 					   phys_addr_t *out_start,
@@ -719,6 +723,10 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
 		if (nid != MAX_NUMNODES && nid != memblock_get_region_node(m))
 			continue;
 
+		/* skip hotpluggable memory regions */
+		if (m->flags & MEMBLOCK_HOTPLUG)
+			continue;
+
 		/* scan areas before each reservation for intersection */
 		for ( ; ri >= 0; ri--) {
 			struct memblock_region *r = &rsv->regions[ri];
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 16/18] memblock, mem_hotplug: Make memblock skip hotpluggable regions by default.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Linux kernel cannot migrate pages used by the kernel. As a result, hotpluggable
memory used by the kernel won't be able to be hot-removed. To solve this
problem, the basic idea is to prevent memblock from allocating hotpluggable
memory for the kernel at early time, and arrange all hotpluggable memory in
ACPI SRAT(System Resource Affinity Table) as ZONE_MOVABLE when initializing
zones.

In the previous patches, we have marked hotpluggable memory regions with
MEMBLOCK_HOTPLUG flag in memblock.memory.

In this patch, we make memblock skip these hotpluggable memory regions in
the default allocate function.

memblock_find_in_range_node()
  |-->for_each_free_mem_range_reverse()
        |-->__next_free_mem_range_rev()

The above is the only place where __next_free_mem_range_rev() is used. So
skip hotpluggable memory regions when iterating memblock.memory to find
free memory.

In the later patches, a boot option named "movablenode" will be introduced
to enable/disable using SRAT to arrange ZONE_MOVABLE.

NOTE: This check will always be done. It is OK because if users didn't specify
      movablenode option, the hotpluggable memory won't be marked. So this
      check won't skip any memory, which means the kernel will act as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 mm/memblock.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index ecd8568..3ea4301 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -695,6 +695,10 @@ void __init_memblock __next_free_mem_range(u64 *idx, int nid,
  * @out_nid: ptr to int for nid of the range, can be %NULL
  *
  * Reverse of __next_free_mem_range().
+ *
+ * Linux kernel cannot migrate pages used by itself. Memory hotplug users won't
+ * be able to hot-remove hotpluggable memory used by the kernel. So this
+ * function skip hotpluggable regions when allocating memory for the kernel.
  */
 void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
 					   phys_addr_t *out_start,
@@ -719,6 +723,10 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
 		if (nid != MAX_NUMNODES && nid != memblock_get_region_node(m))
 			continue;
 
+		/* skip hotpluggable memory regions */
+		if (m->flags & MEMBLOCK_HOTPLUG)
+			continue;
+
 		/* scan areas before each reservation for intersection */
 		for ( ; ri >= 0; ri--) {
 			struct memblock_region *r = &rsv->regions[ri];
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 17/18] mem-hotplug: Introduce movablenode boot option to {en|dis}able using SRAT.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Hot-Pluggable fired in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel,
it cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movablenode boot option to allow users to
choose to reserve hotpluggable memory and set it as ZONE_MOVABLE or not.

Users can specify "movablenode" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.

Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |   15 +++++++++++++++
 arch/x86/kernel/setup.c             |   10 ++++++++--
 include/linux/memory_hotplug.h      |    3 +++
 mm/memory_hotplug.c                 |   11 +++++++++++
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 15356ac..7349d1f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1718,6 +1718,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablenode		[KNL,X86] This parameter enables/disables the
+			kernel to arrange hotpluggable memory ranges recorded
+			in ACPI SRAT(System Resource Affinity Table) as
+			ZONE_MOVABLE. And these memory can be hot-removed when
+			the system is up.
+			By specifying this option, all the hotpluggable memory
+			will be in ZONE_MOVABLE, which the kernel cannot use.
+			This will cause NUMA performance down. For users who
+			care about NUMA performance, just don't use it.
+			If all the memory ranges in the system are hotpluggable,
+			then the ones used by the kernel at early time, such as
+			kernel code and data segments, initrd file and so on,
+			won't be set as ZONE_MOVABLE, and won't be hotpluggable.
+			Otherwise the kernel won't have enough memory to boot.
+
 	MTD_Partition=	[MTD]
 			Format: <name>,<region-number>,<size>,<offset>
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c23e6a7..08029d4 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1058,14 +1058,20 @@ void __init setup_arch(char **cmdline_p)
 	/* Initialize ACPI global root table list. */
 	early_acpi_boot_table_init();
 
-#ifdef CONFIG_ACPI_NUMA
+#if defined(CONFIG_ACPI_NUMA) && defined(CONFIG_MOVABLE_NODE)
 	/*
 	 * Linux kernel cannot migrate kernel pages, as a result, memory used
 	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
 	 * memory in memblock to prevent memblock from allocating hotpluggable
 	 * memory for the kernel.
+	 *
+	 * If all the memory in a node is hotpluggable, then the kernel won't
+	 * be able to use memory on that node. This will cause NUMA performance
+	 * down. So by default, we don't reserve any hotpluggable memory. Users
+	 * may use "movablenode" boot option to enable this functionality.
 	 */
-	find_hotpluggable_memory();
+	if (movablenode_enable_srat)
+		find_hotpluggable_memory();
 #endif
 
 	/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 463efa9..43eb373 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,9 @@ enum {
 	ONLINE_MOVABLE,
 };
 
+/* Enable/disable SRAT in movablenode boot option */
+extern bool movablenode_enable_srat;
+
 /*
  * pgdat resizing functions
  */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4c6182d..b06d7bc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -93,6 +93,17 @@ static void release_memory_resource(struct resource *res)
 }
 
 #ifdef CONFIG_ACPI_NUMA
+#ifdef CONFIG_MOVABLE_NODE
+bool __initdata movablenode_enable_srat;
+
+static int __init cmdline_parse_movablenode(char *p)
+{
+	movablenode_enable_srat = true;
+	return 0;
+}
+early_param("movablenode", cmdline_parse_movablenode);
+#endif	/* CONFIG_MOVABLE_NODE */
+
 /**
  * kernel_resides_in_range - Check if kernel resides in a memory region.
  * @base: The base address of the memory region.
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 17/18] mem-hotplug: Introduce movablenode boot option to {en|dis}able using SRAT.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Hot-Pluggable fired in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel,
it cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movablenode boot option to allow users to
choose to reserve hotpluggable memory and set it as ZONE_MOVABLE or not.

Users can specify "movablenode" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.

Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |   15 +++++++++++++++
 arch/x86/kernel/setup.c             |   10 ++++++++--
 include/linux/memory_hotplug.h      |    3 +++
 mm/memory_hotplug.c                 |   11 +++++++++++
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 15356ac..7349d1f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1718,6 +1718,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablenode		[KNL,X86] This parameter enables/disables the
+			kernel to arrange hotpluggable memory ranges recorded
+			in ACPI SRAT(System Resource Affinity Table) as
+			ZONE_MOVABLE. And these memory can be hot-removed when
+			the system is up.
+			By specifying this option, all the hotpluggable memory
+			will be in ZONE_MOVABLE, which the kernel cannot use.
+			This will cause NUMA performance down. For users who
+			care about NUMA performance, just don't use it.
+			If all the memory ranges in the system are hotpluggable,
+			then the ones used by the kernel at early time, such as
+			kernel code and data segments, initrd file and so on,
+			won't be set as ZONE_MOVABLE, and won't be hotpluggable.
+			Otherwise the kernel won't have enough memory to boot.
+
 	MTD_Partition=	[MTD]
 			Format: <name>,<region-number>,<size>,<offset>
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c23e6a7..08029d4 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1058,14 +1058,20 @@ void __init setup_arch(char **cmdline_p)
 	/* Initialize ACPI global root table list. */
 	early_acpi_boot_table_init();
 
-#ifdef CONFIG_ACPI_NUMA
+#if defined(CONFIG_ACPI_NUMA) && defined(CONFIG_MOVABLE_NODE)
 	/*
 	 * Linux kernel cannot migrate kernel pages, as a result, memory used
 	 * by the kernel cannot be hot-removed. Find and mark hotpluggable
 	 * memory in memblock to prevent memblock from allocating hotpluggable
 	 * memory for the kernel.
+	 *
+	 * If all the memory in a node is hotpluggable, then the kernel won't
+	 * be able to use memory on that node. This will cause NUMA performance
+	 * down. So by default, we don't reserve any hotpluggable memory. Users
+	 * may use "movablenode" boot option to enable this functionality.
 	 */
-	find_hotpluggable_memory();
+	if (movablenode_enable_srat)
+		find_hotpluggable_memory();
 #endif
 
 	/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 463efa9..43eb373 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,9 @@ enum {
 	ONLINE_MOVABLE,
 };
 
+/* Enable/disable SRAT in movablenode boot option */
+extern bool movablenode_enable_srat;
+
 /*
  * pgdat resizing functions
  */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4c6182d..b06d7bc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -93,6 +93,17 @@ static void release_memory_resource(struct resource *res)
 }
 
 #ifdef CONFIG_ACPI_NUMA
+#ifdef CONFIG_MOVABLE_NODE
+bool __initdata movablenode_enable_srat;
+
+static int __init cmdline_parse_movablenode(char *p)
+{
+	movablenode_enable_srat = true;
+	return 0;
+}
+early_param("movablenode", cmdline_parse_movablenode);
+#endif	/* CONFIG_MOVABLE_NODE */
+
 /**
  * kernel_resides_in_range - Check if kernel resides in a memory region.
  * @base: The base address of the memory region.
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 18/18] x86, numa, acpi, memory-hotplug: Make movablenode have higher priority.
  2013-08-02  9:14 ` Tang Chen
@ 2013-08-02  9:14   ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Arrange hotpluggable memory as ZONE_MOVABLE will cause NUMA performance down
because the kernel cannot use movable memory. For users who don't use memory
hotplug and who don't want to lose their NUMA performance, they need a way to
disable this functionality. So we improved movablecore boot option.

If users specify the original movablecore=nn@ss boot option, the kernel will
arrange [ss, ss+nn) as ZONE_MOVABLE. The kernelcore=nn@ss boot option is similar
except it specifies ZONE_NORMAL ranges.

Now, if users specify "movablenode" in kernel commandline, the kernel will
arrange hotpluggable memory in SRAT as ZONE_MOVABLE. And if users do this, all
the other movablecore=nn@ss and kernelcore=nn@ss options should be ignored.

For those who don't want this, just specify nothing. The kernel will act as
before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |    5 +++++
 mm/page_alloc.c          |   31 ++++++++++++++++++++++++++++---
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index c0bd31c..e78e32f 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -64,6 +64,7 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
 void memblock_trim_memory(phys_addr_t align);
 
 int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
+bool memblock_is_hotpluggable(struct memblock_region *region);
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git a/mm/memblock.c b/mm/memblock.c
index 3ea4301..c8eb5d2 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -610,6 +610,11 @@ int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
 	return 0;
 }
 
+bool __init_memblock memblock_is_hotpluggable(struct memblock_region *region)
+{
+	return region->flags & MEMBLOCK_HOTPLUG;
+}
+
 /**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b100255..86d4381 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4948,9 +4948,35 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	nodemask_t saved_node_state = node_states[N_MEMORY];
 	unsigned long totalpages = early_calculate_totalpages();
 	int usable_nodes = nodes_weight(node_states[N_MEMORY]);
+	struct memblock_type *type = &memblock.memory;
 
+	/* Need to find movable_zone earlier when movablenode is specified. */
+	find_usable_zone_for_movable();
+
+#ifdef CONFIG_MOVABLE_NODE
 	/*
-	 * If movablecore was specified, calculate what size of
+	 * If movablenode is specified, ignore kernelcore and movablecore
+	 * options.
+	 */
+	if (movablenode_enable_srat) {
+		for (i = 0; i < type->cnt; i++) {
+			if (!memblock_is_hotpluggable(&type->regions[i]))
+				continue;
+
+			nid = type->regions[i].nid;
+
+			usable_startpfn = PFN_DOWN(type->regions[i].base);
+			zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
+				min(usable_startpfn, zone_movable_pfn[nid]) :
+				usable_startpfn;
+		}
+
+		goto out;
+	}
+#endif
+
+	/*
+	 * If movablecore=nn[KMG] was specified, calculate what size of
 	 * kernelcore that corresponds so that memory usable for
 	 * any allocation type is evenly spread. If both kernelcore
 	 * and movablecore are specified, then the value of kernelcore
@@ -4976,7 +5002,6 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		goto out;
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-	find_usable_zone_for_movable();
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
@@ -5067,12 +5092,12 @@ restart:
 	if (usable_nodes && required_kernelcore > usable_nodes)
 		goto restart;
 
+out:
 	/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
 	for (nid = 0; nid < MAX_NUMNODES; nid++)
 		zone_movable_pfn[nid] =
 			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
 
-out:
 	/* restore the node_state */
 	node_states[N_MEMORY] = saved_node_state;
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 18/18] x86, numa, acpi, memory-hotplug: Make movablenode have higher priority.
@ 2013-08-02  9:14   ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-02  9:14 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Arrange hotpluggable memory as ZONE_MOVABLE will cause NUMA performance down
because the kernel cannot use movable memory. For users who don't use memory
hotplug and who don't want to lose their NUMA performance, they need a way to
disable this functionality. So we improved movablecore boot option.

If users specify the original movablecore=nn@ss boot option, the kernel will
arrange [ss, ss+nn) as ZONE_MOVABLE. The kernelcore=nn@ss boot option is similar
except it specifies ZONE_NORMAL ranges.

Now, if users specify "movablenode" in kernel commandline, the kernel will
arrange hotpluggable memory in SRAT as ZONE_MOVABLE. And if users do this, all
the other movablecore=nn@ss and kernelcore=nn@ss options should be ignored.

For those who don't want this, just specify nothing. The kernel will act as
before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |    5 +++++
 mm/page_alloc.c          |   31 ++++++++++++++++++++++++++++---
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index c0bd31c..e78e32f 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -64,6 +64,7 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
 void memblock_trim_memory(phys_addr_t align);
 
 int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
+bool memblock_is_hotpluggable(struct memblock_region *region);
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git a/mm/memblock.c b/mm/memblock.c
index 3ea4301..c8eb5d2 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -610,6 +610,11 @@ int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
 	return 0;
 }
 
+bool __init_memblock memblock_is_hotpluggable(struct memblock_region *region)
+{
+	return region->flags & MEMBLOCK_HOTPLUG;
+}
+
 /**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b100255..86d4381 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4948,9 +4948,35 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	nodemask_t saved_node_state = node_states[N_MEMORY];
 	unsigned long totalpages = early_calculate_totalpages();
 	int usable_nodes = nodes_weight(node_states[N_MEMORY]);
+	struct memblock_type *type = &memblock.memory;
 
+	/* Need to find movable_zone earlier when movablenode is specified. */
+	find_usable_zone_for_movable();
+
+#ifdef CONFIG_MOVABLE_NODE
 	/*
-	 * If movablecore was specified, calculate what size of
+	 * If movablenode is specified, ignore kernelcore and movablecore
+	 * options.
+	 */
+	if (movablenode_enable_srat) {
+		for (i = 0; i < type->cnt; i++) {
+			if (!memblock_is_hotpluggable(&type->regions[i]))
+				continue;
+
+			nid = type->regions[i].nid;
+
+			usable_startpfn = PFN_DOWN(type->regions[i].base);
+			zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
+				min(usable_startpfn, zone_movable_pfn[nid]) :
+				usable_startpfn;
+		}
+
+		goto out;
+	}
+#endif
+
+	/*
+	 * If movablecore=nn[KMG] was specified, calculate what size of
 	 * kernelcore that corresponds so that memory usable for
 	 * any allocation type is evenly spread. If both kernelcore
 	 * and movablecore are specified, then the value of kernelcore
@@ -4976,7 +5002,6 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		goto out;
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-	find_usable_zone_for_movable();
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
@@ -5067,12 +5092,12 @@ restart:
 	if (usable_nodes && required_kernelcore > usable_nodes)
 		goto restart;
 
+out:
 	/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
 	for (nid = 0; nid < MAX_NUMNODES; nid++)
 		zone_movable_pfn[nid] =
 			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
 
-out:
 	/* restore the node_state */
 	node_states[N_MEMORY] = saved_node_state;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts.
  2013-08-02  9:14   ` Tang Chen
@ 2013-08-02 13:00     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2013-08-02 13:00 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Friday, August 02, 2013 05:14:24 PM Tang Chen wrote:
> In ACPI, SRAT(System Resource Affinity Table) contains NUMA info.
> The memory affinities in SRAT record every memory range in the
> system, and also, flags specifying if the memory range is
> hotpluggable.
> (Please refer to ACPI spec 5.0 5.2.16)
> 
> memblock starts to work at very early time, and SRAT has not been
> parsed. So we don't know which memory is hotpluggable. In order
> to use memblock to reserve hotpluggable memory, we need to obtain
> SRAT memory affinity info earlier.
> 
> In the current acpi_boot_table_init(), it does the following:
> 1. Parse RSDT, so that we can find all the tables.
> 2. Initialize acpi_gbl_root_table_list, an array of acpi table
>    descriptors used to store each table's address, length, signature,
>    and so on.
> 3. Check if there is any table in initrd intending to override
>    tables from firmware. If so, override the firmware tables.
> 4. Initialize all the data in acpi_gbl_root_table_list.
> 
> In order to parse SRAT at early time, we need to do similar job as
> step 1 and 2 above earlier to obtain SRAT. It will be very convenient
> if we have acpi_gbl_root_table_list initialized. We can use address
> and signature to find SRAT.
> 
> Since step 1 and 2 allocates no memory, it is OK to do these two
> steps earlier.
> 
> But step 3 will check acpi initrd table override, not just SRAT,
> but also all the other tables. So it is better to keep it untouched.
> 
> This patch splits acpi_boot_table_init() into two steps:
> 1. Parse RSDT, which cannot be overrided, and initialize
>    acpi_gbl_root_table_list. (step 1 + 2 above)
> 2. Install all ACPI tables into acpi_gbl_root_table_list.
>    (step 3 + 4 above)
> 
> In later patches, we will do step 1 + 2 earlier.

Please note that Linux is not the only user of the code you're modifying, so
you need to make it possible to use the existing functions.

In particular, acpi_tb_parse_root_table() can't be modified the way you did it,
because that would require all of the users of ACPICA to be modified.

> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/acpica/tbutils.c |   25 ++++++++++++++++++++++---
>  drivers/acpi/tables.c         |    2 ++
>  include/acpi/acpixf.h         |    2 ++
>  3 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> index bffdfc7..e3621cf 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -577,9 +577,30 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	 */
>  	acpi_os_unmap_memory(table, length);
>  
> +	return_ACPI_STATUS(AE_OK);
> +}
> +
> +/*******************************************************************************
> + *
> + * FUNCTION:    acpi_tb_install_root_table
> + *
> + * DESCRIPTION: This function installs all the ACPI tables in RSDT into
> + *              acpi_gbl_root_table_list.
> + *
> + ******************************************************************************/
> +
> +void __init
> +acpi_tb_install_root_table()
> +{
> +	int i;
> +
>  	/*
>  	 * Complete the initialization of the root table array by examining
> -	 * the header of each table
> +	 * the header of each table.
> +	 *
> +	 * First two entries in the table array are reserved for the DSDT
> +	 * and FACS, which are not actually present in the RSDT/XSDT - they
> +	 * come from the FADT.
>  	 */
>  	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
>  		acpi_tb_install_table(acpi_gbl_root_table_list.tables[i].
> @@ -593,6 +614,4 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  			acpi_tb_parse_fadt(i);
>  		}
>  	}
> -
> -	return_ACPI_STATUS(AE_OK);
>  }
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index d67a1fe..8860e79 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -353,6 +353,8 @@ int __init acpi_table_init(void)
>  	if (ACPI_FAILURE(status))
>  		return 1;
>  
> +	acpi_tb_install_root_table();
> +
>  	check_multiple_madt();
>  	return 0;
>  }
> diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
> index 22d497e..e9c9b88 100644
> --- a/include/acpi/acpixf.h
> +++ b/include/acpi/acpixf.h
> @@ -118,6 +118,8 @@ acpi_status
>  acpi_initialize_tables(struct acpi_table_desc *initial_storage,
>  		       u32 initial_table_count, u8 allow_resize);
>  
> +void acpi_tb_install_root_table(void);
> +
>  acpi_status __init acpi_initialize_subsystem(void);
>  
>  acpi_status acpi_enable_subsystem(u32 flags);
> 

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts.
@ 2013-08-02 13:00     ` Rafael J. Wysocki
  0 siblings, 0 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2013-08-02 13:00 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Friday, August 02, 2013 05:14:24 PM Tang Chen wrote:
> In ACPI, SRAT(System Resource Affinity Table) contains NUMA info.
> The memory affinities in SRAT record every memory range in the
> system, and also, flags specifying if the memory range is
> hotpluggable.
> (Please refer to ACPI spec 5.0 5.2.16)
> 
> memblock starts to work at very early time, and SRAT has not been
> parsed. So we don't know which memory is hotpluggable. In order
> to use memblock to reserve hotpluggable memory, we need to obtain
> SRAT memory affinity info earlier.
> 
> In the current acpi_boot_table_init(), it does the following:
> 1. Parse RSDT, so that we can find all the tables.
> 2. Initialize acpi_gbl_root_table_list, an array of acpi table
>    descriptors used to store each table's address, length, signature,
>    and so on.
> 3. Check if there is any table in initrd intending to override
>    tables from firmware. If so, override the firmware tables.
> 4. Initialize all the data in acpi_gbl_root_table_list.
> 
> In order to parse SRAT at early time, we need to do similar job as
> step 1 and 2 above earlier to obtain SRAT. It will be very convenient
> if we have acpi_gbl_root_table_list initialized. We can use address
> and signature to find SRAT.
> 
> Since step 1 and 2 allocates no memory, it is OK to do these two
> steps earlier.
> 
> But step 3 will check acpi initrd table override, not just SRAT,
> but also all the other tables. So it is better to keep it untouched.
> 
> This patch splits acpi_boot_table_init() into two steps:
> 1. Parse RSDT, which cannot be overrided, and initialize
>    acpi_gbl_root_table_list. (step 1 + 2 above)
> 2. Install all ACPI tables into acpi_gbl_root_table_list.
>    (step 3 + 4 above)
> 
> In later patches, we will do step 1 + 2 earlier.

Please note that Linux is not the only user of the code you're modifying, so
you need to make it possible to use the existing functions.

In particular, acpi_tb_parse_root_table() can't be modified the way you did it,
because that would require all of the users of ACPICA to be modified.

> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/acpica/tbutils.c |   25 ++++++++++++++++++++++---
>  drivers/acpi/tables.c         |    2 ++
>  include/acpi/acpixf.h         |    2 ++
>  3 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> index bffdfc7..e3621cf 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -577,9 +577,30 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	 */
>  	acpi_os_unmap_memory(table, length);
>  
> +	return_ACPI_STATUS(AE_OK);
> +}
> +
> +/*******************************************************************************
> + *
> + * FUNCTION:    acpi_tb_install_root_table
> + *
> + * DESCRIPTION: This function installs all the ACPI tables in RSDT into
> + *              acpi_gbl_root_table_list.
> + *
> + ******************************************************************************/
> +
> +void __init
> +acpi_tb_install_root_table()
> +{
> +	int i;
> +
>  	/*
>  	 * Complete the initialization of the root table array by examining
> -	 * the header of each table
> +	 * the header of each table.
> +	 *
> +	 * First two entries in the table array are reserved for the DSDT
> +	 * and FACS, which are not actually present in the RSDT/XSDT - they
> +	 * come from the FADT.
>  	 */
>  	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
>  		acpi_tb_install_table(acpi_gbl_root_table_list.tables[i].
> @@ -593,6 +614,4 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  			acpi_tb_parse_fadt(i);
>  		}
>  	}
> -
> -	return_ACPI_STATUS(AE_OK);
>  }
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index d67a1fe..8860e79 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -353,6 +353,8 @@ int __init acpi_table_init(void)
>  	if (ACPI_FAILURE(status))
>  		return 1;
>  
> +	acpi_tb_install_root_table();
> +
>  	check_multiple_madt();
>  	return 0;
>  }
> diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
> index 22d497e..e9c9b88 100644
> --- a/include/acpi/acpixf.h
> +++ b/include/acpi/acpixf.h
> @@ -118,6 +118,8 @@ acpi_status
>  acpi_initialize_tables(struct acpi_table_desc *initial_storage,
>  		       u32 initial_table_count, u8 allow_resize);
>  
> +void acpi_tb_install_root_table(void);
> +
>  acpi_status __init acpi_initialize_subsystem(void);
>  
>  acpi_status acpi_enable_subsystem(u32 flags);
> 

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 07/18] x86, ACPI: Also initialize signature and length when parsing root table.
  2013-08-02  9:14   ` Tang Chen
@ 2013-08-02 13:03     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2013-08-02 13:03 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Friday, August 02, 2013 05:14:26 PM Tang Chen wrote:
> Besides the phys addr of the acpi tables, it will be very convenient if
> we also have the signature of each table in acpi_gbl_root_table_list at
> early time. We can find SRAT easily by comparing the signature.
> 
> This patch alse record signature and some other info in
> acpi_gbl_root_table_list at early time.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

The subject is misleading, as the change is in ACPICA and therefore affects not
only x86.

Also I think the same comments as for the other ACPICA patch is this series
applies: You shouldn't modify acpi_tbl_parse_root_table() in ways that would
require the other OSes using ACPICA to be modified.

> ---
>  drivers/acpi/acpica/tbutils.c |   22 ++++++++++++++++++++++
>  1 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> index e3621cf..af942fe 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -438,6 +438,7 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	u32 i;
>  	u32 table_count;
>  	struct acpi_table_header *table;
> +	struct acpi_table_desc *table_desc;
>  	acpi_physical_address address;
>  	acpi_physical_address uninitialized_var(rsdt_address);
>  	u32 length;
> @@ -577,6 +578,27 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	 */
>  	acpi_os_unmap_memory(table, length);
>  
> +	/*
> +	 * Also initialize the table entries here, so that later we can use them
> +	 * to find SRAT at very eraly time to reserve hotpluggable memory.
> +	 */
> +	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
> +		table = acpi_os_map_memory(
> +				acpi_gbl_root_table_list.tables[i].address,
> +				sizeof(struct acpi_table_header));
> +		if (!table)
> +			return_ACPI_STATUS(AE_NO_MEMORY);
> +
> +		table_desc = &acpi_gbl_root_table_list.tables[i];
> +
> +		table_desc->pointer = NULL;
> +		table_desc->length = table->length;
> +		table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
> +		ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
> +
> +		acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
> +	}
> +
>  	return_ACPI_STATUS(AE_OK);
>  }

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 07/18] x86, ACPI: Also initialize signature and length when parsing root table.
@ 2013-08-02 13:03     ` Rafael J. Wysocki
  0 siblings, 0 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2013-08-02 13:03 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Friday, August 02, 2013 05:14:26 PM Tang Chen wrote:
> Besides the phys addr of the acpi tables, it will be very convenient if
> we also have the signature of each table in acpi_gbl_root_table_list at
> early time. We can find SRAT easily by comparing the signature.
> 
> This patch alse record signature and some other info in
> acpi_gbl_root_table_list at early time.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

The subject is misleading, as the change is in ACPICA and therefore affects not
only x86.

Also I think the same comments as for the other ACPICA patch is this series
applies: You shouldn't modify acpi_tbl_parse_root_table() in ways that would
require the other OSes using ACPICA to be modified.

> ---
>  drivers/acpi/acpica/tbutils.c |   22 ++++++++++++++++++++++
>  1 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> index e3621cf..af942fe 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -438,6 +438,7 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	u32 i;
>  	u32 table_count;
>  	struct acpi_table_header *table;
> +	struct acpi_table_desc *table_desc;
>  	acpi_physical_address address;
>  	acpi_physical_address uninitialized_var(rsdt_address);
>  	u32 length;
> @@ -577,6 +578,27 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>  	 */
>  	acpi_os_unmap_memory(table, length);
>  
> +	/*
> +	 * Also initialize the table entries here, so that later we can use them
> +	 * to find SRAT at very eraly time to reserve hotpluggable memory.
> +	 */
> +	for (i = 2; i < acpi_gbl_root_table_list.current_table_count; i++) {
> +		table = acpi_os_map_memory(
> +				acpi_gbl_root_table_list.tables[i].address,
> +				sizeof(struct acpi_table_header));
> +		if (!table)
> +			return_ACPI_STATUS(AE_NO_MEMORY);
> +
> +		table_desc = &acpi_gbl_root_table_list.tables[i];
> +
> +		table_desc->pointer = NULL;
> +		table_desc->length = table->length;
> +		table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
> +		ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
> +
> +		acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
> +	}
> +
>  	return_ACPI_STATUS(AE_OK);
>  }

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 07/18] x86, ACPI: Also initialize signature and length when parsing root table.
  2013-08-02 13:03     ` Rafael J. Wysocki
@ 2013-08-05  1:33       ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-05  1:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]

Hi Rafael,

On 08/02/2013 09:03 PM, Rafael J. Wysocki wrote:
> On Friday, August 02, 2013 05:14:26 PM Tang Chen wrote:
>> Besides the phys addr of the acpi tables, it will be very convenient if
>> we also have the signature of each table in acpi_gbl_root_table_list at
>> early time. We can find SRAT easily by comparing the signature.
>>
>> This patch alse record signature and some other info in
>> acpi_gbl_root_table_list at early time.
>>
>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>> Reviewed-by: Zhang Yanfei<zhangyanfei@cn.fujitsu.com>
>
> The subject is misleading, as the change is in ACPICA and therefore affects not
> only x86.

OK, will change it.

>
> Also I think the same comments as for the other ACPICA patch is this series
> applies: You shouldn't modify acpi_tbl_parse_root_table() in ways that would
> require the other OSes using ACPICA to be modified.
>

Thank you for the reminding. Please refer to the attachment.
How do you think of the idea from Zheng ?

Thanks.

[-- Attachment #2: Re: [PATCH v2 05_18] x86, acpi: Split acpi_boot_table_init() into two parts..eml --]
[-- Type: message/rfc822, Size: 14802 bytes --]

From: "Zheng, Lv" <lv.zheng@intel.com>
To: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Toshi Kani <toshi.kani@hp.com>, "rjw@sisk.pl" <rjw@sisk.pl>, "lenb@kernel.org" <lenb@kernel.org>, "tglx@linutronix.de" <tglx@linutronix.de>, "mingo@elte.hu" <mingo@elte.hu>, "hpa@zytor.com" <hpa@zytor.com>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "tj@kernel.org" <tj@kernel.org>, "trenn@suse.de" <trenn@suse.de>, "yinghai@kernel.org" <yinghai@kernel.org>, "jiang.liu@huawei.com" <jiang.liu@huawei.com>, "wency@cn.fujitsu.com" <wency@cn.fujitsu.com>, "laijs@cn.fujitsu.com" <laijs@cn.fujitsu.com>, "isimatu.yasuaki@jp.fujitsu.com" <isimatu.yasuaki@jp.fujitsu.com>, "izumi.taku@jp.fujitsu.com" <izumi.taku@jp.fujitsu.com>, "mgorman@suse.de" <mgorman@suse.de>, "minchan@kernel.org" <minchan@kernel.org>, "mina86@mina86.com" <mina86@mina86.com>, "gong.chen@linux.intel.com" <gong.chen@linux.intel.com>, "vasilis.liaskovitis@profitbricks.com" <vasilis.liaskovitis@profitbricks.com>, "lwoodman@redhat.com" <lwoodman@redhat.com>, "riel@redhat.com" <riel@redhat.com>, "jweiner@redhat.com" <jweiner@redhat.com>, "prarit@redhat.com" <prarit@redhat.com>, "zhangyanfei@cn.fujitsu.com" <zhangyanfei@cn.fujitsu.com>, "yanghy@cn.fujitsu.com" <yanghy@cn.fujitsu.com>, "x86@kernel.org" <x86@kernel.org>, "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>, "Moore, Robert" <robert.moore@intel.com>
Subject: RE: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
Date: Fri, 2 Aug 2013 08:11:15 +0000
Message-ID: <1AE640813FDE7649BE1B193DEA596E8802437C27@SHSMSX101.ccr.corp.intel.com>

> From: Tang Chen [mailto:tangchen@cn.fujitsu.com]
> Sent: Friday, August 02, 2013 3:01 PM
> 
> On 08/02/2013 01:25 PM, Zheng, Lv wrote:
> ......
> >>> index ce3d5db..9d68ffc 100644
> >>> --- a/drivers/acpi/acpica/tbutils.c
> >>> +++ b/drivers/acpi/acpica/tbutils.c
> >>> @@ -766,9 +766,30 @@
> >> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >>>   	*/
> >>>   	acpi_os_unmap_memory(table, length);
> >>>
> >>> +	return_ACPI_STATUS(AE_OK);
> >>> +}
> >>> +
> >>>
> >
> > I don't think you can split the function here.
> > ACPICA still need to continue to parse the table using the logic
> implemented in the acpi_tb_install_table() and acpi_tb_parse_fadt().
> (for example, endianess of the signature).
> > You'd better to keep them as is and split some codes from
> 'acpi_tb_install_table' to form another function:
> acpi_tb_override_table().
> 
> I'm sorry, I don't quite follow this.
> 
> I split acpi_tb_parse_root_table(), not acpi_tb_install_table() and
> acpi_tb_parse_fadt().
> If ACPICA wants to use these two functions somewhere else, I think it is
> OK, isn't it?
> 
> And the reason I did this, please see below.
> 
> ......
> >>> + *
> >>> + * FUNCTION:    acpi_tb_install_root_table
> >
> > I think this function should be acpi_tb_override_tables, and call
> acpi_tb_override_table() inside this function for each table.
> 
> It is not just about acpi initrd table override.
> 
> acpi_tb_parse_root_table() was split into two steps:
> 1. initialize acpi_gbl_root_table_list
> 2. install tables into acpi_gbl_root_table_list
> 
> I need step1 earlier because I want to find SRAT at early time.
> But I don't want step2 earlier because before install the tables in
> firmware,
> acpi initrd table override could happen. I want only SRAT, I don't want to
> touch much existing code.

According to what you've explained, what you didn’t want to be called earlier is exactly "acpi initrd table override", please split only this logic to the step 2 and leave the others remained.
I think you should write a function named as acpi_override_tables() or likewise in tbxface.c to be executed as the OSPM entry of the step 2.
Inside this function, acpi_tb_table_override() should be called.

268 void
269 acpi_tb_install_table(acpi_physical_address address,
270                       char *signature, u32 table_index)
271 {

I think you still need the following codes to be called at the early stage.

272         struct acpi_table_header *table;
273         struct acpi_table_header *final_table;
274         struct acpi_table_desc *table_desc;
275 
276         if (!address) {
277                 ACPI_ERROR((AE_INFO,
278                             "Null physical address for ACPI table [%s]",
279                             signature));
280                 return;
281         }
282 
283         /* Map just the table header */
284 
285         table = acpi_os_map_memory(address, sizeof(struct acpi_table_header));
286         if (!table) {
287                 ACPI_ERROR((AE_INFO,
288                             "Could not map memory for table [%s] at %p",
289                             signature, ACPI_CAST_PTR(void, address)));
290                 return;
291         }
292 
293         /* If a particular signature is expected (DSDT/FACS), it must match */
294 
295         if (signature && !ACPI_COMPARE_NAME(table->signature, signature)) {
296                 ACPI_BIOS_ERROR((AE_INFO,
297                                  "Invalid signature 0x%X for ACPI table, expected [%s]",
298                                  *ACPI_CAST_PTR(u32, table->signature),
299                                  signature));
300                 goto unmap_and_exit;
301         }
302 
303         /*
304          * Initialize the table entry. Set the pointer to NULL, since the
305          * table is not fully mapped at this time.
306          */
307         table_desc = &acpi_gbl_root_table_list.tables[table_index];
308 
309         table_desc->address = address;
310         table_desc->pointer = NULL;
311         table_desc->length = table->length;
312         table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
313         ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
314 

You should delete the following codes:

315         /*
316          * ACPI Table Override:
317          *
318          * Before we install the table, let the host OS override it with a new
319          * one if desired. Any table within the RSDT/XSDT can be replaced,
320          * including the DSDT which is pointed to by the FADT.
321          *
322          * NOTE: If the table is overridden, then final_table will contain a
323          * mapped pointer to the full new table. If the table is not overridden,
324          * or if there has been a physical override, then the table will be
325          * fully mapped later (in verify table). In any case, we must
326          * unmap the header that was mapped above.
327          */
328         final_table = acpi_tb_table_override(table, table_desc);
329         if (!final_table) {
330                 final_table = table;    /* There was no override */
331         }
332 

You still need to keep the following logic.

333         acpi_tb_print_table_header(table_desc->address, final_table);
334 
335         /* Set the global integer width (based upon revision of the DSDT) */
336 
337         if (table_index == ACPI_TABLE_INDEX_DSDT) {
338                 acpi_ut_set_integer_width(final_table->revision);
339         }
340 

You should delete the following codes:

341         /*
342          * If we have a physical override during this early loading of the ACPI
343          * tables, unmap the table for now. It will be mapped again later when
344          * it is actually used. This supports very early loading of ACPI tables,
345          * before virtual memory is fully initialized and running within the
346          * host OS. Note: A logical override has the ACPI_TABLE_ORIGIN_OVERRIDE
347          * flag set and will not be deleted below.
348          */
349         if (final_table != table) {
350                 acpi_tb_delete_table(table_desc);
351         }

Keep the following.

352 
353       unmap_and_exit:
354 
355         /* Always unmap the table header that we mapped above */
356 
357         acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
358 }

I'm not sure if this can make my concerns clearer for you now.

Thanks and best regards
-Lv

> 
> Would you please explain more about your comment ? I think maybe I
> missed something
> important to you guys. :)
> 
> And all the other ACPICA rules will be followed in the next version.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 07/18] x86, ACPI: Also initialize signature and length when parsing root table.
@ 2013-08-05  1:33       ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-05  1:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]

Hi Rafael,

On 08/02/2013 09:03 PM, Rafael J. Wysocki wrote:
> On Friday, August 02, 2013 05:14:26 PM Tang Chen wrote:
>> Besides the phys addr of the acpi tables, it will be very convenient if
>> we also have the signature of each table in acpi_gbl_root_table_list at
>> early time. We can find SRAT easily by comparing the signature.
>>
>> This patch alse record signature and some other info in
>> acpi_gbl_root_table_list at early time.
>>
>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>> Reviewed-by: Zhang Yanfei<zhangyanfei@cn.fujitsu.com>
>
> The subject is misleading, as the change is in ACPICA and therefore affects not
> only x86.

OK, will change it.

>
> Also I think the same comments as for the other ACPICA patch is this series
> applies: You shouldn't modify acpi_tbl_parse_root_table() in ways that would
> require the other OSes using ACPICA to be modified.
>

Thank you for the reminding. Please refer to the attachment.
How do you think of the idea from Zheng ?

Thanks.

[-- Attachment #2: Re: [PATCH v2 05_18] x86, acpi: Split acpi_boot_table_init() into two parts..eml --]
[-- Type: message/rfc822, Size: 14804 bytes --]

From: "Zheng, Lv" <lv.zheng@intel.com>
To: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Toshi Kani <toshi.kani@hp.com>, "rjw@sisk.pl" <rjw@sisk.pl>, "lenb@kernel.org" <lenb@kernel.org>, "tglx@linutronix.de" <tglx@linutronix.de>, "mingo@elte.hu" <mingo@elte.hu>, "hpa@zytor.com" <hpa@zytor.com>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "tj@kernel.org" <tj@kernel.org>, "trenn@suse.de" <trenn@suse.de>, "yinghai@kernel.org" <yinghai@kernel.org>, "jiang.liu@huawei.com" <jiang.liu@huawei.com>, "wency@cn.fujitsu.com" <wency@cn.fujitsu.com>, "laijs@cn.fujitsu.com" <laijs@cn.fujitsu.com>, "isimatu.yasuaki@jp.fujitsu.com" <isimatu.yasuaki@jp.fujitsu.com>, "izumi.taku@jp.fujitsu.com" <izumi.taku@jp.fujitsu.com>, "mgorman@suse.de" <mgorman@suse.de>, "minchan@kernel.org" <minchan@kernel.org>, "mina86@mina86.com" <mina86@mina86.com>, "gong.chen@linux.intel.com" <gong.chen@linux.intel.com>, "vasilis.liaskovitis@profitbricks.com" <vasilis.liaskovitis@profitbricks.com>, "lwoodman@redhat.com" <lwoodman@redhat.com>, "riel@redhat.com" <riel@redhat.com>, "jweiner@redhat.com" <jweiner@redhat.com>, "prarit@redhat.com" <prarit@redhat.com>, "zhangyanfei@cn.fujitsu.com" <zhangyanfei@cn.fujitsu.com>, "yanghy@cn.fujitsu.com" <yanghy@cn.fujitsu.com>, "x86@kernel.org" <x86@kernel.org>, "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>, "Moore, Robert" <robert.moore@intel.com>
Subject: RE: [PATCH v2 05/18] x86, acpi: Split acpi_boot_table_init() into two parts.
Date: Fri, 2 Aug 2013 08:11:15 +0000
Message-ID: <1AE640813FDE7649BE1B193DEA596E8802437C27@SHSMSX101.ccr.corp.intel.com>

> From: Tang Chen [mailto:tangchen@cn.fujitsu.com]
> Sent: Friday, August 02, 2013 3:01 PM
> 
> On 08/02/2013 01:25 PM, Zheng, Lv wrote:
> ......
> >>> index ce3d5db..9d68ffc 100644
> >>> --- a/drivers/acpi/acpica/tbutils.c
> >>> +++ b/drivers/acpi/acpica/tbutils.c
> >>> @@ -766,9 +766,30 @@
> >> acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >>>   	*/
> >>>   	acpi_os_unmap_memory(table, length);
> >>>
> >>> +	return_ACPI_STATUS(AE_OK);
> >>> +}
> >>> +
> >>>
> >
> > I don't think you can split the function here.
> > ACPICA still need to continue to parse the table using the logic
> implemented in the acpi_tb_install_table() and acpi_tb_parse_fadt().
> (for example, endianess of the signature).
> > You'd better to keep them as is and split some codes from
> 'acpi_tb_install_table' to form another function:
> acpi_tb_override_table().
> 
> I'm sorry, I don't quite follow this.
> 
> I split acpi_tb_parse_root_table(), not acpi_tb_install_table() and
> acpi_tb_parse_fadt().
> If ACPICA wants to use these two functions somewhere else, I think it is
> OK, isn't it?
> 
> And the reason I did this, please see below.
> 
> ......
> >>> + *
> >>> + * FUNCTION:    acpi_tb_install_root_table
> >
> > I think this function should be acpi_tb_override_tables, and call
> acpi_tb_override_table() inside this function for each table.
> 
> It is not just about acpi initrd table override.
> 
> acpi_tb_parse_root_table() was split into two steps:
> 1. initialize acpi_gbl_root_table_list
> 2. install tables into acpi_gbl_root_table_list
> 
> I need step1 earlier because I want to find SRAT at early time.
> But I don't want step2 earlier because before install the tables in
> firmware,
> acpi initrd table override could happen. I want only SRAT, I don't want to
> touch much existing code.

According to what you've explained, what you didn’t want to be called earlier is exactly "acpi initrd table override", please split only this logic to the step 2 and leave the others remained.
I think you should write a function named as acpi_override_tables() or likewise in tbxface.c to be executed as the OSPM entry of the step 2.
Inside this function, acpi_tb_table_override() should be called.

268 void
269 acpi_tb_install_table(acpi_physical_address address,
270                       char *signature, u32 table_index)
271 {

I think you still need the following codes to be called at the early stage.

272         struct acpi_table_header *table;
273         struct acpi_table_header *final_table;
274         struct acpi_table_desc *table_desc;
275 
276         if (!address) {
277                 ACPI_ERROR((AE_INFO,
278                             "Null physical address for ACPI table [%s]",
279                             signature));
280                 return;
281         }
282 
283         /* Map just the table header */
284 
285         table = acpi_os_map_memory(address, sizeof(struct acpi_table_header));
286         if (!table) {
287                 ACPI_ERROR((AE_INFO,
288                             "Could not map memory for table [%s] at %p",
289                             signature, ACPI_CAST_PTR(void, address)));
290                 return;
291         }
292 
293         /* If a particular signature is expected (DSDT/FACS), it must match */
294 
295         if (signature && !ACPI_COMPARE_NAME(table->signature, signature)) {
296                 ACPI_BIOS_ERROR((AE_INFO,
297                                  "Invalid signature 0x%X for ACPI table, expected [%s]",
298                                  *ACPI_CAST_PTR(u32, table->signature),
299                                  signature));
300                 goto unmap_and_exit;
301         }
302 
303         /*
304          * Initialize the table entry. Set the pointer to NULL, since the
305          * table is not fully mapped at this time.
306          */
307         table_desc = &acpi_gbl_root_table_list.tables[table_index];
308 
309         table_desc->address = address;
310         table_desc->pointer = NULL;
311         table_desc->length = table->length;
312         table_desc->flags = ACPI_TABLE_ORIGIN_MAPPED;
313         ACPI_MOVE_32_TO_32(table_desc->signature.ascii, table->signature);
314 

You should delete the following codes:

315         /*
316          * ACPI Table Override:
317          *
318          * Before we install the table, let the host OS override it with a new
319          * one if desired. Any table within the RSDT/XSDT can be replaced,
320          * including the DSDT which is pointed to by the FADT.
321          *
322          * NOTE: If the table is overridden, then final_table will contain a
323          * mapped pointer to the full new table. If the table is not overridden,
324          * or if there has been a physical override, then the table will be
325          * fully mapped later (in verify table). In any case, we must
326          * unmap the header that was mapped above.
327          */
328         final_table = acpi_tb_table_override(table, table_desc);
329         if (!final_table) {
330                 final_table = table;    /* There was no override */
331         }
332 

You still need to keep the following logic.

333         acpi_tb_print_table_header(table_desc->address, final_table);
334 
335         /* Set the global integer width (based upon revision of the DSDT) */
336 
337         if (table_index == ACPI_TABLE_INDEX_DSDT) {
338                 acpi_ut_set_integer_width(final_table->revision);
339         }
340 

You should delete the following codes:

341         /*
342          * If we have a physical override during this early loading of the ACPI
343          * tables, unmap the table for now. It will be mapped again later when
344          * it is actually used. This supports very early loading of ACPI tables,
345          * before virtual memory is fully initialized and running within the
346          * host OS. Note: A logical override has the ACPI_TABLE_ORIGIN_OVERRIDE
347          * flag set and will not be deleted below.
348          */
349         if (final_table != table) {
350                 acpi_tb_delete_table(table_desc);
351         }

Keep the following.

352 
353       unmap_and_exit:
354 
355         /* Always unmap the table header that we mapped above */
356 
357         acpi_os_unmap_memory(table, sizeof(struct acpi_table_header));
358 }

I'm not sure if this can make my concerns clearer for you now.

Thanks and best regards
-Lv

> 
> Would you please explain more about your comment ? I think maybe I
> missed something
> important to you guys. :)
> 
> And all the other ACPICA rules will be followed in the next version.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts.
  2013-08-02 13:00     ` Rafael J. Wysocki
@ 2013-08-05  3:21       ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-05  3:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

Hi Rafael,

On 08/02/2013 09:00 PM, Rafael J. Wysocki wrote:
......
>> This patch splits acpi_boot_table_init() into two steps:
>> 1. Parse RSDT, which cannot be overrided, and initialize
>>     acpi_gbl_root_table_list. (step 1 + 2 above)
>> 2. Install all ACPI tables into acpi_gbl_root_table_list.
>>     (step 3 + 4 above)
>>
>> In later patches, we will do step 1 + 2 earlier.
>
> Please note that Linux is not the only user of the code you're modifying, so
> you need to make it possible to use the existing functions.
>
> In particular, acpi_tb_parse_root_table() can't be modified the way you did it,
> because that would require all of the users of ACPICA to be modified.

OK, I understand it. Then how about acpi_tb_install_table() ?

acpi_tb_install_table() is also an ACPICA API. But can we split the
acpi_initrd_table_override part out ? Like the following:

1. Initialize acpi_gbl_root_table_list earlier, and install all tables
    provided by firmware.
2. Find SRAT in initrd. If no overridden SRAT, get the SRAT in 
acpi_gbl_root_table_list
    directly. And mark hotpluggable memory. (This the job I want to do.)
3. DO acpi_initrd_table_override job.

Finally it will work like the current kernel. The only difference is:
Before the patch-set, it try to do override first, and then install 
firmware tables.
After the patch-set, it installs firmware tables, and then do the override.

Thanks.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts.
@ 2013-08-05  3:21       ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-05  3:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

Hi Rafael,

On 08/02/2013 09:00 PM, Rafael J. Wysocki wrote:
......
>> This patch splits acpi_boot_table_init() into two steps:
>> 1. Parse RSDT, which cannot be overrided, and initialize
>>     acpi_gbl_root_table_list. (step 1 + 2 above)
>> 2. Install all ACPI tables into acpi_gbl_root_table_list.
>>     (step 3 + 4 above)
>>
>> In later patches, we will do step 1 + 2 earlier.
>
> Please note that Linux is not the only user of the code you're modifying, so
> you need to make it possible to use the existing functions.
>
> In particular, acpi_tb_parse_root_table() can't be modified the way you did it,
> because that would require all of the users of ACPICA to be modified.

OK, I understand it. Then how about acpi_tb_install_table() ?

acpi_tb_install_table() is also an ACPICA API. But can we split the
acpi_initrd_table_override part out ? Like the following:

1. Initialize acpi_gbl_root_table_list earlier, and install all tables
    provided by firmware.
2. Find SRAT in initrd. If no overridden SRAT, get the SRAT in 
acpi_gbl_root_table_list
    directly. And mark hotpluggable memory. (This the job I want to do.)
3. DO acpi_initrd_table_override job.

Finally it will work like the current kernel. The only difference is:
Before the patch-set, it try to do override first, and then install 
firmware tables.
After the patch-set, it installs firmware tables, and then do the override.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  2013-08-02  9:14   ` Tang Chen
@ 2013-08-05  6:22     ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-05  6:22 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Hi tj,

I have resent the v2 patch-set. Would you please give some more
comments about the memblock and x86 booting code modification ?

And I'm also discussing with the ACPICA guys about the implementation
on ACPI side. I hope we can catch up with 3.12 this time.

Thanks.

On 08/02/2013 05:14 PM, Tang Chen wrote:
> At early time, memblock will reserve some memory for the kernel,
> such as the kernel code and data segments, initrd file, and so on,
> which means the kernel resides in these memory regions.
>
> Even if these memory regions are hotpluggable, we should not
> mark them as hotpluggable. Otherwise the kernel won't have enough
> memory to boot.
>
> This patch finds out which memory regions the kernel resides in,
> and skip them when finding all hotpluggable memory regions.
>
> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei<zhangyanfei@cn.fujitsu.com>
> ---
>   mm/memory_hotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>   1 files changed, 45 insertions(+), 0 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ef9ccf8..10a30ef 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -31,6 +31,7 @@
>   #include<linux/firmware-map.h>
>   #include<linux/stop_machine.h>
>   #include<linux/acpi.h>
> +#include<linux/memblock.h>
>
>   #include<asm/tlbflush.h>
>
> @@ -93,6 +94,40 @@ static void release_memory_resource(struct resource *res)
>
>   #ifdef CONFIG_ACPI_NUMA
>   /**
> + * kernel_resides_in_range - Check if kernel resides in a memory region.
> + * @base: The base address of the memory region.
> + * @length: The length of the memory region.
> + *
> + * This function is used at early time. It iterates memblock.reserved and check
> + * if the kernel has used any memory in [@base, @base + @length).
> + *
> + * Return true if the kernel resides in the memory region, false otherwise.
> + */
> +static bool __init kernel_resides_in_region(phys_addr_t base, u64 length)
> +{
> +	int i;
> +	phys_addr_t start, end;
> +	struct memblock_region *region;
> +	struct memblock_type *reserved =&memblock.reserved;
> +
> +	for (i = 0; i<  reserved->cnt; i++) {
> +		region =&reserved->regions[i];
> +
> +		if (region->flags != MEMBLOCK_HOTPLUG)
> +			continue;
> +
> +		start = region->base;
> +		end = region->base + region->size;
> +		if (end<= base || start>= base + length)
> +			continue;
> +
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +/**
>    * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
>    *
>    * This function did the following:
> @@ -129,6 +164,16 @@ void __init find_hotpluggable_memory(void)
>
>   	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr,&base,
>   						&size,&offset))) {
> +		/*
> +		 * At early time, memblock will reserve some memory for the
> +		 * kernel, such as the kernel code and data segments, initrd
> +		 * file, and so on,which means the kernel resides in these
> +		 * memory regions. These regions should not be hotpluggable.
> +		 * So do not mark them as hotpluggable.
> +		 */
> +		if (kernel_resides_in_region(base, size))
> +			continue;
> +
>   		/* Will mark hotpluggable memory regions here */
>   	}
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
@ 2013-08-05  6:22     ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-05  6:22 UTC (permalink / raw)
  To: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Hi tj,

I have resent the v2 patch-set. Would you please give some more
comments about the memblock and x86 booting code modification ?

And I'm also discussing with the ACPICA guys about the implementation
on ACPI side. I hope we can catch up with 3.12 this time.

Thanks.

On 08/02/2013 05:14 PM, Tang Chen wrote:
> At early time, memblock will reserve some memory for the kernel,
> such as the kernel code and data segments, initrd file, and so on,
> which means the kernel resides in these memory regions.
>
> Even if these memory regions are hotpluggable, we should not
> mark them as hotpluggable. Otherwise the kernel won't have enough
> memory to boot.
>
> This patch finds out which memory regions the kernel resides in,
> and skip them when finding all hotpluggable memory regions.
>
> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei<zhangyanfei@cn.fujitsu.com>
> ---
>   mm/memory_hotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>   1 files changed, 45 insertions(+), 0 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ef9ccf8..10a30ef 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -31,6 +31,7 @@
>   #include<linux/firmware-map.h>
>   #include<linux/stop_machine.h>
>   #include<linux/acpi.h>
> +#include<linux/memblock.h>
>
>   #include<asm/tlbflush.h>
>
> @@ -93,6 +94,40 @@ static void release_memory_resource(struct resource *res)
>
>   #ifdef CONFIG_ACPI_NUMA
>   /**
> + * kernel_resides_in_range - Check if kernel resides in a memory region.
> + * @base: The base address of the memory region.
> + * @length: The length of the memory region.
> + *
> + * This function is used at early time. It iterates memblock.reserved and check
> + * if the kernel has used any memory in [@base, @base + @length).
> + *
> + * Return true if the kernel resides in the memory region, false otherwise.
> + */
> +static bool __init kernel_resides_in_region(phys_addr_t base, u64 length)
> +{
> +	int i;
> +	phys_addr_t start, end;
> +	struct memblock_region *region;
> +	struct memblock_type *reserved =&memblock.reserved;
> +
> +	for (i = 0; i<  reserved->cnt; i++) {
> +		region =&reserved->regions[i];
> +
> +		if (region->flags != MEMBLOCK_HOTPLUG)
> +			continue;
> +
> +		start = region->base;
> +		end = region->base + region->size;
> +		if (end<= base || start>= base + length)
> +			continue;
> +
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +/**
>    * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
>    *
>    * This function did the following:
> @@ -129,6 +164,16 @@ void __init find_hotpluggable_memory(void)
>
>   	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr,&base,
>   						&size,&offset))) {
> +		/*
> +		 * At early time, memblock will reserve some memory for the
> +		 * kernel, such as the kernel code and data segments, initrd
> +		 * file, and so on,which means the kernel resides in these
> +		 * memory regions. These regions should not be hotpluggable.
> +		 * So do not mark them as hotpluggable.
> +		 */
> +		if (kernel_resides_in_region(base, size))
> +			continue;
> +
>   		/* Will mark hotpluggable memory regions here */
>   	}
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts.
  2013-08-05 13:26         ` Rafael J. Wysocki
@ 2013-08-05 13:23           ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-05 13:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

Hi Rafael,

On 08/05/2013 09:26 PM, Rafael J. Wysocki wrote:
......
>
> I think I understand what you're trying to achieve and I don't have objections
> agaist the goal, but the matter is *how* to do that.
>
> Why don't you do something like this:
> (1) Introduce two new functions that will each do part of
>      acpi_tb_parse_root_table() such that calling them in sequence, one right
>      after the other, will be exactly equivalent to the current
>      acpi_tb_parse_root_table().
> (2) Redefine acpi_tb_parse_root_table() as a wrapper calling those two new
>      function one right after the other.
> (3) Make Linux use the two new functions directly instead of calling
>      acpi_tb_parse_root_table()?
>
> Then, Linux will use your new functions and won't call acpi_tb_parse_root_table()
> at all, but the other existing users of ACPICA may still call it without any
> modifications.
>
> Does this make sense to you?

Thank you for you advice. It does make sense. I'll try your idea.

Thanks.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts.
@ 2013-08-05 13:23           ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-05 13:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

Hi Rafael,

On 08/05/2013 09:26 PM, Rafael J. Wysocki wrote:
......
>
> I think I understand what you're trying to achieve and I don't have objections
> agaist the goal, but the matter is *how* to do that.
>
> Why don't you do something like this:
> (1) Introduce two new functions that will each do part of
>      acpi_tb_parse_root_table() such that calling them in sequence, one right
>      after the other, will be exactly equivalent to the current
>      acpi_tb_parse_root_table().
> (2) Redefine acpi_tb_parse_root_table() as a wrapper calling those two new
>      function one right after the other.
> (3) Make Linux use the two new functions directly instead of calling
>      acpi_tb_parse_root_table()?
>
> Then, Linux will use your new functions and won't call acpi_tb_parse_root_table()
> at all, but the other existing users of ACPICA may still call it without any
> modifications.
>
> Does this make sense to you?

Thank you for you advice. It does make sense. I'll try your idea.

Thanks.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts.
  2013-08-05  3:21       ` Tang Chen
@ 2013-08-05 13:26         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2013-08-05 13:26 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Monday, August 05, 2013 11:21:51 AM Tang Chen wrote:
> Hi Rafael,
> 
> On 08/02/2013 09:00 PM, Rafael J. Wysocki wrote:
> ......
> >> This patch splits acpi_boot_table_init() into two steps:
> >> 1. Parse RSDT, which cannot be overrided, and initialize
> >>     acpi_gbl_root_table_list. (step 1 + 2 above)
> >> 2. Install all ACPI tables into acpi_gbl_root_table_list.
> >>     (step 3 + 4 above)
> >>
> >> In later patches, we will do step 1 + 2 earlier.
> >
> > Please note that Linux is not the only user of the code you're modifying, so
> > you need to make it possible to use the existing functions.
> >
> > In particular, acpi_tb_parse_root_table() can't be modified the way you did it,
> > because that would require all of the users of ACPICA to be modified.
> 
> OK, I understand it. Then how about acpi_tb_install_table() ?
> 
> acpi_tb_install_table() is also an ACPICA API. But can we split the
> acpi_initrd_table_override part out ? Like the following:

I'm not sure what you mean.  acpi_tb_install_table() doesn't call
acpi_initrd_table_override() directly.

Do you want to split the acpi_tb_table_override() call out of it?

I'm afraid that still wouldn't be OK.

> 1. Initialize acpi_gbl_root_table_list earlier, and install all tables
>     provided by firmware.
> 2. Find SRAT in initrd. If no overridden SRAT, get the SRAT in 
> acpi_gbl_root_table_list
>     directly. And mark hotpluggable memory. (This the job I want to do.)
> 3. DO acpi_initrd_table_override job.
> 
> Finally it will work like the current kernel. The only difference is:
> Before the patch-set, it try to do override first, and then install 
> firmware tables.
> After the patch-set, it installs firmware tables, and then do the override.

I think I understand what you're trying to achieve and I don't have objections
agaist the goal, but the matter is *how* to do that.

Why don't you do something like this:
(1) Introduce two new functions that will each do part of
    acpi_tb_parse_root_table() such that calling them in sequence, one right
    after the other, will be exactly equivalent to the current
    acpi_tb_parse_root_table().
(2) Redefine acpi_tb_parse_root_table() as a wrapper calling those two new
    function one right after the other.
(3) Make Linux use the two new functions directly instead of calling
    acpi_tb_parse_root_table()?

Then, Linux will use your new functions and won't call acpi_tb_parse_root_table()
at all, but the other existing users of ACPICA may still call it without any
modifications.

Does this make sense to you?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts.
@ 2013-08-05 13:26         ` Rafael J. Wysocki
  0 siblings, 0 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2013-08-05 13:26 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Monday, August 05, 2013 11:21:51 AM Tang Chen wrote:
> Hi Rafael,
> 
> On 08/02/2013 09:00 PM, Rafael J. Wysocki wrote:
> ......
> >> This patch splits acpi_boot_table_init() into two steps:
> >> 1. Parse RSDT, which cannot be overrided, and initialize
> >>     acpi_gbl_root_table_list. (step 1 + 2 above)
> >> 2. Install all ACPI tables into acpi_gbl_root_table_list.
> >>     (step 3 + 4 above)
> >>
> >> In later patches, we will do step 1 + 2 earlier.
> >
> > Please note that Linux is not the only user of the code you're modifying, so
> > you need to make it possible to use the existing functions.
> >
> > In particular, acpi_tb_parse_root_table() can't be modified the way you did it,
> > because that would require all of the users of ACPICA to be modified.
> 
> OK, I understand it. Then how about acpi_tb_install_table() ?
> 
> acpi_tb_install_table() is also an ACPICA API. But can we split the
> acpi_initrd_table_override part out ? Like the following:

I'm not sure what you mean.  acpi_tb_install_table() doesn't call
acpi_initrd_table_override() directly.

Do you want to split the acpi_tb_table_override() call out of it?

I'm afraid that still wouldn't be OK.

> 1. Initialize acpi_gbl_root_table_list earlier, and install all tables
>     provided by firmware.
> 2. Find SRAT in initrd. If no overridden SRAT, get the SRAT in 
> acpi_gbl_root_table_list
>     directly. And mark hotpluggable memory. (This the job I want to do.)
> 3. DO acpi_initrd_table_override job.
> 
> Finally it will work like the current kernel. The only difference is:
> Before the patch-set, it try to do override first, and then install 
> firmware tables.
> After the patch-set, it installs firmware tables, and then do the override.

I think I understand what you're trying to achieve and I don't have objections
agaist the goal, but the matter is *how* to do that.

Why don't you do something like this:
(1) Introduce two new functions that will each do part of
    acpi_tb_parse_root_table() such that calling them in sequence, one right
    after the other, will be exactly equivalent to the current
    acpi_tb_parse_root_table().
(2) Redefine acpi_tb_parse_root_table() as a wrapper calling those two new
    function one right after the other.
(3) Make Linux use the two new functions directly instead of calling
    acpi_tb_parse_root_table()?

Then, Linux will use your new functions and won't call acpi_tb_parse_root_table()
at all, but the other existing users of ACPICA may still call it without any
modifications.

Does this make sense to you?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 07/18] x86, ACPI: Also initialize signature and length when parsing root table.
  2013-08-05  1:33       ` Tang Chen
@ 2013-08-05 13:28         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2013-08-05 13:28 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Monday, August 05, 2013 09:33:32 AM Tang Chen wrote:
> Hi Rafael,
> 
> On 08/02/2013 09:03 PM, Rafael J. Wysocki wrote:
> > On Friday, August 02, 2013 05:14:26 PM Tang Chen wrote:
> >> Besides the phys addr of the acpi tables, it will be very convenient if
> >> we also have the signature of each table in acpi_gbl_root_table_list at
> >> early time. We can find SRAT easily by comparing the signature.
> >>
> >> This patch alse record signature and some other info in
> >> acpi_gbl_root_table_list at early time.
> >>
> >> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
> >> Reviewed-by: Zhang Yanfei<zhangyanfei@cn.fujitsu.com>
> >
> > The subject is misleading, as the change is in ACPICA and therefore affects not
> > only x86.
> 
> OK, will change it.
> 
> >
> > Also I think the same comments as for the other ACPICA patch is this series
> > applies: You shouldn't modify acpi_tbl_parse_root_table() in ways that would
> > require the other OSes using ACPICA to be modified.
> >
> 
> Thank you for the reminding. Please refer to the attachment.
> How do you think of the idea from Zheng ?

It's doable and, quite frankly, if the ACPICA maintainers are happy, I'm fine
with that too.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 07/18] x86, ACPI: Also initialize signature and length when parsing root table.
@ 2013-08-05 13:28         ` Rafael J. Wysocki
  0 siblings, 0 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2013-08-05 13:28 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, lenb, tglx, mingo, hpa, akpm, tj, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Monday, August 05, 2013 09:33:32 AM Tang Chen wrote:
> Hi Rafael,
> 
> On 08/02/2013 09:03 PM, Rafael J. Wysocki wrote:
> > On Friday, August 02, 2013 05:14:26 PM Tang Chen wrote:
> >> Besides the phys addr of the acpi tables, it will be very convenient if
> >> we also have the signature of each table in acpi_gbl_root_table_list at
> >> early time. We can find SRAT easily by comparing the signature.
> >>
> >> This patch alse record signature and some other info in
> >> acpi_gbl_root_table_list at early time.
> >>
> >> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
> >> Reviewed-by: Zhang Yanfei<zhangyanfei@cn.fujitsu.com>
> >
> > The subject is misleading, as the change is in ACPICA and therefore affects not
> > only x86.
> 
> OK, will change it.
> 
> >
> > Also I think the same comments as for the other ACPICA patch is this series
> > applies: You shouldn't modify acpi_tbl_parse_root_table() in ways that would
> > require the other OSes using ACPICA to be modified.
> >
> 
> Thank you for the reminding. Please refer to the attachment.
> How do you think of the idea from Zheng ?

It's doable and, quite frankly, if the ACPICA maintainers are happy, I'm fine
with that too.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  2013-08-05  6:22     ` Tang Chen
@ 2013-08-05 14:52       ` Tejun Heo
  -1 siblings, 0 replies; 70+ messages in thread
From: Tejun Heo @ 2013-08-05 14:52 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Mon, Aug 05, 2013 at 02:22:47PM +0800, Tang Chen wrote:
> I have resent the v2 patch-set. Would you please give some more
> comments about the memblock and x86 booting code modification ?

Patch 13 still seems corrupt.  Is it a problem on my side maybe?
Nope, gmane raw message is corrupt too.

 http://article.gmane.org/gmane.linux.kernel.mm/104549/raw

Can you please verify your mail setup?  It's not very nice to repeat
the same problem.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
@ 2013-08-05 14:52       ` Tejun Heo
  0 siblings, 0 replies; 70+ messages in thread
From: Tejun Heo @ 2013-08-05 14:52 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Mon, Aug 05, 2013 at 02:22:47PM +0800, Tang Chen wrote:
> I have resent the v2 patch-set. Would you please give some more
> comments about the memblock and x86 booting code modification ?

Patch 13 still seems corrupt.  Is it a problem on my side maybe?
Nope, gmane raw message is corrupt too.

 http://article.gmane.org/gmane.linux.kernel.mm/104549/raw

Can you please verify your mail setup?  It's not very nice to repeat
the same problem.

-- 
tejun

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  2013-08-05 14:52       ` Tejun Heo
@ 2013-08-05 15:12         ` Zhang Yanfei
  -1 siblings, 0 replies; 70+ messages in thread
From: Zhang Yanfei @ 2013-08-05 15:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Tang Chen, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hi tj,

On 08/05/2013 10:52 PM, Tejun Heo wrote:
> On Mon, Aug 05, 2013 at 02:22:47PM +0800, Tang Chen wrote:
>> I have resent the v2 patch-set. Would you please give some more
>> comments about the memblock and x86 booting code modification ?
> 
> Patch 13 still seems corrupt.  Is it a problem on my side maybe?
> Nope, gmane raw message is corrupt too.
> 
>  http://article.gmane.org/gmane.linux.kernel.mm/104549/raw
> 
> Can you please verify your mail setup?  It's not very nice to repeat
> the same problem.
> 

Sorry for this format problem again. Maybe our mail client does have some
problem. We will check tomorrow when we go to our company since we are at
night now....

And could you please kindly help reviewing other memblock and bootstrap related
patches, so we could have a discussion with you and come to an agreement as soon
as possible.

Thanks in advance!

-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
@ 2013-08-05 15:12         ` Zhang Yanfei
  0 siblings, 0 replies; 70+ messages in thread
From: Zhang Yanfei @ 2013-08-05 15:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Tang Chen, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hi tj,

On 08/05/2013 10:52 PM, Tejun Heo wrote:
> On Mon, Aug 05, 2013 at 02:22:47PM +0800, Tang Chen wrote:
>> I have resent the v2 patch-set. Would you please give some more
>> comments about the memblock and x86 booting code modification ?
> 
> Patch 13 still seems corrupt.  Is it a problem on my side maybe?
> Nope, gmane raw message is corrupt too.
> 
>  http://article.gmane.org/gmane.linux.kernel.mm/104549/raw
> 
> Can you please verify your mail setup?  It's not very nice to repeat
> the same problem.
> 

Sorry for this format problem again. Maybe our mail client does have some
problem. We will check tomorrow when we go to our company since we are at
night now....

And could you please kindly help reviewing other memblock and bootstrap related
patches, so we could have a discussion with you and come to an agreement as soon
as possible.

Thanks in advance!

-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  2013-08-05 14:52       ` Tejun Heo
@ 2013-08-06  2:29         ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-06  2:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On 08/05/2013 10:52 PM, Tejun Heo wrote:
> On Mon, Aug 05, 2013 at 02:22:47PM +0800, Tang Chen wrote:
>> I have resent the v2 patch-set. Would you please give some more
>> comments about the memblock and x86 booting code modification ?
>
> Patch 13 still seems corrupt.  Is it a problem on my side maybe?
> Nope, gmane raw message is corrupt too.
>
>   http://article.gmane.org/gmane.linux.kernel.mm/104549/raw
>
> Can you please verify your mail setup?  It's not very nice to repeat
> the same problem.

Hi tj,

I'm sorry but seeing from lkml, it is OK. And the patch was formatted
by git and sent by git send-email.

   https://lkml.org/lkml/2013/8/2/135

I'll redo and resend this patch again.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
@ 2013-08-06  2:29         ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-06  2:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On 08/05/2013 10:52 PM, Tejun Heo wrote:
> On Mon, Aug 05, 2013 at 02:22:47PM +0800, Tang Chen wrote:
>> I have resent the v2 patch-set. Would you please give some more
>> comments about the memblock and x86 booting code modification ?
>
> Patch 13 still seems corrupt.  Is it a problem on my side maybe?
> Nope, gmane raw message is corrupt too.
>
>   http://article.gmane.org/gmane.linux.kernel.mm/104549/raw
>
> Can you please verify your mail setup?  It's not very nice to repeat
> the same problem.

Hi tj,

I'm sorry but seeing from lkml, it is OK. And the patch was formatted
by git and sent by git send-email.

   https://lkml.org/lkml/2013/8/2/135

I'll redo and resend this patch again.

Thanks.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  2013-08-05 14:52       ` Tejun Heo
@ 2013-08-06  2:50         ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-06  2:50 UTC (permalink / raw)
  To: tj; +Cc: linux-kernel, linux-mm

At early time, memblock will reserve some memory for the kernel,
such as the kernel code and data segments, initrd file, and so on,
which means the kernel resides in these memory regions.

Even if these memory regions are hotpluggable, we should not
mark them as hotpluggable. Otherwise the kernel won't have enough
memory to boot.

This patch finds out which memory regions the kernel resides in,
and skip them when finding all hotpluggable memory regions.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ef9ccf8..10a30ef 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -31,6 +31,7 @@
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
 #include <linux/acpi.h>
+#include <linux/memblock.h>
 
 #include <asm/tlbflush.h>
 
@@ -93,6 +94,40 @@ static void release_memory_resource(struct resource *res)
 
 #ifdef CONFIG_ACPI_NUMA
 /**
+ * kernel_resides_in_range - Check if kernel resides in a memory region.
+ * @base: The base address of the memory region.
+ * @length: The length of the memory region.
+ *
+ * This function is used at early time. It iterates memblock.reserved and check
+ * if the kernel has used any memory in [@base, @base + @length).
+ *
+ * Return true if the kernel resides in the memory region, false otherwise.
+ */
+static bool __init kernel_resides_in_region(phys_addr_t base, u64 length)
+{
+	int i;
+	phys_addr_t start, end;
+	struct memblock_region *region;
+	struct memblock_type *reserved = &memblock.reserved;
+
+	for (i = 0; i < reserved->cnt; i++) {
+		region = &reserved->regions[i];
+
+		if (region->flags != MEMBLOCK_HOTPLUG)
+			continue;
+
+		start = region->base;
+		end = region->base + region->size;
+		if (end <= base || start >= base + length)
+			continue;
+
+		return true;
+	}
+
+	return false;
+}
+
+/**
  * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
  *
  * This function did the following:
@@ -129,6 +164,16 @@ void __init find_hotpluggable_memory(void)
 
 	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
 						      &size, &offset))) {
+		/*
+		 * At early time, memblock will reserve some memory for the
+		 * kernel, such as the kernel code and data segments, initrd
+		 * file, and so on,which means the kernel resides in these
+		 * memory regions. These regions should not be hotpluggable.
+		 * So do not mark them as hotpluggable.
+		 */
+		if (kernel_resides_in_region(base, size))
+			continue;
+
 		/* Will mark hotpluggable memory regions here */
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
@ 2013-08-06  2:50         ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-06  2:50 UTC (permalink / raw)
  To: tj; +Cc: linux-kernel, linux-mm

At early time, memblock will reserve some memory for the kernel,
such as the kernel code and data segments, initrd file, and so on,
which means the kernel resides in these memory regions.

Even if these memory regions are hotpluggable, we should not
mark them as hotpluggable. Otherwise the kernel won't have enough
memory to boot.

This patch finds out which memory regions the kernel resides in,
and skip them when finding all hotpluggable memory regions.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ef9ccf8..10a30ef 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -31,6 +31,7 @@
 #include <linux/firmware-map.h>
 #include <linux/stop_machine.h>
 #include <linux/acpi.h>
+#include <linux/memblock.h>
 
 #include <asm/tlbflush.h>
 
@@ -93,6 +94,40 @@ static void release_memory_resource(struct resource *res)
 
 #ifdef CONFIG_ACPI_NUMA
 /**
+ * kernel_resides_in_range - Check if kernel resides in a memory region.
+ * @base: The base address of the memory region.
+ * @length: The length of the memory region.
+ *
+ * This function is used at early time. It iterates memblock.reserved and check
+ * if the kernel has used any memory in [@base, @base + @length).
+ *
+ * Return true if the kernel resides in the memory region, false otherwise.
+ */
+static bool __init kernel_resides_in_region(phys_addr_t base, u64 length)
+{
+	int i;
+	phys_addr_t start, end;
+	struct memblock_region *region;
+	struct memblock_type *reserved = &memblock.reserved;
+
+	for (i = 0; i < reserved->cnt; i++) {
+		region = &reserved->regions[i];
+
+		if (region->flags != MEMBLOCK_HOTPLUG)
+			continue;
+
+		start = region->base;
+		end = region->base + region->size;
+		if (end <= base || start >= base + length)
+			continue;
+
+		return true;
+	}
+
+	return false;
+}
+
+/**
  * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
  *
  * This function did the following:
@@ -129,6 +164,16 @@ void __init find_hotpluggable_memory(void)
 
 	while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, &base,
 						      &size, &offset))) {
+		/*
+		 * At early time, memblock will reserve some memory for the
+		 * kernel, such as the kernel code and data segments, initrd
+		 * file, and so on,which means the kernel resides in these
+		 * memory regions. These regions should not be hotpluggable.
+		 * So do not mark them as hotpluggable.
+		 */
+		if (kernel_resides_in_region(base, size))
+			continue;
+
 		/* Will mark hotpluggable memory regions here */
 	}
 
-- 
1.7.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
  2013-08-06  2:29         ` Tang Chen
@ 2013-08-06 15:10           ` Tejun Heo
  -1 siblings, 0 replies; 70+ messages in thread
From: Tejun Heo @ 2013-08-06 15:10 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Tue, Aug 06, 2013 at 10:29:16AM +0800, Tang Chen wrote:
> I'm sorry but seeing from lkml, it is OK. And the patch was formatted
> by git and sent by git send-email.
> 
>   https://lkml.org/lkml/2013/8/2/135

Yeah, I checked that too but I think lkml.org is doing the QP
decoding.  The raw link from gmane shows the raw message received so
I'm relatively sure that something on the sending side is doing QP
encoding as the mail travels out.  Can you please try sending it via
gmail?  gmail does smtp and you can set the sender address to whatever
you want too as long as you can receive messages on that address.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.
@ 2013-08-06 15:10           ` Tejun Heo
  0 siblings, 0 replies; 70+ messages in thread
From: Tejun Heo @ 2013-08-06 15:10 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, trenn,
	yinghai, jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku,
	mgorman, minchan, mina86, gong.chen, vasilis.liaskovitis,
	lwoodman, riel, jweiner, prarit, zhangyanfei, yanghy, x86,
	linux-doc, linux-kernel, linux-mm, linux-acpi

On Tue, Aug 06, 2013 at 10:29:16AM +0800, Tang Chen wrote:
> I'm sorry but seeing from lkml, it is OK. And the patch was formatted
> by git and sent by git send-email.
> 
>   https://lkml.org/lkml/2013/8/2/135

Yeah, I checked that too but I think lkml.org is doing the QP
decoding.  The raw link from gmane shows the raw message received so
I'm relatively sure that something on the sending side is doing QP
encoding as the mail travels out.  Can you please try sending it via
gmail?  gmail does smtp and you can set the sender address to whatever
you want too as long as you can receive messages on that address.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 04/18] acpi: Introduce acpi_verify_initrd() to check if a table is invalid.
  2013-08-02  9:14   ` Tang Chen
@ 2013-08-06 23:02     ` Toshi Kani
  -1 siblings, 0 replies; 70+ messages in thread
From: Toshi Kani @ 2013-08-06 23:02 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On Fri, 2013-08-02 at 17:14 +0800, Tang Chen wrote:
> In acpi_initrd_override(), it checks several things to ensure the
> table it found is valid. In later patches, we need to do these check
> somewhere else. So this patch introduces a common function
> acpi_verify_initrd() to do all these checks, and reuse it in different

Typo: acpi_verify_initrd() -> acpi_verify_table()

-Toshi


> places. The function will be used in the subsequent patches.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Acked-by: Toshi Kani <toshi.kani@hp.com>
> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/acpi/osl.c |   86 +++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 61 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 3b8bab2..0043e9f 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -572,9 +572,68 @@ static const char * const table_sigs[] = {
>  /* Must not increase 10 or needs code modification below */
>  #define ACPI_OVERRIDE_TABLES 10
>  
> +/*******************************************************************************
> + *
> + * FUNCTION:    acpi_verify_table
> + *
> + * PARAMETERS:  File               - The initrd file
> + *              Path               - Path to acpi overriding tables in cpio file
> + *              Signature          - Signature of the table
> + *
> + * RETURN:      0 if it passes all the checks, -EINVAL if any check fails.
> + *
> + * DESCRIPTION: Check if an acpi table found in initrd is invalid.
> + *              @signature can be NULL. If it is NULL, the function will check
> + *              if the table signature matches any signature in table_sigs[].
> + *
> + ******************************************************************************/
> +int __init acpi_verify_table(struct cpio_data *file,
 :


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 04/18] acpi: Introduce acpi_verify_initrd() to check if a table is invalid.
@ 2013-08-06 23:02     ` Toshi Kani
  0 siblings, 0 replies; 70+ messages in thread
From: Toshi Kani @ 2013-08-06 23:02 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On Fri, 2013-08-02 at 17:14 +0800, Tang Chen wrote:
> In acpi_initrd_override(), it checks several things to ensure the
> table it found is valid. In later patches, we need to do these check
> somewhere else. So this patch introduces a common function
> acpi_verify_initrd() to do all these checks, and reuse it in different

Typo: acpi_verify_initrd() -> acpi_verify_table()

-Toshi


> places. The function will be used in the subsequent patches.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Acked-by: Toshi Kani <toshi.kani@hp.com>
> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/acpi/osl.c |   86 +++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 61 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 3b8bab2..0043e9f 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -572,9 +572,68 @@ static const char * const table_sigs[] = {
>  /* Must not increase 10 or needs code modification below */
>  #define ACPI_OVERRIDE_TABLES 10
>  
> +/*******************************************************************************
> + *
> + * FUNCTION:    acpi_verify_table
> + *
> + * PARAMETERS:  File               - The initrd file
> + *              Path               - Path to acpi overriding tables in cpio file
> + *              Signature          - Signature of the table
> + *
> + * RETURN:      0 if it passes all the checks, -EINVAL if any check fails.
> + *
> + * DESCRIPTION: Check if an acpi table found in initrd is invalid.
> + *              @signature can be NULL. If it is NULL, the function will check
> + *              if the table signature matches any signature in table_sigs[].
> + *
> + ******************************************************************************/
> +int __init acpi_verify_table(struct cpio_data *file,
 :



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 11/18] x86, acpi: Try to find SRAT in firmware earlier.
  2013-08-02  9:14   ` Tang Chen
@ 2013-08-06 23:33     ` Toshi Kani
  -1 siblings, 0 replies; 70+ messages in thread
From: Toshi Kani @ 2013-08-06 23:33 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On Fri, 2013-08-02 at 17:14 +0800, Tang Chen wrote:
> This patch introduce early_acpi_firmware_srat() to find the
> phys addr of SRAT provided by firmware. And call it in
> find_hotpluggable_memory().
> 
> Since we have initialized acpi_gbl_root_table_list earlier,
> and store all the tables' phys addrs and signatures in it,
> it is easy to find the SRAT.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/acpica/tbxface.c |   32 ++++++++++++++++++++++++++++++++
>  drivers/acpi/osl.c            |   22 ++++++++++++++++++++++
>  include/acpi/acpixf.h         |    4 ++++
>  include/linux/acpi.h          |    4 ++++
>  mm/memory_hotplug.c           |    8 ++++++--
>  5 files changed, 68 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c

Please add "ACPICA" to the patch title.  This patch also needs to be
reviewed by ACPICA folks.

> index ad11162..6a92f12 100644
> --- a/drivers/acpi/acpica/tbxface.c
> +++ b/drivers/acpi/acpica/tbxface.c

 :

> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index dcbca3e..ec490fe 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -53,6 +53,7 @@
>  #include <acpi/acpi.h>
>  #include <acpi/acpi_bus.h>
>  #include <acpi/processor.h>
> +#include <acpi/acpixf.h>
>  
>  #define _COMPONENT		ACPI_OS_SERVICES
>  ACPI_MODULE_NAME("osl");
> @@ -760,6 +761,27 @@ void __init acpi_initrd_override(void *data, size_t size)
>  }
>  #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
>  
> +#ifdef CONFIG_ACPI_NUMA
> +/*******************************************************************************
> + *
> + * FUNCTION:    early_acpi_firmware_srat
> + *
> + * RETURN:      Phys addr of SRAT on success, 0 on error.
> + *
> + * DESCRIPTION: Get the phys addr of SRAT provided by firmware.
> + *
> + ******************************************************************************/
> +phys_addr_t __init early_acpi_firmware_srat(void)
> +{
> +	struct acpi_table_desc table_desc;
> +
> +	if (acpi_get_table_desc(ACPI_SIG_SRAT, &table_desc))

This check should use ACPI_FAILURE() macro:

  if (ACPI_FAILURE(acpi_get_table_desc(ACPI_SIG_SRAT, &table_desc))

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 11/18] x86, acpi: Try to find SRAT in firmware earlier.
@ 2013-08-06 23:33     ` Toshi Kani
  0 siblings, 0 replies; 70+ messages in thread
From: Toshi Kani @ 2013-08-06 23:33 UTC (permalink / raw)
  To: Tang Chen
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On Fri, 2013-08-02 at 17:14 +0800, Tang Chen wrote:
> This patch introduce early_acpi_firmware_srat() to find the
> phys addr of SRAT provided by firmware. And call it in
> find_hotpluggable_memory().
> 
> Since we have initialized acpi_gbl_root_table_list earlier,
> and store all the tables' phys addrs and signatures in it,
> it is easy to find the SRAT.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/acpi/acpica/tbxface.c |   32 ++++++++++++++++++++++++++++++++
>  drivers/acpi/osl.c            |   22 ++++++++++++++++++++++
>  include/acpi/acpixf.h         |    4 ++++
>  include/linux/acpi.h          |    4 ++++
>  mm/memory_hotplug.c           |    8 ++++++--
>  5 files changed, 68 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c

Please add "ACPICA" to the patch title.  This patch also needs to be
reviewed by ACPICA folks.

> index ad11162..6a92f12 100644
> --- a/drivers/acpi/acpica/tbxface.c
> +++ b/drivers/acpi/acpica/tbxface.c

 :

> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index dcbca3e..ec490fe 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -53,6 +53,7 @@
>  #include <acpi/acpi.h>
>  #include <acpi/acpi_bus.h>
>  #include <acpi/processor.h>
> +#include <acpi/acpixf.h>
>  
>  #define _COMPONENT		ACPI_OS_SERVICES
>  ACPI_MODULE_NAME("osl");
> @@ -760,6 +761,27 @@ void __init acpi_initrd_override(void *data, size_t size)
>  }
>  #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
>  
> +#ifdef CONFIG_ACPI_NUMA
> +/*******************************************************************************
> + *
> + * FUNCTION:    early_acpi_firmware_srat
> + *
> + * RETURN:      Phys addr of SRAT on success, 0 on error.
> + *
> + * DESCRIPTION: Get the phys addr of SRAT provided by firmware.
> + *
> + ******************************************************************************/
> +phys_addr_t __init early_acpi_firmware_srat(void)
> +{
> +	struct acpi_table_desc table_desc;
> +
> +	if (acpi_get_table_desc(ACPI_SIG_SRAT, &table_desc))

This check should use ACPI_FAILURE() macro:

  if (ACPI_FAILURE(acpi_get_table_desc(ACPI_SIG_SRAT, &table_desc))

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 11/18] x86, acpi: Try to find SRAT in firmware earlier.
  2013-08-06 23:33     ` Toshi Kani
@ 2013-08-07  1:37       ` Tang Chen
  -1 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-07  1:37 UTC (permalink / raw)
  To: Toshi Kani
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On 08/07/2013 07:33 AM, Toshi Kani wrote:
> On Fri, 2013-08-02 at 17:14 +0800, Tang Chen wrote:
>> This patch introduce early_acpi_firmware_srat() to find the
>> phys addr of SRAT provided by firmware. And call it in
>> find_hotpluggable_memory().
>>
>> Since we have initialized acpi_gbl_root_table_list earlier,
>> and store all the tables' phys addrs and signatures in it,
>> it is easy to find the SRAT.
>>
>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>> Reviewed-by: Zhang Yanfei<zhangyanfei@cn.fujitsu.com>
>> ---
>>   drivers/acpi/acpica/tbxface.c |   32 ++++++++++++++++++++++++++++++++
>>   drivers/acpi/osl.c            |   22 ++++++++++++++++++++++
>>   include/acpi/acpixf.h         |    4 ++++
>>   include/linux/acpi.h          |    4 ++++
>>   mm/memory_hotplug.c           |    8 ++++++--
>>   5 files changed, 68 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c
>
> Please add "ACPICA" to the patch title.  This patch also needs to be
> reviewed by ACPICA folks.

OK, followed.

>
>> index ad11162..6a92f12 100644
>> --- a/drivers/acpi/acpica/tbxface.c
>> +++ b/drivers/acpi/acpica/tbxface.c
>
>   :
>
>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>> index dcbca3e..ec490fe 100644
>> --- a/drivers/acpi/osl.c
>> +++ b/drivers/acpi/osl.c
>> @@ -53,6 +53,7 @@
>>   #include<acpi/acpi.h>
>>   #include<acpi/acpi_bus.h>
>>   #include<acpi/processor.h>
>> +#include<acpi/acpixf.h>
>>
>>   #define _COMPONENT		ACPI_OS_SERVICES
>>   ACPI_MODULE_NAME("osl");
>> @@ -760,6 +761,27 @@ void __init acpi_initrd_override(void *data, size_t size)
>>   }
>>   #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
>>
>> +#ifdef CONFIG_ACPI_NUMA
>> +/*******************************************************************************
>> + *
>> + * FUNCTION:    early_acpi_firmware_srat
>> + *
>> + * RETURN:      Phys addr of SRAT on success, 0 on error.
>> + *
>> + * DESCRIPTION: Get the phys addr of SRAT provided by firmware.
>> + *
>> + ******************************************************************************/
>> +phys_addr_t __init early_acpi_firmware_srat(void)
>> +{
>> +	struct acpi_table_desc table_desc;
>> +
>> +	if (acpi_get_table_desc(ACPI_SIG_SRAT,&table_desc))
>
> This check should use ACPI_FAILURE() macro:
>
>    if (ACPI_FAILURE(acpi_get_table_desc(ACPI_SIG_SRAT,&table_desc))

OK, will change it.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2 RESEND 11/18] x86, acpi: Try to find SRAT in firmware earlier.
@ 2013-08-07  1:37       ` Tang Chen
  0 siblings, 0 replies; 70+ messages in thread
From: Tang Chen @ 2013-08-07  1:37 UTC (permalink / raw)
  To: Toshi Kani
  Cc: robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa, akpm, tj,
	trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On 08/07/2013 07:33 AM, Toshi Kani wrote:
> On Fri, 2013-08-02 at 17:14 +0800, Tang Chen wrote:
>> This patch introduce early_acpi_firmware_srat() to find the
>> phys addr of SRAT provided by firmware. And call it in
>> find_hotpluggable_memory().
>>
>> Since we have initialized acpi_gbl_root_table_list earlier,
>> and store all the tables' phys addrs and signatures in it,
>> it is easy to find the SRAT.
>>
>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>> Reviewed-by: Zhang Yanfei<zhangyanfei@cn.fujitsu.com>
>> ---
>>   drivers/acpi/acpica/tbxface.c |   32 ++++++++++++++++++++++++++++++++
>>   drivers/acpi/osl.c            |   22 ++++++++++++++++++++++
>>   include/acpi/acpixf.h         |    4 ++++
>>   include/linux/acpi.h          |    4 ++++
>>   mm/memory_hotplug.c           |    8 ++++++--
>>   5 files changed, 68 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c
>
> Please add "ACPICA" to the patch title.  This patch also needs to be
> reviewed by ACPICA folks.

OK, followed.

>
>> index ad11162..6a92f12 100644
>> --- a/drivers/acpi/acpica/tbxface.c
>> +++ b/drivers/acpi/acpica/tbxface.c
>
>   :
>
>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>> index dcbca3e..ec490fe 100644
>> --- a/drivers/acpi/osl.c
>> +++ b/drivers/acpi/osl.c
>> @@ -53,6 +53,7 @@
>>   #include<acpi/acpi.h>
>>   #include<acpi/acpi_bus.h>
>>   #include<acpi/processor.h>
>> +#include<acpi/acpixf.h>
>>
>>   #define _COMPONENT		ACPI_OS_SERVICES
>>   ACPI_MODULE_NAME("osl");
>> @@ -760,6 +761,27 @@ void __init acpi_initrd_override(void *data, size_t size)
>>   }
>>   #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
>>
>> +#ifdef CONFIG_ACPI_NUMA
>> +/*******************************************************************************
>> + *
>> + * FUNCTION:    early_acpi_firmware_srat
>> + *
>> + * RETURN:      Phys addr of SRAT on success, 0 on error.
>> + *
>> + * DESCRIPTION: Get the phys addr of SRAT provided by firmware.
>> + *
>> + ******************************************************************************/
>> +phys_addr_t __init early_acpi_firmware_srat(void)
>> +{
>> +	struct acpi_table_desc table_desc;
>> +
>> +	if (acpi_get_table_desc(ACPI_SIG_SRAT,&table_desc))
>
> This check should use ACPI_FAILURE() macro:
>
>    if (ACPI_FAILURE(acpi_get_table_desc(ACPI_SIG_SRAT,&table_desc))

OK, will change it.

Thanks.


^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2013-08-07  1:39 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-02  9:14 [PATCH v2 RESEND 00/18] Arrange hotpluggable memory as ZONE_MOVABLE Tang Chen
2013-08-02  9:14 ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 01/18] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 02/18] earlycpio.c: Fix the confusing comment of find_cpio_data() Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 03/18] acpi: Remove "continue" in macro INVALID_TABLE() Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 04/18] acpi: Introduce acpi_verify_initrd() to check if a table is invalid Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-06 23:02   ` Toshi Kani
2013-08-06 23:02     ` Toshi Kani
2013-08-02  9:14 ` [PATCH v2 RESEND 05/18] x86, ACPICA: Split acpi_boot_table_init() into two parts Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02 13:00   ` Rafael J. Wysocki
2013-08-02 13:00     ` Rafael J. Wysocki
2013-08-05  3:21     ` Tang Chen
2013-08-05  3:21       ` Tang Chen
2013-08-05 13:26       ` Rafael J. Wysocki
2013-08-05 13:26         ` Rafael J. Wysocki
2013-08-05 13:23         ` Tang Chen
2013-08-05 13:23           ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 06/18] x86, acpi, ACPICA: Initialize ACPI root table list earlier Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 07/18] x86, ACPI: Also initialize signature and length when parsing root table Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02 13:03   ` Rafael J. Wysocki
2013-08-02 13:03     ` Rafael J. Wysocki
2013-08-05  1:33     ` Tang Chen
2013-08-05  1:33       ` Tang Chen
2013-08-05 13:28       ` Rafael J. Wysocki
2013-08-05 13:28         ` Rafael J. Wysocki
2013-08-02  9:14 ` [PATCH v2 RESEND 08/18] x86: get pg_data_t's memory from other node Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 09/18] x86: Make get_ramdisk_{image|size}() global Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 10/18] x86, acpi: Try to find if SRAT is overrided earlier Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 11/18] x86, acpi: Try to find SRAT in firmware earlier Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-06 23:33   ` Toshi Kani
2013-08-06 23:33     ` Toshi Kani
2013-08-07  1:37     ` Tang Chen
2013-08-07  1:37       ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 12/18] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-05  6:22   ` Tang Chen
2013-08-05  6:22     ` Tang Chen
2013-08-05 14:52     ` Tejun Heo
2013-08-05 14:52       ` Tejun Heo
2013-08-05 15:12       ` Zhang Yanfei
2013-08-05 15:12         ` Zhang Yanfei
2013-08-06  2:29       ` Tang Chen
2013-08-06  2:29         ` Tang Chen
2013-08-06 15:10         ` Tejun Heo
2013-08-06 15:10           ` Tejun Heo
2013-08-06  2:50       ` Tang Chen
2013-08-06  2:50         ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 14/18] memblock, numa: Introduce flag into memblock Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 15/18] memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 16/18] memblock, mem_hotplug: Make memblock skip hotpluggable regions by default Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 17/18] mem-hotplug: Introduce movablenode boot option to {en|dis}able using SRAT Tang Chen
2013-08-02  9:14   ` Tang Chen
2013-08-02  9:14 ` [PATCH v2 RESEND 18/18] x86, numa, acpi, memory-hotplug: Make movablenode have higher priority Tang Chen
2013-08-02  9:14   ` Tang Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.