All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 10:15 ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch-set aims to move acpi_initrd_override() earlier on x86.
Some of the patches are from Yinghai's patch-set:
https://lkml.org/lkml/2013/6/14/561

The difference between this patch-set and Yinghai's original patch-set are:
1. This patch-set doesn't split acpi_initrd_override(), but call it as a
   whole operation at early time.
2. Allocate memory from BRK to store override tables.
   (This idea is also from Yinghai.)


[Current state]

The current Linux kernel will initialize acpi tables like the following:

1. Find all acpi override table provided by users in initrd.
   (Linux allows users to override acpi tables in firmware, by specifying
   their own tables in initrd.)

2. Use acpica code to initialize acpi global root table list and install all
   tables into it. If any override tables exists, use it to override the one
   provided by firmware.

Then others can parse these tables and get useful info.

Both of the two steps happen after direct mapping page tables are setup.

[Issues]

In the current Linux kernel, the initialization of acpi tables is too late for
new functionalities.

We have some issues about this:

* For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
  ranges are hotpluggable, and prevent bootmem allocator from allocating memory
  for the kernel. (Kernel pages cannot be hotplugged because )

* As suggested by Yinghai Lu <yinghai@kernel.org>, we should allocate page tables
  in local node. This also needs SRAT before direct mapping page tables are setup.

* As mentioned by Toshi Kani <toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
  allow the OS to initialize serial console/debug ports at early boot time. The
  earlier it can be initialized, the better this feature will be.  These tables
  are not currently used by Linux due to a licensing issue, but it could be
  addressed some time soon.


[What are we doing]

We are trying to initialize acip tables as early as possible. But Linux kernel
allows users to override acpi tables by specifying their own tables in initrd.
So we have to do acpi_initrd_override() earlier first.


[About this patch-set]

This patch-set aims to move acpi_initrd_override() as early as possible on x86.
As suggested by Yinghai, we are trying to do it like this:

On 32bit: do it in head_32.S, before paging is enabled. In this case, we can
          access initrd with physical address without page tables.

On 64bit: do it in head_64.c, after paging is enabled but before direct mapping
          is setup.

And also, acpi_initrd_override() needs to allocate memory for override tables.
But at such an early time, there is no memory allocator works. So the basic idea
from Yinghai is to use BRK. We will extend BRK 256KB in this patch-set.


Tang Chen (6):
  x86, acpi: Move table_sigs[] to stack.
  x86, acpi, brk: Extend BRK 256KB to store acpi override tables.
  x86, brk: Make extend_brk() available with va/pa.
  x86, acpi: Make acpi_initrd_override() available with va or pa.
  x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available
    with va/pa.
  x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c.

Yinghai Lu (2):
  x86: Make get_ramdisk_{image|size}() global.
  x86, microcode: Use get_ramdisk_{image|size}() in microcode handling.

 arch/x86/include/asm/dmi.h              |    2 +-
 arch/x86/include/asm/setup.h            |   11 +++-
 arch/x86/kernel/head64.c                |    4 +
 arch/x86/kernel/head_32.S               |    4 +
 arch/x86/kernel/microcode_intel_early.c |    8 +-
 arch/x86/kernel/setup.c                 |   93 ++++++++++++++++------
 arch/x86/mm/init.c                      |    2 +-
 arch/x86/xen/enlighten.c                |    2 +-
 arch/x86/xen/mmu.c                      |    6 +-
 arch/x86/xen/p2m.c                      |   27 ++++---
 drivers/acpi/osl.c                      |  130 ++++++++++++++++++++-----------
 include/linux/acpi.h                    |    5 +-
 12 files changed, 196 insertions(+), 98 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 10:15 ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch-set aims to move acpi_initrd_override() earlier on x86.
Some of the patches are from Yinghai's patch-set:
https://lkml.org/lkml/2013/6/14/561

The difference between this patch-set and Yinghai's original patch-set are:
1. This patch-set doesn't split acpi_initrd_override(), but call it as a
   whole operation at early time.
2. Allocate memory from BRK to store override tables.
   (This idea is also from Yinghai.)


[Current state]

The current Linux kernel will initialize acpi tables like the following:

1. Find all acpi override table provided by users in initrd.
   (Linux allows users to override acpi tables in firmware, by specifying
   their own tables in initrd.)

2. Use acpica code to initialize acpi global root table list and install all
   tables into it. If any override tables exists, use it to override the one
   provided by firmware.

Then others can parse these tables and get useful info.

Both of the two steps happen after direct mapping page tables are setup.

[Issues]

In the current Linux kernel, the initialization of acpi tables is too late for
new functionalities.

We have some issues about this:

* For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
  ranges are hotpluggable, and prevent bootmem allocator from allocating memory
  for the kernel. (Kernel pages cannot be hotplugged because )

* As suggested by Yinghai Lu <yinghai@kernel.org>, we should allocate page tables
  in local node. This also needs SRAT before direct mapping page tables are setup.

* As mentioned by Toshi Kani <toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
  allow the OS to initialize serial console/debug ports at early boot time. The
  earlier it can be initialized, the better this feature will be.  These tables
  are not currently used by Linux due to a licensing issue, but it could be
  addressed some time soon.


[What are we doing]

We are trying to initialize acip tables as early as possible. But Linux kernel
allows users to override acpi tables by specifying their own tables in initrd.
So we have to do acpi_initrd_override() earlier first.


[About this patch-set]

This patch-set aims to move acpi_initrd_override() as early as possible on x86.
As suggested by Yinghai, we are trying to do it like this:

On 32bit: do it in head_32.S, before paging is enabled. In this case, we can
          access initrd with physical address without page tables.

On 64bit: do it in head_64.c, after paging is enabled but before direct mapping
          is setup.

And also, acpi_initrd_override() needs to allocate memory for override tables.
But at such an early time, there is no memory allocator works. So the basic idea
from Yinghai is to use BRK. We will extend BRK 256KB in this patch-set.


Tang Chen (6):
  x86, acpi: Move table_sigs[] to stack.
  x86, acpi, brk: Extend BRK 256KB to store acpi override tables.
  x86, brk: Make extend_brk() available with va/pa.
  x86, acpi: Make acpi_initrd_override() available with va or pa.
  x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available
    with va/pa.
  x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c.

Yinghai Lu (2):
  x86: Make get_ramdisk_{image|size}() global.
  x86, microcode: Use get_ramdisk_{image|size}() in microcode handling.

 arch/x86/include/asm/dmi.h              |    2 +-
 arch/x86/include/asm/setup.h            |   11 +++-
 arch/x86/kernel/head64.c                |    4 +
 arch/x86/kernel/head_32.S               |    4 +
 arch/x86/kernel/microcode_intel_early.c |    8 +-
 arch/x86/kernel/setup.c                 |   93 ++++++++++++++++------
 arch/x86/mm/init.c                      |    2 +-
 arch/x86/xen/enlighten.c                |    2 +-
 arch/x86/xen/mmu.c                      |    6 +-
 arch/x86/xen/p2m.c                      |   27 ++++---
 drivers/acpi/osl.c                      |  130 ++++++++++++++++++++-----------
 include/linux/acpi.h                    |    5 +-
 12 files changed, 196 insertions(+), 98 deletions(-)


^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH 1/8] x86: Make get_ramdisk_{image|size}() global.
  2013-08-21 10:15 ` Tang Chen
@ 2013-08-21 10:15   ` Tang Chen
  -1 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

From: Yinghai Lu <yinghai@kernel.org>

This patch does two things:

1. Make get_ramdisk_image() and get_ramdisk_size() global so that we can use
   them in the later patches.

2. In later patches, we are going to call them in head_32.S before paging is
   enabled. In that case, we can only use physical address to access global
   variable like boot_params. So make them take a boot_params pointer parameter
   so that we can pass va or pa to them.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |    3 +++
 arch/x86/kernel/setup.c      |   28 ++++++++++++++--------------
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..4f71d48 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,9 @@ void *extend_brk(size_t size, size_t align);
 	RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+u64 get_ramdisk_image(struct boot_params *bp);
+u64 get_ramdisk_size(struct boot_params *bp);
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f8ec578..5bfd4c8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -297,19 +297,19 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
-static u64 __init get_ramdisk_image(void)
+u64 __init get_ramdisk_image(struct boot_params *bp)
 {
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+	u64 ramdisk_image = bp->hdr.ramdisk_image;
 
-	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+	ramdisk_image |= (u64)bp->ext_ramdisk_image << 32;
 
 	return ramdisk_image;
 }
-static u64 __init get_ramdisk_size(void)
+u64 __init get_ramdisk_size(struct boot_params *bp)
 {
-	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_size = bp->hdr.ramdisk_size;
 
-	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+	ramdisk_size |= (u64)bp->ext_ramdisk_size << 32;
 
 	return ramdisk_size;
 }
@@ -318,8 +318,8 @@ static u64 __init get_ramdisk_size(void)
 static void __init relocate_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 area_size     = PAGE_ALIGN(ramdisk_size);
 	u64 ramdisk_here;
 	unsigned long slop, clen, mapaddr;
@@ -358,8 +358,8 @@ static void __init relocate_initrd(void)
 		ramdisk_size  -= clen;
 	}
 
-	ramdisk_image = get_ramdisk_image();
-	ramdisk_size  = get_ramdisk_size();
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
 	printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
 		" [mem %#010llx-%#010llx]\n",
 		ramdisk_image, ramdisk_image + ramdisk_size - 1,
@@ -369,8 +369,8 @@ static void __init relocate_initrd(void)
 static void __init early_reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 
 	if (!boot_params.hdr.type_of_loader ||
@@ -382,8 +382,8 @@ static void __init early_reserve_initrd(void)
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 	u64 mapped_size;
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 1/8] x86: Make get_ramdisk_{image|size}() global.
@ 2013-08-21 10:15   ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

From: Yinghai Lu <yinghai@kernel.org>

This patch does two things:

1. Make get_ramdisk_image() and get_ramdisk_size() global so that we can use
   them in the later patches.

2. In later patches, we are going to call them in head_32.S before paging is
   enabled. In that case, we can only use physical address to access global
   variable like boot_params. So make them take a boot_params pointer parameter
   so that we can pass va or pa to them.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |    3 +++
 arch/x86/kernel/setup.c      |   28 ++++++++++++++--------------
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..4f71d48 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,9 @@ void *extend_brk(size_t size, size_t align);
 	RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+u64 get_ramdisk_image(struct boot_params *bp);
+u64 get_ramdisk_size(struct boot_params *bp);
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f8ec578..5bfd4c8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -297,19 +297,19 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
-static u64 __init get_ramdisk_image(void)
+u64 __init get_ramdisk_image(struct boot_params *bp)
 {
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+	u64 ramdisk_image = bp->hdr.ramdisk_image;
 
-	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+	ramdisk_image |= (u64)bp->ext_ramdisk_image << 32;
 
 	return ramdisk_image;
 }
-static u64 __init get_ramdisk_size(void)
+u64 __init get_ramdisk_size(struct boot_params *bp)
 {
-	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_size = bp->hdr.ramdisk_size;
 
-	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+	ramdisk_size |= (u64)bp->ext_ramdisk_size << 32;
 
 	return ramdisk_size;
 }
@@ -318,8 +318,8 @@ static u64 __init get_ramdisk_size(void)
 static void __init relocate_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 area_size     = PAGE_ALIGN(ramdisk_size);
 	u64 ramdisk_here;
 	unsigned long slop, clen, mapaddr;
@@ -358,8 +358,8 @@ static void __init relocate_initrd(void)
 		ramdisk_size  -= clen;
 	}
 
-	ramdisk_image = get_ramdisk_image();
-	ramdisk_size  = get_ramdisk_size();
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
 	printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
 		" [mem %#010llx-%#010llx]\n",
 		ramdisk_image, ramdisk_image + ramdisk_size - 1,
@@ -369,8 +369,8 @@ static void __init relocate_initrd(void)
 static void __init early_reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 
 	if (!boot_params.hdr.type_of_loader ||
@@ -382,8 +382,8 @@ static void __init early_reserve_initrd(void)
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = get_ramdisk_image();
-	u64 ramdisk_size  = get_ramdisk_size();
+	u64 ramdisk_image = get_ramdisk_image(&boot_params);
+	u64 ramdisk_size  = get_ramdisk_size(&boot_params);
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 	u64 mapped_size;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 2/8] x86, microcode: Use get_ramdisk_{image|size}() in microcode handling.
  2013-08-21 10:15 ` Tang Chen
@ 2013-08-21 10:15   ` Tang Chen
  -1 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi, Fenghua Yu

From: Yinghai Lu <yinghai@kernel.org>

Since we made get_ramdisk_{image|size}() global, use them when we want
to access ramdisk.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/microcode_intel_early.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c
index 1575deb..4c58a1b 100644
--- a/arch/x86/kernel/microcode_intel_early.c
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -743,8 +743,8 @@ load_ucode_intel_bsp(void)
 	struct boot_params *boot_params_p;
 
 	boot_params_p = (struct boot_params *)__pa_nodebug(&boot_params);
-	ramdisk_image = boot_params_p->hdr.ramdisk_image;
-	ramdisk_size  = boot_params_p->hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image(boot_params_p);
+	ramdisk_size  = get_ramdisk_size(boot_params_p);
 	initrd_start_early = ramdisk_image;
 	initrd_end_early = initrd_start_early + ramdisk_size;
 
@@ -753,8 +753,8 @@ load_ucode_intel_bsp(void)
 		(unsigned long *)__pa_nodebug(&mc_saved_in_initrd),
 		initrd_start_early, initrd_end_early, &uci);
 #else
-	ramdisk_image = boot_params.hdr.ramdisk_image;
-	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
 	initrd_start_early = ramdisk_image + PAGE_OFFSET;
 	initrd_end_early = initrd_start_early + ramdisk_size;
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 2/8] x86, microcode: Use get_ramdisk_{image|size}() in microcode handling.
@ 2013-08-21 10:15   ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi, Fenghua Yu

From: Yinghai Lu <yinghai@kernel.org>

Since we made get_ramdisk_{image|size}() global, use them when we want
to access ramdisk.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/microcode_intel_early.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c
index 1575deb..4c58a1b 100644
--- a/arch/x86/kernel/microcode_intel_early.c
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -743,8 +743,8 @@ load_ucode_intel_bsp(void)
 	struct boot_params *boot_params_p;
 
 	boot_params_p = (struct boot_params *)__pa_nodebug(&boot_params);
-	ramdisk_image = boot_params_p->hdr.ramdisk_image;
-	ramdisk_size  = boot_params_p->hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image(boot_params_p);
+	ramdisk_size  = get_ramdisk_size(boot_params_p);
 	initrd_start_early = ramdisk_image;
 	initrd_end_early = initrd_start_early + ramdisk_size;
 
@@ -753,8 +753,8 @@ load_ucode_intel_bsp(void)
 		(unsigned long *)__pa_nodebug(&mc_saved_in_initrd),
 		initrd_start_early, initrd_end_early, &uci);
 #else
-	ramdisk_image = boot_params.hdr.ramdisk_image;
-	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
 	initrd_start_early = ramdisk_image + PAGE_OFFSET;
 	initrd_end_early = initrd_start_early + ramdisk_size;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 3/8] x86, acpi: Move table_sigs[] to stack.
  2013-08-21 10:15 ` Tang Chen
@ 2013-08-21 10:15   ` Tang Chen
  -1 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

On 64bit, we will do acpi_initrd_override() in x86_64_start_kernel(). This is
after CPU enables paging, but before direct mapping page tables are setup.
This is OK because we have an early page fault handler to help to access data
without direct mapping page tables.

But on 32bit, in order to keep the x86_32 and x86_64 unified code clean, we want
to do acpi_initrd_override() in head_32.S, before CPU enables paging. Without
direct mapping page tables, we need to access data with physical address.

For global variables, if we access them, we are using va. So on 32bit, we have
to convert them into pa. But for a global array, it could be too messy.

So this patch move table_sigs[] to stack. Define it in acpi_initrd_override().
It is no more than 36 pointers, so it is OK to put it on stack.

Originally-From: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 drivers/acpi/osl.c |   23 +++++++++++------------
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e7effc1..06996d8 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -551,18 +551,6 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
 	return sum;
 }
 
-/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
-static const char * const table_sigs[] = {
-	ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
-	ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
-	ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
-	ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
-	ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
-	ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
-	ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
-	ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
-
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
 /* Must not increase 10 or needs code modification below */
@@ -577,6 +565,17 @@ void __init acpi_initrd_override(void *data, size_t size)
 	struct cpio_data file;
 	struct cpio_data early_initrd_files[ACPI_OVERRIDE_TABLES];
 	char *p;
+	/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
+	const char * const table_sigs[] = {
+		ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
+		ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
+		ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
+		ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
+		ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
+		ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
+		ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
+		ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
+		ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
 	if (data == NULL || size == 0)
 		return;
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 3/8] x86, acpi: Move table_sigs[] to stack.
@ 2013-08-21 10:15   ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

On 64bit, we will do acpi_initrd_override() in x86_64_start_kernel(). This is
after CPU enables paging, but before direct mapping page tables are setup.
This is OK because we have an early page fault handler to help to access data
without direct mapping page tables.

But on 32bit, in order to keep the x86_32 and x86_64 unified code clean, we want
to do acpi_initrd_override() in head_32.S, before CPU enables paging. Without
direct mapping page tables, we need to access data with physical address.

For global variables, if we access them, we are using va. So on 32bit, we have
to convert them into pa. But for a global array, it could be too messy.

So this patch move table_sigs[] to stack. Define it in acpi_initrd_override().
It is no more than 36 pointers, so it is OK to put it on stack.

Originally-From: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 drivers/acpi/osl.c |   23 +++++++++++------------
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e7effc1..06996d8 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -551,18 +551,6 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
 	return sum;
 }
 
-/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
-static const char * const table_sigs[] = {
-	ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
-	ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
-	ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
-	ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
-	ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
-	ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
-	ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
-	ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
-
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
 /* Must not increase 10 or needs code modification below */
@@ -577,6 +565,17 @@ void __init acpi_initrd_override(void *data, size_t size)
 	struct cpio_data file;
 	struct cpio_data early_initrd_files[ACPI_OVERRIDE_TABLES];
 	char *p;
+	/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
+	const char * const table_sigs[] = {
+		ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
+		ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
+		ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
+		ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
+		ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
+		ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
+		ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
+		ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
+		ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
 	if (data == NULL || size == 0)
 		return;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 4/8] x86, acpi, brk: Extend BRK 256KB to store acpi override tables.
  2013-08-21 10:15 ` Tang Chen
@ 2013-08-21 10:15   ` Tang Chen
  -1 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

When finding acpi override tables in initrd, we need to allocate memory to
store these tables. But at such an early time, we don't have any memory
allocator. The basic idea is to use BRK.

This patch reserves 256KB in BRK, and allocate it to store override tables,
instead of memblock.

This idea is from Yinghai Lu <yinghai@kernel.org>.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |    5 +++++
 drivers/acpi/osl.c      |   44 ++++++++++++++++++++++----------------------
 include/linux/acpi.h    |    1 +
 3 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5bfd4c8..51fcd5d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1061,6 +1061,11 @@ void __init setup_arch(char **cmdline_p)
 
 	early_alloc_pgt_buf();
 
+#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
+	/* Allocate buffer to store acpi override tables in brk. */
+	early_alloc_acpi_override_tables_buf();
+#endif
+
 	/*
 	 * Need to conclude brk, before memblock_x86_fill()
 	 *  it could use memblock_find_in_range, could overlap with
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 06996d8..4c1baa7 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -48,6 +48,7 @@
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
+#include <asm/setup.h>
 
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
@@ -556,6 +557,15 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
 /* Must not increase 10 or needs code modification below */
 #define ACPI_OVERRIDE_TABLES 10
 
+/* Reserve 256KB in BRK to store acpi override tables */
+#define ACPI_OVERRIDE_TABLES_SIZE (256 * 1024)
+RESERVE_BRK(acpi_override_tables_alloc, ACPI_OVERRIDE_TABLES_SIZE);
+void __init early_alloc_acpi_override_tables_buf(void)
+{
+	acpi_tables_addr = __pa(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
+					   PAGE_SIZE));
+}
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
 	int sig, no, table_nr = 0, total_offset = 0;
@@ -619,7 +629,18 @@ void __init acpi_initrd_override(void *data, size_t size)
 		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
 
+		/*
+		 * If the override tables in cpio file exceeds the BRK buffer,
+		 * ignore the current table and go for the next one.
+		 */
 		all_tables_size += table->length;
+		if (all_tables_size > ACPI_OVERRIDE_TABLES_SIZE) {
+			pr_warning("ACPI OVERRIDE: ACPI override tables exceeds buffer size."
+				   " Ignoring table %4.4s\n", table->signature);
+			all_tables_size -= table->length;
+			continue;
+		}
+
 		early_initrd_files[table_nr].data = file.data;
 		early_initrd_files[table_nr].size = file.size;
 		table_nr++;
@@ -627,34 +648,13 @@ void __init acpi_initrd_override(void *data, size_t size)
 	if (table_nr == 0)
 		return;
 
-	acpi_tables_addr =
-		memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
-				       all_tables_size, PAGE_SIZE);
-	if (!acpi_tables_addr) {
-		WARN_ON(1);
-		return;
-	}
-	/*
-	 * Only calling e820_add_reserve does not work and the
-	 * tables are invalid (memory got used) later.
-	 * memblock_reserve works as expected and the tables won't get modified.
-	 * But it's not enough on X86 because ioremap will
-	 * complain later (used by acpi_os_map_memory) that the pages
-	 * that should get mapped are not marked "reserved".
-	 * Both memblock_reserve and e820_add_region (via arch_reserve_mem_area)
-	 * works fine.
-	 */
-	memblock_reserve(acpi_tables_addr, all_tables_size);
-	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
-
-	p = early_ioremap(acpi_tables_addr, all_tables_size);
+	p = __va(acpi_tables_addr);
 
 	for (no = 0; no < table_nr; no++) {
 		memcpy(p + total_offset, early_initrd_files[no].data,
 		       early_initrd_files[no].size);
 		total_offset += early_initrd_files[no].size;
 	}
-	early_iounmap(p, all_tables_size);
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 353ba25..381579e 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -81,6 +81,7 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
 void acpi_initrd_override(void *data, size_t size);
+void early_alloc_acpi_override_tables_buf(void);
 #else
 static inline void acpi_initrd_override(void *data, size_t size)
 {
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 4/8] x86, acpi, brk: Extend BRK 256KB to store acpi override tables.
@ 2013-08-21 10:15   ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

When finding acpi override tables in initrd, we need to allocate memory to
store these tables. But at such an early time, we don't have any memory
allocator. The basic idea is to use BRK.

This patch reserves 256KB in BRK, and allocate it to store override tables,
instead of memblock.

This idea is from Yinghai Lu <yinghai@kernel.org>.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |    5 +++++
 drivers/acpi/osl.c      |   44 ++++++++++++++++++++++----------------------
 include/linux/acpi.h    |    1 +
 3 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5bfd4c8..51fcd5d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1061,6 +1061,11 @@ void __init setup_arch(char **cmdline_p)
 
 	early_alloc_pgt_buf();
 
+#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
+	/* Allocate buffer to store acpi override tables in brk. */
+	early_alloc_acpi_override_tables_buf();
+#endif
+
 	/*
 	 * Need to conclude brk, before memblock_x86_fill()
 	 *  it could use memblock_find_in_range, could overlap with
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 06996d8..4c1baa7 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -48,6 +48,7 @@
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
+#include <asm/setup.h>
 
 #include <acpi/acpi.h>
 #include <acpi/acpi_bus.h>
@@ -556,6 +557,15 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
 /* Must not increase 10 or needs code modification below */
 #define ACPI_OVERRIDE_TABLES 10
 
+/* Reserve 256KB in BRK to store acpi override tables */
+#define ACPI_OVERRIDE_TABLES_SIZE (256 * 1024)
+RESERVE_BRK(acpi_override_tables_alloc, ACPI_OVERRIDE_TABLES_SIZE);
+void __init early_alloc_acpi_override_tables_buf(void)
+{
+	acpi_tables_addr = __pa(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
+					   PAGE_SIZE));
+}
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
 	int sig, no, table_nr = 0, total_offset = 0;
@@ -619,7 +629,18 @@ void __init acpi_initrd_override(void *data, size_t size)
 		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
 			table->signature, cpio_path, file.name, table->length);
 
+		/*
+		 * If the override tables in cpio file exceeds the BRK buffer,
+		 * ignore the current table and go for the next one.
+		 */
 		all_tables_size += table->length;
+		if (all_tables_size > ACPI_OVERRIDE_TABLES_SIZE) {
+			pr_warning("ACPI OVERRIDE: ACPI override tables exceeds buffer size."
+				   " Ignoring table %4.4s\n", table->signature);
+			all_tables_size -= table->length;
+			continue;
+		}
+
 		early_initrd_files[table_nr].data = file.data;
 		early_initrd_files[table_nr].size = file.size;
 		table_nr++;
@@ -627,34 +648,13 @@ void __init acpi_initrd_override(void *data, size_t size)
 	if (table_nr == 0)
 		return;
 
-	acpi_tables_addr =
-		memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
-				       all_tables_size, PAGE_SIZE);
-	if (!acpi_tables_addr) {
-		WARN_ON(1);
-		return;
-	}
-	/*
-	 * Only calling e820_add_reserve does not work and the
-	 * tables are invalid (memory got used) later.
-	 * memblock_reserve works as expected and the tables won't get modified.
-	 * But it's not enough on X86 because ioremap will
-	 * complain later (used by acpi_os_map_memory) that the pages
-	 * that should get mapped are not marked "reserved".
-	 * Both memblock_reserve and e820_add_region (via arch_reserve_mem_area)
-	 * works fine.
-	 */
-	memblock_reserve(acpi_tables_addr, all_tables_size);
-	arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
-
-	p = early_ioremap(acpi_tables_addr, all_tables_size);
+	p = __va(acpi_tables_addr);
 
 	for (no = 0; no < table_nr; no++) {
 		memcpy(p + total_offset, early_initrd_files[no].data,
 		       early_initrd_files[no].size);
 		total_offset += early_initrd_files[no].size;
 	}
-	early_iounmap(p, all_tables_size);
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 353ba25..381579e 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -81,6 +81,7 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
 void acpi_initrd_override(void *data, size_t size);
+void early_alloc_acpi_override_tables_buf(void);
 #else
 static inline void acpi_initrd_override(void *data, size_t size)
 {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa.
  2013-08-21 10:15 ` Tang Chen
@ 2013-08-21 10:15   ` Tang Chen
  -1 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

We are going to do acpi_initrd_override() at very early time:

On 32bit: do it in head_32.S, before paging is enabled. In this case, we can
          access initrd with physical address without page tables.

On 64bit: do it in head_64.c, after paging is enabled but before direct mapping
          is setup.

          On 64bit, we have an early page fault handler to help to access data
          with direct mapping page tables. So it is easy to do in head_64.c.

And we need to allocate memory to store override tables. At such an early time,
no memory allocator works. So we can only use BRK.

As mentioned above, on 32bit before paging is enabled, we have to access variables
with pa. So introduce a "bool is_phys" parameter to extend_brk(), and convert va
to pa is it is true.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/dmi.h   |    2 +-
 arch/x86/include/asm/setup.h |    2 +-
 arch/x86/kernel/setup.c      |   20 ++++++++++++++------
 arch/x86/mm/init.c           |    2 +-
 arch/x86/xen/enlighten.c     |    2 +-
 arch/x86/xen/mmu.c           |    6 +++---
 arch/x86/xen/p2m.c           |   27 ++++++++++++++-------------
 drivers/acpi/osl.c           |    2 +-
 8 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
index fd8f9e2..3b51d81 100644
--- a/arch/x86/include/asm/dmi.h
+++ b/arch/x86/include/asm/dmi.h
@@ -9,7 +9,7 @@
 
 static __always_inline __init void *dmi_alloc(unsigned len)
 {
-	return extend_brk(len, sizeof(int));
+	return extend_brk(len, sizeof(int), false);
 }
 
 /* Use early IO mappings for DMI because it's initialized early */
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 4f71d48..96d00da 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -75,7 +75,7 @@ extern struct boot_params boot_params;
 
 /* exceedingly early brk-like allocator */
 extern unsigned long _brk_end;
-void *extend_brk(size_t size, size_t align);
+void *extend_brk(size_t size, size_t align, bool is_phys);
 
 /*
  * Reserve space in the brk section.  The name must be unique within
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 51fcd5d..a189909 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -259,19 +259,27 @@ static inline void __init copy_edd(void)
 }
 #endif
 
-void * __init extend_brk(size_t size, size_t align)
+void * __init extend_brk(size_t size, size_t align, bool is_phys)
 {
 	size_t mask = align - 1;
 	void *ret;
+	unsigned long *brk_start, *brk_end, *brk_limit;
 
-	BUG_ON(_brk_start == 0);
+	brk_start = is_phys ? (unsigned long *)__pa_nodebug(&_brk_start) :
+			      (unsigned long *)&_brk_start;
+	brk_end = is_phys ? (unsigned long *)__pa_nodebug(&_brk_end) :
+			    (unsigned long *)&_brk_end;
+	brk_limit = is_phys ? (unsigned long *)__pa_nodebug(__brk_limit) :
+			      (unsigned long *)__brk_limit;
+
+	BUG_ON(*brk_start == 0);
 	BUG_ON(align & mask);
 
-	_brk_end = (_brk_end + mask) & ~mask;
-	BUG_ON((char *)(_brk_end + size) > __brk_limit);
+	*brk_end = (*brk_end + mask) & ~mask;
+	BUG_ON((char *)(*brk_end + size) > brk_limit);
 
-	ret = (void *)_brk_end;
-	_brk_end += size;
+	ret = (void *)(*brk_end);
+	*brk_end += size;
 
 	memset(ret, 0, size);
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 2ec29ac..189a9e2 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -86,7 +86,7 @@ void  __init early_alloc_pgt_buf(void)
 	unsigned long tables = INIT_PGT_BUF_SIZE;
 	phys_addr_t base;
 
-	base = __pa(extend_brk(tables, PAGE_SIZE));
+	base = __pa(extend_brk(tables, PAGE_SIZE, false));
 
 	pgt_buf_start = base >> PAGE_SHIFT;
 	pgt_buf_end = pgt_buf_start;
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 193097e..2d5a34f 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1629,7 +1629,7 @@ void __ref xen_hvm_init_shared_info(void)
 
 	if (!shared_info_page)
 		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
+			extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fdc3ba2..573bc50 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1768,7 +1768,7 @@ static void __init xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn)
 	unsigned long pfn;
 
 	level1_ident_pgt = extend_brk(sizeof(pte_t) * LEVEL1_IDENT_ENTRIES,
-				      PAGE_SIZE);
+				      PAGE_SIZE, false);
 
 	ident_pte = 0;
 	pfn = 0;
@@ -1980,7 +1980,7 @@ static void __init xen_write_cr3_init(unsigned long cr3)
 	 * swapper_pg_dir.
 	 */
 	swapper_kernel_pmd =
-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
 	copy_page(swapper_kernel_pmd, initial_kernel_pmd);
 	swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
 		__pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
@@ -2003,7 +2003,7 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	pmd_t *kernel_pmd;
 
 	initial_kernel_pmd =
-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
 
 	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
 				  xen_start_info->nr_pt_frames * PAGE_SIZE +
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 95fb2aa..bbdcf20 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -281,13 +281,13 @@ void __ref xen_build_mfn_list_list(void)
 
 	/* Pre-initialize p2m_top_mfn to be completely missing */
 	if (p2m_top_mfn == NULL) {
-		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 		p2m_mid_mfn_init(p2m_mid_missing_mfn);
 
-		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 		p2m_top_mfn_p_init(p2m_top_mfn_p);
 
-		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 		p2m_top_mfn_init(p2m_top_mfn);
 	} else {
 		/* Reinitialise, mfn's all change after migration */
@@ -322,7 +322,7 @@ void __ref xen_build_mfn_list_list(void)
 			 * runtime.  extend_brk() will BUG if we call
 			 * it too late.
 			 */
-			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
+			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 			p2m_mid_mfn_init(mid_mfn_p);
 
 			p2m_top_mfn_p[topidx] = mid_mfn_p;
@@ -351,16 +351,16 @@ void __init xen_build_dynamic_phys_to_machine(void)
 
 	xen_max_p2m_pfn = max_pfn;
 
-	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 	p2m_init(p2m_missing);
 
-	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 	p2m_mid_init(p2m_mid_missing);
 
-	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 	p2m_top_init(p2m_top);
 
-	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 	p2m_init(p2m_identity);
 
 	/*
@@ -373,7 +373,8 @@ void __init xen_build_dynamic_phys_to_machine(void)
 		unsigned mididx = p2m_mid_index(pfn);
 
 		if (p2m_top[topidx] == p2m_mid_missing) {
-			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
+			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE,
+							 false);
 			p2m_mid_init(mid);
 
 			p2m_top[topidx] = mid;
@@ -609,7 +610,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary
 		return false;
 
 	/* Boundary cross-over for the edges: */
-	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 
 	p2m_init(p2m);
 
@@ -635,7 +636,7 @@ static bool __init early_alloc_p2m(unsigned long pfn)
 	mid = p2m_top[topidx];
 	mid_mfn_p = p2m_top_mfn_p[topidx];
 	if (mid == p2m_mid_missing) {
-		mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		mid = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 
 		p2m_mid_init(mid);
 
@@ -645,7 +646,7 @@ static bool __init early_alloc_p2m(unsigned long pfn)
 	}
 	/* And the save/restore P2M tables.. */
 	if (mid_mfn_p == p2m_mid_missing_mfn) {
-		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 		p2m_mid_mfn_init(mid_mfn_p);
 
 		p2m_top_mfn_p[topidx] = mid_mfn_p;
@@ -858,7 +859,7 @@ static void __init m2p_override_init(void)
 	unsigned i;
 
 	m2p_overrides = extend_brk(sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH,
-				   sizeof(unsigned long));
+				   sizeof(unsigned long), false);
 
 	for (i = 0; i < M2P_OVERRIDE_HASH; i++)
 		INIT_LIST_HEAD(&m2p_overrides[i]);
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 4c1baa7..dff7fcc 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -563,7 +563,7 @@ RESERVE_BRK(acpi_override_tables_alloc, ACPI_OVERRIDE_TABLES_SIZE);
 void __init early_alloc_acpi_override_tables_buf(void)
 {
 	acpi_tables_addr = __pa(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
-					   PAGE_SIZE));
+					   PAGE_SIZE, false));
 }
 
 void __init acpi_initrd_override(void *data, size_t size)
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa.
@ 2013-08-21 10:15   ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

We are going to do acpi_initrd_override() at very early time:

On 32bit: do it in head_32.S, before paging is enabled. In this case, we can
          access initrd with physical address without page tables.

On 64bit: do it in head_64.c, after paging is enabled but before direct mapping
          is setup.

          On 64bit, we have an early page fault handler to help to access data
          with direct mapping page tables. So it is easy to do in head_64.c.

And we need to allocate memory to store override tables. At such an early time,
no memory allocator works. So we can only use BRK.

As mentioned above, on 32bit before paging is enabled, we have to access variables
with pa. So introduce a "bool is_phys" parameter to extend_brk(), and convert va
to pa is it is true.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/dmi.h   |    2 +-
 arch/x86/include/asm/setup.h |    2 +-
 arch/x86/kernel/setup.c      |   20 ++++++++++++++------
 arch/x86/mm/init.c           |    2 +-
 arch/x86/xen/enlighten.c     |    2 +-
 arch/x86/xen/mmu.c           |    6 +++---
 arch/x86/xen/p2m.c           |   27 ++++++++++++++-------------
 drivers/acpi/osl.c           |    2 +-
 8 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
index fd8f9e2..3b51d81 100644
--- a/arch/x86/include/asm/dmi.h
+++ b/arch/x86/include/asm/dmi.h
@@ -9,7 +9,7 @@
 
 static __always_inline __init void *dmi_alloc(unsigned len)
 {
-	return extend_brk(len, sizeof(int));
+	return extend_brk(len, sizeof(int), false);
 }
 
 /* Use early IO mappings for DMI because it's initialized early */
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 4f71d48..96d00da 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -75,7 +75,7 @@ extern struct boot_params boot_params;
 
 /* exceedingly early brk-like allocator */
 extern unsigned long _brk_end;
-void *extend_brk(size_t size, size_t align);
+void *extend_brk(size_t size, size_t align, bool is_phys);
 
 /*
  * Reserve space in the brk section.  The name must be unique within
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 51fcd5d..a189909 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -259,19 +259,27 @@ static inline void __init copy_edd(void)
 }
 #endif
 
-void * __init extend_brk(size_t size, size_t align)
+void * __init extend_brk(size_t size, size_t align, bool is_phys)
 {
 	size_t mask = align - 1;
 	void *ret;
+	unsigned long *brk_start, *brk_end, *brk_limit;
 
-	BUG_ON(_brk_start == 0);
+	brk_start = is_phys ? (unsigned long *)__pa_nodebug(&_brk_start) :
+			      (unsigned long *)&_brk_start;
+	brk_end = is_phys ? (unsigned long *)__pa_nodebug(&_brk_end) :
+			    (unsigned long *)&_brk_end;
+	brk_limit = is_phys ? (unsigned long *)__pa_nodebug(__brk_limit) :
+			      (unsigned long *)__brk_limit;
+
+	BUG_ON(*brk_start == 0);
 	BUG_ON(align & mask);
 
-	_brk_end = (_brk_end + mask) & ~mask;
-	BUG_ON((char *)(_brk_end + size) > __brk_limit);
+	*brk_end = (*brk_end + mask) & ~mask;
+	BUG_ON((char *)(*brk_end + size) > brk_limit);
 
-	ret = (void *)_brk_end;
-	_brk_end += size;
+	ret = (void *)(*brk_end);
+	*brk_end += size;
 
 	memset(ret, 0, size);
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 2ec29ac..189a9e2 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -86,7 +86,7 @@ void  __init early_alloc_pgt_buf(void)
 	unsigned long tables = INIT_PGT_BUF_SIZE;
 	phys_addr_t base;
 
-	base = __pa(extend_brk(tables, PAGE_SIZE));
+	base = __pa(extend_brk(tables, PAGE_SIZE, false));
 
 	pgt_buf_start = base >> PAGE_SHIFT;
 	pgt_buf_end = pgt_buf_start;
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 193097e..2d5a34f 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1629,7 +1629,7 @@ void __ref xen_hvm_init_shared_info(void)
 
 	if (!shared_info_page)
 		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
+			extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fdc3ba2..573bc50 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1768,7 +1768,7 @@ static void __init xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn)
 	unsigned long pfn;
 
 	level1_ident_pgt = extend_brk(sizeof(pte_t) * LEVEL1_IDENT_ENTRIES,
-				      PAGE_SIZE);
+				      PAGE_SIZE, false);
 
 	ident_pte = 0;
 	pfn = 0;
@@ -1980,7 +1980,7 @@ static void __init xen_write_cr3_init(unsigned long cr3)
 	 * swapper_pg_dir.
 	 */
 	swapper_kernel_pmd =
-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
 	copy_page(swapper_kernel_pmd, initial_kernel_pmd);
 	swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
 		__pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
@@ -2003,7 +2003,7 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	pmd_t *kernel_pmd;
 
 	initial_kernel_pmd =
-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
 
 	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
 				  xen_start_info->nr_pt_frames * PAGE_SIZE +
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 95fb2aa..bbdcf20 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -281,13 +281,13 @@ void __ref xen_build_mfn_list_list(void)
 
 	/* Pre-initialize p2m_top_mfn to be completely missing */
 	if (p2m_top_mfn == NULL) {
-		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 		p2m_mid_mfn_init(p2m_mid_missing_mfn);
 
-		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 		p2m_top_mfn_p_init(p2m_top_mfn_p);
 
-		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 		p2m_top_mfn_init(p2m_top_mfn);
 	} else {
 		/* Reinitialise, mfn's all change after migration */
@@ -322,7 +322,7 @@ void __ref xen_build_mfn_list_list(void)
 			 * runtime.  extend_brk() will BUG if we call
 			 * it too late.
 			 */
-			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
+			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 			p2m_mid_mfn_init(mid_mfn_p);
 
 			p2m_top_mfn_p[topidx] = mid_mfn_p;
@@ -351,16 +351,16 @@ void __init xen_build_dynamic_phys_to_machine(void)
 
 	xen_max_p2m_pfn = max_pfn;
 
-	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 	p2m_init(p2m_missing);
 
-	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 	p2m_mid_init(p2m_mid_missing);
 
-	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 	p2m_top_init(p2m_top);
 
-	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 	p2m_init(p2m_identity);
 
 	/*
@@ -373,7 +373,8 @@ void __init xen_build_dynamic_phys_to_machine(void)
 		unsigned mididx = p2m_mid_index(pfn);
 
 		if (p2m_top[topidx] == p2m_mid_missing) {
-			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
+			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE,
+							 false);
 			p2m_mid_init(mid);
 
 			p2m_top[topidx] = mid;
@@ -609,7 +610,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary
 		return false;
 
 	/* Boundary cross-over for the edges: */
-	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 
 	p2m_init(p2m);
 
@@ -635,7 +636,7 @@ static bool __init early_alloc_p2m(unsigned long pfn)
 	mid = p2m_top[topidx];
 	mid_mfn_p = p2m_top_mfn_p[topidx];
 	if (mid == p2m_mid_missing) {
-		mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		mid = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 
 		p2m_mid_init(mid);
 
@@ -645,7 +646,7 @@ static bool __init early_alloc_p2m(unsigned long pfn)
 	}
 	/* And the save/restore P2M tables.. */
 	if (mid_mfn_p == p2m_mid_missing_mfn) {
-		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
 		p2m_mid_mfn_init(mid_mfn_p);
 
 		p2m_top_mfn_p[topidx] = mid_mfn_p;
@@ -858,7 +859,7 @@ static void __init m2p_override_init(void)
 	unsigned i;
 
 	m2p_overrides = extend_brk(sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH,
-				   sizeof(unsigned long));
+				   sizeof(unsigned long), false);
 
 	for (i = 0; i < M2P_OVERRIDE_HASH; i++)
 		INIT_LIST_HEAD(&m2p_overrides[i]);
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 4c1baa7..dff7fcc 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -563,7 +563,7 @@ RESERVE_BRK(acpi_override_tables_alloc, ACPI_OVERRIDE_TABLES_SIZE);
 void __init early_alloc_acpi_override_tables_buf(void)
 {
 	acpi_tables_addr = __pa(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
-					   PAGE_SIZE));
+					   PAGE_SIZE, false));
 }
 
 void __init acpi_initrd_override(void *data, size_t size)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 6/8] x86, acpi: Make acpi_initrd_override() available with va or pa.
  2013-08-21 10:15 ` Tang Chen
@ 2013-08-21 10:15   ` Tang Chen
  -1 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

We are using the same trick in previous patch.

Introduce a "bool is_phys" to acpi_initrd_override(). When it
is true, convert all golbal variables va to pa, so that we can
access them on 32bit before paging is enabled.

NOTE: Do not call printk() on 32bit before paging is enabled
      because it will use global variables.

Originally-From: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |    3 +-
 drivers/acpi/osl.c      |   68 ++++++++++++++++++++++++++++++++++------------
 include/linux/acpi.h    |    4 +-
 3 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index a189909..1290ea7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1133,7 +1133,8 @@ void __init setup_arch(char **cmdline_p)
 	reserve_initrd();
 
 #if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start);
+	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start,
+			     false);
 #endif
 
 	reserve_crashkernel();
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index dff7fcc..ccdb5a6 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -566,7 +566,19 @@ void __init early_alloc_acpi_override_tables_buf(void)
 					   PAGE_SIZE, false));
 }
 
-void __init acpi_initrd_override(void *data, size_t size)
+/**
+ * acpi_initrd_override - Initialize acpi_tables_addr with acpi override tables
+ * @data: cpio file address (va or pa)
+ * @size: size of cpio file
+ * @is_phys: true if @data is pa, false otherwise
+ *
+ * This function will find all acpi override tables provided by initrd, and
+ * store the addresses in acpi_tables_addr.
+ *
+ * This function could be called before paging is enabled. Before paging is
+ * enabled, caller should use physical address, and set @is_phys as true.
+ */
+void __init acpi_initrd_override(void *data, size_t size, bool is_phys)
 {
 	int sig, no, table_nr = 0, total_offset = 0;
 	long offset = 0;
@@ -586,10 +598,17 @@ void __init acpi_initrd_override(void *data, size_t size)
 		ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
 		ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
 		ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
+	u64 *acpi_tables_addr_p = &acpi_tables_addr;
+	int *all_tables_size_p = &all_tables_size;
 
 	if (data == NULL || size == 0)
 		return;
 
+	if (is_phys) {
+		acpi_tables_addr_p = (u64 *)__pa_nodebug(&acpi_tables_addr);
+		all_tables_size_p = (int *)__pa_nodebug(&all_tables_size);
+	}
+
 	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
 		file = find_cpio_data(cpio_path, data, size, &offset);
 		if (!file.data)
@@ -599,8 +618,9 @@ void __init acpi_initrd_override(void *data, size_t size)
 		size -= offset;
 
 		if (file.size < sizeof(struct acpi_table_header)) {
-			pr_err("ACPI OVERRIDE: Table smaller than ACPI header [%s%s]\n",
-				cpio_path, file.name);
+			if (!is_phys)
+				pr_err("ACPI OVERRIDE: Table smaller than ACPI header [%s%s]\n",
+					cpio_path, file.name);
 			continue;
 		}
 
@@ -611,36 +631,48 @@ void __init acpi_initrd_override(void *data, size_t size)
 				break;
 
 		if (!table_sigs[sig]) {
-			pr_err("ACPI OVERRIDE: Unknown signature [%s%s]\n",
-				cpio_path, file.name);
+			if (!is_phys)
+				pr_err("ACPI OVERRIDE: Unknown signature [%s%s]\n",
+					cpio_path, file.name);
 			continue;
 		}
 		if (file.size != table->length) {
-			pr_err("ACPI OVERRIDE: File length does not match table length [%s%s]\n",
-				cpio_path, file.name);
+			if (!is_phys)
+				pr_err("ACPI OVERRIDE: File length does not match table length [%s%s]\n",
+					cpio_path, file.name);
 			continue;
 		}
 		if (acpi_table_checksum(file.data, table->length)) {
-			pr_err("ACPI OVERRIDE: Bad table checksum [%s%s]\n",
-				cpio_path, file.name);
+			if (!is_phys)
+				pr_err("ACPI OVERRIDE: Bad table checksum [%s%s]\n",
+					cpio_path, file.name);
 			continue;
 		}
 
-		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
-			table->signature, cpio_path, file.name, table->length);
+		if (!is_phys)
+			pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
+				table->signature, cpio_path,
+				file.name, table->length);
 
 		/*
 		 * If the override tables in cpio file exceeds the BRK buffer,
 		 * ignore the current table and go for the next one.
 		 */
-		all_tables_size += table->length;
-		if (all_tables_size > ACPI_OVERRIDE_TABLES_SIZE) {
-			pr_warning("ACPI OVERRIDE: ACPI override tables exceeds buffer size."
-				   " Ignoring table %4.4s\n", table->signature);
-			all_tables_size -= table->length;
+		*all_tables_size_p += table->length;
+		if (*all_tables_size_p > ACPI_OVERRIDE_TABLES_SIZE) {
+			if (!is_phys)
+				pr_warning("ACPI OVERRIDE: ACPI override tables exceeds buffer size."
+					   " Ignoring table %4.4s\n",
+					   table->signature);
+			*all_tables_size_p -= table->length;
 			continue;
 		}
 
+		/*
+		 * file.data is the offset of the table in initrd. If @data is
+		 * pa, then we find pa. If @data is va, then we find va. No need
+		 * to convert.
+		 */
 		early_initrd_files[table_nr].data = file.data;
 		early_initrd_files[table_nr].size = file.size;
 		table_nr++;
@@ -648,8 +680,8 @@ void __init acpi_initrd_override(void *data, size_t size)
 	if (table_nr == 0)
 		return;
 
-	p = __va(acpi_tables_addr);
-
+	p = is_phys ? (char *)(*acpi_tables_addr_p) :
+		      (char *)__va(*acpi_tables_addr_p);
 	for (no = 0; no < table_nr; no++) {
 		memcpy(p + total_offset, early_initrd_files[no].data,
 		       early_initrd_files[no].size);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 381579e..af4da51 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -80,10 +80,10 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 				      const unsigned long end);
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
-void acpi_initrd_override(void *data, size_t size);
+void acpi_initrd_override(void *data, size_t size, bool is_phys);
 void early_alloc_acpi_override_tables_buf(void);
 #else
-static inline void acpi_initrd_override(void *data, size_t size)
+static inline void acpi_initrd_override(void *data, size_t size, bool is_phys)
 {
 }
 #endif
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 6/8] x86, acpi: Make acpi_initrd_override() available with va or pa.
@ 2013-08-21 10:15   ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

We are using the same trick in previous patch.

Introduce a "bool is_phys" to acpi_initrd_override(). When it
is true, convert all golbal variables va to pa, so that we can
access them on 32bit before paging is enabled.

NOTE: Do not call printk() on 32bit before paging is enabled
      because it will use global variables.

Originally-From: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |    3 +-
 drivers/acpi/osl.c      |   68 ++++++++++++++++++++++++++++++++++------------
 include/linux/acpi.h    |    4 +-
 3 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index a189909..1290ea7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1133,7 +1133,8 @@ void __init setup_arch(char **cmdline_p)
 	reserve_initrd();
 
 #if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start);
+	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start,
+			     false);
 #endif
 
 	reserve_crashkernel();
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index dff7fcc..ccdb5a6 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -566,7 +566,19 @@ void __init early_alloc_acpi_override_tables_buf(void)
 					   PAGE_SIZE, false));
 }
 
-void __init acpi_initrd_override(void *data, size_t size)
+/**
+ * acpi_initrd_override - Initialize acpi_tables_addr with acpi override tables
+ * @data: cpio file address (va or pa)
+ * @size: size of cpio file
+ * @is_phys: true if @data is pa, false otherwise
+ *
+ * This function will find all acpi override tables provided by initrd, and
+ * store the addresses in acpi_tables_addr.
+ *
+ * This function could be called before paging is enabled. Before paging is
+ * enabled, caller should use physical address, and set @is_phys as true.
+ */
+void __init acpi_initrd_override(void *data, size_t size, bool is_phys)
 {
 	int sig, no, table_nr = 0, total_offset = 0;
 	long offset = 0;
@@ -586,10 +598,17 @@ void __init acpi_initrd_override(void *data, size_t size)
 		ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
 		ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
 		ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
+	u64 *acpi_tables_addr_p = &acpi_tables_addr;
+	int *all_tables_size_p = &all_tables_size;
 
 	if (data == NULL || size == 0)
 		return;
 
+	if (is_phys) {
+		acpi_tables_addr_p = (u64 *)__pa_nodebug(&acpi_tables_addr);
+		all_tables_size_p = (int *)__pa_nodebug(&all_tables_size);
+	}
+
 	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
 		file = find_cpio_data(cpio_path, data, size, &offset);
 		if (!file.data)
@@ -599,8 +618,9 @@ void __init acpi_initrd_override(void *data, size_t size)
 		size -= offset;
 
 		if (file.size < sizeof(struct acpi_table_header)) {
-			pr_err("ACPI OVERRIDE: Table smaller than ACPI header [%s%s]\n",
-				cpio_path, file.name);
+			if (!is_phys)
+				pr_err("ACPI OVERRIDE: Table smaller than ACPI header [%s%s]\n",
+					cpio_path, file.name);
 			continue;
 		}
 
@@ -611,36 +631,48 @@ void __init acpi_initrd_override(void *data, size_t size)
 				break;
 
 		if (!table_sigs[sig]) {
-			pr_err("ACPI OVERRIDE: Unknown signature [%s%s]\n",
-				cpio_path, file.name);
+			if (!is_phys)
+				pr_err("ACPI OVERRIDE: Unknown signature [%s%s]\n",
+					cpio_path, file.name);
 			continue;
 		}
 		if (file.size != table->length) {
-			pr_err("ACPI OVERRIDE: File length does not match table length [%s%s]\n",
-				cpio_path, file.name);
+			if (!is_phys)
+				pr_err("ACPI OVERRIDE: File length does not match table length [%s%s]\n",
+					cpio_path, file.name);
 			continue;
 		}
 		if (acpi_table_checksum(file.data, table->length)) {
-			pr_err("ACPI OVERRIDE: Bad table checksum [%s%s]\n",
-				cpio_path, file.name);
+			if (!is_phys)
+				pr_err("ACPI OVERRIDE: Bad table checksum [%s%s]\n",
+					cpio_path, file.name);
 			continue;
 		}
 
-		pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
-			table->signature, cpio_path, file.name, table->length);
+		if (!is_phys)
+			pr_info("%4.4s ACPI table found in initrd [%s%s][0x%x]\n",
+				table->signature, cpio_path,
+				file.name, table->length);
 
 		/*
 		 * If the override tables in cpio file exceeds the BRK buffer,
 		 * ignore the current table and go for the next one.
 		 */
-		all_tables_size += table->length;
-		if (all_tables_size > ACPI_OVERRIDE_TABLES_SIZE) {
-			pr_warning("ACPI OVERRIDE: ACPI override tables exceeds buffer size."
-				   " Ignoring table %4.4s\n", table->signature);
-			all_tables_size -= table->length;
+		*all_tables_size_p += table->length;
+		if (*all_tables_size_p > ACPI_OVERRIDE_TABLES_SIZE) {
+			if (!is_phys)
+				pr_warning("ACPI OVERRIDE: ACPI override tables exceeds buffer size."
+					   " Ignoring table %4.4s\n",
+					   table->signature);
+			*all_tables_size_p -= table->length;
 			continue;
 		}
 
+		/*
+		 * file.data is the offset of the table in initrd. If @data is
+		 * pa, then we find pa. If @data is va, then we find va. No need
+		 * to convert.
+		 */
 		early_initrd_files[table_nr].data = file.data;
 		early_initrd_files[table_nr].size = file.size;
 		table_nr++;
@@ -648,8 +680,8 @@ void __init acpi_initrd_override(void *data, size_t size)
 	if (table_nr == 0)
 		return;
 
-	p = __va(acpi_tables_addr);
-
+	p = is_phys ? (char *)(*acpi_tables_addr_p) :
+		      (char *)__va(*acpi_tables_addr_p);
 	for (no = 0; no < table_nr; no++) {
 		memcpy(p + total_offset, early_initrd_files[no].data,
 		       early_initrd_files[no].size);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 381579e..af4da51 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -80,10 +80,10 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 				      const unsigned long end);
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
-void acpi_initrd_override(void *data, size_t size);
+void acpi_initrd_override(void *data, size_t size, bool is_phys);
 void early_alloc_acpi_override_tables_buf(void);
 #else
-static inline void acpi_initrd_override(void *data, size_t size)
+static inline void acpi_initrd_override(void *data, size_t size, bool is_phys)
 {
 }
 #endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 7/8] x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available with va/pa.
  2013-08-21 10:15 ` Tang Chen
@ 2013-08-21 10:15   ` Tang Chen
  -1 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

We are using the same trick in previous patch.

Introduce a "bool is_phys" to early_alloc_acpi_override_tables_buf(). When it
is true, convert all golbal variables va to pa, so that we can access them on
32bit before paging is enabled.

NOTE: Do not call printk() on 32bit before paging is enabled
      because it will use global variables.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |    2 +-
 drivers/acpi/osl.c      |   11 ++++++++---
 include/linux/acpi.h    |    2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1290ea7..5729cd2 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1071,7 +1071,7 @@ void __init setup_arch(char **cmdline_p)
 
 #if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
 	/* Allocate buffer to store acpi override tables in brk. */
-	early_alloc_acpi_override_tables_buf();
+	early_alloc_acpi_override_tables_buf(false);
 #endif
 
 	/*
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index ccdb5a6..25ba68d 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -560,10 +560,15 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
 /* Reserve 256KB in BRK to store acpi override tables */
 #define ACPI_OVERRIDE_TABLES_SIZE (256 * 1024)
 RESERVE_BRK(acpi_override_tables_alloc, ACPI_OVERRIDE_TABLES_SIZE);
-void __init early_alloc_acpi_override_tables_buf(void)
+void __init early_alloc_acpi_override_tables_buf(bool is_phys)
 {
-	acpi_tables_addr = __pa(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
-					   PAGE_SIZE, false));
+	u64 *acpi_tables_addr_p;
+
+	acpi_tables_addr_p = is_phys ? (u64 *)__pa_nodebug(&acpi_tables_addr) :
+				       (u64 *)&acpi_tables_addr;
+
+	*acpi_tables_addr_p = __pa_nodebug(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
+					      PAGE_SIZE, is_phys));
 }
 
 /**
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index af4da51..17f2e8e 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -81,7 +81,7 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
 void acpi_initrd_override(void *data, size_t size, bool is_phys);
-void early_alloc_acpi_override_tables_buf(void);
+void early_alloc_acpi_override_tables_buf(bool is_phys);
 #else
 static inline void acpi_initrd_override(void *data, size_t size, bool is_phys)
 {
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 7/8] x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available with va/pa.
@ 2013-08-21 10:15   ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

We are using the same trick in previous patch.

Introduce a "bool is_phys" to early_alloc_acpi_override_tables_buf(). When it
is true, convert all golbal variables va to pa, so that we can access them on
32bit before paging is enabled.

NOTE: Do not call printk() on 32bit before paging is enabled
      because it will use global variables.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |    2 +-
 drivers/acpi/osl.c      |   11 ++++++++---
 include/linux/acpi.h    |    2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1290ea7..5729cd2 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1071,7 +1071,7 @@ void __init setup_arch(char **cmdline_p)
 
 #if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
 	/* Allocate buffer to store acpi override tables in brk. */
-	early_alloc_acpi_override_tables_buf();
+	early_alloc_acpi_override_tables_buf(false);
 #endif
 
 	/*
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index ccdb5a6..25ba68d 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -560,10 +560,15 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
 /* Reserve 256KB in BRK to store acpi override tables */
 #define ACPI_OVERRIDE_TABLES_SIZE (256 * 1024)
 RESERVE_BRK(acpi_override_tables_alloc, ACPI_OVERRIDE_TABLES_SIZE);
-void __init early_alloc_acpi_override_tables_buf(void)
+void __init early_alloc_acpi_override_tables_buf(bool is_phys)
 {
-	acpi_tables_addr = __pa(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
-					   PAGE_SIZE, false));
+	u64 *acpi_tables_addr_p;
+
+	acpi_tables_addr_p = is_phys ? (u64 *)__pa_nodebug(&acpi_tables_addr) :
+				       (u64 *)&acpi_tables_addr;
+
+	*acpi_tables_addr_p = __pa_nodebug(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
+					      PAGE_SIZE, is_phys));
 }
 
 /**
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index af4da51..17f2e8e 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -81,7 +81,7 @@ typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
 
 #ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
 void acpi_initrd_override(void *data, size_t size, bool is_phys);
-void early_alloc_acpi_override_tables_buf(void);
+void early_alloc_acpi_override_tables_buf(bool is_phys);
 #else
 static inline void acpi_initrd_override(void *data, size_t size, bool is_phys)
 {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 8/8] x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c.
  2013-08-21 10:15 ` Tang Chen
@ 2013-08-21 10:15   ` Tang Chen
  -1 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Introduce x86_acpi_initrd_override() to do acpi table override job. This function
can be called before or after paging is enabled. On 32bit, it will be called before
paging is enabled. On 64bit, it will be called after paging is enabled but before
direct mapping page tables are setup.

Originally-From: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |    6 +++++
 arch/x86/kernel/head64.c     |    4 +++
 arch/x86/kernel/head_32.S    |    4 +++
 arch/x86/kernel/setup.c      |   51 ++++++++++++++++++++++++++++++++---------
 4 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 96d00da..9f32cb4 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -42,6 +42,12 @@ extern void visws_early_detect(void);
 static inline void visws_early_detect(void) { }
 #endif
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void x86_acpi_initrd_override(void);
+#else
+static inline void x86_acpi_initrd_override(void) { }
+#endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
+
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 55b6761..88e19b4 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -175,6 +175,10 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
+#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
+	x86_acpi_initrd_override();
+#endif
+
 	clear_page(init_level4_pgt);
 	/* set init_level4_pgt kernel high mapping*/
 	init_level4_pgt[511] = early_level4_pgt[511];
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 5dd87a8..e04e13b 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -149,6 +149,10 @@ ENTRY(startup_32)
 	call load_ucode_bsp
 #endif
 
+#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
+	call x86_acpi_initrd_override
+#endif
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond __brk_base.  The variable
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5729cd2..b48a0ff 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -833,7 +833,46 @@ static void __init trim_low_memory_range(void)
 {
 	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
 }
-	
+
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+/**
+ * x86_acpi_initrd_override - Find all acpi override tables in initrd, and copy
+ *                            them to acpi_tables_addr.
+ *
+ * On 32bit platform, this function is call in head_32.S, before paging is
+ * enabled. So we have to use physical address.
+ *
+ * On 64bit platform, this function is call in head_64.c, after paging is
+ * enabled but before direct mapping page tables are set up. Since we have an
+ * early page fault handler on 64bit, so it is OK to use virtual address.
+ */
+void __init x86_acpi_initrd_override(void)
+{
+	unsigned long ramdisk_image, ramdisk_size;
+	void *p = NULL;
+
+#ifdef CONFIG_X86_32
+	struct boot_params *boot_params_p;
+
+	boot_params_p = (struct boot_params *)__pa(&boot_params);
+	ramdisk_image = get_ramdisk_image(boot_params_p);
+	ramdisk_size  = get_ramdisk_size(boot_params_p);
+	p = (void *)ramdisk_image;
+
+	early_alloc_acpi_override_tables_buf(true);
+	acpi_initrd_override(p, ramdisk_size, true);
+#else
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
+	if (ramdisk_image)
+		p = (void *)__va(ramdisk_image);
+
+	early_alloc_acpi_override_tables_buf(false);
+	acpi_initrd_override(p, ramdisk_size, false);
+#endif	/* CONFIG_X86_32 */
+}
+#endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
+
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -1069,11 +1108,6 @@ void __init setup_arch(char **cmdline_p)
 
 	early_alloc_pgt_buf();
 
-#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-	/* Allocate buffer to store acpi override tables in brk. */
-	early_alloc_acpi_override_tables_buf(false);
-#endif
-
 	/*
 	 * Need to conclude brk, before memblock_x86_fill()
 	 *  it could use memblock_find_in_range, could overlap with
@@ -1132,11 +1166,6 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_initrd();
 
-#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start,
-			     false);
-#endif
-
 	reserve_crashkernel();
 
 	vsmp_init();
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 8/8] x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c.
@ 2013-08-21 10:15   ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:15 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Introduce x86_acpi_initrd_override() to do acpi table override job. This function
can be called before or after paging is enabled. On 32bit, it will be called before
paging is enabled. On 64bit, it will be called after paging is enabled but before
direct mapping page tables are setup.

Originally-From: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/setup.h |    6 +++++
 arch/x86/kernel/head64.c     |    4 +++
 arch/x86/kernel/head_32.S    |    4 +++
 arch/x86/kernel/setup.c      |   51 ++++++++++++++++++++++++++++++++---------
 4 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 96d00da..9f32cb4 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -42,6 +42,12 @@ extern void visws_early_detect(void);
 static inline void visws_early_detect(void) { }
 #endif
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void x86_acpi_initrd_override(void);
+#else
+static inline void x86_acpi_initrd_override(void) { }
+#endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
+
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 55b6761..88e19b4 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -175,6 +175,10 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
+#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
+	x86_acpi_initrd_override();
+#endif
+
 	clear_page(init_level4_pgt);
 	/* set init_level4_pgt kernel high mapping*/
 	init_level4_pgt[511] = early_level4_pgt[511];
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 5dd87a8..e04e13b 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -149,6 +149,10 @@ ENTRY(startup_32)
 	call load_ucode_bsp
 #endif
 
+#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
+	call x86_acpi_initrd_override
+#endif
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond __brk_base.  The variable
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5729cd2..b48a0ff 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -833,7 +833,46 @@ static void __init trim_low_memory_range(void)
 {
 	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
 }
-	
+
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+/**
+ * x86_acpi_initrd_override - Find all acpi override tables in initrd, and copy
+ *                            them to acpi_tables_addr.
+ *
+ * On 32bit platform, this function is call in head_32.S, before paging is
+ * enabled. So we have to use physical address.
+ *
+ * On 64bit platform, this function is call in head_64.c, after paging is
+ * enabled but before direct mapping page tables are set up. Since we have an
+ * early page fault handler on 64bit, so it is OK to use virtual address.
+ */
+void __init x86_acpi_initrd_override(void)
+{
+	unsigned long ramdisk_image, ramdisk_size;
+	void *p = NULL;
+
+#ifdef CONFIG_X86_32
+	struct boot_params *boot_params_p;
+
+	boot_params_p = (struct boot_params *)__pa(&boot_params);
+	ramdisk_image = get_ramdisk_image(boot_params_p);
+	ramdisk_size  = get_ramdisk_size(boot_params_p);
+	p = (void *)ramdisk_image;
+
+	early_alloc_acpi_override_tables_buf(true);
+	acpi_initrd_override(p, ramdisk_size, true);
+#else
+	ramdisk_image = get_ramdisk_image(&boot_params);
+	ramdisk_size  = get_ramdisk_size(&boot_params);
+	if (ramdisk_image)
+		p = (void *)__va(ramdisk_image);
+
+	early_alloc_acpi_override_tables_buf(false);
+	acpi_initrd_override(p, ramdisk_size, false);
+#endif	/* CONFIG_X86_32 */
+}
+#endif	/* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
+
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -1069,11 +1108,6 @@ void __init setup_arch(char **cmdline_p)
 
 	early_alloc_pgt_buf();
 
-#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-	/* Allocate buffer to store acpi override tables in brk. */
-	early_alloc_acpi_override_tables_buf(false);
-#endif
-
 	/*
 	 * Need to conclude brk, before memblock_x86_fill()
 	 *  it could use memblock_find_in_range, could overlap with
@@ -1132,11 +1166,6 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_initrd();
 
-#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start,
-			     false);
-#endif
-
 	reserve_crashkernel();
 
 	vsmp_init();
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-21 10:15 ` Tang Chen
@ 2013-08-21 10:42   ` Tang Chen
  -1 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:42 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, Toshi Kani, Luck, Tony
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Hi all,

This patch-set has not been fully tested. I sent them first for you
to review. Please comment if we can agree on this solution.

Thanks.:)

On 08/21/2013 06:15 PM, Tang Chen wrote:
> This patch-set aims to move acpi_initrd_override() earlier on x86.
> Some of the patches are from Yinghai's patch-set:
> https://lkml.org/lkml/2013/6/14/561
>
> The difference between this patch-set and Yinghai's original patch-set are:
> 1. This patch-set doesn't split acpi_initrd_override(), but call it as a
>     whole operation at early time.
> 2. Allocate memory from BRK to store override tables.
>     (This idea is also from Yinghai.)
>
>
> [Current state]
>
> The current Linux kernel will initialize acpi tables like the following:
>
> 1. Find all acpi override table provided by users in initrd.
>     (Linux allows users to override acpi tables in firmware, by specifying
>     their own tables in initrd.)
>
> 2. Use acpica code to initialize acpi global root table list and install all
>     tables into it. If any override tables exists, use it to override the one
>     provided by firmware.
>
> Then others can parse these tables and get useful info.
>
> Both of the two steps happen after direct mapping page tables are setup.
>
> [Issues]
>
> In the current Linux kernel, the initialization of acpi tables is too late for
> new functionalities.
>
> We have some issues about this:
>
> * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
>    ranges are hotpluggable, and prevent bootmem allocator from allocating memory
>    for the kernel. (Kernel pages cannot be hotplugged because )
>
> * As suggested by Yinghai Lu<yinghai@kernel.org>, we should allocate page tables
>    in local node. This also needs SRAT before direct mapping page tables are setup.
>
> * As mentioned by Toshi Kani<toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
>    allow the OS to initialize serial console/debug ports at early boot time. The
>    earlier it can be initialized, the better this feature will be.  These tables
>    are not currently used by Linux due to a licensing issue, but it could be
>    addressed some time soon.
>
>
> [What are we doing]
>
> We are trying to initialize acip tables as early as possible. But Linux kernel
> allows users to override acpi tables by specifying their own tables in initrd.
> So we have to do acpi_initrd_override() earlier first.
>
>
> [About this patch-set]
>
> This patch-set aims to move acpi_initrd_override() as early as possible on x86.
> As suggested by Yinghai, we are trying to do it like this:
>
> On 32bit: do it in head_32.S, before paging is enabled. In this case, we can
>            access initrd with physical address without page tables.
>
> On 64bit: do it in head_64.c, after paging is enabled but before direct mapping
>            is setup.
>
> And also, acpi_initrd_override() needs to allocate memory for override tables.
> But at such an early time, there is no memory allocator works. So the basic idea
> from Yinghai is to use BRK. We will extend BRK 256KB in this patch-set.
>
>
> Tang Chen (6):
>    x86, acpi: Move table_sigs[] to stack.
>    x86, acpi, brk: Extend BRK 256KB to store acpi override tables.
>    x86, brk: Make extend_brk() available with va/pa.
>    x86, acpi: Make acpi_initrd_override() available with va or pa.
>    x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available
>      with va/pa.
>    x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c.
>
> Yinghai Lu (2):
>    x86: Make get_ramdisk_{image|size}() global.
>    x86, microcode: Use get_ramdisk_{image|size}() in microcode handling.
>
>   arch/x86/include/asm/dmi.h              |    2 +-
>   arch/x86/include/asm/setup.h            |   11 +++-
>   arch/x86/kernel/head64.c                |    4 +
>   arch/x86/kernel/head_32.S               |    4 +
>   arch/x86/kernel/microcode_intel_early.c |    8 +-
>   arch/x86/kernel/setup.c                 |   93 ++++++++++++++++------
>   arch/x86/mm/init.c                      |    2 +-
>   arch/x86/xen/enlighten.c                |    2 +-
>   arch/x86/xen/mmu.c                      |    6 +-
>   arch/x86/xen/p2m.c                      |   27 ++++---
>   drivers/acpi/osl.c                      |  130 ++++++++++++++++++++-----------
>   include/linux/acpi.h                    |    5 +-
>   12 files changed, 196 insertions(+), 98 deletions(-)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 10:42   ` Tang Chen
  0 siblings, 0 replies; 116+ messages in thread
From: Tang Chen @ 2013-08-21 10:42 UTC (permalink / raw)
  To: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, Toshi Kani, Luck, Tony
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Hi all,

This patch-set has not been fully tested. I sent them first for you
to review. Please comment if we can agree on this solution.

Thanks.:)

On 08/21/2013 06:15 PM, Tang Chen wrote:
> This patch-set aims to move acpi_initrd_override() earlier on x86.
> Some of the patches are from Yinghai's patch-set:
> https://lkml.org/lkml/2013/6/14/561
>
> The difference between this patch-set and Yinghai's original patch-set are:
> 1. This patch-set doesn't split acpi_initrd_override(), but call it as a
>     whole operation at early time.
> 2. Allocate memory from BRK to store override tables.
>     (This idea is also from Yinghai.)
>
>
> [Current state]
>
> The current Linux kernel will initialize acpi tables like the following:
>
> 1. Find all acpi override table provided by users in initrd.
>     (Linux allows users to override acpi tables in firmware, by specifying
>     their own tables in initrd.)
>
> 2. Use acpica code to initialize acpi global root table list and install all
>     tables into it. If any override tables exists, use it to override the one
>     provided by firmware.
>
> Then others can parse these tables and get useful info.
>
> Both of the two steps happen after direct mapping page tables are setup.
>
> [Issues]
>
> In the current Linux kernel, the initialization of acpi tables is too late for
> new functionalities.
>
> We have some issues about this:
>
> * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
>    ranges are hotpluggable, and prevent bootmem allocator from allocating memory
>    for the kernel. (Kernel pages cannot be hotplugged because )
>
> * As suggested by Yinghai Lu<yinghai@kernel.org>, we should allocate page tables
>    in local node. This also needs SRAT before direct mapping page tables are setup.
>
> * As mentioned by Toshi Kani<toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
>    allow the OS to initialize serial console/debug ports at early boot time. The
>    earlier it can be initialized, the better this feature will be.  These tables
>    are not currently used by Linux due to a licensing issue, but it could be
>    addressed some time soon.
>
>
> [What are we doing]
>
> We are trying to initialize acip tables as early as possible. But Linux kernel
> allows users to override acpi tables by specifying their own tables in initrd.
> So we have to do acpi_initrd_override() earlier first.
>
>
> [About this patch-set]
>
> This patch-set aims to move acpi_initrd_override() as early as possible on x86.
> As suggested by Yinghai, we are trying to do it like this:
>
> On 32bit: do it in head_32.S, before paging is enabled. In this case, we can
>            access initrd with physical address without page tables.
>
> On 64bit: do it in head_64.c, after paging is enabled but before direct mapping
>            is setup.
>
> And also, acpi_initrd_override() needs to allocate memory for override tables.
> But at such an early time, there is no memory allocator works. So the basic idea
> from Yinghai is to use BRK. We will extend BRK 256KB in this patch-set.
>
>
> Tang Chen (6):
>    x86, acpi: Move table_sigs[] to stack.
>    x86, acpi, brk: Extend BRK 256KB to store acpi override tables.
>    x86, brk: Make extend_brk() available with va/pa.
>    x86, acpi: Make acpi_initrd_override() available with va or pa.
>    x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available
>      with va/pa.
>    x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c.
>
> Yinghai Lu (2):
>    x86: Make get_ramdisk_{image|size}() global.
>    x86, microcode: Use get_ramdisk_{image|size}() in microcode handling.
>
>   arch/x86/include/asm/dmi.h              |    2 +-
>   arch/x86/include/asm/setup.h            |   11 +++-
>   arch/x86/kernel/head64.c                |    4 +
>   arch/x86/kernel/head_32.S               |    4 +
>   arch/x86/kernel/microcode_intel_early.c |    8 +-
>   arch/x86/kernel/setup.c                 |   93 ++++++++++++++++------
>   arch/x86/mm/init.c                      |    2 +-
>   arch/x86/xen/enlighten.c                |    2 +-
>   arch/x86/xen/mmu.c                      |    6 +-
>   arch/x86/xen/p2m.c                      |   27 ++++---
>   drivers/acpi/osl.c                      |  130 ++++++++++++++++++++-----------
>   include/linux/acpi.h                    |    5 +-
>   12 files changed, 196 insertions(+), 98 deletions(-)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>
>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa.
  2013-08-21 10:15   ` Tang Chen
@ 2013-08-21 12:26     ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 116+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-21 12:26 UTC (permalink / raw)
  To: Tang Chen, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Tang Chen <tangchen@cn.fujitsu.com> wrote:
>We are going to do acpi_initrd_override() at very early time:
>
>On 32bit: do it in head_32.S, before paging is enabled. In this case,
>we can
>          access initrd with physical address without page tables.
>
>On 64bit: do it in head_64.c, after paging is enabled but before direct
>mapping
>          is setup.
>
>   On 64bit, we have an early page fault handler to help to access data
>     with direct mapping page tables. So it is easy to do in head_64.c.
>
>And we need to allocate memory to store override tables. At such an
>early time,
>no memory allocator works. So we can only use BRK.
>
>As mentioned above, on 32bit before paging is enabled, we have to
>access variables
>with pa. So introduce a "bool is_phys" parameter to extend_brk(), and
>convert va
>to pa is it is true.

Could you do it differently? Meaning have a global symbol (paging_enabled) which will be used by most of the functions you changed in this patch and the next ones? It would naturally be enabled when paging is on and __va addresses can be used.

That could also be used in the printk case to do a BUG_ON before paging is enabled on 32bit. Or perhaps use a different code path to deal with using __pa address.

?
>
>Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>---
> arch/x86/include/asm/dmi.h   |    2 +-
> arch/x86/include/asm/setup.h |    2 +-
> arch/x86/kernel/setup.c      |   20 ++++++++++++++------
> arch/x86/mm/init.c           |    2 +-
> arch/x86/xen/enlighten.c     |    2 +-
> arch/x86/xen/mmu.c           |    6 +++---
> arch/x86/xen/p2m.c           |   27 ++++++++++++++-------------
> drivers/acpi/osl.c           |    2 +-
> 8 files changed, 36 insertions(+), 27 deletions(-)
>
>diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
>index fd8f9e2..3b51d81 100644
>--- a/arch/x86/include/asm/dmi.h
>+++ b/arch/x86/include/asm/dmi.h
>@@ -9,7 +9,7 @@
>
> static __always_inline __init void *dmi_alloc(unsigned len)
> {
>-	return extend_brk(len, sizeof(int));
>+	return extend_brk(len, sizeof(int), false);
> }
>
> /* Use early IO mappings for DMI because it's initialized early */
>diff --git a/arch/x86/include/asm/setup.h
>b/arch/x86/include/asm/setup.h
>index 4f71d48..96d00da 100644
>--- a/arch/x86/include/asm/setup.h
>+++ b/arch/x86/include/asm/setup.h
>@@ -75,7 +75,7 @@ extern struct boot_params boot_params;
>
> /* exceedingly early brk-like allocator */
> extern unsigned long _brk_end;
>-void *extend_brk(size_t size, size_t align);
>+void *extend_brk(size_t size, size_t align, bool is_phys);
>
> /*
>  * Reserve space in the brk section.  The name must be unique within
>diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>index 51fcd5d..a189909 100644
>--- a/arch/x86/kernel/setup.c
>+++ b/arch/x86/kernel/setup.c
>@@ -259,19 +259,27 @@ static inline void __init copy_edd(void)
> }
> #endif
>
>-void * __init extend_brk(size_t size, size_t align)
>+void * __init extend_brk(size_t size, size_t align, bool is_phys)
> {
> 	size_t mask = align - 1;
> 	void *ret;
>+	unsigned long *brk_start, *brk_end, *brk_limit;
>
>-	BUG_ON(_brk_start == 0);
>+	brk_start = is_phys ? (unsigned long *)__pa_nodebug(&_brk_start) :
>+			      (unsigned long *)&_brk_start;
>+	brk_end = is_phys ? (unsigned long *)__pa_nodebug(&_brk_end) :
>+			    (unsigned long *)&_brk_end;
>+	brk_limit = is_phys ? (unsigned long *)__pa_nodebug(__brk_limit) :
>+			      (unsigned long *)__brk_limit;
>+
>+	BUG_ON(*brk_start == 0);
> 	BUG_ON(align & mask);
>
>-	_brk_end = (_brk_end + mask) & ~mask;
>-	BUG_ON((char *)(_brk_end + size) > __brk_limit);
>+	*brk_end = (*brk_end + mask) & ~mask;
>+	BUG_ON((char *)(*brk_end + size) > brk_limit);
>
>-	ret = (void *)_brk_end;
>-	_brk_end += size;
>+	ret = (void *)(*brk_end);
>+	*brk_end += size;
>
> 	memset(ret, 0, size);
>
>diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>index 2ec29ac..189a9e2 100644
>--- a/arch/x86/mm/init.c
>+++ b/arch/x86/mm/init.c
>@@ -86,7 +86,7 @@ void  __init early_alloc_pgt_buf(void)
> 	unsigned long tables = INIT_PGT_BUF_SIZE;
> 	phys_addr_t base;
>
>-	base = __pa(extend_brk(tables, PAGE_SIZE));
>+	base = __pa(extend_brk(tables, PAGE_SIZE, false));
>
> 	pgt_buf_start = base >> PAGE_SHIFT;
> 	pgt_buf_end = pgt_buf_start;
>diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>index 193097e..2d5a34f 100644
>--- a/arch/x86/xen/enlighten.c
>+++ b/arch/x86/xen/enlighten.c
>@@ -1629,7 +1629,7 @@ void __ref xen_hvm_init_shared_info(void)
>
> 	if (!shared_info_page)
> 		shared_info_page = (struct shared_info *)
>-			extend_brk(PAGE_SIZE, PAGE_SIZE);
>+			extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 	xatp.domid = DOMID_SELF;
> 	xatp.idx = 0;
> 	xatp.space = XENMAPSPACE_shared_info;
>diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>index fdc3ba2..573bc50 100644
>--- a/arch/x86/xen/mmu.c
>+++ b/arch/x86/xen/mmu.c
>@@ -1768,7 +1768,7 @@ static void __init xen_map_identity_early(pmd_t
>*pmd, unsigned long max_pfn)
> 	unsigned long pfn;
>
> 	level1_ident_pgt = extend_brk(sizeof(pte_t) * LEVEL1_IDENT_ENTRIES,
>-				      PAGE_SIZE);
>+				      PAGE_SIZE, false);
>
> 	ident_pte = 0;
> 	pfn = 0;
>@@ -1980,7 +1980,7 @@ static void __init xen_write_cr3_init(unsigned
>long cr3)
> 	 * swapper_pg_dir.
> 	 */
> 	swapper_kernel_pmd =
>-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
>+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
> 	copy_page(swapper_kernel_pmd, initial_kernel_pmd);
> 	swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
> 		__pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
>@@ -2003,7 +2003,7 @@ void __init xen_setup_kernel_pagetable(pgd_t
>*pgd, unsigned long max_pfn)
> 	pmd_t *kernel_pmd;
>
> 	initial_kernel_pmd =
>-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
>+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
>
> 	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
> 				  xen_start_info->nr_pt_frames * PAGE_SIZE +
>diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
>index 95fb2aa..bbdcf20 100644
>--- a/arch/x86/xen/p2m.c
>+++ b/arch/x86/xen/p2m.c
>@@ -281,13 +281,13 @@ void __ref xen_build_mfn_list_list(void)
>
> 	/* Pre-initialize p2m_top_mfn to be completely missing */
> 	if (p2m_top_mfn == NULL) {
>-		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 		p2m_mid_mfn_init(p2m_mid_missing_mfn);
>
>-		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 		p2m_top_mfn_p_init(p2m_top_mfn_p);
>
>-		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 		p2m_top_mfn_init(p2m_top_mfn);
> 	} else {
> 		/* Reinitialise, mfn's all change after migration */
>@@ -322,7 +322,7 @@ void __ref xen_build_mfn_list_list(void)
> 			 * runtime.  extend_brk() will BUG if we call
> 			 * it too late.
> 			 */
>-			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 			p2m_mid_mfn_init(mid_mfn_p);
>
> 			p2m_top_mfn_p[topidx] = mid_mfn_p;
>@@ -351,16 +351,16 @@ void __init
>xen_build_dynamic_phys_to_machine(void)
>
> 	xen_max_p2m_pfn = max_pfn;
>
>-	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 	p2m_init(p2m_missing);
>
>-	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 	p2m_mid_init(p2m_mid_missing);
>
>-	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 	p2m_top_init(p2m_top);
>
>-	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 	p2m_init(p2m_identity);
>
> 	/*
>@@ -373,7 +373,8 @@ void __init xen_build_dynamic_phys_to_machine(void)
> 		unsigned mididx = p2m_mid_index(pfn);
>
> 		if (p2m_top[topidx] == p2m_mid_missing) {
>-			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE,
>+							 false);
> 			p2m_mid_init(mid);
>
> 			p2m_top[topidx] = mid;
>@@ -609,7 +610,7 @@ static bool __init early_alloc_p2m_middle(unsigned
>long pfn, bool check_boundary
> 		return false;
>
> 	/* Boundary cross-over for the edges: */
>-	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>
> 	p2m_init(p2m);
>
>@@ -635,7 +636,7 @@ static bool __init early_alloc_p2m(unsigned long
>pfn)
> 	mid = p2m_top[topidx];
> 	mid_mfn_p = p2m_top_mfn_p[topidx];
> 	if (mid == p2m_mid_missing) {
>-		mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+		mid = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>
> 		p2m_mid_init(mid);
>
>@@ -645,7 +646,7 @@ static bool __init early_alloc_p2m(unsigned long
>pfn)
> 	}
> 	/* And the save/restore P2M tables.. */
> 	if (mid_mfn_p == p2m_mid_missing_mfn) {
>-		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 		p2m_mid_mfn_init(mid_mfn_p);
>
> 		p2m_top_mfn_p[topidx] = mid_mfn_p;
>@@ -858,7 +859,7 @@ static void __init m2p_override_init(void)
> 	unsigned i;
>
>	m2p_overrides = extend_brk(sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH,
>-				   sizeof(unsigned long));
>+				   sizeof(unsigned long), false);
>
> 	for (i = 0; i < M2P_OVERRIDE_HASH; i++)
> 		INIT_LIST_HEAD(&m2p_overrides[i]);
>diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>index 4c1baa7..dff7fcc 100644
>--- a/drivers/acpi/osl.c
>+++ b/drivers/acpi/osl.c
>@@ -563,7 +563,7 @@ RESERVE_BRK(acpi_override_tables_alloc,
>ACPI_OVERRIDE_TABLES_SIZE);
> void __init early_alloc_acpi_override_tables_buf(void)
> {
> 	acpi_tables_addr = __pa(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
>-					   PAGE_SIZE));
>+					   PAGE_SIZE, false));
> }
>
> void __init acpi_initrd_override(void *data, size_t size)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa.
@ 2013-08-21 12:26     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 116+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-21 12:26 UTC (permalink / raw)
  To: Tang Chen, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, tj, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Tang Chen <tangchen@cn.fujitsu.com> wrote:
>We are going to do acpi_initrd_override() at very early time:
>
>On 32bit: do it in head_32.S, before paging is enabled. In this case,
>we can
>          access initrd with physical address without page tables.
>
>On 64bit: do it in head_64.c, after paging is enabled but before direct
>mapping
>          is setup.
>
>   On 64bit, we have an early page fault handler to help to access data
>     with direct mapping page tables. So it is easy to do in head_64.c.
>
>And we need to allocate memory to store override tables. At such an
>early time,
>no memory allocator works. So we can only use BRK.
>
>As mentioned above, on 32bit before paging is enabled, we have to
>access variables
>with pa. So introduce a "bool is_phys" parameter to extend_brk(), and
>convert va
>to pa is it is true.

Could you do it differently? Meaning have a global symbol (paging_enabled) which will be used by most of the functions you changed in this patch and the next ones? It would naturally be enabled when paging is on and __va addresses can be used. 

That could also be used in the printk case to do a BUG_ON before paging is enabled on 32bit. Or perhaps use a different code path to deal with using __pa address. 

? 
>
>Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>---
> arch/x86/include/asm/dmi.h   |    2 +-
> arch/x86/include/asm/setup.h |    2 +-
> arch/x86/kernel/setup.c      |   20 ++++++++++++++------
> arch/x86/mm/init.c           |    2 +-
> arch/x86/xen/enlighten.c     |    2 +-
> arch/x86/xen/mmu.c           |    6 +++---
> arch/x86/xen/p2m.c           |   27 ++++++++++++++-------------
> drivers/acpi/osl.c           |    2 +-
> 8 files changed, 36 insertions(+), 27 deletions(-)
>
>diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
>index fd8f9e2..3b51d81 100644
>--- a/arch/x86/include/asm/dmi.h
>+++ b/arch/x86/include/asm/dmi.h
>@@ -9,7 +9,7 @@
> 
> static __always_inline __init void *dmi_alloc(unsigned len)
> {
>-	return extend_brk(len, sizeof(int));
>+	return extend_brk(len, sizeof(int), false);
> }
> 
> /* Use early IO mappings for DMI because it's initialized early */
>diff --git a/arch/x86/include/asm/setup.h
>b/arch/x86/include/asm/setup.h
>index 4f71d48..96d00da 100644
>--- a/arch/x86/include/asm/setup.h
>+++ b/arch/x86/include/asm/setup.h
>@@ -75,7 +75,7 @@ extern struct boot_params boot_params;
> 
> /* exceedingly early brk-like allocator */
> extern unsigned long _brk_end;
>-void *extend_brk(size_t size, size_t align);
>+void *extend_brk(size_t size, size_t align, bool is_phys);
> 
> /*
>  * Reserve space in the brk section.  The name must be unique within
>diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>index 51fcd5d..a189909 100644
>--- a/arch/x86/kernel/setup.c
>+++ b/arch/x86/kernel/setup.c
>@@ -259,19 +259,27 @@ static inline void __init copy_edd(void)
> }
> #endif
> 
>-void * __init extend_brk(size_t size, size_t align)
>+void * __init extend_brk(size_t size, size_t align, bool is_phys)
> {
> 	size_t mask = align - 1;
> 	void *ret;
>+	unsigned long *brk_start, *brk_end, *brk_limit;
> 
>-	BUG_ON(_brk_start == 0);
>+	brk_start = is_phys ? (unsigned long *)__pa_nodebug(&_brk_start) :
>+			      (unsigned long *)&_brk_start;
>+	brk_end = is_phys ? (unsigned long *)__pa_nodebug(&_brk_end) :
>+			    (unsigned long *)&_brk_end;
>+	brk_limit = is_phys ? (unsigned long *)__pa_nodebug(__brk_limit) :
>+			      (unsigned long *)__brk_limit;
>+
>+	BUG_ON(*brk_start == 0);
> 	BUG_ON(align & mask);
> 
>-	_brk_end = (_brk_end + mask) & ~mask;
>-	BUG_ON((char *)(_brk_end + size) > __brk_limit);
>+	*brk_end = (*brk_end + mask) & ~mask;
>+	BUG_ON((char *)(*brk_end + size) > brk_limit);
> 
>-	ret = (void *)_brk_end;
>-	_brk_end += size;
>+	ret = (void *)(*brk_end);
>+	*brk_end += size;
> 
> 	memset(ret, 0, size);
> 
>diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>index 2ec29ac..189a9e2 100644
>--- a/arch/x86/mm/init.c
>+++ b/arch/x86/mm/init.c
>@@ -86,7 +86,7 @@ void  __init early_alloc_pgt_buf(void)
> 	unsigned long tables = INIT_PGT_BUF_SIZE;
> 	phys_addr_t base;
> 
>-	base = __pa(extend_brk(tables, PAGE_SIZE));
>+	base = __pa(extend_brk(tables, PAGE_SIZE, false));
> 
> 	pgt_buf_start = base >> PAGE_SHIFT;
> 	pgt_buf_end = pgt_buf_start;
>diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>index 193097e..2d5a34f 100644
>--- a/arch/x86/xen/enlighten.c
>+++ b/arch/x86/xen/enlighten.c
>@@ -1629,7 +1629,7 @@ void __ref xen_hvm_init_shared_info(void)
> 
> 	if (!shared_info_page)
> 		shared_info_page = (struct shared_info *)
>-			extend_brk(PAGE_SIZE, PAGE_SIZE);
>+			extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 	xatp.domid = DOMID_SELF;
> 	xatp.idx = 0;
> 	xatp.space = XENMAPSPACE_shared_info;
>diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>index fdc3ba2..573bc50 100644
>--- a/arch/x86/xen/mmu.c
>+++ b/arch/x86/xen/mmu.c
>@@ -1768,7 +1768,7 @@ static void __init xen_map_identity_early(pmd_t
>*pmd, unsigned long max_pfn)
> 	unsigned long pfn;
> 
> 	level1_ident_pgt = extend_brk(sizeof(pte_t) * LEVEL1_IDENT_ENTRIES,
>-				      PAGE_SIZE);
>+				      PAGE_SIZE, false);
> 
> 	ident_pte = 0;
> 	pfn = 0;
>@@ -1980,7 +1980,7 @@ static void __init xen_write_cr3_init(unsigned
>long cr3)
> 	 * swapper_pg_dir.
> 	 */
> 	swapper_kernel_pmd =
>-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
>+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
> 	copy_page(swapper_kernel_pmd, initial_kernel_pmd);
> 	swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
> 		__pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
>@@ -2003,7 +2003,7 @@ void __init xen_setup_kernel_pagetable(pgd_t
>*pgd, unsigned long max_pfn)
> 	pmd_t *kernel_pmd;
> 
> 	initial_kernel_pmd =
>-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
>+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
> 
> 	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
> 				  xen_start_info->nr_pt_frames * PAGE_SIZE +
>diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
>index 95fb2aa..bbdcf20 100644
>--- a/arch/x86/xen/p2m.c
>+++ b/arch/x86/xen/p2m.c
>@@ -281,13 +281,13 @@ void __ref xen_build_mfn_list_list(void)
> 
> 	/* Pre-initialize p2m_top_mfn to be completely missing */
> 	if (p2m_top_mfn == NULL) {
>-		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 		p2m_mid_mfn_init(p2m_mid_missing_mfn);
> 
>-		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 		p2m_top_mfn_p_init(p2m_top_mfn_p);
> 
>-		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 		p2m_top_mfn_init(p2m_top_mfn);
> 	} else {
> 		/* Reinitialise, mfn's all change after migration */
>@@ -322,7 +322,7 @@ void __ref xen_build_mfn_list_list(void)
> 			 * runtime.  extend_brk() will BUG if we call
> 			 * it too late.
> 			 */
>-			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 			p2m_mid_mfn_init(mid_mfn_p);
> 
> 			p2m_top_mfn_p[topidx] = mid_mfn_p;
>@@ -351,16 +351,16 @@ void __init
>xen_build_dynamic_phys_to_machine(void)
> 
> 	xen_max_p2m_pfn = max_pfn;
> 
>-	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 	p2m_init(p2m_missing);
> 
>-	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 	p2m_mid_init(p2m_mid_missing);
> 
>-	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 	p2m_top_init(p2m_top);
> 
>-	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 	p2m_init(p2m_identity);
> 
> 	/*
>@@ -373,7 +373,8 @@ void __init xen_build_dynamic_phys_to_machine(void)
> 		unsigned mididx = p2m_mid_index(pfn);
> 
> 		if (p2m_top[topidx] == p2m_mid_missing) {
>-			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE,
>+							 false);
> 			p2m_mid_init(mid);
> 
> 			p2m_top[topidx] = mid;
>@@ -609,7 +610,7 @@ static bool __init early_alloc_p2m_middle(unsigned
>long pfn, bool check_boundary
> 		return false;
> 
> 	/* Boundary cross-over for the edges: */
>-	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 
> 	p2m_init(p2m);
> 
>@@ -635,7 +636,7 @@ static bool __init early_alloc_p2m(unsigned long
>pfn)
> 	mid = p2m_top[topidx];
> 	mid_mfn_p = p2m_top_mfn_p[topidx];
> 	if (mid == p2m_mid_missing) {
>-		mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+		mid = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 
> 		p2m_mid_init(mid);
> 
>@@ -645,7 +646,7 @@ static bool __init early_alloc_p2m(unsigned long
>pfn)
> 	}
> 	/* And the save/restore P2M tables.. */
> 	if (mid_mfn_p == p2m_mid_missing_mfn) {
>-		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>+		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
> 		p2m_mid_mfn_init(mid_mfn_p);
> 
> 		p2m_top_mfn_p[topidx] = mid_mfn_p;
>@@ -858,7 +859,7 @@ static void __init m2p_override_init(void)
> 	unsigned i;
> 
>	m2p_overrides = extend_brk(sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH,
>-				   sizeof(unsigned long));
>+				   sizeof(unsigned long), false);
> 
> 	for (i = 0; i < M2P_OVERRIDE_HASH; i++)
> 		INIT_LIST_HEAD(&m2p_overrides[i]);
>diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>index 4c1baa7..dff7fcc 100644
>--- a/drivers/acpi/osl.c
>+++ b/drivers/acpi/osl.c
>@@ -563,7 +563,7 @@ RESERVE_BRK(acpi_override_tables_alloc,
>ACPI_OVERRIDE_TABLES_SIZE);
> void __init early_alloc_acpi_override_tables_buf(void)
> {
> 	acpi_tables_addr = __pa(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
>-					   PAGE_SIZE));
>+					   PAGE_SIZE, false));
> }
> 
> void __init acpi_initrd_override(void *data, size_t size)



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa.
  2013-08-21 12:26     ` Konrad Rzeszutek Wilk
@ 2013-08-21 12:35       ` H. Peter Anvin
  -1 siblings, 0 replies; 116+ messages in thread
From: H. Peter Anvin @ 2013-08-21 12:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Tang Chen, robert.moore, lv.zheng, rjw,
	lenb, tglx, mingo, akpm, tj, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Global symbols are inaccessible in physical mode.

This is incidentally yet another example of "PV/weird platform violence", since in their absence it would be trivial to work around this by using segmentation.

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>Tang Chen <tangchen@cn.fujitsu.com> wrote:
>>We are going to do acpi_initrd_override() at very early time:
>>
>>On 32bit: do it in head_32.S, before paging is enabled. In this case,
>>we can
>>          access initrd with physical address without page tables.
>>
>>On 64bit: do it in head_64.c, after paging is enabled but before
>direct
>>mapping
>>          is setup.
>>
>>   On 64bit, we have an early page fault handler to help to access
>data
>>     with direct mapping page tables. So it is easy to do in
>head_64.c.
>>
>>And we need to allocate memory to store override tables. At such an
>>early time,
>>no memory allocator works. So we can only use BRK.
>>
>>As mentioned above, on 32bit before paging is enabled, we have to
>>access variables
>>with pa. So introduce a "bool is_phys" parameter to extend_brk(), and
>>convert va
>>to pa is it is true.
>
>Could you do it differently? Meaning have a global symbol
>(paging_enabled) which will be used by most of the functions you
>changed in this patch and the next ones? It would naturally be enabled
>when paging is on and __va addresses can be used. 
>
>That could also be used in the printk case to do a BUG_ON before paging
>is enabled on 32bit. Or perhaps use a different code path to deal with
>using __pa address. 
>
>? 
>>
>>Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>>---
>> arch/x86/include/asm/dmi.h   |    2 +-
>> arch/x86/include/asm/setup.h |    2 +-
>> arch/x86/kernel/setup.c      |   20 ++++++++++++++------
>> arch/x86/mm/init.c           |    2 +-
>> arch/x86/xen/enlighten.c     |    2 +-
>> arch/x86/xen/mmu.c           |    6 +++---
>> arch/x86/xen/p2m.c           |   27 ++++++++++++++-------------
>> drivers/acpi/osl.c           |    2 +-
>> 8 files changed, 36 insertions(+), 27 deletions(-)
>>
>>diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
>>index fd8f9e2..3b51d81 100644
>>--- a/arch/x86/include/asm/dmi.h
>>+++ b/arch/x86/include/asm/dmi.h
>>@@ -9,7 +9,7 @@
>> 
>> static __always_inline __init void *dmi_alloc(unsigned len)
>> {
>>-	return extend_brk(len, sizeof(int));
>>+	return extend_brk(len, sizeof(int), false);
>> }
>> 
>> /* Use early IO mappings for DMI because it's initialized early */
>>diff --git a/arch/x86/include/asm/setup.h
>>b/arch/x86/include/asm/setup.h
>>index 4f71d48..96d00da 100644
>>--- a/arch/x86/include/asm/setup.h
>>+++ b/arch/x86/include/asm/setup.h
>>@@ -75,7 +75,7 @@ extern struct boot_params boot_params;
>> 
>> /* exceedingly early brk-like allocator */
>> extern unsigned long _brk_end;
>>-void *extend_brk(size_t size, size_t align);
>>+void *extend_brk(size_t size, size_t align, bool is_phys);
>> 
>> /*
>>  * Reserve space in the brk section.  The name must be unique within
>>diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>index 51fcd5d..a189909 100644
>>--- a/arch/x86/kernel/setup.c
>>+++ b/arch/x86/kernel/setup.c
>>@@ -259,19 +259,27 @@ static inline void __init copy_edd(void)
>> }
>> #endif
>> 
>>-void * __init extend_brk(size_t size, size_t align)
>>+void * __init extend_brk(size_t size, size_t align, bool is_phys)
>> {
>> 	size_t mask = align - 1;
>> 	void *ret;
>>+	unsigned long *brk_start, *brk_end, *brk_limit;
>> 
>>-	BUG_ON(_brk_start == 0);
>>+	brk_start = is_phys ? (unsigned long *)__pa_nodebug(&_brk_start) :
>>+			      (unsigned long *)&_brk_start;
>>+	brk_end = is_phys ? (unsigned long *)__pa_nodebug(&_brk_end) :
>>+			    (unsigned long *)&_brk_end;
>>+	brk_limit = is_phys ? (unsigned long *)__pa_nodebug(__brk_limit) :
>>+			      (unsigned long *)__brk_limit;
>>+
>>+	BUG_ON(*brk_start == 0);
>> 	BUG_ON(align & mask);
>> 
>>-	_brk_end = (_brk_end + mask) & ~mask;
>>-	BUG_ON((char *)(_brk_end + size) > __brk_limit);
>>+	*brk_end = (*brk_end + mask) & ~mask;
>>+	BUG_ON((char *)(*brk_end + size) > brk_limit);
>> 
>>-	ret = (void *)_brk_end;
>>-	_brk_end += size;
>>+	ret = (void *)(*brk_end);
>>+	*brk_end += size;
>> 
>> 	memset(ret, 0, size);
>> 
>>diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>>index 2ec29ac..189a9e2 100644
>>--- a/arch/x86/mm/init.c
>>+++ b/arch/x86/mm/init.c
>>@@ -86,7 +86,7 @@ void  __init early_alloc_pgt_buf(void)
>> 	unsigned long tables = INIT_PGT_BUF_SIZE;
>> 	phys_addr_t base;
>> 
>>-	base = __pa(extend_brk(tables, PAGE_SIZE));
>>+	base = __pa(extend_brk(tables, PAGE_SIZE, false));
>> 
>> 	pgt_buf_start = base >> PAGE_SHIFT;
>> 	pgt_buf_end = pgt_buf_start;
>>diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>>index 193097e..2d5a34f 100644
>>--- a/arch/x86/xen/enlighten.c
>>+++ b/arch/x86/xen/enlighten.c
>>@@ -1629,7 +1629,7 @@ void __ref xen_hvm_init_shared_info(void)
>> 
>> 	if (!shared_info_page)
>> 		shared_info_page = (struct shared_info *)
>>-			extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+			extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 	xatp.domid = DOMID_SELF;
>> 	xatp.idx = 0;
>> 	xatp.space = XENMAPSPACE_shared_info;
>>diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>>index fdc3ba2..573bc50 100644
>>--- a/arch/x86/xen/mmu.c
>>+++ b/arch/x86/xen/mmu.c
>>@@ -1768,7 +1768,7 @@ static void __init xen_map_identity_early(pmd_t
>>*pmd, unsigned long max_pfn)
>> 	unsigned long pfn;
>> 
>> 	level1_ident_pgt = extend_brk(sizeof(pte_t) * LEVEL1_IDENT_ENTRIES,
>>-				      PAGE_SIZE);
>>+				      PAGE_SIZE, false);
>> 
>> 	ident_pte = 0;
>> 	pfn = 0;
>>@@ -1980,7 +1980,7 @@ static void __init xen_write_cr3_init(unsigned
>>long cr3)
>> 	 * swapper_pg_dir.
>> 	 */
>> 	swapper_kernel_pmd =
>>-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
>>+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
>> 	copy_page(swapper_kernel_pmd, initial_kernel_pmd);
>> 	swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
>> 		__pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
>>@@ -2003,7 +2003,7 @@ void __init xen_setup_kernel_pagetable(pgd_t
>>*pgd, unsigned long max_pfn)
>> 	pmd_t *kernel_pmd;
>> 
>> 	initial_kernel_pmd =
>>-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
>>+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
>> 
>> 	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
>> 				  xen_start_info->nr_pt_frames * PAGE_SIZE +
>>diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
>>index 95fb2aa..bbdcf20 100644
>>--- a/arch/x86/xen/p2m.c
>>+++ b/arch/x86/xen/p2m.c
>>@@ -281,13 +281,13 @@ void __ref xen_build_mfn_list_list(void)
>> 
>> 	/* Pre-initialize p2m_top_mfn to be completely missing */
>> 	if (p2m_top_mfn == NULL) {
>>-		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 		p2m_mid_mfn_init(p2m_mid_missing_mfn);
>> 
>>-		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 		p2m_top_mfn_p_init(p2m_top_mfn_p);
>> 
>>-		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 		p2m_top_mfn_init(p2m_top_mfn);
>> 	} else {
>> 		/* Reinitialise, mfn's all change after migration */
>>@@ -322,7 +322,7 @@ void __ref xen_build_mfn_list_list(void)
>> 			 * runtime.  extend_brk() will BUG if we call
>> 			 * it too late.
>> 			 */
>>-			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 			p2m_mid_mfn_init(mid_mfn_p);
>> 
>> 			p2m_top_mfn_p[topidx] = mid_mfn_p;
>>@@ -351,16 +351,16 @@ void __init
>>xen_build_dynamic_phys_to_machine(void)
>> 
>> 	xen_max_p2m_pfn = max_pfn;
>> 
>>-	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 	p2m_init(p2m_missing);
>> 
>>-	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 	p2m_mid_init(p2m_mid_missing);
>> 
>>-	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 	p2m_top_init(p2m_top);
>> 
>>-	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 	p2m_init(p2m_identity);
>> 
>> 	/*
>>@@ -373,7 +373,8 @@ void __init
>xen_build_dynamic_phys_to_machine(void)
>> 		unsigned mididx = p2m_mid_index(pfn);
>> 
>> 		if (p2m_top[topidx] == p2m_mid_missing) {
>>-			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE,
>>+							 false);
>> 			p2m_mid_init(mid);
>> 
>> 			p2m_top[topidx] = mid;
>>@@ -609,7 +610,7 @@ static bool __init early_alloc_p2m_middle(unsigned
>>long pfn, bool check_boundary
>> 		return false;
>> 
>> 	/* Boundary cross-over for the edges: */
>>-	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 
>> 	p2m_init(p2m);
>> 
>>@@ -635,7 +636,7 @@ static bool __init early_alloc_p2m(unsigned long
>>pfn)
>> 	mid = p2m_top[topidx];
>> 	mid_mfn_p = p2m_top_mfn_p[topidx];
>> 	if (mid == p2m_mid_missing) {
>>-		mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+		mid = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 
>> 		p2m_mid_init(mid);
>> 
>>@@ -645,7 +646,7 @@ static bool __init early_alloc_p2m(unsigned long
>>pfn)
>> 	}
>> 	/* And the save/restore P2M tables.. */
>> 	if (mid_mfn_p == p2m_mid_missing_mfn) {
>>-		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 		p2m_mid_mfn_init(mid_mfn_p);
>> 
>> 		p2m_top_mfn_p[topidx] = mid_mfn_p;
>>@@ -858,7 +859,7 @@ static void __init m2p_override_init(void)
>> 	unsigned i;
>> 
>>	m2p_overrides = extend_brk(sizeof(*m2p_overrides) *
>M2P_OVERRIDE_HASH,
>>-				   sizeof(unsigned long));
>>+				   sizeof(unsigned long), false);
>> 
>> 	for (i = 0; i < M2P_OVERRIDE_HASH; i++)
>> 		INIT_LIST_HEAD(&m2p_overrides[i]);
>>diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>>index 4c1baa7..dff7fcc 100644
>>--- a/drivers/acpi/osl.c
>>+++ b/drivers/acpi/osl.c
>>@@ -563,7 +563,7 @@ RESERVE_BRK(acpi_override_tables_alloc,
>>ACPI_OVERRIDE_TABLES_SIZE);
>> void __init early_alloc_acpi_override_tables_buf(void)
>> {
>> 	acpi_tables_addr = __pa(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
>>-					   PAGE_SIZE));
>>+					   PAGE_SIZE, false));
>> }
>> 
>> void __init acpi_initrd_override(void *data, size_t size)

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa.
@ 2013-08-21 12:35       ` H. Peter Anvin
  0 siblings, 0 replies; 116+ messages in thread
From: H. Peter Anvin @ 2013-08-21 12:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Tang Chen, robert.moore, lv.zheng, rjw,
	lenb, tglx, mingo, akpm, tj, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Global symbols are inaccessible in physical mode.

This is incidentally yet another example of "PV/weird platform violence", since in their absence it would be trivial to work around this by using segmentation.

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>Tang Chen <tangchen@cn.fujitsu.com> wrote:
>>We are going to do acpi_initrd_override() at very early time:
>>
>>On 32bit: do it in head_32.S, before paging is enabled. In this case,
>>we can
>>          access initrd with physical address without page tables.
>>
>>On 64bit: do it in head_64.c, after paging is enabled but before
>direct
>>mapping
>>          is setup.
>>
>>   On 64bit, we have an early page fault handler to help to access
>data
>>     with direct mapping page tables. So it is easy to do in
>head_64.c.
>>
>>And we need to allocate memory to store override tables. At such an
>>early time,
>>no memory allocator works. So we can only use BRK.
>>
>>As mentioned above, on 32bit before paging is enabled, we have to
>>access variables
>>with pa. So introduce a "bool is_phys" parameter to extend_brk(), and
>>convert va
>>to pa is it is true.
>
>Could you do it differently? Meaning have a global symbol
>(paging_enabled) which will be used by most of the functions you
>changed in this patch and the next ones? It would naturally be enabled
>when paging is on and __va addresses can be used. 
>
>That could also be used in the printk case to do a BUG_ON before paging
>is enabled on 32bit. Or perhaps use a different code path to deal with
>using __pa address. 
>
>? 
>>
>>Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>>---
>> arch/x86/include/asm/dmi.h   |    2 +-
>> arch/x86/include/asm/setup.h |    2 +-
>> arch/x86/kernel/setup.c      |   20 ++++++++++++++------
>> arch/x86/mm/init.c           |    2 +-
>> arch/x86/xen/enlighten.c     |    2 +-
>> arch/x86/xen/mmu.c           |    6 +++---
>> arch/x86/xen/p2m.c           |   27 ++++++++++++++-------------
>> drivers/acpi/osl.c           |    2 +-
>> 8 files changed, 36 insertions(+), 27 deletions(-)
>>
>>diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
>>index fd8f9e2..3b51d81 100644
>>--- a/arch/x86/include/asm/dmi.h
>>+++ b/arch/x86/include/asm/dmi.h
>>@@ -9,7 +9,7 @@
>> 
>> static __always_inline __init void *dmi_alloc(unsigned len)
>> {
>>-	return extend_brk(len, sizeof(int));
>>+	return extend_brk(len, sizeof(int), false);
>> }
>> 
>> /* Use early IO mappings for DMI because it's initialized early */
>>diff --git a/arch/x86/include/asm/setup.h
>>b/arch/x86/include/asm/setup.h
>>index 4f71d48..96d00da 100644
>>--- a/arch/x86/include/asm/setup.h
>>+++ b/arch/x86/include/asm/setup.h
>>@@ -75,7 +75,7 @@ extern struct boot_params boot_params;
>> 
>> /* exceedingly early brk-like allocator */
>> extern unsigned long _brk_end;
>>-void *extend_brk(size_t size, size_t align);
>>+void *extend_brk(size_t size, size_t align, bool is_phys);
>> 
>> /*
>>  * Reserve space in the brk section.  The name must be unique within
>>diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>index 51fcd5d..a189909 100644
>>--- a/arch/x86/kernel/setup.c
>>+++ b/arch/x86/kernel/setup.c
>>@@ -259,19 +259,27 @@ static inline void __init copy_edd(void)
>> }
>> #endif
>> 
>>-void * __init extend_brk(size_t size, size_t align)
>>+void * __init extend_brk(size_t size, size_t align, bool is_phys)
>> {
>> 	size_t mask = align - 1;
>> 	void *ret;
>>+	unsigned long *brk_start, *brk_end, *brk_limit;
>> 
>>-	BUG_ON(_brk_start == 0);
>>+	brk_start = is_phys ? (unsigned long *)__pa_nodebug(&_brk_start) :
>>+			      (unsigned long *)&_brk_start;
>>+	brk_end = is_phys ? (unsigned long *)__pa_nodebug(&_brk_end) :
>>+			    (unsigned long *)&_brk_end;
>>+	brk_limit = is_phys ? (unsigned long *)__pa_nodebug(__brk_limit) :
>>+			      (unsigned long *)__brk_limit;
>>+
>>+	BUG_ON(*brk_start == 0);
>> 	BUG_ON(align & mask);
>> 
>>-	_brk_end = (_brk_end + mask) & ~mask;
>>-	BUG_ON((char *)(_brk_end + size) > __brk_limit);
>>+	*brk_end = (*brk_end + mask) & ~mask;
>>+	BUG_ON((char *)(*brk_end + size) > brk_limit);
>> 
>>-	ret = (void *)_brk_end;
>>-	_brk_end += size;
>>+	ret = (void *)(*brk_end);
>>+	*brk_end += size;
>> 
>> 	memset(ret, 0, size);
>> 
>>diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>>index 2ec29ac..189a9e2 100644
>>--- a/arch/x86/mm/init.c
>>+++ b/arch/x86/mm/init.c
>>@@ -86,7 +86,7 @@ void  __init early_alloc_pgt_buf(void)
>> 	unsigned long tables = INIT_PGT_BUF_SIZE;
>> 	phys_addr_t base;
>> 
>>-	base = __pa(extend_brk(tables, PAGE_SIZE));
>>+	base = __pa(extend_brk(tables, PAGE_SIZE, false));
>> 
>> 	pgt_buf_start = base >> PAGE_SHIFT;
>> 	pgt_buf_end = pgt_buf_start;
>>diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>>index 193097e..2d5a34f 100644
>>--- a/arch/x86/xen/enlighten.c
>>+++ b/arch/x86/xen/enlighten.c
>>@@ -1629,7 +1629,7 @@ void __ref xen_hvm_init_shared_info(void)
>> 
>> 	if (!shared_info_page)
>> 		shared_info_page = (struct shared_info *)
>>-			extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+			extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 	xatp.domid = DOMID_SELF;
>> 	xatp.idx = 0;
>> 	xatp.space = XENMAPSPACE_shared_info;
>>diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>>index fdc3ba2..573bc50 100644
>>--- a/arch/x86/xen/mmu.c
>>+++ b/arch/x86/xen/mmu.c
>>@@ -1768,7 +1768,7 @@ static void __init xen_map_identity_early(pmd_t
>>*pmd, unsigned long max_pfn)
>> 	unsigned long pfn;
>> 
>> 	level1_ident_pgt = extend_brk(sizeof(pte_t) * LEVEL1_IDENT_ENTRIES,
>>-				      PAGE_SIZE);
>>+				      PAGE_SIZE, false);
>> 
>> 	ident_pte = 0;
>> 	pfn = 0;
>>@@ -1980,7 +1980,7 @@ static void __init xen_write_cr3_init(unsigned
>>long cr3)
>> 	 * swapper_pg_dir.
>> 	 */
>> 	swapper_kernel_pmd =
>>-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
>>+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
>> 	copy_page(swapper_kernel_pmd, initial_kernel_pmd);
>> 	swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
>> 		__pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
>>@@ -2003,7 +2003,7 @@ void __init xen_setup_kernel_pagetable(pgd_t
>>*pgd, unsigned long max_pfn)
>> 	pmd_t *kernel_pmd;
>> 
>> 	initial_kernel_pmd =
>>-		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
>>+		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE, false);
>> 
>> 	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
>> 				  xen_start_info->nr_pt_frames * PAGE_SIZE +
>>diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
>>index 95fb2aa..bbdcf20 100644
>>--- a/arch/x86/xen/p2m.c
>>+++ b/arch/x86/xen/p2m.c
>>@@ -281,13 +281,13 @@ void __ref xen_build_mfn_list_list(void)
>> 
>> 	/* Pre-initialize p2m_top_mfn to be completely missing */
>> 	if (p2m_top_mfn == NULL) {
>>-		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 		p2m_mid_mfn_init(p2m_mid_missing_mfn);
>> 
>>-		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+		p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 		p2m_top_mfn_p_init(p2m_top_mfn_p);
>> 
>>-		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+		p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 		p2m_top_mfn_init(p2m_top_mfn);
>> 	} else {
>> 		/* Reinitialise, mfn's all change after migration */
>>@@ -322,7 +322,7 @@ void __ref xen_build_mfn_list_list(void)
>> 			 * runtime.  extend_brk() will BUG if we call
>> 			 * it too late.
>> 			 */
>>-			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+			mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 			p2m_mid_mfn_init(mid_mfn_p);
>> 
>> 			p2m_top_mfn_p[topidx] = mid_mfn_p;
>>@@ -351,16 +351,16 @@ void __init
>>xen_build_dynamic_phys_to_machine(void)
>> 
>> 	xen_max_p2m_pfn = max_pfn;
>> 
>>-	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 	p2m_init(p2m_missing);
>> 
>>-	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+	p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 	p2m_mid_init(p2m_mid_missing);
>> 
>>-	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+	p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 	p2m_top_init(p2m_top);
>> 
>>-	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+	p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 	p2m_init(p2m_identity);
>> 
>> 	/*
>>@@ -373,7 +373,8 @@ void __init
>xen_build_dynamic_phys_to_machine(void)
>> 		unsigned mididx = p2m_mid_index(pfn);
>> 
>> 		if (p2m_top[topidx] == p2m_mid_missing) {
>>-			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+			unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE,
>>+							 false);
>> 			p2m_mid_init(mid);
>> 
>> 			p2m_top[topidx] = mid;
>>@@ -609,7 +610,7 @@ static bool __init early_alloc_p2m_middle(unsigned
>>long pfn, bool check_boundary
>> 		return false;
>> 
>> 	/* Boundary cross-over for the edges: */
>>-	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 
>> 	p2m_init(p2m);
>> 
>>@@ -635,7 +636,7 @@ static bool __init early_alloc_p2m(unsigned long
>>pfn)
>> 	mid = p2m_top[topidx];
>> 	mid_mfn_p = p2m_top_mfn_p[topidx];
>> 	if (mid == p2m_mid_missing) {
>>-		mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+		mid = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 
>> 		p2m_mid_init(mid);
>> 
>>@@ -645,7 +646,7 @@ static bool __init early_alloc_p2m(unsigned long
>>pfn)
>> 	}
>> 	/* And the save/restore P2M tables.. */
>> 	if (mid_mfn_p == p2m_mid_missing_mfn) {
>>-		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
>>+		mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE, false);
>> 		p2m_mid_mfn_init(mid_mfn_p);
>> 
>> 		p2m_top_mfn_p[topidx] = mid_mfn_p;
>>@@ -858,7 +859,7 @@ static void __init m2p_override_init(void)
>> 	unsigned i;
>> 
>>	m2p_overrides = extend_brk(sizeof(*m2p_overrides) *
>M2P_OVERRIDE_HASH,
>>-				   sizeof(unsigned long));
>>+				   sizeof(unsigned long), false);
>> 
>> 	for (i = 0; i < M2P_OVERRIDE_HASH; i++)
>> 		INIT_LIST_HEAD(&m2p_overrides[i]);
>>diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>>index 4c1baa7..dff7fcc 100644
>>--- a/drivers/acpi/osl.c
>>+++ b/drivers/acpi/osl.c
>>@@ -563,7 +563,7 @@ RESERVE_BRK(acpi_override_tables_alloc,
>>ACPI_OVERRIDE_TABLES_SIZE);
>> void __init early_alloc_acpi_override_tables_buf(void)
>> {
>> 	acpi_tables_addr = __pa(extend_brk(ACPI_OVERRIDE_TABLES_SIZE,
>>-					   PAGE_SIZE));
>>+					   PAGE_SIZE, false));
>> }
>> 
>> void __init acpi_initrd_override(void *data, size_t size)

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-21 10:15 ` Tang Chen
@ 2013-08-21 13:06   ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-21 13:06 UTC (permalink / raw)
  To: Tang Chen
  Cc: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello,

On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote:
> [What are we doing]
> 
> We are trying to initialize acip tables as early as possible. But Linux kernel
> allows users to override acpi tables by specifying their own tables in initrd.
> So we have to do acpi_initrd_override() earlier first.

So, are we now back to making SRAT info as early as possible?  What
happened to just co-locating early allocations close to kernel image?
What'd be the benefit of doing this over that?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 13:06   ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-21 13:06 UTC (permalink / raw)
  To: Tang Chen
  Cc: konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, hpa,
	akpm, trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello,

On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote:
> [What are we doing]
> 
> We are trying to initialize acip tables as early as possible. But Linux kernel
> allows users to override acpi tables by specifying their own tables in initrd.
> So we have to do acpi_initrd_override() earlier first.

So, are we now back to making SRAT info as early as possible?  What
happened to just co-locating early allocations close to kernel image?
What'd be the benefit of doing this over that?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa.
  2013-08-21 12:35       ` H. Peter Anvin
@ 2013-08-21 14:42         ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 116+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-21 14:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tang Chen, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, akpm,
	tj, trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On Wed, Aug 21, 2013 at 02:35:36PM +0200, H. Peter Anvin wrote:
> Global symbols are inaccessible in physical mode.

Even if they are embedded in the assembler code and use GLOBAL(paging_enabled) ?

> 
> This is incidentally yet another example of "PV/weird platform violence", since in their absence it would be trivial to work around this by using segmentation.

I don't follow why it could not.

Why can't there be a __pa_symbol(paging_enabled) that is used. Won't
that in effect allow you to check the contents of that 'global
constant' even when you don't have paging enabled?

> >>As mentioned above, on 32bit before paging is enabled, we have to
> >>access variables
> >>with pa. So introduce a "bool is_phys" parameter to extend_brk(), and
> >>convert va
> >>to pa is it is true.
> >
> >Could you do it differently? Meaning have a global symbol
> >(paging_enabled) which will be used by most of the functions you
> >changed in this patch and the next ones? It would naturally be enabled
> >when paging is on and __va addresses can be used. 
> >
> >That could also be used in the printk case to do a BUG_ON before paging
> >is enabled on 32bit. Or perhaps use a different code path to deal with
> >using __pa address. 
> >
> >? 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa.
@ 2013-08-21 14:42         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 116+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-21 14:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tang Chen, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, akpm,
	tj, trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On Wed, Aug 21, 2013 at 02:35:36PM +0200, H. Peter Anvin wrote:
> Global symbols are inaccessible in physical mode.

Even if they are embedded in the assembler code and use GLOBAL(paging_enabled) ?

> 
> This is incidentally yet another example of "PV/weird platform violence", since in their absence it would be trivial to work around this by using segmentation.

I don't follow why it could not.

Why can't there be a __pa_symbol(paging_enabled) that is used. Won't
that in effect allow you to check the contents of that 'global
constant' even when you don't have paging enabled?

> >>As mentioned above, on 32bit before paging is enabled, we have to
> >>access variables
> >>with pa. So introduce a "bool is_phys" parameter to extend_brk(), and
> >>convert va
> >>to pa is it is true.
> >
> >Could you do it differently? Meaning have a global symbol
> >(paging_enabled) which will be used by most of the functions you
> >changed in this patch and the next ones? It would naturally be enabled
> >when paging is on and __va addresses can be used. 
> >
> >That could also be used in the printk case to do a BUG_ON before paging
> >is enabled on 32bit. Or perhaps use a different code path to deal with
> >using __pa address. 
> >
> >? 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-21 13:06   ` Tejun Heo
@ 2013-08-21 15:00     ` Zhang Yanfei
  -1 siblings, 0 replies; 116+ messages in thread
From: Zhang Yanfei @ 2013-08-21 15:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx,
	mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hi tejun,

On 08/21/2013 09:06 PM, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote:
>> [What are we doing]
>>
>> We are trying to initialize acip tables as early as possible. But Linux kernel
>> allows users to override acpi tables by specifying their own tables in initrd.
>> So we have to do acpi_initrd_override() earlier first.
> 
> So, are we now back to making SRAT info as early as possible?  What
> happened to just co-locating early allocations close to kernel image?
> What'd be the benefit of doing this over that?

We know you are trying to give the direction to make the change more natural and
robust and very thankful for your comments. We have taken your comments and suggestions
about co-locating early allocations close to kernel image into consideration, but
still we found that not that easy.

In current boot order, before we get the SRAT, we have a big consumer of early
allocations: we are setting up the page table in top-down (The idea was proposed by HPA,
Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
setup will make the page tables as high as possible in memory, since memory at low 
addresses is precious (for stupid DMA devices, for things like  kexec/kdump, and so on.)

So if we are trying to make early allocations close to kernel image, we should
rewrite the way we are setting up page table totally. That is not a easy thing
to do.

As for the benefits of the patchset, just as Tang said in this patch,

* For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
  ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable
  nodes.

This one is the current requirement of us but may be very helpful for future change:

* As suggested by Yinghai, we should allocate page tables in local node. This also
  needs SRAT before direct mapping page tables are setup.

* As mentioned by Toshi Kani <toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
  allow the OS to initialize serial console/debug ports at early boot time. The
  earlier it can be initialized, the better this feature will be.  These tables
  are not currently used by Linux due to a licensing issue, but it could be
  addressed some time soon.

So we decided to firstly make ACPI override earlier and use BRK (this is obviously
near the kernel image range) to store the found ACPI tables.

-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 15:00     ` Zhang Yanfei
  0 siblings, 0 replies; 116+ messages in thread
From: Zhang Yanfei @ 2013-08-21 15:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx,
	mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hi tejun,

On 08/21/2013 09:06 PM, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote:
>> [What are we doing]
>>
>> We are trying to initialize acip tables as early as possible. But Linux kernel
>> allows users to override acpi tables by specifying their own tables in initrd.
>> So we have to do acpi_initrd_override() earlier first.
> 
> So, are we now back to making SRAT info as early as possible?  What
> happened to just co-locating early allocations close to kernel image?
> What'd be the benefit of doing this over that?

We know you are trying to give the direction to make the change more natural and
robust and very thankful for your comments. We have taken your comments and suggestions
about co-locating early allocations close to kernel image into consideration, but
still we found that not that easy.

In current boot order, before we get the SRAT, we have a big consumer of early
allocations: we are setting up the page table in top-down (The idea was proposed by HPA,
Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
setup will make the page tables as high as possible in memory, since memory at low 
addresses is precious (for stupid DMA devices, for things like  kexec/kdump, and so on.)

So if we are trying to make early allocations close to kernel image, we should
rewrite the way we are setting up page table totally. That is not a easy thing
to do.

As for the benefits of the patchset, just as Tang said in this patch,

* For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
  ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable
  nodes.

This one is the current requirement of us but may be very helpful for future change:

* As suggested by Yinghai, we should allocate page tables in local node. This also
  needs SRAT before direct mapping page tables are setup.

* As mentioned by Toshi Kani <toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
  allow the OS to initialize serial console/debug ports at early boot time. The
  earlier it can be initialized, the better this feature will be.  These tables
  are not currently used by Linux due to a licensing issue, but it could be
  addressed some time soon.

So we decided to firstly make ACPI override earlier and use BRK (this is obviously
near the kernel image range) to store the found ACPI tables.

-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa.
  2013-08-21 14:42         ` Konrad Rzeszutek Wilk
@ 2013-08-21 15:04           ` H. Peter Anvin
  -1 siblings, 0 replies; 116+ messages in thread
From: H. Peter Anvin @ 2013-08-21 15:04 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tang Chen, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, akpm,
	tj, trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi



Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>On Wed, Aug 21, 2013 at 02:35:36PM +0200, H. Peter Anvin wrote:
>> Global symbols are inaccessible in physical mode.
>
>Even if they are embedded in the assembler code and use
>GLOBAL(paging_enabled) ?

Yes, because the address is different in physical mode.  Think about it.  You could do a *function* like:

paging_enabled:
	mov (%esp),%edx
	xor %eax,%eax
	cmpl $PAGE_OFFSET,%edx
	setae %al
	ret

>> 
>> This is incidentally yet another example of "PV/weird platform
>violence", since in their absence it would be trivial to work around
>this by using segmentation.
>
>I don't follow why it could not.
>
>Why can't there be a __pa_symbol(paging_enabled) that is used. Won't
>that in effect allow you to check the contents of that 'global
>constant' even when you don't have paging enabled?

Yes.  But not once paging has been turned on.

>> >>As mentioned above, on 32bit before paging is enabled, we have to
>> >>access variables
>> >>with pa. So introduce a "bool is_phys" parameter to extend_brk(),
>and
>> >>convert va
>> >>to pa is it is true.
>> >
>> >Could you do it differently? Meaning have a global symbol
>> >(paging_enabled) which will be used by most of the functions you
>> >changed in this patch and the next ones? It would naturally be
>enabled
>> >when paging is on and __va addresses can be used. 
>> >
>> >That could also be used in the printk case to do a BUG_ON before
>paging
>> >is enabled on 32bit. Or perhaps use a different code path to deal
>with
>> >using __pa address. 
>> >
>> >? 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa.
@ 2013-08-21 15:04           ` H. Peter Anvin
  0 siblings, 0 replies; 116+ messages in thread
From: H. Peter Anvin @ 2013-08-21 15:04 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tang Chen, robert.moore, lv.zheng, rjw, lenb, tglx, mingo, akpm,
	tj, trenn, yinghai, jiang.liu, wency, laijs, isimatu.yasuaki,
	izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi



Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>On Wed, Aug 21, 2013 at 02:35:36PM +0200, H. Peter Anvin wrote:
>> Global symbols are inaccessible in physical mode.
>
>Even if they are embedded in the assembler code and use
>GLOBAL(paging_enabled) ?

Yes, because the address is different in physical mode.  Think about it.  You could do a *function* like:

paging_enabled:
	mov (%esp),%edx
	xor %eax,%eax
	cmpl $PAGE_OFFSET,%edx
	setae %al
	ret

>> 
>> This is incidentally yet another example of "PV/weird platform
>violence", since in their absence it would be trivial to work around
>this by using segmentation.
>
>I don't follow why it could not.
>
>Why can't there be a __pa_symbol(paging_enabled) that is used. Won't
>that in effect allow you to check the contents of that 'global
>constant' even when you don't have paging enabled?

Yes.  But not once paging has been turned on.

>> >>As mentioned above, on 32bit before paging is enabled, we have to
>> >>access variables
>> >>with pa. So introduce a "bool is_phys" parameter to extend_brk(),
>and
>> >>convert va
>> >>to pa is it is true.
>> >
>> >Could you do it differently? Meaning have a global symbol
>> >(paging_enabled) which will be used by most of the functions you
>> >changed in this patch and the next ones? It would naturally be
>enabled
>> >when paging is on and __va addresses can be used. 
>> >
>> >That could also be used in the printk case to do a BUG_ON before
>paging
>> >is enabled on 32bit. Or perhaps use a different code path to deal
>with
>> >using __pa address. 
>> >
>> >? 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-21 15:00     ` Zhang Yanfei
@ 2013-08-21 15:36       ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-21 15:36 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx,
	mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello,

On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote:
> In current boot order, before we get the SRAT, we have a big consumer of early
> allocations: we are setting up the page table in top-down (The idea was proposed by HPA,
> Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
> setup will make the page tables as high as possible in memory, since memory at low 
> addresses is precious (for stupid DMA devices, for things like  kexec/kdump, and so on.)

With huge mappings, they are fairly small, right?  And this whole
thing needs a kernel param anyway at this point, so the allocation
direction can be made dependent on that or huge mapping availability
and, even with 4k mappings, we aren't talking about gigabytes of
memory, are we?

> So if we are trying to make early allocations close to kernel image, we should
> rewrite the way we are setting up page table totally. That is not a easy thing
> to do.

It has been a while since I looked at the code so can you please
elaborate why that is not easy?  It's pretty simple conceptually.

> * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
>   ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable
>   nodes.
> 
> This one is the current requirement of us but may be very helpful for future change:
> 
> * As suggested by Yinghai, we should allocate page tables in local node. This also
>   needs SRAT before direct mapping page tables are setup.

Does this even matter for huge mappings?

> * As mentioned by Toshi Kani <toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
>   allow the OS to initialize serial console/debug ports at early boot time. The
>   earlier it can be initialized, the better this feature will be.  These tables
>   are not currently used by Linux due to a licensing issue, but it could be
>   addressed some time soon.
> 
> So we decided to firstly make ACPI override earlier and use BRK (this is obviously
> near the kernel image range) to store the found ACPI tables.

I don't know.  The whole effort seems way overcomplicated compared to
the benefits it would bring.  For NUMA memory hotunplug, what's the
point of doing all this when the kernel doesn't have any control over
where its image is gonna be?  Some megabytes at the tail aren't gonna
make a huge difference and if you wanna do this properly, you need to
determine the load address of the kernel considering the node
boundaries and hotpluggability of each node, which has to happen
before the early kernel boot code executes.  And if there's a code
piece which does that, that might as well place the kernel image such
that extra allocation afterwards doesn't interfere with memory
hotunplugging.

It looks like a lot of code changes for a mechanism which doesn't seem
all that useful.  This code is already too late in boot sequence to be
a proper solution so I don't see the point in pushing the coverage to
the maximum from here.  It's kinda silly.

The last point - early init of debug facility - makes some sense but
again how extra coverage are we talking about?  The code path between
the two points is fairly short and the change doesn't come free.  It
means we add more fragile firmware-specific code path before the
execution environment is stable and get to do things like traveling
the same code paths multiple times in different environments.  Doesn't
seem like a win.  We want to reach stable execution environment as
soon as possible.  Shoving whole more logic before that in the name of
"earlier debugging" doesn't make a lot of sense.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 15:36       ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-21 15:36 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx,
	mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello,

On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote:
> In current boot order, before we get the SRAT, we have a big consumer of early
> allocations: we are setting up the page table in top-down (The idea was proposed by HPA,
> Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
> setup will make the page tables as high as possible in memory, since memory at low 
> addresses is precious (for stupid DMA devices, for things like  kexec/kdump, and so on.)

With huge mappings, they are fairly small, right?  And this whole
thing needs a kernel param anyway at this point, so the allocation
direction can be made dependent on that or huge mapping availability
and, even with 4k mappings, we aren't talking about gigabytes of
memory, are we?

> So if we are trying to make early allocations close to kernel image, we should
> rewrite the way we are setting up page table totally. That is not a easy thing
> to do.

It has been a while since I looked at the code so can you please
elaborate why that is not easy?  It's pretty simple conceptually.

> * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
>   ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable
>   nodes.
> 
> This one is the current requirement of us but may be very helpful for future change:
> 
> * As suggested by Yinghai, we should allocate page tables in local node. This also
>   needs SRAT before direct mapping page tables are setup.

Does this even matter for huge mappings?

> * As mentioned by Toshi Kani <toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
>   allow the OS to initialize serial console/debug ports at early boot time. The
>   earlier it can be initialized, the better this feature will be.  These tables
>   are not currently used by Linux due to a licensing issue, but it could be
>   addressed some time soon.
> 
> So we decided to firstly make ACPI override earlier and use BRK (this is obviously
> near the kernel image range) to store the found ACPI tables.

I don't know.  The whole effort seems way overcomplicated compared to
the benefits it would bring.  For NUMA memory hotunplug, what's the
point of doing all this when the kernel doesn't have any control over
where its image is gonna be?  Some megabytes at the tail aren't gonna
make a huge difference and if you wanna do this properly, you need to
determine the load address of the kernel considering the node
boundaries and hotpluggability of each node, which has to happen
before the early kernel boot code executes.  And if there's a code
piece which does that, that might as well place the kernel image such
that extra allocation afterwards doesn't interfere with memory
hotunplugging.

It looks like a lot of code changes for a mechanism which doesn't seem
all that useful.  This code is already too late in boot sequence to be
a proper solution so I don't see the point in pushing the coverage to
the maximum from here.  It's kinda silly.

The last point - early init of debug facility - makes some sense but
again how extra coverage are we talking about?  The code path between
the two points is fairly short and the change doesn't come free.  It
means we add more fragile firmware-specific code path before the
execution environment is stable and get to do things like traveling
the same code paths multiple times in different environments.  Doesn't
seem like a win.  We want to reach stable execution environment as
soon as possible.  Shoving whole more logic before that in the name of
"earlier debugging" doesn't make a lot of sense.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-21 15:36       ` Tejun Heo
@ 2013-08-21 19:31         ` Toshi Kani
  -1 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-21 19:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Wed, 2013-08-21 at 11:36 -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote:
> > In current boot order, before we get the SRAT, we have a big consumer of early
> > allocations: we are setting up the page table in top-down (The idea was proposed by HPA,
> > Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
> > setup will make the page tables as high as possible in memory, since memory at low 
> > addresses is precious (for stupid DMA devices, for things like  kexec/kdump, and so on.)
> 
> With huge mappings, they are fairly small, right?  And this whole
> thing needs a kernel param anyway at this point, so the allocation
> direction can be made dependent on that or huge mapping availability
> and, even with 4k mappings, we aren't talking about gigabytes of
> memory, are we?
> 
> > So if we are trying to make early allocations close to kernel image, we should
> > rewrite the way we are setting up page table totally. That is not a easy thing
> > to do.
> 
> It has been a while since I looked at the code so can you please
> elaborate why that is not easy?  It's pretty simple conceptually.
> 
> > * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
> >   ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable
> >   nodes.
> > 
> > This one is the current requirement of us but may be very helpful for future change:
> > 
> > * As suggested by Yinghai, we should allocate page tables in local node. This also
> >   needs SRAT before direct mapping page tables are setup.
> 
> Does this even matter for huge mappings?
> 
> > * As mentioned by Toshi Kani <toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
> >   allow the OS to initialize serial console/debug ports at early boot time. The
> >   earlier it can be initialized, the better this feature will be.  These tables
> >   are not currently used by Linux due to a licensing issue, but it could be
> >   addressed some time soon.
> > 
> > So we decided to firstly make ACPI override earlier and use BRK (this is obviously
> > near the kernel image range) to store the found ACPI tables.
> 
> I don't know.  The whole effort seems way overcomplicated compared to
> the benefits it would bring.  For NUMA memory hotunplug, what's the
> point of doing all this when the kernel doesn't have any control over
> where its image is gonna be?  Some megabytes at the tail aren't gonna
> make a huge difference and if you wanna do this properly, you need to
> determine the load address of the kernel considering the node
> boundaries and hotpluggability of each node, which has to happen
> before the early kernel boot code executes.  And if there's a code
> piece which does that, that might as well place the kernel image such
> that extra allocation afterwards doesn't interfere with memory
> hotunplugging.
> 
> It looks like a lot of code changes for a mechanism which doesn't seem
> all that useful.  This code is already too late in boot sequence to be
> a proper solution so I don't see the point in pushing the coverage to
> the maximum from here.  It's kinda silly.
> 
> The last point - early init of debug facility - makes some sense but
> again how extra coverage are we talking about?  The code path between
> the two points is fairly short and the change doesn't come free.  It
> means we add more fragile firmware-specific code path before the
> execution environment is stable and get to do things like traveling
> the same code paths multiple times in different environments.  Doesn't
> seem like a win.  We want to reach stable execution environment as
> soon as possible.  Shoving whole more logic before that in the name of
> "earlier debugging" doesn't make a lot of sense.

Well, there is reason why we have earlyprintk feature today.  So, let's
not debate on this feature now.  There was previous attempt to support
this feature with ACPI tables below.  As described, it had the same
ordering issue.

https://lkml.org/lkml/2012/10/8/498

There is a basic problem that when we try to use ACPI tables that
extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
this ordering issue because ACPI is not available as early as the legacy
interfaces.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 19:31         ` Toshi Kani
  0 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-21 19:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Wed, 2013-08-21 at 11:36 -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote:
> > In current boot order, before we get the SRAT, we have a big consumer of early
> > allocations: we are setting up the page table in top-down (The idea was proposed by HPA,
> > Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
> > setup will make the page tables as high as possible in memory, since memory at low 
> > addresses is precious (for stupid DMA devices, for things like  kexec/kdump, and so on.)
> 
> With huge mappings, they are fairly small, right?  And this whole
> thing needs a kernel param anyway at this point, so the allocation
> direction can be made dependent on that or huge mapping availability
> and, even with 4k mappings, we aren't talking about gigabytes of
> memory, are we?
> 
> > So if we are trying to make early allocations close to kernel image, we should
> > rewrite the way we are setting up page table totally. That is not a easy thing
> > to do.
> 
> It has been a while since I looked at the code so can you please
> elaborate why that is not easy?  It's pretty simple conceptually.
> 
> > * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
> >   ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable
> >   nodes.
> > 
> > This one is the current requirement of us but may be very helpful for future change:
> > 
> > * As suggested by Yinghai, we should allocate page tables in local node. This also
> >   needs SRAT before direct mapping page tables are setup.
> 
> Does this even matter for huge mappings?
> 
> > * As mentioned by Toshi Kani <toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
> >   allow the OS to initialize serial console/debug ports at early boot time. The
> >   earlier it can be initialized, the better this feature will be.  These tables
> >   are not currently used by Linux due to a licensing issue, but it could be
> >   addressed some time soon.
> > 
> > So we decided to firstly make ACPI override earlier and use BRK (this is obviously
> > near the kernel image range) to store the found ACPI tables.
> 
> I don't know.  The whole effort seems way overcomplicated compared to
> the benefits it would bring.  For NUMA memory hotunplug, what's the
> point of doing all this when the kernel doesn't have any control over
> where its image is gonna be?  Some megabytes at the tail aren't gonna
> make a huge difference and if you wanna do this properly, you need to
> determine the load address of the kernel considering the node
> boundaries and hotpluggability of each node, which has to happen
> before the early kernel boot code executes.  And if there's a code
> piece which does that, that might as well place the kernel image such
> that extra allocation afterwards doesn't interfere with memory
> hotunplugging.
> 
> It looks like a lot of code changes for a mechanism which doesn't seem
> all that useful.  This code is already too late in boot sequence to be
> a proper solution so I don't see the point in pushing the coverage to
> the maximum from here.  It's kinda silly.
> 
> The last point - early init of debug facility - makes some sense but
> again how extra coverage are we talking about?  The code path between
> the two points is fairly short and the change doesn't come free.  It
> means we add more fragile firmware-specific code path before the
> execution environment is stable and get to do things like traveling
> the same code paths multiple times in different environments.  Doesn't
> seem like a win.  We want to reach stable execution environment as
> soon as possible.  Shoving whole more logic before that in the name of
> "earlier debugging" doesn't make a lot of sense.

Well, there is reason why we have earlyprintk feature today.  So, let's
not debate on this feature now.  There was previous attempt to support
this feature with ACPI tables below.  As described, it had the same
ordering issue.

https://lkml.org/lkml/2012/10/8/498

There is a basic problem that when we try to use ACPI tables that
extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
this ordering issue because ACPI is not available as early as the legacy
interfaces.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-21 19:31         ` Toshi Kani
@ 2013-08-21 19:54           ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-21 19:54 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello, Toshi.

On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote:
> Well, there is reason why we have earlyprintk feature today.  So, let's
> not debate on this feature now.  There was previous attempt to support

Are you saying the existing earlyprintk automatically justifies
addition of more complex mechanism?  The added complex of course
should be traded off against the benefits of gaining ACPI based early
boot.  You aren't gonna suggest implementing netconsole based
earlyprintk, right?

> this feature with ACPI tables below.  As described, it had the same
> ordering issue.
> 
> https://lkml.org/lkml/2012/10/8/498
> 
> There is a basic problem that when we try to use ACPI tables that
> extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
> this ordering issue because ACPI is not available as early as the legacy
> interfaces.

Do we even want ACPI parsing and all that that early?  Parsing SRAT
early doesn't buy us much and I'm not sure whether adding ACPI
earlyprintk would increase or decrease debuggability during earlyboot.
It adds whole lot more code paths where things can go wrong while the
basic execution environment is unstable.  Why do that?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 19:54           ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-21 19:54 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello, Toshi.

On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote:
> Well, there is reason why we have earlyprintk feature today.  So, let's
> not debate on this feature now.  There was previous attempt to support

Are you saying the existing earlyprintk automatically justifies
addition of more complex mechanism?  The added complex of course
should be traded off against the benefits of gaining ACPI based early
boot.  You aren't gonna suggest implementing netconsole based
earlyprintk, right?

> this feature with ACPI tables below.  As described, it had the same
> ordering issue.
> 
> https://lkml.org/lkml/2012/10/8/498
> 
> There is a basic problem that when we try to use ACPI tables that
> extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
> this ordering issue because ACPI is not available as early as the legacy
> interfaces.

Do we even want ACPI parsing and all that that early?  Parsing SRAT
early doesn't buy us much and I'm not sure whether adding ACPI
earlyprintk would increase or decrease debuggability during earlyboot.
It adds whole lot more code paths where things can go wrong while the
basic execution environment is unstable.  Why do that?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-21 19:54           ` Tejun Heo
@ 2013-08-21 20:29             ` Toshi Kani
  -1 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-21 20:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello Tejun,

On Wed, 2013-08-21 at 15:54 -0400, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote:
> > Well, there is reason why we have earlyprintk feature today.  So, let's
> > not debate on this feature now.  There was previous attempt to support
> 
> Are you saying the existing earlyprintk automatically justifies
> addition of more complex mechanism?  The added complex of course
> should be traded off against the benefits of gaining ACPI based early
> boot.  You aren't gonna suggest implementing netconsole based
> earlyprintk, right?

Platforms vendors (which care Linux) need to support the existing Linux
features.  This means that they have to implement legacy interfaces on
x86 until the kernel supports an alternative method.  For instance, some
platforms are legacy-free and do not have legacy COM ports.  These ACPI
tables were defined so that non-legacy COM ports can be described and
informed to the OS.  Without this support, such platforms may have to
emulate the legacy COM ports for Linux, or drop Linux support.

> > this feature with ACPI tables below.  As described, it had the same
> > ordering issue.
> > 
> > https://lkml.org/lkml/2012/10/8/498
> > 
> > There is a basic problem that when we try to use ACPI tables that
> > extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
> > this ordering issue because ACPI is not available as early as the legacy
> > interfaces.
> 
> Do we even want ACPI parsing and all that that early?  Parsing SRAT
> early doesn't buy us much and I'm not sure whether adding ACPI
> earlyprintk would increase or decrease debuggability during earlyboot.
> It adds whole lot more code paths where things can go wrong while the
> basic execution environment is unstable.  Why do that?

I think the kernel boot-up sequence should be designed in such a way
that can support legacy-free and/or NUMA platforms properly.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 20:29             ` Toshi Kani
  0 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-21 20:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello Tejun,

On Wed, 2013-08-21 at 15:54 -0400, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote:
> > Well, there is reason why we have earlyprintk feature today.  So, let's
> > not debate on this feature now.  There was previous attempt to support
> 
> Are you saying the existing earlyprintk automatically justifies
> addition of more complex mechanism?  The added complex of course
> should be traded off against the benefits of gaining ACPI based early
> boot.  You aren't gonna suggest implementing netconsole based
> earlyprintk, right?

Platforms vendors (which care Linux) need to support the existing Linux
features.  This means that they have to implement legacy interfaces on
x86 until the kernel supports an alternative method.  For instance, some
platforms are legacy-free and do not have legacy COM ports.  These ACPI
tables were defined so that non-legacy COM ports can be described and
informed to the OS.  Without this support, such platforms may have to
emulate the legacy COM ports for Linux, or drop Linux support.

> > this feature with ACPI tables below.  As described, it had the same
> > ordering issue.
> > 
> > https://lkml.org/lkml/2012/10/8/498
> > 
> > There is a basic problem that when we try to use ACPI tables that
> > extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
> > this ordering issue because ACPI is not available as early as the legacy
> > interfaces.
> 
> Do we even want ACPI parsing and all that that early?  Parsing SRAT
> early doesn't buy us much and I'm not sure whether adding ACPI
> earlyprintk would increase or decrease debuggability during earlyboot.
> It adds whole lot more code paths where things can go wrong while the
> basic execution environment is unstable.  Why do that?

I think the kernel boot-up sequence should be designed in such a way
that can support legacy-free and/or NUMA platforms properly.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-21 20:29             ` Toshi Kani
@ 2013-08-21 20:40               ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-21 20:40 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello, Toshi.

On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote:
> Platforms vendors (which care Linux) need to support the existing Linux
> features.  This means that they have to implement legacy interfaces on
> x86 until the kernel supports an alternative method.  For instance, some
> platforms are legacy-free and do not have legacy COM ports.  These ACPI
> tables were defined so that non-legacy COM ports can be described and
> informed to the OS.  Without this support, such platforms may have to
> emulate the legacy COM ports for Linux, or drop Linux support.

Are you seriously saying that vendors are gonna drop linux support for
lacking ACPI earlyprintk support?  Please...

Please take a look at the existing earlyprintk code and how compact
and self-contained they are.  If you want to add ACPI earlyprintk, do
similar stuff.  Forget about firmware blob override from initrd or
ACPICA.  Just implement the bare minimum to get the thing working.  Do
not add dependency to large body of code from earlyboot.  It's a bad
idea through and through.

> I think the kernel boot-up sequence should be designed in such a way
> that can support legacy-free and/or NUMA platforms properly.

Blanket statements like the above don't mean much.  There are many
separate stages of boot and you're talking about one of the very first
stages where we traditionally have always depended upon only the very
bare minimum of the platform both in hardware itself and configuration
information.  We've been doing that for *very* good reasons.  If you
screw up there, it's mighty tricky to figure out what went wrong
especially on the machines that you can't physically kick.  You're now
suggesting to add whole ACPI parsing including overloading from initrd
into that stage with pretty weak rationale.

Seriously, if you want ACPI based earlyprintk, implement it in a
discrete minimal code which is easy to verify and won't get affected
when the rest of ACPI machinery is updated.  We really don't want
earlyboot to fail because someone screwed up ACPI or initrd handling.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 20:40               ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-21 20:40 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello, Toshi.

On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote:
> Platforms vendors (which care Linux) need to support the existing Linux
> features.  This means that they have to implement legacy interfaces on
> x86 until the kernel supports an alternative method.  For instance, some
> platforms are legacy-free and do not have legacy COM ports.  These ACPI
> tables were defined so that non-legacy COM ports can be described and
> informed to the OS.  Without this support, such platforms may have to
> emulate the legacy COM ports for Linux, or drop Linux support.

Are you seriously saying that vendors are gonna drop linux support for
lacking ACPI earlyprintk support?  Please...

Please take a look at the existing earlyprintk code and how compact
and self-contained they are.  If you want to add ACPI earlyprintk, do
similar stuff.  Forget about firmware blob override from initrd or
ACPICA.  Just implement the bare minimum to get the thing working.  Do
not add dependency to large body of code from earlyboot.  It's a bad
idea through and through.

> I think the kernel boot-up sequence should be designed in such a way
> that can support legacy-free and/or NUMA platforms properly.

Blanket statements like the above don't mean much.  There are many
separate stages of boot and you're talking about one of the very first
stages where we traditionally have always depended upon only the very
bare minimum of the platform both in hardware itself and configuration
information.  We've been doing that for *very* good reasons.  If you
screw up there, it's mighty tricky to figure out what went wrong
especially on the machines that you can't physically kick.  You're now
suggesting to add whole ACPI parsing including overloading from initrd
into that stage with pretty weak rationale.

Seriously, if you want ACPI based earlyprintk, implement it in a
discrete minimal code which is easy to verify and won't get affected
when the rest of ACPI machinery is updated.  We really don't want
earlyboot to fail because someone screwed up ACPI or initrd handling.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-21 20:40               ` Tejun Heo
@ 2013-08-21 22:36                 ` Toshi Kani
  -1 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-21 22:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello Tejun,

On Wed, 2013-08-21 at 16:40 -0400, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote:
> > Platforms vendors (which care Linux) need to support the existing Linux
> > features.  This means that they have to implement legacy interfaces on
> > x86 until the kernel supports an alternative method.  For instance, some
> > platforms are legacy-free and do not have legacy COM ports.  These ACPI
> > tables were defined so that non-legacy COM ports can be described and
> > informed to the OS.  Without this support, such platforms may have to
> > emulate the legacy COM ports for Linux, or drop Linux support.
> 
> Are you seriously saying that vendors are gonna drop linux support for
> lacking ACPI earlyprintk support?  Please...

earlyprintk is an example of the issues.  The point is that vendors are
required to support legacy stuff for Linux.

> Please take a look at the existing earlyprintk code and how compact
> and self-contained they are.  If you want to add ACPI earlyprintk, do
> similar stuff.  Forget about firmware blob override from initrd or
> ACPICA.  Just implement the bare minimum to get the thing working.  Do
> not add dependency to large body of code from earlyboot.  It's a bad
> idea through and through.

I am not saying that ACPI earlyprintk must be available at exactly the
same point.  How early it can reasonably be is a subject of discussion.

> > I think the kernel boot-up sequence should be designed in such a way
> > that can support legacy-free and/or NUMA platforms properly.
> 
> Blanket statements like the above don't mean much.  There are many
> separate stages of boot and you're talking about one of the very first
> stages where we traditionally have always depended upon only the very
> bare minimum of the platform both in hardware itself and configuration
> information.  We've been doing that for *very* good reasons.  If you
> screw up there, it's mighty tricky to figure out what went wrong
> especially on the machines that you can't physically kick.  You're now
> suggesting to add whole ACPI parsing including overloading from initrd
> into that stage with pretty weak rationale.

I agree that ACPI is rather complicated stuff.  But in my experience,
the majority complication comes from ACPI namespace and methods, not
from ACPI tables.  Do you really think ACPI table init is that risky?  I
consider ACPI tables are part of the minimum config info, esp. for
legacy-free platforms.

> Seriously, if you want ACPI based earlyprintk, implement it in a
> discrete minimal code which is easy to verify and won't get affected
> when the rest of ACPI machinery is updated.  We really don't want
> earlyboot to fail because someone screwed up ACPI or initrd handling.

earlyprintk is just another example to this SRAT issue.  The local page
table is yet another example.  My hope here is for us to be able to
utilize ACPI tables properly without hitting this kind of ordering
issues again and again, which requires considerable time & effort to
address.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-21 22:36                 ` Toshi Kani
  0 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-21 22:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello Tejun,

On Wed, 2013-08-21 at 16:40 -0400, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote:
> > Platforms vendors (which care Linux) need to support the existing Linux
> > features.  This means that they have to implement legacy interfaces on
> > x86 until the kernel supports an alternative method.  For instance, some
> > platforms are legacy-free and do not have legacy COM ports.  These ACPI
> > tables were defined so that non-legacy COM ports can be described and
> > informed to the OS.  Without this support, such platforms may have to
> > emulate the legacy COM ports for Linux, or drop Linux support.
> 
> Are you seriously saying that vendors are gonna drop linux support for
> lacking ACPI earlyprintk support?  Please...

earlyprintk is an example of the issues.  The point is that vendors are
required to support legacy stuff for Linux.

> Please take a look at the existing earlyprintk code and how compact
> and self-contained they are.  If you want to add ACPI earlyprintk, do
> similar stuff.  Forget about firmware blob override from initrd or
> ACPICA.  Just implement the bare minimum to get the thing working.  Do
> not add dependency to large body of code from earlyboot.  It's a bad
> idea through and through.

I am not saying that ACPI earlyprintk must be available at exactly the
same point.  How early it can reasonably be is a subject of discussion.

> > I think the kernel boot-up sequence should be designed in such a way
> > that can support legacy-free and/or NUMA platforms properly.
> 
> Blanket statements like the above don't mean much.  There are many
> separate stages of boot and you're talking about one of the very first
> stages where we traditionally have always depended upon only the very
> bare minimum of the platform both in hardware itself and configuration
> information.  We've been doing that for *very* good reasons.  If you
> screw up there, it's mighty tricky to figure out what went wrong
> especially on the machines that you can't physically kick.  You're now
> suggesting to add whole ACPI parsing including overloading from initrd
> into that stage with pretty weak rationale.

I agree that ACPI is rather complicated stuff.  But in my experience,
the majority complication comes from ACPI namespace and methods, not
from ACPI tables.  Do you really think ACPI table init is that risky?  I
consider ACPI tables are part of the minimum config info, esp. for
legacy-free platforms.

> Seriously, if you want ACPI based earlyprintk, implement it in a
> discrete minimal code which is easy to verify and won't get affected
> when the rest of ACPI machinery is updated.  We really don't want
> earlyboot to fail because someone screwed up ACPI or initrd handling.

earlyprintk is just another example to this SRAT issue.  The local page
table is yet another example.  My hope here is for us to be able to
utilize ACPI tables properly without hitting this kind of ordering
issues again and again, which requires considerable time & effort to
address.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-21 22:36                 ` Toshi Kani
@ 2013-08-22  3:32                   ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22  3:32 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote:
> I agree that ACPI is rather complicated stuff.  But in my experience,
> the majority complication comes from ACPI namespace and methods, not
> from ACPI tables.  Do you really think ACPI table init is that risky?  I
> consider ACPI tables are part of the minimum config info, esp. for
> legacy-free platforms.

It's just that we're talking about the very first stage of boot.  We
really don't do much there and pulling in ACPI code into that stage is
a lot by comparison.  If that's gonna happen, it needs pretty strong
justification.

> earlyprintk is just another example to this SRAT issue.  The local page
> table is yet another example.  My hope here is for us to be able to
> utilize ACPI tables properly without hitting this kind of ordering
> issues again and again, which requires considerable time & effort to
> address.

So, the two things brought up at this point are early parsing of SRAT,
which can't really solve the problem at hand anyway, and earlyprintk
which should be implemented in minimal way which is not activated
unless specifically enabled with earlyprintk boot param.  Neither
seems to justify pulling in full ACPI into early boot, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22  3:32                   ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22  3:32 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote:
> I agree that ACPI is rather complicated stuff.  But in my experience,
> the majority complication comes from ACPI namespace and methods, not
> from ACPI tables.  Do you really think ACPI table init is that risky?  I
> consider ACPI tables are part of the minimum config info, esp. for
> legacy-free platforms.

It's just that we're talking about the very first stage of boot.  We
really don't do much there and pulling in ACPI code into that stage is
a lot by comparison.  If that's gonna happen, it needs pretty strong
justification.

> earlyprintk is just another example to this SRAT issue.  The local page
> table is yet another example.  My hope here is for us to be able to
> utilize ACPI tables properly without hitting this kind of ordering
> issues again and again, which requires considerable time & effort to
> address.

So, the two things brought up at this point are early parsing of SRAT,
which can't really solve the problem at hand anyway, and earlyprintk
which should be implemented in minimal way which is not activated
unless specifically enabled with earlyprintk boot param.  Neither
seems to justify pulling in full ACPI into early boot, right?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22  3:32                   ` Tejun Heo
@ 2013-08-22 15:52                     ` Toshi Kani
  -1 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-22 15:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello Tejun,

On Wed, 2013-08-21 at 23:32 -0400, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote:
> > I agree that ACPI is rather complicated stuff.  But in my experience,
> > the majority complication comes from ACPI namespace and methods, not
> > from ACPI tables.  Do you really think ACPI table init is that risky?  I
> > consider ACPI tables are part of the minimum config info, esp. for
> > legacy-free platforms.
> 
> It's just that we're talking about the very first stage of boot.  We
> really don't do much there and pulling in ACPI code into that stage is
> a lot by comparison.  If that's gonna happen, it needs pretty strong
> justification.

It moves up the ACPI table init code, which itself is simple.  And ACPI
tables are defined to be pursed at early boot-time, which is why they
exist in addition to ACPI namespace/methods.  They are similar to EFI
memory table.  Firmware publishes tables in one way or the other.

I understand that you are concerned about stability of the ACPI stuff,
which I think is a valid point, but most of (if not all) of the
ACPI-related issues come from ACPI namespace/methods, which is a very
different thing.  Please do not mix up those two.  The ACPI
namespace/methods stuff remains the same and continues to be initialized
at very late in the boot sequence.

What's making the patchset complicated is acpi_initrd_override(), which
is intended for developers and allows overwriting ACPI bits at their own
risk.  This feature won't be used by regular users. 

> > earlyprintk is just another example to this SRAT issue.  The local page
> > table is yet another example.  My hope here is for us to be able to
> > utilize ACPI tables properly without hitting this kind of ordering
> > issues again and again, which requires considerable time & effort to
> > address.
> 
> So, the two things brought up at this point are early parsing of SRAT,
> which can't really solve the problem at hand anyway, 

If you are referring the issue of kernel image location, it is a
limitation in the current implementation, not a technical limitation.  I
know other OS that supports movable memory and puts the kernel image
into a movable memory with SRAT by changing the bootloader.

Also, how do you support local page tables without pursing SRAT early?

> and earlyprintk
> which should be implemented in minimal way which is not activated
> unless specifically enabled with earlyprintk boot param.  Neither
> seems to justify pulling in full ACPI into early boot, right?

Initializing page tables on large systems may take a long time, and I do
think that earlyprink needs to be available before that point.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22 15:52                     ` Toshi Kani
  0 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-22 15:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello Tejun,

On Wed, 2013-08-21 at 23:32 -0400, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote:
> > I agree that ACPI is rather complicated stuff.  But in my experience,
> > the majority complication comes from ACPI namespace and methods, not
> > from ACPI tables.  Do you really think ACPI table init is that risky?  I
> > consider ACPI tables are part of the minimum config info, esp. for
> > legacy-free platforms.
> 
> It's just that we're talking about the very first stage of boot.  We
> really don't do much there and pulling in ACPI code into that stage is
> a lot by comparison.  If that's gonna happen, it needs pretty strong
> justification.

It moves up the ACPI table init code, which itself is simple.  And ACPI
tables are defined to be pursed at early boot-time, which is why they
exist in addition to ACPI namespace/methods.  They are similar to EFI
memory table.  Firmware publishes tables in one way or the other.

I understand that you are concerned about stability of the ACPI stuff,
which I think is a valid point, but most of (if not all) of the
ACPI-related issues come from ACPI namespace/methods, which is a very
different thing.  Please do not mix up those two.  The ACPI
namespace/methods stuff remains the same and continues to be initialized
at very late in the boot sequence.

What's making the patchset complicated is acpi_initrd_override(), which
is intended for developers and allows overwriting ACPI bits at their own
risk.  This feature won't be used by regular users. 

> > earlyprintk is just another example to this SRAT issue.  The local page
> > table is yet another example.  My hope here is for us to be able to
> > utilize ACPI tables properly without hitting this kind of ordering
> > issues again and again, which requires considerable time & effort to
> > address.
> 
> So, the two things brought up at this point are early parsing of SRAT,
> which can't really solve the problem at hand anyway, 

If you are referring the issue of kernel image location, it is a
limitation in the current implementation, not a technical limitation.  I
know other OS that supports movable memory and puts the kernel image
into a movable memory with SRAT by changing the bootloader.

Also, how do you support local page tables without pursing SRAT early?

> and earlyprintk
> which should be implemented in minimal way which is not activated
> unless specifically enabled with earlyprintk boot param.  Neither
> seems to justify pulling in full ACPI into early boot, right?

Initializing page tables on large systems may take a long time, and I do
think that earlyprink needs to be available before that point.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22 15:52                     ` Toshi Kani
@ 2013-08-22 18:31                       ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22 18:31 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
> I understand that you are concerned about stability of the ACPI stuff,
> which I think is a valid point, but most of (if not all) of the
> ACPI-related issues come from ACPI namespace/methods, which is a very
> different thing.  Please do not mix up those two.  The ACPI

I have no objection to implementing self-conftained earlyprintk
support.  If that's all you want to do, please go ahead but do not
pull in initrd override or ACPICA into it.

> namespace/methods stuff remains the same and continues to be initialized
> at very late in the boot sequence.
> 
> What's making the patchset complicated is acpi_initrd_override(), which
> is intended for developers and allows overwriting ACPI bits at their own
> risk.  This feature won't be used by regular users. 

Yeah, please forget about that in earlyboot.  It doesn't make any
sense to fiddle with initrd that early during boot.

> If you are referring the issue of kernel image location, it is a
> limitation in the current implementation, not a technical limitation.  I
> know other OS that supports movable memory and puts the kernel image
> into a movable memory with SRAT by changing the bootloader.

I'm not saying that problem shouldn't be solved.  I'm saying what you
guys are pushing doesn't help solving it at all.  It's too late in the
boot process.  It needs to be handled either by bootloader or earlier
kernel kexecing the actual one and super-early SRAT doens't help at
all in either case, so what's the point of pulling ACPI code in when
it doesn't contribute to solving the problem properly?

> Also, how do you support local page tables without pursing SRAT early?

Does it even matter with huge mappings?  It's gonna be contained in a
single page anyway, right?

> Initializing page tables on large systems may take a long time, and I do
> think that earlyprink needs to be available before that point.

Yeah, sure, implement it in *minimal* way which doesn't affect
anything if not explicitly enabled by kernel param like other
earlyprintks.  It doens't make any sense to add dependency to acpi
from early boot for that.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22 18:31                       ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22 18:31 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
> I understand that you are concerned about stability of the ACPI stuff,
> which I think is a valid point, but most of (if not all) of the
> ACPI-related issues come from ACPI namespace/methods, which is a very
> different thing.  Please do not mix up those two.  The ACPI

I have no objection to implementing self-conftained earlyprintk
support.  If that's all you want to do, please go ahead but do not
pull in initrd override or ACPICA into it.

> namespace/methods stuff remains the same and continues to be initialized
> at very late in the boot sequence.
> 
> What's making the patchset complicated is acpi_initrd_override(), which
> is intended for developers and allows overwriting ACPI bits at their own
> risk.  This feature won't be used by regular users. 

Yeah, please forget about that in earlyboot.  It doesn't make any
sense to fiddle with initrd that early during boot.

> If you are referring the issue of kernel image location, it is a
> limitation in the current implementation, not a technical limitation.  I
> know other OS that supports movable memory and puts the kernel image
> into a movable memory with SRAT by changing the bootloader.

I'm not saying that problem shouldn't be solved.  I'm saying what you
guys are pushing doesn't help solving it at all.  It's too late in the
boot process.  It needs to be handled either by bootloader or earlier
kernel kexecing the actual one and super-early SRAT doens't help at
all in either case, so what's the point of pulling ACPI code in when
it doesn't contribute to solving the problem properly?

> Also, how do you support local page tables without pursing SRAT early?

Does it even matter with huge mappings?  It's gonna be contained in a
single page anyway, right?

> Initializing page tables on large systems may take a long time, and I do
> think that earlyprink needs to be available before that point.

Yeah, sure, implement it in *minimal* way which doesn't affect
anything if not explicitly enabled by kernel param like other
earlyprintks.  It doens't make any sense to add dependency to acpi
from early boot for that.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22 18:31                       ` Tejun Heo
@ 2013-08-22 19:39                         ` Zhang Yanfei
  -1 siblings, 0 replies; 116+ messages in thread
From: Zhang Yanfei @ 2013-08-22 19:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Toshi Kani, Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw,
	lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello tejun,

On 08/23/2013 02:31 AM, Tejun Heo wrote:
> Hello,
> 
> On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
>> I understand that you are concerned about stability of the ACPI stuff,
>> which I think is a valid point, but most of (if not all) of the
>> ACPI-related issues come from ACPI namespace/methods, which is a very
>> different thing.  Please do not mix up those two.  The ACPI
> 
> I have no objection to implementing self-conftained earlyprintk
> support.  If that's all you want to do, please go ahead but do not
> pull in initrd override or ACPICA into it.
> 
>> namespace/methods stuff remains the same and continues to be initialized
>> at very late in the boot sequence.
>>
>> What's making the patchset complicated is acpi_initrd_override(), which
>> is intended for developers and allows overwriting ACPI bits at their own
>> risk.  This feature won't be used by regular users. 
> 
> Yeah, please forget about that in earlyboot.  It doesn't make any
> sense to fiddle with initrd that early during boot.

What do you mean by "earlyboot"? And also in your previous mail, I am also
a little confused by what you said "the very first stage of boot". Does
this mean the stage we are in head_32 or head64.c?

If so, could we just do something just as Yinghai did before, that is, Split
acpi_override into 2 parts: find and copy. And in "earlyboot", we just do
the find, and I think that is less of risk. Or we can just do ACPI override
earlier in setup_arch(), not pulling this process that early during boot?

Thanks

-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22 19:39                         ` Zhang Yanfei
  0 siblings, 0 replies; 116+ messages in thread
From: Zhang Yanfei @ 2013-08-22 19:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Toshi Kani, Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw,
	lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello tejun,

On 08/23/2013 02:31 AM, Tejun Heo wrote:
> Hello,
> 
> On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
>> I understand that you are concerned about stability of the ACPI stuff,
>> which I think is a valid point, but most of (if not all) of the
>> ACPI-related issues come from ACPI namespace/methods, which is a very
>> different thing.  Please do not mix up those two.  The ACPI
> 
> I have no objection to implementing self-conftained earlyprintk
> support.  If that's all you want to do, please go ahead but do not
> pull in initrd override or ACPICA into it.
> 
>> namespace/methods stuff remains the same and continues to be initialized
>> at very late in the boot sequence.
>>
>> What's making the patchset complicated is acpi_initrd_override(), which
>> is intended for developers and allows overwriting ACPI bits at their own
>> risk.  This feature won't be used by regular users. 
> 
> Yeah, please forget about that in earlyboot.  It doesn't make any
> sense to fiddle with initrd that early during boot.

What do you mean by "earlyboot"? And also in your previous mail, I am also
a little confused by what you said "the very first stage of boot". Does
this mean the stage we are in head_32 or head64.c?

If so, could we just do something just as Yinghai did before, that is, Split
acpi_override into 2 parts: find and copy. And in "earlyboot", we just do
the find, and I think that is less of risk. Or we can just do ACPI override
earlier in setup_arch(), not pulling this process that early during boot?

Thanks

-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22 19:39                         ` Zhang Yanfei
@ 2013-08-22 19:45                           ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22 19:45 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Toshi Kani, Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw,
	lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello,

On Fri, Aug 23, 2013 at 03:39:53AM +0800, Zhang Yanfei wrote:
> What do you mean by "earlyboot"? And also in your previous mail, I am also
> a little confused by what you said "the very first stage of boot". Does
> this mean the stage we are in head_32 or head64.c?

Mostly referring to the state where we don't have basic environment
set up yet including page tables.

> If so, could we just do something just as Yinghai did before, that is, Split
> acpi_override into 2 parts: find and copy. And in "earlyboot", we just do
> the find, and I think that is less of risk. Or we can just do ACPI override
> earlier in setup_arch(), not pulling this process that early during boot?

But *WHY*?  It doesn't really buy us anything substantial.  What are
you trying to achieve here?  "Making ACPI info available early" can't
be a goal in itself and the two benefits cited in this thread seem
pretty dubious to me.  Why are you guys trying to push this
convolution when it doesn't bring any substantial gain?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22 19:45                           ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22 19:45 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Toshi Kani, Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw,
	lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello,

On Fri, Aug 23, 2013 at 03:39:53AM +0800, Zhang Yanfei wrote:
> What do you mean by "earlyboot"? And also in your previous mail, I am also
> a little confused by what you said "the very first stage of boot". Does
> this mean the stage we are in head_32 or head64.c?

Mostly referring to the state where we don't have basic environment
set up yet including page tables.

> If so, could we just do something just as Yinghai did before, that is, Split
> acpi_override into 2 parts: find and copy. And in "earlyboot", we just do
> the find, and I think that is less of risk. Or we can just do ACPI override
> earlier in setup_arch(), not pulling this process that early during boot?

But *WHY*?  It doesn't really buy us anything substantial.  What are
you trying to achieve here?  "Making ACPI info available early" can't
be a goal in itself and the two benefits cited in this thread seem
pretty dubious to me.  Why are you guys trying to push this
convolution when it doesn't bring any substantial gain?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22 18:31                       ` Tejun Heo
@ 2013-08-22 20:11                         ` Toshi Kani
  -1 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-22 20:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello Tejun,

On Thu, 2013-08-22 at 14:31 -0400, Tejun Heo wrote:
> On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
> > I understand that you are concerned about stability of the ACPI stuff,
> > which I think is a valid point, but most of (if not all) of the
> > ACPI-related issues come from ACPI namespace/methods, which is a very
> > different thing.  Please do not mix up those two.  The ACPI
> 
> I have no objection to implementing self-conftained earlyprintk
> support.  If that's all you want to do, please go ahead but do not
> pull in initrd override or ACPICA into it.

If you are referring ACPICA as the AML interpreter, right, we do not
move it up as I explained before.  We are trying to move up the ACPI
table init code (which is part of ACPICA, but has nothing to do with
AML.)

Note that ia64 also uses ACPI, and calls acpi_table_init() in
setup_arch() before initializing the bootmap in find_memory().

> > namespace/methods stuff remains the same and continues to be initialized
> > at very late in the boot sequence.
> > 
> > What's making the patchset complicated is acpi_initrd_override(), which
> > is intended for developers and allows overwriting ACPI bits at their own
> > risk.  This feature won't be used by regular users. 
> 
> Yeah, please forget about that in earlyboot.  It doesn't make any
> sense to fiddle with initrd that early during boot.

I think the reason why Tang is working on this stuff again is that his
previous change (which was once accepted) had broken initrd.  So, he'd
have to support it this time...

> > If you are referring the issue of kernel image location, it is a
> > limitation in the current implementation, not a technical limitation.  I
> > know other OS that supports movable memory and puts the kernel image
> > into a movable memory with SRAT by changing the bootloader.
> 
> I'm not saying that problem shouldn't be solved.  I'm saying what you
> guys are pushing doesn't help solving it at all.  It's too late in the
> boot process.  It needs to be handled either by bootloader or earlier
> kernel kexecing the actual one and super-early SRAT doens't help at
> all in either case, so what's the point of pulling ACPI code in when
> it doesn't contribute to solving the problem properly?

It's too late for the kernel image itself, but it prevents allocating
kernel memory from movable ranges after that.  I'd say it solves a half
of the issue this time.

> > Also, how do you support local page tables without pursing SRAT early?
> 
> Does it even matter with huge mappings?  It's gonna be contained in a
> single page anyway, right?

Are the huge mappings always used?  We cannot force user programs to use
huge pages, can we?

> > Initializing page tables on large systems may take a long time, and I do
> > think that earlyprink needs to be available before that point.
> 
> Yeah, sure, implement it in *minimal* way which doesn't affect
> anything if not explicitly enabled by kernel param like other
> earlyprintks.  It doens't make any sense to add dependency to acpi
> from early boot for that.

It makes sense because it needs to obtain the config info from ACPI
tables.

As for the maintainability, I am far more concerned with your suggestion
of having a separate page table init code when SRAT is used.  This kind
of divergence is a recipe of breakage.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22 20:11                         ` Toshi Kani
  0 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-22 20:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello Tejun,

On Thu, 2013-08-22 at 14:31 -0400, Tejun Heo wrote:
> On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
> > I understand that you are concerned about stability of the ACPI stuff,
> > which I think is a valid point, but most of (if not all) of the
> > ACPI-related issues come from ACPI namespace/methods, which is a very
> > different thing.  Please do not mix up those two.  The ACPI
> 
> I have no objection to implementing self-conftained earlyprintk
> support.  If that's all you want to do, please go ahead but do not
> pull in initrd override or ACPICA into it.

If you are referring ACPICA as the AML interpreter, right, we do not
move it up as I explained before.  We are trying to move up the ACPI
table init code (which is part of ACPICA, but has nothing to do with
AML.)

Note that ia64 also uses ACPI, and calls acpi_table_init() in
setup_arch() before initializing the bootmap in find_memory().

> > namespace/methods stuff remains the same and continues to be initialized
> > at very late in the boot sequence.
> > 
> > What's making the patchset complicated is acpi_initrd_override(), which
> > is intended for developers and allows overwriting ACPI bits at their own
> > risk.  This feature won't be used by regular users. 
> 
> Yeah, please forget about that in earlyboot.  It doesn't make any
> sense to fiddle with initrd that early during boot.

I think the reason why Tang is working on this stuff again is that his
previous change (which was once accepted) had broken initrd.  So, he'd
have to support it this time...

> > If you are referring the issue of kernel image location, it is a
> > limitation in the current implementation, not a technical limitation.  I
> > know other OS that supports movable memory and puts the kernel image
> > into a movable memory with SRAT by changing the bootloader.
> 
> I'm not saying that problem shouldn't be solved.  I'm saying what you
> guys are pushing doesn't help solving it at all.  It's too late in the
> boot process.  It needs to be handled either by bootloader or earlier
> kernel kexecing the actual one and super-early SRAT doens't help at
> all in either case, so what's the point of pulling ACPI code in when
> it doesn't contribute to solving the problem properly?

It's too late for the kernel image itself, but it prevents allocating
kernel memory from movable ranges after that.  I'd say it solves a half
of the issue this time.

> > Also, how do you support local page tables without pursing SRAT early?
> 
> Does it even matter with huge mappings?  It's gonna be contained in a
> single page anyway, right?

Are the huge mappings always used?  We cannot force user programs to use
huge pages, can we?

> > Initializing page tables on large systems may take a long time, and I do
> > think that earlyprink needs to be available before that point.
> 
> Yeah, sure, implement it in *minimal* way which doesn't affect
> anything if not explicitly enabled by kernel param like other
> earlyprintks.  It doens't make any sense to add dependency to acpi
> from early boot for that.

It makes sense because it needs to obtain the config info from ACPI
tables.

As for the maintainability, I am far more concerned with your suggestion
of having a separate page table init code when SRAT is used.  This kind
of divergence is a recipe of breakage.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22 20:11                         ` Toshi Kani
@ 2013-08-22 20:21                           ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22 20:21 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote:
> It's too late for the kernel image itself, but it prevents allocating
> kernel memory from movable ranges after that.  I'd say it solves a half
> of the issue this time.

That works if such half solution eventually leads to the full
solution.  This is just a distraction.  You are already too late in
the boot sequence.  It doesn't even qualify as a half solution.  It's
like obsessing about a speck on your shirt without your trousers on.
If you want to solve this, do that from a place where it actually is
solvable.

> > > Also, how do you support local page tables without pursing SRAT early?
> > 
> > Does it even matter with huge mappings?  It's gonna be contained in a
> > single page anyway, right?
> 
> Are the huge mappings always used?  We cannot force user programs to use
> huge pages, can we?

Everything is a trade-off.  Should we do all this just to support the
off chance someone tries to use memory hotplug on a machine which
doesn't support huge mapping when virtually all CPUs on market
supports it?

> As for the maintainability, I am far more concerned with your suggestion
> of having a separate page table init code when SRAT is used.  This kind
> of divergence is a recipe of breakage.

I don't buy that.  The only thing which needs to change is the
directionality of allocation and we probably don't even need to do
that if huge mapping is in use.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22 20:21                           ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22 20:21 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote:
> It's too late for the kernel image itself, but it prevents allocating
> kernel memory from movable ranges after that.  I'd say it solves a half
> of the issue this time.

That works if such half solution eventually leads to the full
solution.  This is just a distraction.  You are already too late in
the boot sequence.  It doesn't even qualify as a half solution.  It's
like obsessing about a speck on your shirt without your trousers on.
If you want to solve this, do that from a place where it actually is
solvable.

> > > Also, how do you support local page tables without pursing SRAT early?
> > 
> > Does it even matter with huge mappings?  It's gonna be contained in a
> > single page anyway, right?
> 
> Are the huge mappings always used?  We cannot force user programs to use
> huge pages, can we?

Everything is a trade-off.  Should we do all this just to support the
off chance someone tries to use memory hotplug on a machine which
doesn't support huge mapping when virtually all CPUs on market
supports it?

> As for the maintainability, I am far more concerned with your suggestion
> of having a separate page table init code when SRAT is used.  This kind
> of divergence is a recipe of breakage.

I don't buy that.  The only thing which needs to change is the
directionality of allocation and we probably don't even need to do
that if huge mapping is in use.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22 20:21                           ` Tejun Heo
@ 2013-08-22 20:35                             ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22 20:35 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

A bit of addition.

On Thu, Aug 22, 2013 at 04:21:58PM -0400, Tejun Heo wrote:
> That works if such half solution eventually leads to the full
> solution.  This is just a distraction.  You are already too late in
> the boot sequence.  It doesn't even qualify as a half solution.  It's
> like obsessing about a speck on your shirt without your trousers on.
> If you want to solve this, do that from a place where it actually is
> solvable.

Seriously, what's the end game here?  How do you guys see this
eventually reaching full solution?  If you don't see that and this
kinda-sorta-working solution is fine, then that's fine too but we
aren't gonna make a lot of invasive changes for that.  If you can at
least envision the full solution, please try to fit this effort into
the bigger picture.

In all possible solutions that I can think of, there needs to be
earlier handling of SRAT informtaion before the kernel proper starts
executing be that either the actual bootloader or earlier kernel
serving as kexec host.  If a proper solution needs such processing
earlier anyway, it can set up things so that either the default
booting behavior doesn't harm hotpluggability or feed the necessary
information to the kernel.  In both cases, doing ACPI super early in
the booting kernel doesn't buy us anything.

So, then, what the hell are we doing here with all these relocations,
careful double execution of the same code from different execution
contexts, worrying about initrd firmware override even before the
kernel page table is set up?  If we're doing all those to just make
the temporary half-assed-anyway solution minutely better, that's just
plain stupid.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22 20:35                             ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22 20:35 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

A bit of addition.

On Thu, Aug 22, 2013 at 04:21:58PM -0400, Tejun Heo wrote:
> That works if such half solution eventually leads to the full
> solution.  This is just a distraction.  You are already too late in
> the boot sequence.  It doesn't even qualify as a half solution.  It's
> like obsessing about a speck on your shirt without your trousers on.
> If you want to solve this, do that from a place where it actually is
> solvable.

Seriously, what's the end game here?  How do you guys see this
eventually reaching full solution?  If you don't see that and this
kinda-sorta-working solution is fine, then that's fine too but we
aren't gonna make a lot of invasive changes for that.  If you can at
least envision the full solution, please try to fit this effort into
the bigger picture.

In all possible solutions that I can think of, there needs to be
earlier handling of SRAT informtaion before the kernel proper starts
executing be that either the actual bootloader or earlier kernel
serving as kexec host.  If a proper solution needs such processing
earlier anyway, it can set up things so that either the default
booting behavior doesn't harm hotpluggability or feed the necessary
information to the kernel.  In both cases, doing ACPI super early in
the booting kernel doesn't buy us anything.

So, then, what the hell are we doing here with all these relocations,
careful double execution of the same code from different execution
contexts, worrying about initrd firmware override even before the
kernel page table is set up?  If we're doing all those to just make
the temporary half-assed-anyway solution minutely better, that's just
plain stupid.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22 20:21                           ` Tejun Heo
@ 2013-08-22 21:06                             ` Toshi Kani
  -1 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-22 21:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-22 at 16:21 -0400, Tejun Heo wrote:
> On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote:
> > It's too late for the kernel image itself, but it prevents allocating
> > kernel memory from movable ranges after that.  I'd say it solves a half
> > of the issue this time.
> 
> That works if such half solution eventually leads to the full
> solution.  This is just a distraction.  You are already too late in
> the boot sequence.  It doesn't even qualify as a half solution.  It's
> like obsessing about a speck on your shirt without your trousers on.
> If you want to solve this, do that from a place where it actually is
> solvable.

Since some node(s) won't be ejectable, this solution is reasonable as
the first step.  I do not think it is a distraction.  I view your
suggestion as a distraction of supporting local page tables, though.

> > > > Also, how do you support local page tables without pursing SRAT early?
> > > 
> > > Does it even matter with huge mappings?  It's gonna be contained in a
> > > single page anyway, right?
> > 
> > Are the huge mappings always used?  We cannot force user programs to use
> > huge pages, can we?
> 
> Everything is a trade-off.  Should we do all this just to support the
> off chance someone tries to use memory hotplug on a machine which
> doesn't support huge mapping when virtually all CPUs on market
> supports it?

Local page table and memory hotplug are two separate things.  That is,
local page tables can be supported on all NUMA platforms without hotplug
support.  Are you sure huge mapping will solve everything for all types
of applications, and therefore local page tables won't be needed at all?

> > As for the maintainability, I am far more concerned with your suggestion
> > of having a separate page table init code when SRAT is used.  This kind
> > of divergence is a recipe of breakage.
> 
> I don't buy that.  The only thing which needs to change is the
> directionality of allocation and we probably don't even need to do
> that if huge mapping is in use.

When someone changes the page table init code, who will test it with the
special allocation code?

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22 21:06                             ` Toshi Kani
  0 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-22 21:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

On Thu, 2013-08-22 at 16:21 -0400, Tejun Heo wrote:
> On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote:
> > It's too late for the kernel image itself, but it prevents allocating
> > kernel memory from movable ranges after that.  I'd say it solves a half
> > of the issue this time.
> 
> That works if such half solution eventually leads to the full
> solution.  This is just a distraction.  You are already too late in
> the boot sequence.  It doesn't even qualify as a half solution.  It's
> like obsessing about a speck on your shirt without your trousers on.
> If you want to solve this, do that from a place where it actually is
> solvable.

Since some node(s) won't be ejectable, this solution is reasonable as
the first step.  I do not think it is a distraction.  I view your
suggestion as a distraction of supporting local page tables, though.

> > > > Also, how do you support local page tables without pursing SRAT early?
> > > 
> > > Does it even matter with huge mappings?  It's gonna be contained in a
> > > single page anyway, right?
> > 
> > Are the huge mappings always used?  We cannot force user programs to use
> > huge pages, can we?
> 
> Everything is a trade-off.  Should we do all this just to support the
> off chance someone tries to use memory hotplug on a machine which
> doesn't support huge mapping when virtually all CPUs on market
> supports it?

Local page table and memory hotplug are two separate things.  That is,
local page tables can be supported on all NUMA platforms without hotplug
support.  Are you sure huge mapping will solve everything for all types
of applications, and therefore local page tables won't be needed at all?

> > As for the maintainability, I am far more concerned with your suggestion
> > of having a separate page table init code when SRAT is used.  This kind
> > of divergence is a recipe of breakage.
> 
> I don't buy that.  The only thing which needs to change is the
> directionality of allocation and we probably don't even need to do
> that if huge mapping is in use.

When someone changes the page table init code, who will test it with the
special allocation code?

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22 21:06                             ` Toshi Kani
@ 2013-08-22 21:21                               ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22 21:21 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello, Toshi.

On Thu, Aug 22, 2013 at 03:06:38PM -0600, Toshi Kani wrote:
> Since some node(s) won't be ejectable, this solution is reasonable as
> the first step.  I do not think it is a distraction.  I view your

But does this contribute to reaching the next step?  If so, how?
I can't see how and that's why I said this was a distraction.

> suggestion as a distraction of supporting local page tables, though.

Hmmm...

> Local page table and memory hotplug are two separate things.  That is,
> local page tables can be supported on all NUMA platforms without hotplug
> support.  Are you sure huge mapping will solve everything for all types
> of applications, and therefore local page tables won't be needed at all?

When you throw around terms like "all" and "at all", you can't reach
rational discussion about engineering trade-offs.  I was asking you
whether it was reasonable to do per-node page table when most machines
support huge page mappings which makes the whole thing rather
pointless.  Of course there will be some niche cases where this might
not be optimal but do you think that would be enough to justify the
added complexity and churn?  If you think so, can you please
elaborate?

> When someone changes the page table init code, who will test it with the
> special allocation code?

What are you worrying about?  Are you saying that allocating page
table towards top or bottom of memory would be more disruptive and
difficult to debug than pulling in ACPI init and SRAT information into
the process?  Am I missing something here?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22 21:21                               ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-22 21:21 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello, Toshi.

On Thu, Aug 22, 2013 at 03:06:38PM -0600, Toshi Kani wrote:
> Since some node(s) won't be ejectable, this solution is reasonable as
> the first step.  I do not think it is a distraction.  I view your

But does this contribute to reaching the next step?  If so, how?
I can't see how and that's why I said this was a distraction.

> suggestion as a distraction of supporting local page tables, though.

Hmmm...

> Local page table and memory hotplug are two separate things.  That is,
> local page tables can be supported on all NUMA platforms without hotplug
> support.  Are you sure huge mapping will solve everything for all types
> of applications, and therefore local page tables won't be needed at all?

When you throw around terms like "all" and "at all", you can't reach
rational discussion about engineering trade-offs.  I was asking you
whether it was reasonable to do per-node page table when most machines
support huge page mappings which makes the whole thing rather
pointless.  Of course there will be some niche cases where this might
not be optimal but do you think that would be enough to justify the
added complexity and churn?  If you think so, can you please
elaborate?

> When someone changes the page table init code, who will test it with the
> special allocation code?

What are you worrying about?  Are you saying that allocating page
table towards top or bottom of memory would be more disruptive and
difficult to debug than pulling in ACPI init and SRAT information into
the process?  Am I missing something here?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22 21:21                               ` Tejun Heo
@ 2013-08-22 22:17                                 ` Toshi Kani
  -1 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-22 22:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello Tejun,

On Thu, 2013-08-22 at 17:21 -0400, Tejun Heo wrote:
 :
> > Local page table and memory hotplug are two separate things.  That is,
> > local page tables can be supported on all NUMA platforms without hotplug
> > support.  Are you sure huge mapping will solve everything for all types
> > of applications, and therefore local page tables won't be needed at all?
> 
> When you throw around terms like "all" and "at all", you can't reach
> rational discussion about engineering trade-offs.  I was asking you
> whether it was reasonable to do per-node page table when most machines
> support huge page mappings which makes the whole thing rather
> pointless.  Of course there will be some niche cases where this might
> not be optimal but do you think that would be enough to justify the
> added complexity and churn?  If you think so, can you please
> elaborate?

I am relatively new to Linux, so I am not a good person to elaborate
this.  From my experience on other OS, huge pages helped for the kernel,
but did not necessarily help user applications.  It depended on
applications, which were not niche cases.  But Linux may be different,
so I asked since you seemed confident.  I'd appreciate if you can point
us some data that endorses your statement.

> > When someone changes the page table init code, who will test it with the
> > special allocation code?
> 
> What are you worrying about?  Are you saying that allocating page
> table towards top or bottom of memory would be more disruptive and
> difficult to debug than pulling in ACPI init and SRAT information into
> the process?  Am I missing something here?

My worry is that the code is unlikely tested with the special logic when
someone makes code changes to the page tables.  Such code can easily be
broken in future.

To answer your other question/email, I believe Tang's next step is to
support local page tables.  This is why we think pursing SRAT earlier is
the right direction.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-22 22:17                                 ` Toshi Kani
  0 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-22 22:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello Tejun,

On Thu, 2013-08-22 at 17:21 -0400, Tejun Heo wrote:
 :
> > Local page table and memory hotplug are two separate things.  That is,
> > local page tables can be supported on all NUMA platforms without hotplug
> > support.  Are you sure huge mapping will solve everything for all types
> > of applications, and therefore local page tables won't be needed at all?
> 
> When you throw around terms like "all" and "at all", you can't reach
> rational discussion about engineering trade-offs.  I was asking you
> whether it was reasonable to do per-node page table when most machines
> support huge page mappings which makes the whole thing rather
> pointless.  Of course there will be some niche cases where this might
> not be optimal but do you think that would be enough to justify the
> added complexity and churn?  If you think so, can you please
> elaborate?

I am relatively new to Linux, so I am not a good person to elaborate
this.  From my experience on other OS, huge pages helped for the kernel,
but did not necessarily help user applications.  It depended on
applications, which were not niche cases.  But Linux may be different,
so I asked since you seemed confident.  I'd appreciate if you can point
us some data that endorses your statement.

> > When someone changes the page table init code, who will test it with the
> > special allocation code?
> 
> What are you worrying about?  Are you saying that allocating page
> table towards top or bottom of memory would be more disruptive and
> difficult to debug than pulling in ACPI init and SRAT information into
> the process?  Am I missing something here?

My worry is that the code is unlikely tested with the special logic when
someone makes code changes to the page tables.  Such code can easily be
broken in future.

To answer your other question/email, I believe Tang's next step is to
support local page tables.  This is why we think pursing SRAT earlier is
the right direction.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-22 22:17                                 ` Toshi Kani
@ 2013-08-23 13:04                                   ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 13:04 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello, Toshi.

On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
> I am relatively new to Linux, so I am not a good person to elaborate
> this.  From my experience on other OS, huge pages helped for the kernel,
> but did not necessarily help user applications.  It depended on
> applications, which were not niche cases.  But Linux may be different,
> so I asked since you seemed confident.  I'd appreciate if you can point
> us some data that endorses your statement.

We are talking about the kernel linear mapping which is created during
early boot, so if it's available and useable there's no reason not to
use it.  Exceptions would be earlier processors which didn't do 1G
mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
configurations, the former has been history for a bit now.  Can't be
sure about the latter but it'd be surprising for that to affect large
amount of memory in the systems that are of interest here.  Ooh, that
reminds me that we probably wanna go back to 1G + MTRR mapping under
4G.  We're currently creating a lot of mapping holes.

> My worry is that the code is unlikely tested with the special logic when
> someone makes code changes to the page tables.  Such code can easily be
> broken in future.

Well, I wouldn't consider flipping the direction of allocation to be
particularly difficult to get right especially when compared to
bringing in ACPI tables into the mix.

> To answer your other question/email, I believe Tang's next step is to
> support local page tables.  This is why we think pursing SRAT earlier is
> the right direction.

Given 1G mappings, is that even a worthwhile effort?  I'm getting even
more more skeptical.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 13:04                                   ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 13:04 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello, Toshi.

On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
> I am relatively new to Linux, so I am not a good person to elaborate
> this.  From my experience on other OS, huge pages helped for the kernel,
> but did not necessarily help user applications.  It depended on
> applications, which were not niche cases.  But Linux may be different,
> so I asked since you seemed confident.  I'd appreciate if you can point
> us some data that endorses your statement.

We are talking about the kernel linear mapping which is created during
early boot, so if it's available and useable there's no reason not to
use it.  Exceptions would be earlier processors which didn't do 1G
mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
configurations, the former has been history for a bit now.  Can't be
sure about the latter but it'd be surprising for that to affect large
amount of memory in the systems that are of interest here.  Ooh, that
reminds me that we probably wanna go back to 1G + MTRR mapping under
4G.  We're currently creating a lot of mapping holes.

> My worry is that the code is unlikely tested with the special logic when
> someone makes code changes to the page tables.  Such code can easily be
> broken in future.

Well, I wouldn't consider flipping the direction of allocation to be
particularly difficult to get right especially when compared to
bringing in ACPI tables into the mix.

> To answer your other question/email, I believe Tang's next step is to
> support local page tables.  This is why we think pursing SRAT earlier is
> the right direction.

Given 1G mappings, is that even a worthwhile effort?  I'm getting even
more more skeptical.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 13:04                                   ` Tejun Heo
@ 2013-08-23 13:08                                     ` H. Peter Anvin
  -1 siblings, 0 replies; 116+ messages in thread
From: H. Peter Anvin @ 2013-08-23 13:08 UTC (permalink / raw)
  To: Tejun Heo, Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, akpm, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

What is the point of 1G+MTRR?  If there are caching differences the TLB will fracture the pages anyway.

Tejun Heo <tj@kernel.org> wrote:
>Hello, Toshi.
>
>On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
>> I am relatively new to Linux, so I am not a good person to elaborate
>> this.  From my experience on other OS, huge pages helped for the
>kernel,
>> but did not necessarily help user applications.  It depended on
>> applications, which were not niche cases.  But Linux may be
>different,
>> so I asked since you seemed confident.  I'd appreciate if you can
>point
>> us some data that endorses your statement.
>
>We are talking about the kernel linear mapping which is created during
>early boot, so if it's available and useable there's no reason not to
>use it.  Exceptions would be earlier processors which didn't do 1G
>mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
>configurations, the former has been history for a bit now.  Can't be
>sure about the latter but it'd be surprising for that to affect large
>amount of memory in the systems that are of interest here.  Ooh, that
>reminds me that we probably wanna go back to 1G + MTRR mapping under
>4G.  We're currently creating a lot of mapping holes.
>
>> My worry is that the code is unlikely tested with the special logic
>when
>> someone makes code changes to the page tables.  Such code can easily
>be
>> broken in future.
>
>Well, I wouldn't consider flipping the direction of allocation to be
>particularly difficult to get right especially when compared to
>bringing in ACPI tables into the mix.
>
>> To answer your other question/email, I believe Tang's next step is to
>> support local page tables.  This is why we think pursing SRAT earlier
>is
>> the right direction.
>
>Given 1G mappings, is that even a worthwhile effort?  I'm getting even
>more more skeptical.
>
>Thanks.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 13:08                                     ` H. Peter Anvin
  0 siblings, 0 replies; 116+ messages in thread
From: H. Peter Anvin @ 2013-08-23 13:08 UTC (permalink / raw)
  To: Tejun Heo, Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, akpm, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

What is the point of 1G+MTRR?  If there are caching differences the TLB will fracture the pages anyway.

Tejun Heo <tj@kernel.org> wrote:
>Hello, Toshi.
>
>On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
>> I am relatively new to Linux, so I am not a good person to elaborate
>> this.  From my experience on other OS, huge pages helped for the
>kernel,
>> but did not necessarily help user applications.  It depended on
>> applications, which were not niche cases.  But Linux may be
>different,
>> so I asked since you seemed confident.  I'd appreciate if you can
>point
>> us some data that endorses your statement.
>
>We are talking about the kernel linear mapping which is created during
>early boot, so if it's available and useable there's no reason not to
>use it.  Exceptions would be earlier processors which didn't do 1G
>mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
>configurations, the former has been history for a bit now.  Can't be
>sure about the latter but it'd be surprising for that to affect large
>amount of memory in the systems that are of interest here.  Ooh, that
>reminds me that we probably wanna go back to 1G + MTRR mapping under
>4G.  We're currently creating a lot of mapping holes.
>
>> My worry is that the code is unlikely tested with the special logic
>when
>> someone makes code changes to the page tables.  Such code can easily
>be
>> broken in future.
>
>Well, I wouldn't consider flipping the direction of allocation to be
>particularly difficult to get right especially when compared to
>bringing in ACPI tables into the mix.
>
>> To answer your other question/email, I believe Tang's next step is to
>> support local page tables.  This is why we think pursing SRAT earlier
>is
>> the right direction.
>
>Given 1G mappings, is that even a worthwhile effort?  I'm getting even
>more more skeptical.
>
>Thanks.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 13:08                                     ` H. Peter Anvin
@ 2013-08-23 14:19                                       ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 14:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore,
	lv.zheng, rjw, lenb, tglx, mingo, akpm, trenn, yinghai,
	jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku, mgorman,
	minchan, mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel,
	jweiner, prarit, zhangyanfei, yanghy, x86, linux-doc,
	linux-kernel, linux-mm, linux-acpi

Hello,

On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote:
> What is the point of 1G+MTRR?  If there are caching differences the
> TLB will fracture the pages anyway.

Ah, right.  Consuming less memory / cachelines would still be a small
advantage tho unless creating split TLB from larger mapping is
noticeably less efficient.  If the extra logic to do that is small,
which I think it'd be, it'd be a gain at almost no cost.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 14:19                                       ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 14:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore,
	lv.zheng, rjw, lenb, tglx, mingo, akpm, trenn, yinghai,
	jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku, mgorman,
	minchan, mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel,
	jweiner, prarit, zhangyanfei, yanghy, x86, linux-doc,
	linux-kernel, linux-mm, linux-acpi

Hello,

On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote:
> What is the point of 1G+MTRR?  If there are caching differences the
> TLB will fracture the pages anyway.

Ah, right.  Consuming less memory / cachelines would still be a small
advantage tho unless creating split TLB from larger mapping is
noticeably less efficient.  If the extra logic to do that is small,
which I think it'd be, it'd be a gain at almost no cost.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 14:19                                       ` Tejun Heo
  (?)
@ 2013-08-23 14:24                                         ` H. Peter Anvin
  -1 siblings, 0 replies; 116+ messages in thread
From: H. Peter Anvin @ 2013-08-23 14:24 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore,
	lv.zheng, rjw, lenb, tglx, mingo, akpm, trenn, yinghai,
	jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku, mgorman,
	minchan, mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel,
	jweiner, prarit, zhangyanfei, yanghy, x86, linux-doc,
	linux-kernel, linux-mm, linux-acpi

Well... relying on MTRRs is a big cost in complexity and failure modes.

Tejun Heo <tj@kernel.org> wrote:
>Hello,
>
>On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote:
>> What is the point of 1G+MTRR?  If there are caching differences the
>> TLB will fracture the pages anyway.
>
>Ah, right.  Consuming less memory / cachelines would still be a small
>advantage tho unless creating split TLB from larger mapping is
>noticeably less efficient.  If the extra logic to do that is small,
>which I think it'd be, it'd be a gain at almost no cost.
>
>Thanks.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 14:24                                         ` H. Peter Anvin
  0 siblings, 0 replies; 116+ messages in thread
From: H. Peter Anvin @ 2013-08-23 14:24 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore,
	lv.zheng, rjw, lenb, tglx, mingo, akpm, trenn, yinghai,
	jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku, mgorman,
	minchan, mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel,
	jweiner, prarit, zhangyanfei, yanghy, x86, linux-doc,
	linux-kernel, linux-mm, linux-acpi

Well... relying on MTRRs is a big cost in complexity and failure modes.

Tejun Heo <tj@kernel.org> wrote:
>Hello,
>
>On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote:
>> What is the point of 1G+MTRR?  If there are caching differences the
>> TLB will fracture the pages anyway.
>
>Ah, right.  Consuming less memory / cachelines would still be a small
>advantage tho unless creating split TLB from larger mapping is
>noticeably less efficient.  If the extra logic to do that is small,
>which I think it'd be, it'd be a gain at almost no cost.
>
>Thanks.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 14:24                                         ` H. Peter Anvin
  0 siblings, 0 replies; 116+ messages in thread
From: H. Peter Anvin @ 2013-08-23 14:24 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore,
	lv.zheng, rjw, lenb, tglx, mingo, akpm, trenn, yinghai,
	jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku, mgorman,
	minchan, mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel,
	jweiner, prarit, zhangyanfei, yanghy, x86, linux-doc,
	linux-kernel, linux-mm, linux-acpi

Well... relying on MTRRs is a big cost in complexity and failure modes.

Tejun Heo <tj@kernel.org> wrote:
>Hello,
>
>On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote:
>> What is the point of 1G+MTRR?  If there are caching differences the
>> TLB will fracture the pages anyway.
>
>Ah, right.  Consuming less memory / cachelines would still be a small
>advantage tho unless creating split TLB from larger mapping is
>noticeably less efficient.  If the extra logic to do that is small,
>which I think it'd be, it'd be a gain at almost no cost.
>
>Thanks.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 14:24                                         ` H. Peter Anvin
@ 2013-08-23 14:35                                           ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 14:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore,
	lv.zheng, rjw, lenb, tglx, mingo, akpm, trenn, yinghai,
	jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku, mgorman,
	minchan, mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel,
	jweiner, prarit, zhangyanfei, yanghy, x86, linux-doc,
	linux-kernel, linux-mm, linux-acpi

Hello,

On Fri, Aug 23, 2013 at 04:24:06PM +0200, H. Peter Anvin wrote:
> Well... relying on MTRRs is a big cost in complexity and failure modes.

Yeah, it's true that MTRRs are nasty.  On the other hand, we've been
doing that for over a decade and are still doing it anyway if I'm not
mistaken.  It probably isn't a big difference but it's still a bit sad
that this is likely causing small performance regression out in the
wild.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 14:35                                           ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 14:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore,
	lv.zheng, rjw, lenb, tglx, mingo, akpm, trenn, yinghai,
	jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku, mgorman,
	minchan, mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel,
	jweiner, prarit, zhangyanfei, yanghy, x86, linux-doc,
	linux-kernel, linux-mm, linux-acpi

Hello,

On Fri, Aug 23, 2013 at 04:24:06PM +0200, H. Peter Anvin wrote:
> Well... relying on MTRRs is a big cost in complexity and failure modes.

Yeah, it's true that MTRRs are nasty.  On the other hand, we've been
doing that for over a decade and are still doing it anyway if I'm not
mistaken.  It probably isn't a big difference but it's still a bit sad
that this is likely causing small performance regression out in the
wild.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 14:35                                           ` Tejun Heo
@ 2013-08-23 14:57                                             ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 14:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore,
	lv.zheng, rjw, lenb, tglx, mingo, akpm, trenn, yinghai,
	jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku, mgorman,
	minchan, mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel,
	jweiner, prarit, zhangyanfei, yanghy, x86, linux-doc,
	linux-kernel, linux-mm, linux-acpi

On Fri, Aug 23, 2013 at 10:35:07AM -0400, Tejun Heo wrote:
> Yeah, it's true that MTRRs are nasty.  On the other hand, we've been
> doing that for over a decade and are still doing it anyway if I'm not
> mistaken.  It probably isn't a big difference but it's still a bit sad
> that this is likely causing small performance regression out in the
> wild.

Just went over the processor manual and it doesn't seem like doing the
above would be a good idea.


  System Programming Guide, Part 1

  11.11.9 Large Page Size Considerations

 ... 
 Because the memory type for a large page is cached in the TLB, the
 processor can behave in an undefined manner if a large page is mapped
 to a region of memory that MTRRs have mapped with multiple memory
 types.
 ...
 If a large page maps to a region of memory containing different
 MTRR-defined memory types, the PCD and PWT flags in the page-table
 entry should be set for the most conservative memory type for that
 range. For example, a large page used for memory mapped I/O and
 regular memory 11-48 Vol. 3A MEMORY CACHE CONTROL
 ...

 The Pentium 4, Intel Xeon, and P6 family processors provide special
 support for the physical memory range from 0 to 4 MBytes,
 ...
 Here, the processor maps the memory range as multiple 4-KByte pages
 within the TLB. This operation insures correct behavior at the cost
 of performance. To avoid this performance penalty, operating-system
 software should reserve the large page option for regions of memory
 at addresses greater than or equal to 4 MBytes.

So, yeah, the current behavior seems like the right thing to do.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 14:57                                             ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 14:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore,
	lv.zheng, rjw, lenb, tglx, mingo, akpm, trenn, yinghai,
	jiang.liu, wency, laijs, isimatu.yasuaki, izumi.taku, mgorman,
	minchan, mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel,
	jweiner, prarit, zhangyanfei, yanghy, x86, linux-doc,
	linux-kernel, linux-mm, linux-acpi

On Fri, Aug 23, 2013 at 10:35:07AM -0400, Tejun Heo wrote:
> Yeah, it's true that MTRRs are nasty.  On the other hand, we've been
> doing that for over a decade and are still doing it anyway if I'm not
> mistaken.  It probably isn't a big difference but it's still a bit sad
> that this is likely causing small performance regression out in the
> wild.

Just went over the processor manual and it doesn't seem like doing the
above would be a good idea.


  System Programming Guide, Part 1

  11.11.9 Large Page Size Considerations

 ... 
 Because the memory type for a large page is cached in the TLB, the
 processor can behave in an undefined manner if a large page is mapped
 to a region of memory that MTRRs have mapped with multiple memory
 types.
 ...
 If a large page maps to a region of memory containing different
 MTRR-defined memory types, the PCD and PWT flags in the page-table
 entry should be set for the most conservative memory type for that
 range. For example, a large page used for memory mapped I/O and
 regular memory 11-48 Vol. 3A MEMORY CACHE CONTROL
 ...

 The Pentium 4, Intel Xeon, and P6 family processors provide special
 support for the physical memory range from 0 to 4 MBytes,
 ...
 Here, the processor maps the memory range as multiple 4-KByte pages
 within the TLB. This operation insures correct behavior at the cost
 of performance. To avoid this performance penalty, operating-system
 software should reserve the large page option for regions of memory
 at addresses greater than or equal to 4 MBytes.

So, yeah, the current behavior seems like the right thing to do.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 13:04                                   ` Tejun Heo
@ 2013-08-23 16:14                                     ` Toshi Kani
  -1 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-23 16:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote:
> On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
> > I am relatively new to Linux, so I am not a good person to elaborate
> > this.  From my experience on other OS, huge pages helped for the kernel,
> > but did not necessarily help user applications.  It depended on
> > applications, which were not niche cases.  But Linux may be different,
> > so I asked since you seemed confident.  I'd appreciate if you can point
> > us some data that endorses your statement.
> 
> We are talking about the kernel linear mapping which is created during
> early boot, so if it's available and useable there's no reason not to
> use it.  Exceptions would be earlier processors which didn't do 1G
> mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
> configurations, the former has been history for a bit now.  Can't be
> sure about the latter but it'd be surprising for that to affect large
> amount of memory in the systems that are of interest here.  Ooh, that
> reminds me that we probably wanna go back to 1G + MTRR mapping under
> 4G.  We're currently creating a lot of mapping holes.

Thanks for the explanation.

> > My worry is that the code is unlikely tested with the special logic when
> > someone makes code changes to the page tables.  Such code can easily be
> > broken in future.
> 
> Well, I wouldn't consider flipping the direction of allocation to be
> particularly difficult to get right especially when compared to
> bringing in ACPI tables into the mix.
> 
> > To answer your other question/email, I believe Tang's next step is to
> > support local page tables.  This is why we think pursing SRAT earlier is
> > the right direction.
> 
> Given 1G mappings, is that even a worthwhile effort?  I'm getting even
> more more skeptical.

With 1G mappings, I agree that it won't make much difference.

I still think acpi table info should be available earlier, but I do not
think I can convince you on this.  This can be religious debate.

Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 16:14                                     ` Toshi Kani
  0 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-23 16:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote:
> On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
> > I am relatively new to Linux, so I am not a good person to elaborate
> > this.  From my experience on other OS, huge pages helped for the kernel,
> > but did not necessarily help user applications.  It depended on
> > applications, which were not niche cases.  But Linux may be different,
> > so I asked since you seemed confident.  I'd appreciate if you can point
> > us some data that endorses your statement.
> 
> We are talking about the kernel linear mapping which is created during
> early boot, so if it's available and useable there's no reason not to
> use it.  Exceptions would be earlier processors which didn't do 1G
> mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
> configurations, the former has been history for a bit now.  Can't be
> sure about the latter but it'd be surprising for that to affect large
> amount of memory in the systems that are of interest here.  Ooh, that
> reminds me that we probably wanna go back to 1G + MTRR mapping under
> 4G.  We're currently creating a lot of mapping holes.

Thanks for the explanation.

> > My worry is that the code is unlikely tested with the special logic when
> > someone makes code changes to the page tables.  Such code can easily be
> > broken in future.
> 
> Well, I wouldn't consider flipping the direction of allocation to be
> particularly difficult to get right especially when compared to
> bringing in ACPI tables into the mix.
> 
> > To answer your other question/email, I believe Tang's next step is to
> > support local page tables.  This is why we think pursing SRAT earlier is
> > the right direction.
> 
> Given 1G mappings, is that even a worthwhile effort?  I'm getting even
> more more skeptical.

With 1G mappings, I agree that it won't make much difference.

I still think acpi table info should be available earlier, but I do not
think I can convince you on this.  This can be religious debate.

Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 16:14                                     ` Toshi Kani
@ 2013-08-23 16:24                                       ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 16:24 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
> I still think acpi table info should be available earlier, but I do not
> think I can convince you on this.  This can be religious debate.

I'm curious.  If there aren't substantial enough benefits, why would
you still want to pull it earlier when it brings in things like initrd
override and crafting the code carefully so that it's safe to execute
it from different address modes and so on?  Please note that x86 is
not ia64.  The early environment is completely different not only
technically but also in its diversity and suckiness.  It wasn't too
long ago that vendors were screwing up ACPI left and right.  It has
been getting better but there's a reason why, for example, we still
consider e820 to be the authoritative information over ACPI.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 16:24                                       ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 16:24 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
> I still think acpi table info should be available earlier, but I do not
> think I can convince you on this.  This can be religious debate.

I'm curious.  If there aren't substantial enough benefits, why would
you still want to pull it earlier when it brings in things like initrd
override and crafting the code carefully so that it's safe to execute
it from different address modes and so on?  Please note that x86 is
not ia64.  The early environment is completely different not only
technically but also in its diversity and suckiness.  It wasn't too
long ago that vendors were screwing up ACPI left and right.  It has
been getting better but there's a reason why, for example, we still
consider e820 to be the authoritative information over ACPI.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 16:14                                     ` Toshi Kani
@ 2013-08-23 16:54                                       ` Zhang Yanfei
  -1 siblings, 0 replies; 116+ messages in thread
From: Zhang Yanfei @ 2013-08-23 16:54 UTC (permalink / raw)
  To: Toshi Kani, Tejun Heo
  Cc: Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx,
	mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello

On 08/24/2013 12:14 AM, Toshi Kani wrote:
> Hello,
> 
> On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote:
>> On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
>>> I am relatively new to Linux, so I am not a good person to elaborate
>>> this.  From my experience on other OS, huge pages helped for the kernel,
>>> but did not necessarily help user applications.  It depended on
>>> applications, which were not niche cases.  But Linux may be different,
>>> so I asked since you seemed confident.  I'd appreciate if you can point
>>> us some data that endorses your statement.
>>
>> We are talking about the kernel linear mapping which is created during
>> early boot, so if it's available and useable there's no reason not to
>> use it.  Exceptions would be earlier processors which didn't do 1G
>> mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
>> configurations, the former has been history for a bit now.  Can't be
>> sure about the latter but it'd be surprising for that to affect large
>> amount of memory in the systems that are of interest here.  Ooh, that
>> reminds me that we probably wanna go back to 1G + MTRR mapping under
>> 4G.  We're currently creating a lot of mapping holes.
> 
> Thanks for the explanation.
> 
>>> My worry is that the code is unlikely tested with the special logic when
>>> someone makes code changes to the page tables.  Such code can easily be
>>> broken in future.
>>
>> Well, I wouldn't consider flipping the direction of allocation to be
>> particularly difficult to get right especially when compared to
>> bringing in ACPI tables into the mix.
>>
>>> To answer your other question/email, I believe Tang's next step is to
>>> support local page tables.  This is why we think pursing SRAT earlier is
>>> the right direction.
>>
>> Given 1G mappings, is that even a worthwhile effort?  I'm getting even
>> more more skeptical.
> 
> With 1G mappings, I agree that it won't make much difference.
> 
> I still think acpi table info should be available earlier, but I do not
> think I can convince you on this.  This can be religious debate.
> 
> Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 
> 

By saying TJ's suggestion, you mean, we will let memblock to control the
behaviour, that said, we will do early allocations near the kernel image
range before we get the SRAT info?

If so, yeah, we have been working on this direction. By doing this, we may
have two main changes:

1. change some of memblock's APIs to make it have the ability to allocate
   memory from low address.
2. setup kernel page table down-top. Concretely, we first map the memory
   just after the kernel image to the top, then, we map 0 - kernel image end.

Do you guys think this is reasonable and acceptable?

-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 16:54                                       ` Zhang Yanfei
  0 siblings, 0 replies; 116+ messages in thread
From: Zhang Yanfei @ 2013-08-23 16:54 UTC (permalink / raw)
  To: Toshi Kani, Tejun Heo
  Cc: Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw, lenb, tglx,
	mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello

On 08/24/2013 12:14 AM, Toshi Kani wrote:
> Hello,
> 
> On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote:
>> On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
>>> I am relatively new to Linux, so I am not a good person to elaborate
>>> this.  From my experience on other OS, huge pages helped for the kernel,
>>> but did not necessarily help user applications.  It depended on
>>> applications, which were not niche cases.  But Linux may be different,
>>> so I asked since you seemed confident.  I'd appreciate if you can point
>>> us some data that endorses your statement.
>>
>> We are talking about the kernel linear mapping which is created during
>> early boot, so if it's available and useable there's no reason not to
>> use it.  Exceptions would be earlier processors which didn't do 1G
>> mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
>> configurations, the former has been history for a bit now.  Can't be
>> sure about the latter but it'd be surprising for that to affect large
>> amount of memory in the systems that are of interest here.  Ooh, that
>> reminds me that we probably wanna go back to 1G + MTRR mapping under
>> 4G.  We're currently creating a lot of mapping holes.
> 
> Thanks for the explanation.
> 
>>> My worry is that the code is unlikely tested with the special logic when
>>> someone makes code changes to the page tables.  Such code can easily be
>>> broken in future.
>>
>> Well, I wouldn't consider flipping the direction of allocation to be
>> particularly difficult to get right especially when compared to
>> bringing in ACPI tables into the mix.
>>
>>> To answer your other question/email, I believe Tang's next step is to
>>> support local page tables.  This is why we think pursing SRAT earlier is
>>> the right direction.
>>
>> Given 1G mappings, is that even a worthwhile effort?  I'm getting even
>> more more skeptical.
> 
> With 1G mappings, I agree that it won't make much difference.
> 
> I still think acpi table info should be available earlier, but I do not
> think I can convince you on this.  This can be religious debate.
> 
> Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 
> 

By saying TJ's suggestion, you mean, we will let memblock to control the
behaviour, that said, we will do early allocations near the kernel image
range before we get the SRAT info?

If so, yeah, we have been working on this direction. By doing this, we may
have two main changes:

1. change some of memblock's APIs to make it have the ability to allocate
   memory from low address.
2. setup kernel page table down-top. Concretely, we first map the memory
   just after the kernel image to the top, then, we map 0 - kernel image end.

Do you guys think this is reasonable and acceptable?

-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 16:24                                       ` Tejun Heo
@ 2013-08-23 17:13                                         ` Toshi Kani
  -1 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-23 17:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote:
> On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
> > I still think acpi table info should be available earlier, but I do not
> > think I can convince you on this.  This can be religious debate.
> 
> I'm curious.  If there aren't substantial enough benefits, why would
> you still want to pull it earlier when it brings in things like initrd
> override and crafting the code carefully so that it's safe to execute
> it from different address modes and so on?  Please note that x86 is
> not ia64.  The early environment is completely different not only
> technically but also in its diversity and suckiness.  It wasn't too
> long ago that vendors were screwing up ACPI left and right.  It has
> been getting better but there's a reason why, for example, we still
> consider e820 to be the authoritative information over ACPI.

Firmware generates tables, and provides them via some interface.  Memory
map table can be provided via e820 or EFI memory map.  Memory topology
table is provided via ACPI.  I agree to prioritize one table over the
other when there is overlap.  But in the end, it is the firmware that
generates the tables.  Because it is provided via ACPI does not make it
suddenly unreliable.  I think table info from e820/EFI/ACPI should be
available at the same time.  To me, it makes more sense to use the
hotplug info to initialize memblock than try to find a way to workaround
without it.  I think we will continue to be in that way to find a
workaround in this direction. 

I came from ia64 background, and am not very familiar with x86.  So, you
may be very right about that x86 is different.  I also agree that initrd
is making it unnecessarily complicated.  We may see some initial issues,
but my hope is that the code gets matured over the time.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 17:13                                         ` Toshi Kani
  0 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-23 17:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zhang Yanfei, Tang Chen, konrad.wilk, robert.moore, lv.zheng,
	rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, yanghy, x86, linux-doc, linux-kernel,
	linux-mm, linux-acpi

Hello,

On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote:
> On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
> > I still think acpi table info should be available earlier, but I do not
> > think I can convince you on this.  This can be religious debate.
> 
> I'm curious.  If there aren't substantial enough benefits, why would
> you still want to pull it earlier when it brings in things like initrd
> override and crafting the code carefully so that it's safe to execute
> it from different address modes and so on?  Please note that x86 is
> not ia64.  The early environment is completely different not only
> technically but also in its diversity and suckiness.  It wasn't too
> long ago that vendors were screwing up ACPI left and right.  It has
> been getting better but there's a reason why, for example, we still
> consider e820 to be the authoritative information over ACPI.

Firmware generates tables, and provides them via some interface.  Memory
map table can be provided via e820 or EFI memory map.  Memory topology
table is provided via ACPI.  I agree to prioritize one table over the
other when there is overlap.  But in the end, it is the firmware that
generates the tables.  Because it is provided via ACPI does not make it
suddenly unreliable.  I think table info from e820/EFI/ACPI should be
available at the same time.  To me, it makes more sense to use the
hotplug info to initialize memblock than try to find a way to workaround
without it.  I think we will continue to be in that way to find a
workaround in this direction. 

I came from ia64 background, and am not very familiar with x86.  So, you
may be very right about that x86 is different.  I also agree that initrd
is making it unnecessarily complicated.  We may see some initial issues,
but my hope is that the code gets matured over the time.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 17:13                                         ` Toshi Kani
@ 2013-08-23 17:29                                           ` Zhang Yanfei
  -1 siblings, 0 replies; 116+ messages in thread
From: Zhang Yanfei @ 2013-08-23 17:29 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Tejun Heo, Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw,
	lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hi Toshi,

On 08/24/2013 01:13 AM, Toshi Kani wrote:
> Hello,
> 
> On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote:
>> On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
>>> I still think acpi table info should be available earlier, but I do not
>>> think I can convince you on this.  This can be religious debate.
>>
>> I'm curious.  If there aren't substantial enough benefits, why would
>> you still want to pull it earlier when it brings in things like initrd
>> override and crafting the code carefully so that it's safe to execute
>> it from different address modes and so on?  Please note that x86 is
>> not ia64.  The early environment is completely different not only
>> technically but also in its diversity and suckiness.  It wasn't too
>> long ago that vendors were screwing up ACPI left and right.  It has
>> been getting better but there's a reason why, for example, we still
>> consider e820 to be the authoritative information over ACPI.
> 
> Firmware generates tables, and provides them via some interface.  Memory
> map table can be provided via e820 or EFI memory map.  Memory topology
> table is provided via ACPI.  I agree to prioritize one table over the
> other when there is overlap.  But in the end, it is the firmware that
> generates the tables.  Because it is provided via ACPI does not make it
> suddenly unreliable.  I think table info from e820/EFI/ACPI should be
> available at the same time.  To me, it makes more sense to use the
> hotplug info to initialize memblock than try to find a way to workaround
> without it.  

Yeah, agreed. But sigh.... on x86, we have ACPI initrd override, so we still
cannot convince Tj....

I think we will continue to be in that way to find a
> workaround in this direction. 
> 
> I came from ia64 background, and am not very familiar with x86.  So, you
> may be very right about that x86 is different.  I also agree that initrd
> is making it unnecessarily complicated.  We may see some initial issues,
> but my hope is that the code gets matured over the time.
> 
> Thanks,
> -Toshi
> 


-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 17:29                                           ` Zhang Yanfei
  0 siblings, 0 replies; 116+ messages in thread
From: Zhang Yanfei @ 2013-08-23 17:29 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Tejun Heo, Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw,
	lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hi Toshi,

On 08/24/2013 01:13 AM, Toshi Kani wrote:
> Hello,
> 
> On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote:
>> On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
>>> I still think acpi table info should be available earlier, but I do not
>>> think I can convince you on this.  This can be religious debate.
>>
>> I'm curious.  If there aren't substantial enough benefits, why would
>> you still want to pull it earlier when it brings in things like initrd
>> override and crafting the code carefully so that it's safe to execute
>> it from different address modes and so on?  Please note that x86 is
>> not ia64.  The early environment is completely different not only
>> technically but also in its diversity and suckiness.  It wasn't too
>> long ago that vendors were screwing up ACPI left and right.  It has
>> been getting better but there's a reason why, for example, we still
>> consider e820 to be the authoritative information over ACPI.
> 
> Firmware generates tables, and provides them via some interface.  Memory
> map table can be provided via e820 or EFI memory map.  Memory topology
> table is provided via ACPI.  I agree to prioritize one table over the
> other when there is overlap.  But in the end, it is the firmware that
> generates the tables.  Because it is provided via ACPI does not make it
> suddenly unreliable.  I think table info from e820/EFI/ACPI should be
> available at the same time.  To me, it makes more sense to use the
> hotplug info to initialize memblock than try to find a way to workaround
> without it.  

Yeah, agreed. But sigh.... on x86, we have ACPI initrd override, so we still
cannot convince Tj....

I think we will continue to be in that way to find a
> workaround in this direction. 
> 
> I came from ia64 background, and am not very familiar with x86.  So, you
> may be very right about that x86 is different.  I also agree that initrd
> is making it unnecessarily complicated.  We may see some initial issues,
> but my hope is that the code gets matured over the time.
> 
> Thanks,
> -Toshi
> 


-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 16:54                                       ` Zhang Yanfei
@ 2013-08-23 18:18                                         ` Yinghai Lu
  -1 siblings, 0 replies; 116+ messages in thread
From: Yinghai Lu @ 2013-08-23 18:18 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Toshi Kani, Tejun Heo, Tang Chen, Konrad Rzeszutek Wilk,
	Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Jiang Liu, Wen Congyang, Lai Jiangshan,
	Yasuaki Ishimatsu, Taku Izumi, Mel Gorman, Minchan Kim, mina86,
	gong.chen, Vasilis Liaskovitis

[trimmed the CC list, assume too long list will not go through LKML]

On Fri, Aug 23, 2013 at 9:54 AM, Zhang Yanfei <zhangyanfei.yes@gmail.com> wrote:



> By saying TJ's suggestion, you mean, we will let memblock to control the
> behaviour, that said, we will do early allocations near the kernel image
> range before we get the SRAT info?

put the acpi override table in BRK, we still need ok from HPA.
I have impression that he did not like it, so want to confirm from him.

>
> If so, yeah, we have been working on this direction. By doing this, we may
> have two main changes:
>
> 1. change some of memblock's APIs to make it have the ability to allocate
>    memory from low address.
> 2. setup kernel page table down-top. Concretely, we first map the memory
>    just after the kernel image to the top, then, we map 0 - kernel image end.

how about kexec/kdump ?

when load high with kexec/dump, the second kernel could be very high near
TOHM.

>
> Do you guys think this is reasonable and acceptable?

current boot flow that need to have all cpu and mem and pci discovered
are not scalable.

for numa system, we should boot system with cpu/mem/pci in PXM(X) only.
and assume that PXM are not hot-removed later.
Later during booting late stage hot add other PXM in parallel.

That case, we could reduce boot time, and also could solve other PXM
hotplug problem.

Thanks

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 18:18                                         ` Yinghai Lu
  0 siblings, 0 replies; 116+ messages in thread
From: Yinghai Lu @ 2013-08-23 18:18 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Toshi Kani, Tejun Heo, Tang Chen, Konrad Rzeszutek Wilk,
	Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Jiang Liu, Wen Congyang, Lai Jiangshan,
	Yasuaki Ishimatsu, Taku Izumi, Mel Gorman, Minchan Kim, mina86,
	gong.chen, Vasilis Liaskovitis, lwoodman, Rik van Riel, jweiner,
	Prarit Bhargava, Zhang Yanfei, yanghy, the arch/x86 maintainers,
	linux-doc, Linux Kernel Mailing List, Linux MM,
	ACPI Devel Maling List

[trimmed the CC list, assume too long list will not go through LKML]

On Fri, Aug 23, 2013 at 9:54 AM, Zhang Yanfei <zhangyanfei.yes@gmail.com> wrote:



> By saying TJ's suggestion, you mean, we will let memblock to control the
> behaviour, that said, we will do early allocations near the kernel image
> range before we get the SRAT info?

put the acpi override table in BRK, we still need ok from HPA.
I have impression that he did not like it, so want to confirm from him.

>
> If so, yeah, we have been working on this direction. By doing this, we may
> have two main changes:
>
> 1. change some of memblock's APIs to make it have the ability to allocate
>    memory from low address.
> 2. setup kernel page table down-top. Concretely, we first map the memory
>    just after the kernel image to the top, then, we map 0 - kernel image end.

how about kexec/kdump ?

when load high with kexec/dump, the second kernel could be very high near
TOHM.

>
> Do you guys think this is reasonable and acceptable?

current boot flow that need to have all cpu and mem and pci discovered
are not scalable.

for numa system, we should boot system with cpu/mem/pci in PXM(X) only.
and assume that PXM are not hot-removed later.
Later during booting late stage hot add other PXM in parallel.

That case, we could reduce boot time, and also could solve other PXM
hotplug problem.

Thanks

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 18:18                                         ` Yinghai Lu
  (?)
@ 2013-08-23 18:25                                         ` H. Peter Anvin
  2013-08-23 20:08                                           ` Yinghai Lu
  -1 siblings, 1 reply; 116+ messages in thread
From: H. Peter Anvin @ 2013-08-23 18:25 UTC (permalink / raw)
  To: Yinghai Lu, Zhang Yanfei
  Cc: Toshi Kani, Tejun Heo, Tang Chen, Konrad Rzeszutek Wilk,
	Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, Thomas Renninger,
	Jiang Liu, Wen Congyang, Lai Jiangshan, Yasuaki Ishimatsu,
	Taku Izumi, Mel Gorman, Minchan Kim, mina86, gong.chen,
	Vasilis Liaskovitis, lwoodman, Rik van Riel, jweiner,
	Prarit Bhargava, Zhang.Yanfei

<zhangyanfei@cn.fujitsu.com>,"yanghy@cn.fujitsu.com" <yanghy@cn.fujitsu.com>,the arch/x86 maintainers <x86@kernel.org>,"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,Linux MM <linux-mm@kvack.org>,ACPI Devel Maling List <linux-acpi@vger.kernel.org>
Message-ID: <aad28f9f-be10-43d8-bb98-e28d46101c44@email.android.com>

BRK makes sense as long as you can set a sane O(1) size limit.

Yinghai Lu <yinghai@kernel.org> wrote:
>[trimmed the CC list, assume too long list will not go through LKML]
>
>On Fri, Aug 23, 2013 at 9:54 AM, Zhang Yanfei
><zhangyanfei.yes@gmail.com> wrote:
>
>
>
>> By saying TJ's suggestion, you mean, we will let memblock to control
>the
>> behaviour, that said, we will do early allocations near the kernel
>image
>> range before we get the SRAT info?
>
>put the acpi override table in BRK, we still need ok from HPA.
>I have impression that he did not like it, so want to confirm from him.
>
>>
>> If so, yeah, we have been working on this direction. By doing this,
>we may
>> have two main changes:
>>
>> 1. change some of memblock's APIs to make it have the ability to
>allocate
>>    memory from low address.
>> 2. setup kernel page table down-top. Concretely, we first map the
>memory
>>    just after the kernel image to the top, then, we map 0 - kernel
>image end.
>
>how about kexec/kdump ?
>
>when load high with kexec/dump, the second kernel could be very high
>near
>TOHM.
>
>>
>> Do you guys think this is reasonable and acceptable?
>
>current boot flow that need to have all cpu and mem and pci discovered
>are not scalable.
>
>for numa system, we should boot system with cpu/mem/pci in PXM(X) only.
>and assume that PXM are not hot-removed later.
>Later during booting late stage hot add other PXM in parallel.
>
>That case, we could reduce boot time, and also could solve other PXM
>hotplug problem.
>
>Thanks
>
>Yinghai

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 16:54                                       ` Zhang Yanfei
@ 2013-08-23 18:29                                         ` Toshi Kani
  -1 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-23 18:29 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Tejun Heo, Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw,
	lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello Zhang,

On Sat, 2013-08-24 at 00:54 +0800, Zhang Yanfei wrote:
> > Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 
> > 
> 
> By saying TJ's suggestion, you mean, we will let memblock to control the
> behaviour, that said, we will do early allocations near the kernel image
> range before we get the SRAT info?

Right.

> If so, yeah, we have been working on this direction. 

Great!

> By doing this, we may
> have two main changes:
> 
> 1. change some of memblock's APIs to make it have the ability to allocate
>    memory from low address.
> 2. setup kernel page table down-top. Concretely, we first map the memory
>    just after the kernel image to the top, then, we map 0 - kernel image end.
> 
> Do you guys think this is reasonable and acceptable?

Have you also looked at Yinghai's comments below?

http://www.spinics.net/lists/linux-mm/msg61362.html

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 18:29                                         ` Toshi Kani
  0 siblings, 0 replies; 116+ messages in thread
From: Toshi Kani @ 2013-08-23 18:29 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Tejun Heo, Tang Chen, konrad.wilk, robert.moore, lv.zheng, rjw,
	lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu, wency,
	laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan, mina86,
	gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner, prarit,
	zhangyanfei, yanghy, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hello Zhang,

On Sat, 2013-08-24 at 00:54 +0800, Zhang Yanfei wrote:
> > Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 
> > 
> 
> By saying TJ's suggestion, you mean, we will let memblock to control the
> behaviour, that said, we will do early allocations near the kernel image
> range before we get the SRAT info?

Right.

> If so, yeah, we have been working on this direction. 

Great!

> By doing this, we may
> have two main changes:
> 
> 1. change some of memblock's APIs to make it have the ability to allocate
>    memory from low address.
> 2. setup kernel page table down-top. Concretely, we first map the memory
>    just after the kernel image to the top, then, we map 0 - kernel image end.
> 
> Do you guys think this is reasonable and acceptable?

Have you also looked at Yinghai's comments below?

http://www.spinics.net/lists/linux-mm/msg61362.html

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 18:25                                         ` H. Peter Anvin
@ 2013-08-23 20:08                                           ` Yinghai Lu
  2013-08-23 20:30                                             ` Russ Anderson
  2013-08-24  2:41                                             ` Russ Anderson
  0 siblings, 2 replies; 116+ messages in thread
From: Yinghai Lu @ 2013-08-23 20:08 UTC (permalink / raw)
  To: H. Peter Anvin, Russ Anderson
  Cc: Zhang Yanfei, Toshi Kani, Tejun Heo, Tang Chen,
	Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Ingo Molnar, Andrew Morton, Thomas Renninger, Yasuaki Ishimatsu,
	Mel Gorman, Linux Kernel Mailing List

On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> BRK makes sense as long as you can set a sane O(1) size limit.
>
>>
>>put the acpi override table in BRK, we still need ok from HPA.
>>I have impression that he did not like it, so want to confirm from him.

on 8 sockets system:
-rw-r--r-- 1 root root   3532 Aug 22 10:26 APIC.dat
-rw-r--r-- 1 root root     48 Aug 22 10:26 BDAT.dat
-rw-r--r-- 1 root root    824 Aug 22 10:26 DMAR.dat
-rw-r--r-- 1 root root  83509 Aug 22 10:26 DSDT.dat
-rw-r--r-- 1 root root    244 Aug 22 10:26 FACP.dat
-rw-r--r-- 1 root root     64 Aug 22 10:26 FACS.dat
-rw-r--r-- 1 root root     68 Aug 22 10:26 FPDT.dat
-rw-r--r-- 1 root root     56 Aug 22 10:26 HPET.dat
-rw-r--r-- 1 root root    304 Aug 22 10:26 MCEJ.dat
-rw-r--r-- 1 root root     60 Aug 22 10:26 MCFG.dat
-rw-r--r-- 1 root root   6712 Aug 22 10:26 MPST.dat
-rw-r--r-- 1 root root    232 Aug 22 10:26 MSCT.dat
-rw-r--r-- 1 root root    172 Aug 22 10:26 PCCT.dat
-rw-r--r-- 1 root root     96 Aug 22 10:26 PMCT.dat
-rw-r--r-- 1 root root     48 Aug 22 10:26 RASF.dat
-rw-r--r-- 1 root root    108 Aug 22 10:26 SLIT.dat
-rw-r--r-- 1 root root     80 Aug 22 10:26 SPCR.dat
-rw-r--r-- 1 root root     65 Aug 22 10:26 SPMI.dat
-rw-r--r-- 1 root root   6448 Aug 22 10:26 SRAT.dat
-rw-r--r-- 1 root root    100 Aug 22 10:26 SSDT1.dat
-rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat
-rw-r--r-- 1 root root     66 Aug 22 10:26 UEFI.dat
-rw-r--r-- 1 root root     64 Aug 22 10:26 WDDT.dat

assume for 32sockets will have four times bigger with DSDT and SSDT.
(with more pci and cpus)

So we can not have O(1) the size.

Russ, What is ACPI table size on your big machine?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 20:08                                           ` Yinghai Lu
@ 2013-08-23 20:30                                             ` Russ Anderson
  2013-08-23 20:48                                               ` Yinghai Lu
  2013-08-24  2:41                                             ` Russ Anderson
  1 sibling, 1 reply; 116+ messages in thread
From: Russ Anderson @ 2013-08-23 20:30 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Zhang Yanfei, Toshi Kani, Tejun Heo, Tang Chen,
	Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Ingo Molnar, Andrew Morton, Thomas Renninger, Yasuaki Ishimatsu,
	Mel Gorman, Linux Kernel Mailing List

On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > BRK makes sense as long as you can set a sane O(1) size limit.
> >
> >>
> >>put the acpi override table in BRK, we still need ok from HPA.
> >>I have impression that he did not like it, so want to confirm from him.
> 
> on 8 sockets system:
> -rw-r--r-- 1 root root   3532 Aug 22 10:26 APIC.dat
> -rw-r--r-- 1 root root     48 Aug 22 10:26 BDAT.dat
> -rw-r--r-- 1 root root    824 Aug 22 10:26 DMAR.dat
> -rw-r--r-- 1 root root  83509 Aug 22 10:26 DSDT.dat
> -rw-r--r-- 1 root root    244 Aug 22 10:26 FACP.dat
> -rw-r--r-- 1 root root     64 Aug 22 10:26 FACS.dat
> -rw-r--r-- 1 root root     68 Aug 22 10:26 FPDT.dat
> -rw-r--r-- 1 root root     56 Aug 22 10:26 HPET.dat
> -rw-r--r-- 1 root root    304 Aug 22 10:26 MCEJ.dat
> -rw-r--r-- 1 root root     60 Aug 22 10:26 MCFG.dat
> -rw-r--r-- 1 root root   6712 Aug 22 10:26 MPST.dat
> -rw-r--r-- 1 root root    232 Aug 22 10:26 MSCT.dat
> -rw-r--r-- 1 root root    172 Aug 22 10:26 PCCT.dat
> -rw-r--r-- 1 root root     96 Aug 22 10:26 PMCT.dat
> -rw-r--r-- 1 root root     48 Aug 22 10:26 RASF.dat
> -rw-r--r-- 1 root root    108 Aug 22 10:26 SLIT.dat
> -rw-r--r-- 1 root root     80 Aug 22 10:26 SPCR.dat
> -rw-r--r-- 1 root root     65 Aug 22 10:26 SPMI.dat
> -rw-r--r-- 1 root root   6448 Aug 22 10:26 SRAT.dat
> -rw-r--r-- 1 root root    100 Aug 22 10:26 SSDT1.dat
> -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat
> -rw-r--r-- 1 root root     66 Aug 22 10:26 UEFI.dat
> -rw-r--r-- 1 root root     64 Aug 22 10:26 WDDT.dat
> 
> assume for 32sockets will have four times bigger with DSDT and SSDT.
> (with more pci and cpus)
> 
> So we can not have O(1) the size.
> 
> Russ, What is ACPI table size on your big machine?

This is from a 256 socket 32TB system.

 Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 32501719MB)
 ACPI: RSDP 000000007ef3d014 00024 (v02 INTEL )
 ACPI: XSDT 000000007ef3d120 0007C (v01 INTEL  TIANO    00000000      01000013)
 ACPI: FACP 000000007ef3a000 000F4 (v04 INTEL  TIANO    00000000 MSFT 01000013)
 ACPI: DSDT 000000007e6c3000 7F493E (v02   SGI2      UVX 00000002 MSFT 01000013)
 ACPI: FACS 000000007d147000 00040
 ACPI: UEFI 000000007ef3c000 0012A (v01  INTEL  RstScuO 00000000      00000000)
 ACPI: UEFI 000000007ef3b000 0005C (v01  INTEL  RstScuV 00000000      00000000)
 ACPI: HPET 000000007ef39000 00038 (v01 INTEL  TIANO    00000001 MSFT 01000013)
 ACPI: SSDT 000000007ef33000 05352 (v02  INTEL ROSECITY 00000003 INTL 20070508)
 ACPI: SLIT 000000007ef10000 1002C (v01   SGI2      UVX 00000002 MSFT 00000001)
 ACPI: APIC 000000007eeee000 10070 (v03   SGI2      UVX 00000002 MSFT 00000001)
 ACPI: SRAT 000000007eeb8000 1A830 (v03   SGI2      UVX 00000002 MSFT 00000001)
 ACPI: MCFG 000000007d6d4000 0105C (v01   SGI2      UVX 00000002 MSFT 00000001)
 ACPI: SPCR 000000007e6c2000 00050 (v01                 00000000      00000000)
 ACPI: DMAR 000000007d6d3000 0013C (v01 INTEL  TIANO    00000001 MSFT 01000013)

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 18:18                                         ` Yinghai Lu
@ 2013-08-23 20:33                                           ` chen tang
  -1 siblings, 0 replies; 116+ messages in thread
From: chen tang @ 2013-08-23 20:33 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Zhang Yanfei, Toshi Kani, Tejun Heo, Tang Chen,
	Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Len Brown, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis

Hi Yinghai,

2013/8/24 Yinghai Lu <yinghai@kernel.org>:
......
>> Do you guys think this is reasonable and acceptable?
>
> current boot flow that need to have all cpu and mem and pci discovered
> are not scalable.
>
> for numa system, we should boot system with cpu/mem/pci in PXM(X) only.
> and assume that PXM are not hot-removed later.
> Later during booting late stage hot add other PXM in parallel.
>
> That case, we could reduce boot time, and also could solve other PXM
> hotplug problem.
>

This is a good point, I think. Actually, I had a similar thinking
before. This can
solve the hotplug issue, and also the local node page table problem.

I found that the current kernel will do the memory hot-add procedure too in
later boot sequence. And we could get the following message:

System RAM resource ...... cannot be added

This message was from :
add_memory()
  |->register_memory_resource()

because we have found and mapped all the memory in the system at early time.

But it is not easy to solve.
1. We still have to know how much memory and cpus the boot PXM has.
    How could we know that ?  SRAT again ?
2. The boot PXM could have little memory, and we meet the kexec and kdump
    problem again.
3. I'm not quite sure is there any important benefit that the kernel
initializes all
    the memory at beginning ?

And also, the memory hotplug schedule is very tough for us. We really want the
movablenode functionality could be available soon. And this idea could
be a long
way to go. So I also think this would be the next step.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 20:33                                           ` chen tang
  0 siblings, 0 replies; 116+ messages in thread
From: chen tang @ 2013-08-23 20:33 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Zhang Yanfei, Toshi Kani, Tejun Heo, Tang Chen,
	Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Len Brown, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy,
	the arch/x86 maintainers, linux-doc, Linux Kernel Mailing List,
	Linux MM, ACPI Devel Maling List

Hi Yinghai,

2013/8/24 Yinghai Lu <yinghai@kernel.org>:
......
>> Do you guys think this is reasonable and acceptable?
>
> current boot flow that need to have all cpu and mem and pci discovered
> are not scalable.
>
> for numa system, we should boot system with cpu/mem/pci in PXM(X) only.
> and assume that PXM are not hot-removed later.
> Later during booting late stage hot add other PXM in parallel.
>
> That case, we could reduce boot time, and also could solve other PXM
> hotplug problem.
>

This is a good point, I think. Actually, I had a similar thinking
before. This can
solve the hotplug issue, and also the local node page table problem.

I found that the current kernel will do the memory hot-add procedure too in
later boot sequence. And we could get the following message:

System RAM resource ...... cannot be added

This message was from :
add_memory()
  |->register_memory_resource()

because we have found and mapped all the memory in the system at early time.

But it is not easy to solve.
1. We still have to know how much memory and cpus the boot PXM has.
    How could we know that ?  SRAT again ?
2. The boot PXM could have little memory, and we meet the kexec and kdump
    problem again.
3. I'm not quite sure is there any important benefit that the kernel
initializes all
    the memory at beginning ?

And also, the memory hotplug schedule is very tough for us. We really want the
movablenode functionality could be available soon. And this idea could
be a long
way to go. So I also think this would be the next step.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 20:30                                             ` Russ Anderson
@ 2013-08-23 20:48                                               ` Yinghai Lu
  2013-08-23 21:50                                                 ` chen tang
  0 siblings, 1 reply; 116+ messages in thread
From: Yinghai Lu @ 2013-08-23 20:48 UTC (permalink / raw)
  To: Russ Anderson
  Cc: H. Peter Anvin, Zhang Yanfei, Toshi Kani, Tejun Heo, Tang Chen,
	Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Ingo Molnar, Andrew Morton, Thomas Renninger, Yasuaki Ishimatsu,
	Mel Gorman, Linux Kernel Mailing List

On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson <rja@sgi.com> wrote:
> On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
>> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin <hpa@zytor.com> wrote:

>> Russ, What is ACPI table size on your big machine?
>
> This is from a 256 socket 32TB system.
>
>  Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 32501719MB)
>  ACPI: RSDP 000000007ef3d014 00024 (v02 INTEL )
>  ACPI: XSDT 000000007ef3d120 0007C (v01 INTEL  TIANO    00000000      01000013)
>  ACPI: FACP 000000007ef3a000 000F4 (v04 INTEL  TIANO    00000000 MSFT 01000013)
>  ACPI: DSDT 000000007e6c3000 7F493E (v02   SGI2      UVX 00000002 MSFT 01000013)
>  ACPI: FACS 000000007d147000 00040
>  ACPI: UEFI 000000007ef3c000 0012A (v01  INTEL  RstScuO 00000000      00000000)
>  ACPI: UEFI 000000007ef3b000 0005C (v01  INTEL  RstScuV 00000000      00000000)
>  ACPI: HPET 000000007ef39000 00038 (v01 INTEL  TIANO    00000001 MSFT 01000013)
>  ACPI: SSDT 000000007ef33000 05352 (v02  INTEL ROSECITY 00000003 INTL 20070508)
>  ACPI: SLIT 000000007ef10000 1002C (v01   SGI2      UVX 00000002 MSFT 00000001)
>  ACPI: APIC 000000007eeee000 10070 (v03   SGI2      UVX 00000002 MSFT 00000001)
>  ACPI: SRAT 000000007eeb8000 1A830 (v03   SGI2      UVX 00000002 MSFT 00000001)
>  ACPI: MCFG 000000007d6d4000 0105C (v01   SGI2      UVX 00000002 MSFT 00000001)
>  ACPI: SPCR 000000007e6c2000 00050 (v01                 00000000      00000000)
>  ACPI: DMAR 000000007d6d3000 0013C (v01 INTEL  TIANO    00000001 MSFT 01000013)
>

so the DSDT is 7F493E, and total is more than 8M.

that will need BRK to be extended 16M?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 20:33                                           ` chen tang
@ 2013-08-23 21:08                                             ` Yinghai Lu
  -1 siblings, 0 replies; 116+ messages in thread
From: Yinghai Lu @ 2013-08-23 21:08 UTC (permalink / raw)
  To: chen tang
  Cc: Zhang Yanfei, Toshi Kani, Tejun Heo, Tang Chen,
	Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Len Brown, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis

> because we have found and mapped all the memory in the system at early time.
>
> But it is not easy to solve.
> 1. We still have to know how much memory and cpus the boot PXM has.
>     How could we know that ?  SRAT again ?

yes, need to parse numa info early, and only use RAM for PXM(X) at first.

> 2. The boot PXM could have little memory, and we meet the kexec and kdump
>     problem again.

could extend to have more PXM at first, like boot pxm mask.

> 3. I'm not quite sure is there any important benefit that the kernel
> initializes all
>     the memory at beginning ?
>
> And also, the memory hotplug schedule is very tough for us. We really want the
> movablenode functionality could be available soon. And this idea could
> be a long way to go. So I also think this would be the next step.

uh?
I already tried best to help. Even spent time to produce patchset
to separate acpi_override find/copy, and numa_info parse early
and put page tables on local nodes.

Let me know what else I could help.

Thanks

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 21:08                                             ` Yinghai Lu
  0 siblings, 0 replies; 116+ messages in thread
From: Yinghai Lu @ 2013-08-23 21:08 UTC (permalink / raw)
  To: chen tang
  Cc: Zhang Yanfei, Toshi Kani, Tejun Heo, Tang Chen,
	Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Len Brown, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy,
	the arch/x86 maintainers, linux-doc, Linux Kernel Mailing List,
	Linux MM, ACPI Devel Maling List

> because we have found and mapped all the memory in the system at early time.
>
> But it is not easy to solve.
> 1. We still have to know how much memory and cpus the boot PXM has.
>     How could we know that ?  SRAT again ?

yes, need to parse numa info early, and only use RAM for PXM(X) at first.

> 2. The boot PXM could have little memory, and we meet the kexec and kdump
>     problem again.

could extend to have more PXM at first, like boot pxm mask.

> 3. I'm not quite sure is there any important benefit that the kernel
> initializes all
>     the memory at beginning ?
>
> And also, the memory hotplug schedule is very tough for us. We really want the
> movablenode functionality could be available soon. And this idea could
> be a long way to go. So I also think this would be the next step.

uh?
I already tried best to help. Even spent time to produce patchset
to separate acpi_override find/copy, and numa_info parse early
and put page tables on local nodes.

Let me know what else I could help.

Thanks

Yinghai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 18:29                                         ` Toshi Kani
@ 2013-08-23 21:37                                           ` chen tang
  -1 siblings, 0 replies; 116+ messages in thread
From: chen tang @ 2013-08-23 21:37 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tejun Heo, Tang Chen, Konrad Rzeszutek Wilk,
	Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Yinghai Lu, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis

Hi Toshi, tj,

Really sorry for the delay. There were so many discussions in this thread and
it took me a lot of time to read and follow them.

2013/8/24 Toshi Kani <toshi.kani@hp.com>:
......
> On Sat, 2013-08-24 at 00:54 +0800, Zhang Yanfei wrote:
>> > Tang, what do you think?  Are you OK to try Tejun's suggestion as well?
>> >

We have been working on this for a while. And will send the patches soon.
We are trying this idea and will see how it goes.

>>
>> By saying TJ's suggestion, you mean, we will let memblock to control the
>> behaviour, that said, we will do early allocations near the kernel image
>> range before we get the SRAT info?
>
> Right.
>
>> If so, yeah, we have been working on this direction.
>
> Great!
>
>> By doing this, we may
>> have two main changes:
>>
>> 1. change some of memblock's APIs to make it have the ability to allocate
>>    memory from low address.
>> 2. setup kernel page table down-top. Concretely, we first map the memory
>>    just after the kernel image to the top, then, we map 0 - kernel image end.
>>
>> Do you guys think this is reasonable and acceptable?
>
> Have you also looked at Yinghai's comments below?
>
> http://www.spinics.net/lists/linux-mm/msg61362.html
>

We have read the comments from Yinghai. Reordering relocated_initrd and
reserve_crashkernel is doable, and the most difficult part is change the page
tables initialization logic. And as Zhang has mentioned above, we are not sure
if this could be acceptable.

Actually I also stand with Toshi that we should get SRAT earlier. This
will solve
memory hotplug issue, and also the following local page table problem.

And as tj concerned about the stability of the kernel boot sequence, then how
about this:

We don't do acpi_initrd_override() that early in head_32.S and head_64.c. We do
it after early_ioremap is available. I have mentioned this before.

With the help of early_ioremap, which is used by many others, we can copy all
override tables into memory. In my understanding, the page tables setup by
early_ioremap will work just like direct mapping page tables, right ?
Then we unmap
the memory, it is done. And we don't need to split the whole procedure
into fnd & copy.

So to tj, this approach won't affect acpica and very early boot
sequence. Are you OK
with this ?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 21:37                                           ` chen tang
  0 siblings, 0 replies; 116+ messages in thread
From: chen tang @ 2013-08-23 21:37 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Zhang Yanfei, Tejun Heo, Tang Chen, Konrad Rzeszutek Wilk,
	Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Yinghai Lu, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy,
	the arch/x86 maintainers, linux-doc, Linux Kernel Mailing List,
	Linux MM, ACPI Devel Maling List

Hi Toshi, tj,

Really sorry for the delay. There were so many discussions in this thread and
it took me a lot of time to read and follow them.

2013/8/24 Toshi Kani <toshi.kani@hp.com>:
......
> On Sat, 2013-08-24 at 00:54 +0800, Zhang Yanfei wrote:
>> > Tang, what do you think?  Are you OK to try Tejun's suggestion as well?
>> >

We have been working on this for a while. And will send the patches soon.
We are trying this idea and will see how it goes.

>>
>> By saying TJ's suggestion, you mean, we will let memblock to control the
>> behaviour, that said, we will do early allocations near the kernel image
>> range before we get the SRAT info?
>
> Right.
>
>> If so, yeah, we have been working on this direction.
>
> Great!
>
>> By doing this, we may
>> have two main changes:
>>
>> 1. change some of memblock's APIs to make it have the ability to allocate
>>    memory from low address.
>> 2. setup kernel page table down-top. Concretely, we first map the memory
>>    just after the kernel image to the top, then, we map 0 - kernel image end.
>>
>> Do you guys think this is reasonable and acceptable?
>
> Have you also looked at Yinghai's comments below?
>
> http://www.spinics.net/lists/linux-mm/msg61362.html
>

We have read the comments from Yinghai. Reordering relocated_initrd and
reserve_crashkernel is doable, and the most difficult part is change the page
tables initialization logic. And as Zhang has mentioned above, we are not sure
if this could be acceptable.

Actually I also stand with Toshi that we should get SRAT earlier. This
will solve
memory hotplug issue, and also the following local page table problem.

And as tj concerned about the stability of the kernel boot sequence, then how
about this:

We don't do acpi_initrd_override() that early in head_32.S and head_64.c. We do
it after early_ioremap is available. I have mentioned this before.

With the help of early_ioremap, which is used by many others, we can copy all
override tables into memory. In my understanding, the page tables setup by
early_ioremap will work just like direct mapping page tables, right ?
Then we unmap
the memory, it is done. And we don't need to split the whole procedure
into fnd & copy.

So to tj, this approach won't affect acpica and very early boot
sequence. Are you OK
with this ?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 20:48                                               ` Yinghai Lu
@ 2013-08-23 21:50                                                 ` chen tang
  2013-08-23 21:52                                                   ` Moore, Robert
  2013-08-23 22:08                                                   ` Yinghai Lu
  0 siblings, 2 replies; 116+ messages in thread
From: chen tang @ 2013-08-23 21:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Russ Anderson, H. Peter Anvin, Zhang Yanfei, Toshi Kani,
	Tejun Heo, Tang Chen, Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng,
	Rafael J. Wysocki, Ingo Molnar, Andrew Morton, Thomas Renninger,
	Yasuaki Ishimatsu, Mel Gorman, Linux Kernel Mailing List

Hi Yinghai,

2013/8/24 Yinghai Lu <yinghai@kernel.org>:
> On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson <rja@sgi.com> wrote:
>> On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
>>> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>
>>> Russ, What is ACPI table size on your big machine?
>>
>> This is from a 256 socket 32TB system.
>>
>>  Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 32501719MB)
>>  ACPI: RSDP 000000007ef3d014 00024 (v02 INTEL )
>>  ACPI: XSDT 000000007ef3d120 0007C (v01 INTEL  TIANO    00000000      01000013)
>>  ACPI: FACP 000000007ef3a000 000F4 (v04 INTEL  TIANO    00000000 MSFT 01000013)
>>  ACPI: DSDT 000000007e6c3000 7F493E (v02   SGI2      UVX 00000002 MSFT 01000013)
>>  ACPI: FACS 000000007d147000 00040
>>  ACPI: UEFI 000000007ef3c000 0012A (v01  INTEL  RstScuO 00000000      00000000)
>>  ACPI: UEFI 000000007ef3b000 0005C (v01  INTEL  RstScuV 00000000      00000000)
>>  ACPI: HPET 000000007ef39000 00038 (v01 INTEL  TIANO    00000001 MSFT 01000013)
>>  ACPI: SSDT 000000007ef33000 05352 (v02  INTEL ROSECITY 00000003 INTL 20070508)
>>  ACPI: SLIT 000000007ef10000 1002C (v01   SGI2      UVX 00000002 MSFT 00000001)
>>  ACPI: APIC 000000007eeee000 10070 (v03   SGI2      UVX 00000002 MSFT 00000001)
>>  ACPI: SRAT 000000007eeb8000 1A830 (v03   SGI2      UVX 00000002 MSFT 00000001)
>>  ACPI: MCFG 000000007d6d4000 0105C (v01   SGI2      UVX 00000002 MSFT 00000001)
>>  ACPI: SPCR 000000007e6c2000 00050 (v01                 00000000      00000000)
>>  ACPI: DMAR 000000007d6d3000 0013C (v01 INTEL  TIANO    00000001 MSFT 01000013)
>>
>
> so the DSDT is 7F493E, and total is more than 8M.
>
> that will need BRK to be extended 16M?
>

Then how about use early_ioremap(), and don't do it that early in
head_32 and head64 ?

Thanks.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 21:50                                                 ` chen tang
@ 2013-08-23 21:52                                                   ` Moore, Robert
  2013-08-23 22:05                                                     ` Yinghai Lu
  2013-08-23 22:08                                                   ` Yinghai Lu
  1 sibling, 1 reply; 116+ messages in thread
From: Moore, Robert @ 2013-08-23 21:52 UTC (permalink / raw)
  To: chen tang, Yinghai Lu
  Cc: Russ Anderson, H. Peter Anvin, Zhang Yanfei, Toshi Kani,
	Tejun Heo, Tang Chen, Konrad Rzeszutek Wilk, Zheng, Lv,
	Rafael J. Wysocki, Ingo Molnar, Andrew Morton, Thomas Renninger,
	Yasuaki Ishimatsu, Mel Gorman, Linux Kernel Mailing List

While we're at it:

Can someone send me the acpidump for this machine? We very much would like to test all of ACPICA with such a large DSDT.

Thanks,
Bob


> -----Original Message-----
> From: chen tang [mailto:imtangchen@gmail.com]
> Sent: Friday, August 23, 2013 2:51 PM
> To: Yinghai Lu
> Cc: Russ Anderson; H. Peter Anvin; Zhang Yanfei; Toshi Kani; Tejun Heo;
> Tang Chen; Konrad Rzeszutek Wilk; Moore, Robert; Zheng, Lv; Rafael J.
> Wysocki; Ingo Molnar; Andrew Morton; Thomas Renninger; Yasuaki Ishimatsu;
> Mel Gorman; Linux Kernel Mailing List
> Subject: Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
> 
> Hi Yinghai,
> 
> 2013/8/24 Yinghai Lu <yinghai@kernel.org>:
> > On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson <rja@sgi.com> wrote:
> >> On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
> >>> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin <hpa@zytor.com>
> wrote:
> >
> >>> Russ, What is ACPI table size on your big machine?
> >>
> >> This is from a 256 socket 32TB system.
> >>
> >>  Reserving 256MB of memory at 66973408MB for crashkernel (System RAM:
> 32501719MB)
> >>  ACPI: RSDP 000000007ef3d014 00024 (v02 INTEL )
> >>  ACPI: XSDT 000000007ef3d120 0007C (v01 INTEL  TIANO    00000000
> 01000013)
> >>  ACPI: FACP 000000007ef3a000 000F4 (v04 INTEL  TIANO    00000000 MSFT
> 01000013)
> >>  ACPI: DSDT 000000007e6c3000 7F493E (v02   SGI2      UVX 00000002 MSFT
> 01000013)
> >>  ACPI: FACS 000000007d147000 00040
> >>  ACPI: UEFI 000000007ef3c000 0012A (v01  INTEL  RstScuO 00000000
> 00000000)
> >>  ACPI: UEFI 000000007ef3b000 0005C (v01  INTEL  RstScuV 00000000
> 00000000)
> >>  ACPI: HPET 000000007ef39000 00038 (v01 INTEL  TIANO    00000001 MSFT
> 01000013)
> >>  ACPI: SSDT 000000007ef33000 05352 (v02  INTEL ROSECITY 00000003 INTL
> 20070508)
> >>  ACPI: SLIT 000000007ef10000 1002C (v01   SGI2      UVX 00000002 MSFT
> 00000001)
> >>  ACPI: APIC 000000007eeee000 10070 (v03   SGI2      UVX 00000002 MSFT
> 00000001)
> >>  ACPI: SRAT 000000007eeb8000 1A830 (v03   SGI2      UVX 00000002 MSFT
> 00000001)
> >>  ACPI: MCFG 000000007d6d4000 0105C (v01   SGI2      UVX 00000002 MSFT
> 00000001)
> >>  ACPI: SPCR 000000007e6c2000 00050 (v01                 00000000
> 00000000)
> >>  ACPI: DMAR 000000007d6d3000 0013C (v01 INTEL  TIANO    00000001 MSFT
> 01000013)
> >>
> >
> > so the DSDT is 7F493E, and total is more than 8M.
> >
> > that will need BRK to be extended 16M?
> >
> 
> Then how about use early_ioremap(), and don't do it that early in
> head_32 and head64 ?
> 
> Thanks.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 21:37                                           ` chen tang
@ 2013-08-23 21:52                                             ` Tejun Heo
  -1 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 21:52 UTC (permalink / raw)
  To: chen tang
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, Konrad Rzeszutek Wilk,
	Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Yinghai Lu, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis

Hello,

On Sat, Aug 24, 2013 at 05:37:48AM +0800, chen tang wrote:
> We have read the comments from Yinghai. Reordering relocated_initrd and
> reserve_crashkernel is doable, and the most difficult part is change the page
> tables initialization logic. And as Zhang has mentioned above, we are not sure
> if this could be acceptable.

Maybe I'm missing something but why is that so hard?  All it does is
allocating memory in a different place.  Why is that so complicated?
Can somebody please elaborate the issues here?  If it is actually
hairy, where does the hairiness come from?  Is it an inherent problem
or just an issue with how the code is organized currently?

> Actually I also stand with Toshi that we should get SRAT earlier. This
> will solve
> memory hotplug issue, and also the following local page table problem.

Do you mind responding to the points raised in the discussion?  I
really don't wnat to repeat the whole discussion anew and your
statements don't really add anything new.

> And as tj concerned about the stability of the kernel boot sequence, then how
> about this:

I guess my answer remains the same.  Why?  What actual benefits does
doing so buy us and why is changing the allocation direction, which
conceptually is extremely simple, so complicated?  What you guys are
trying to do adds significant amount of complexity and convolution,
which in itself doesn't necessarily disqualify the changes but it
needs good enough justifications.

I get that you guys want it but I still fail to see why.  It *can't*
be proper solution to the hotplug issue.  We don't want earlyprintk to
involve huge chunk of logic and the benefits of node-affine page
tables for kernel linear mapping seem dubious.  So, what do we gain by
doing this?  What am I missing here?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 21:52                                             ` Tejun Heo
  0 siblings, 0 replies; 116+ messages in thread
From: Tejun Heo @ 2013-08-23 21:52 UTC (permalink / raw)
  To: chen tang
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, Konrad Rzeszutek Wilk,
	Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Yinghai Lu, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy,
	the arch/x86 maintainers, linux-doc, Linux Kernel Mailing List,
	Linux MM, ACPI Devel Maling List

Hello,

On Sat, Aug 24, 2013 at 05:37:48AM +0800, chen tang wrote:
> We have read the comments from Yinghai. Reordering relocated_initrd and
> reserve_crashkernel is doable, and the most difficult part is change the page
> tables initialization logic. And as Zhang has mentioned above, we are not sure
> if this could be acceptable.

Maybe I'm missing something but why is that so hard?  All it does is
allocating memory in a different place.  Why is that so complicated?
Can somebody please elaborate the issues here?  If it is actually
hairy, where does the hairiness come from?  Is it an inherent problem
or just an issue with how the code is organized currently?

> Actually I also stand with Toshi that we should get SRAT earlier. This
> will solve
> memory hotplug issue, and also the following local page table problem.

Do you mind responding to the points raised in the discussion?  I
really don't wnat to repeat the whole discussion anew and your
statements don't really add anything new.

> And as tj concerned about the stability of the kernel boot sequence, then how
> about this:

I guess my answer remains the same.  Why?  What actual benefits does
doing so buy us and why is changing the allocation direction, which
conceptually is extremely simple, so complicated?  What you guys are
trying to do adds significant amount of complexity and convolution,
which in itself doesn't necessarily disqualify the changes but it
needs good enough justifications.

I get that you guys want it but I still fail to see why.  It *can't*
be proper solution to the hotplug issue.  We don't want earlyprintk to
involve huge chunk of logic and the benefits of node-affine page
tables for kernel linear mapping seem dubious.  So, what do we gain by
doing this?  What am I missing here?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 21:52                                                   ` Moore, Robert
@ 2013-08-23 22:05                                                     ` Yinghai Lu
  0 siblings, 0 replies; 116+ messages in thread
From: Yinghai Lu @ 2013-08-23 22:05 UTC (permalink / raw)
  To: Moore, Robert, Russ Anderson
  Cc: chen tang, H. Peter Anvin, Zhang Yanfei, Toshi Kani, Tejun Heo,
	Tang Chen, Konrad Rzeszutek Wilk, Zheng, Lv, Rafael J. Wysocki,
	Ingo Molnar, Andrew Morton, Thomas Renninger, Yasuaki Ishimatsu,
	Mel Gorman, Linux Kernel Mailing List

On Fri, Aug 23, 2013 at 2:52 PM, Moore, Robert <robert.moore@intel.com> wrote:
> While we're at it:
>
> Can someone send me the acpidump for this machine? We very much would like to test all of ACPICA with such a large DSDT.

That is Russ.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 21:50                                                 ` chen tang
  2013-08-23 21:52                                                   ` Moore, Robert
@ 2013-08-23 22:08                                                   ` Yinghai Lu
  2013-08-23 22:40                                                     ` chen tang
  1 sibling, 1 reply; 116+ messages in thread
From: Yinghai Lu @ 2013-08-23 22:08 UTC (permalink / raw)
  To: chen tang
  Cc: Russ Anderson, H. Peter Anvin, Zhang Yanfei, Toshi Kani,
	Tejun Heo, Tang Chen, Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng,
	Rafael J. Wysocki, Ingo Molnar, Andrew Morton, Thomas Renninger,
	Yasuaki Ishimatsu, Mel Gorman, Linux Kernel Mailing List

On Fri, Aug 23, 2013 at 2:50 PM, chen tang <imtangchen@gmail.com> wrote:
>>
>> so the DSDT is 7F493E, and total is more than 8M.
>>
>> that will need BRK to be extended 16M?
>>
>
> Then how about use early_ioremap(), and don't do it that early in
> head_32 and head64 ?

why could early_ioremap() help?

when to use early_ioremap()? what for?

Yinghai

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 21:08                                             ` Yinghai Lu
@ 2013-08-23 22:27                                               ` chen tang
  -1 siblings, 0 replies; 116+ messages in thread
From: chen tang @ 2013-08-23 22:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Zhang Yanfei, Toshi Kani, Tejun Heo, Tang Chen,
	Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Len Brown, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis

Hi Yinghai,

2013/8/24 Yinghai Lu <yinghai@kernel.org>:
......
>
>> 3. I'm not quite sure is there any important benefit that the kernel
>> initializes all
>>     the memory at beginning ?
>>
>> And also, the memory hotplug schedule is very tough for us. We really want the
>> movablenode functionality could be available soon. And this idea could
>> be a long way to go. So I also think this would be the next step.
>
> uh?
> I already tried best to help. Even spent time to produce patchset
> to separate acpi_override find/copy, and numa_info parse early
> and put page tables on local nodes.
>
> Let me know what else I could help.
>

Yes, we are very appreciated your help. We can see all the people working on
this topic are very helpful. :)

I'm just saying the "boot with pxm(x), and hot-add the rest" idea could be a
long way and need more discussion and confirmation.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 22:27                                               ` chen tang
  0 siblings, 0 replies; 116+ messages in thread
From: chen tang @ 2013-08-23 22:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Zhang Yanfei, Toshi Kani, Tejun Heo, Tang Chen,
	Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Len Brown, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Thomas Renninger, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy,
	the arch/x86 maintainers, linux-doc, Linux Kernel Mailing List,
	Linux MM, ACPI Devel Maling List

Hi Yinghai,

2013/8/24 Yinghai Lu <yinghai@kernel.org>:
......
>
>> 3. I'm not quite sure is there any important benefit that the kernel
>> initializes all
>>     the memory at beginning ?
>>
>> And also, the memory hotplug schedule is very tough for us. We really want the
>> movablenode functionality could be available soon. And this idea could
>> be a long way to go. So I also think this would be the next step.
>
> uh?
> I already tried best to help. Even spent time to produce patchset
> to separate acpi_override find/copy, and numa_info parse early
> and put page tables on local nodes.
>
> Let me know what else I could help.
>

Yes, we are very appreciated your help. We can see all the people working on
this topic are very helpful. :)

I'm just saying the "boot with pxm(x), and hot-add the rest" idea could be a
long way and need more discussion and confirmation.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 22:08                                                   ` Yinghai Lu
@ 2013-08-23 22:40                                                     ` chen tang
  2013-08-23 23:04                                                       ` Yinghai Lu
  0 siblings, 1 reply; 116+ messages in thread
From: chen tang @ 2013-08-23 22:40 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Russ Anderson, H. Peter Anvin, Zhang Yanfei, Toshi Kani,
	Tejun Heo, Tang Chen, Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng,
	Rafael J. Wysocki, Ingo Molnar, Andrew Morton, Thomas Renninger,
	Yasuaki Ishimatsu, Mel Gorman, Linux Kernel Mailing List

Hi Yinghai,

2013/8/24 Yinghai Lu <yinghai@kernel.org>:
> On Fri, Aug 23, 2013 at 2:50 PM, chen tang <imtangchen@gmail.com> wrote:
>>>
>>> so the DSDT is 7F493E, and total is more than 8M.
>>>
>>> that will need BRK to be extended 16M?
>>>
>>
>> Then how about use early_ioremap(), and don't do it that early in
>> head_32 and head64 ?
>
> why could early_ioremap() help?
>
> when to use early_ioremap()? what for?
>

In my understanding, acpica framework needs users to copy the override tables
somewhere in the memory. And acpica will get these user specified tables when
installing firmware tables. This is the acpica logic, which cannot be
changed, I think.

So we need to allocate memory. That is why you suggested to use BRK, right ?
And the size seems to be a problem.

So I suggest to use early_ioremap().

1. After paging is enabled, before direct mapping page tables are
setup, we map the
    initrd with early_ioremap(). And we are able to access it with va,
even on 32bit.
    Then we can find all tables.
2. We still use memblock to allocate memory. Maybe it will be
hotpluggable memory,
    but this memory can be freed when all the acpi tables are parsed, right ?

So I want to try early_ioremap(). All these should be done in setup_arch().

Thanks.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 22:40                                                     ` chen tang
@ 2013-08-23 23:04                                                       ` Yinghai Lu
  0 siblings, 0 replies; 116+ messages in thread
From: Yinghai Lu @ 2013-08-23 23:04 UTC (permalink / raw)
  To: chen tang
  Cc: Russ Anderson, H. Peter Anvin, Zhang Yanfei, Toshi Kani,
	Tejun Heo, Tang Chen, Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng,
	Rafael J. Wysocki, Ingo Molnar, Andrew Morton, Thomas Renninger,
	Yasuaki Ishimatsu, Mel Gorman, Linux Kernel Mailing List

> So we need to allocate memory. That is why you suggested to use BRK, right ?
> And the size seems to be a problem.
>
> So I suggest to use early_ioremap().
>
> 1. After paging is enabled, before direct mapping page tables are
> setup, we map the
>     initrd with early_ioremap(). And we are able to access it with va,
> even on 32bit.
>     Then we can find all tables.
> 2. We still use memblock to allocate memory. Maybe it will be
> hotpluggable memory,
>     but this memory can be freed when all the acpi tables are parsed, right ?
>
> So I want to try early_ioremap(). All these should be done in setup_arch().

no.
cpio search need to take whole range virtual address,
and early_ioremap has size limitation.
you will have to update cpio search to take mapping function.
could be too messy.

Yinghai

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 21:52                                             ` Tejun Heo
@ 2013-08-23 23:56                                               ` chen tang
  -1 siblings, 0 replies; 116+ messages in thread
From: chen tang @ 2013-08-23 23:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, Konrad Rzeszutek Wilk,
	Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Yinghai Lu, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis

Hi tj,

2013/8/24 Tejun Heo <tj@kernel.org>:
> Hello,
>
> On Sat, Aug 24, 2013 at 05:37:48AM +0800, chen tang wrote:
>> We have read the comments from Yinghai. Reordering relocated_initrd and
>> reserve_crashkernel is doable, and the most difficult part is change the page
>> tables initialization logic. And as Zhang has mentioned above, we are not sure
>> if this could be acceptable.
>
> Maybe I'm missing something but why is that so hard?  All it does is
> allocating memory in a different place.  Why is that so complicated?
> Can somebody please elaborate the issues here?  If it is actually
> hairy, where does the hairiness come from?  Is it an inherent problem
> or just an issue with how the code is organized currently?

Your idea is doable, and we are doing it now. I think it is just an issue
with how to organize the code.

The main problem is like Yinghai said, memory hotplug kernel and regular
kernel won't be able to share the code. We use a boot option to control it.

And I'll send a patch-set next week and then we can see how it goes.

......
>> And as tj concerned about the stability of the kernel boot sequence, then how
>> about this:
>
> I guess my answer remains the same.  Why?  What actual benefits does
> doing so buy us and why is changing the allocation direction, which
> conceptually is extremely simple, so complicated?  What you guys are
> trying to do adds significant amount of complexity and convolution,
> which in itself doesn't necessarily disqualify the changes but it
> needs good enough justifications.
>
> I get that you guys want it but I still fail to see why.  It *can't*
> be proper solution to the hotplug issue.  We don't want earlyprintk to
> involve huge chunk of logic and the benefits of node-affine page
> tables for kernel linear mapping seem dubious.  So, what do we gain by
> doing this?  What am I missing here?
>

Sorry for the unclear description. To summaries, the whole picture is:

We are going to provide a system with some nodes hotpluggable, not all.
These nodes are movable nodes, with only ZONE_MOVABLE. So we need
a way to ensure the movable nodes have ZONE_MOVABLE only.

We want SRAT earlier because before it is parsed, memblock starts to
allocate memory for kernel, which won't be in ZONE_MOVABLE.

It is just that simple. You are not missing anything here, I think.

My suggestion is parsing SRAT earlier:
1. The nodes kernel resides in will be un-movable.
2. The nodes with un-hotpluggable memory in SRAT will be un-movable.
3. The nodes full of hotpluggable memory in SRAT will be movable nodes.

No offence, but seeing from me, it is a solution. The only thing it could buy
me is that I can achieve the goal. I was trying my best to avoid bad influence
to other code.
And other things, early_printk, and local node page table are some other
cases that also need acpi tables earlier.

That is exactly all the things I want to do. That is all I have, and I
don't have
anything new here.

I read the whole series, and I understand your points:
1. Even if move SRAT earlier, it is too late. It should be done in
bootloader and
    kexec/kdump.
2. Doing so will make boot sequence more complicated and error-prone.
3. early_printk and local node page table won't by us so much when
using huge page.

...maybe some other points, I won't elaborate them all.

Since we have discussed so much, I don't want debate more. We are working on
patch-set according to your idea, and I also agree it is doable. I
have no objection to
your idea. Sending this patch-set is just one of my another try. So,
let's see how
it will go next week. :)

Thank you all for your comments and patient. :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
@ 2013-08-23 23:56                                               ` chen tang
  0 siblings, 0 replies; 116+ messages in thread
From: chen tang @ 2013-08-23 23:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Toshi Kani, Zhang Yanfei, Tang Chen, Konrad Rzeszutek Wilk,
	Bob Moore, Lv Zheng, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Thomas Renninger, Yinghai Lu, Jiang Liu, Wen Congyang,
	Lai Jiangshan, Yasuaki Ishimatsu, Taku Izumi, Mel Gorman,
	Minchan Kim, mina86, gong.chen, Vasilis Liaskovitis, lwoodman,
	Rik van Riel, jweiner, Prarit Bhargava, Zhang Yanfei, yanghy,
	the arch/x86 maintainers, linux-doc, Linux Kernel Mailing List,
	Linux MM, ACPI Devel Maling List

Hi tj,

2013/8/24 Tejun Heo <tj@kernel.org>:
> Hello,
>
> On Sat, Aug 24, 2013 at 05:37:48AM +0800, chen tang wrote:
>> We have read the comments from Yinghai. Reordering relocated_initrd and
>> reserve_crashkernel is doable, and the most difficult part is change the page
>> tables initialization logic. And as Zhang has mentioned above, we are not sure
>> if this could be acceptable.
>
> Maybe I'm missing something but why is that so hard?  All it does is
> allocating memory in a different place.  Why is that so complicated?
> Can somebody please elaborate the issues here?  If it is actually
> hairy, where does the hairiness come from?  Is it an inherent problem
> or just an issue with how the code is organized currently?

Your idea is doable, and we are doing it now. I think it is just an issue
with how to organize the code.

The main problem is like Yinghai said, memory hotplug kernel and regular
kernel won't be able to share the code. We use a boot option to control it.

And I'll send a patch-set next week and then we can see how it goes.

......
>> And as tj concerned about the stability of the kernel boot sequence, then how
>> about this:
>
> I guess my answer remains the same.  Why?  What actual benefits does
> doing so buy us and why is changing the allocation direction, which
> conceptually is extremely simple, so complicated?  What you guys are
> trying to do adds significant amount of complexity and convolution,
> which in itself doesn't necessarily disqualify the changes but it
> needs good enough justifications.
>
> I get that you guys want it but I still fail to see why.  It *can't*
> be proper solution to the hotplug issue.  We don't want earlyprintk to
> involve huge chunk of logic and the benefits of node-affine page
> tables for kernel linear mapping seem dubious.  So, what do we gain by
> doing this?  What am I missing here?
>

Sorry for the unclear description. To summaries, the whole picture is:

We are going to provide a system with some nodes hotpluggable, not all.
These nodes are movable nodes, with only ZONE_MOVABLE. So we need
a way to ensure the movable nodes have ZONE_MOVABLE only.

We want SRAT earlier because before it is parsed, memblock starts to
allocate memory for kernel, which won't be in ZONE_MOVABLE.

It is just that simple. You are not missing anything here, I think.

My suggestion is parsing SRAT earlier:
1. The nodes kernel resides in will be un-movable.
2. The nodes with un-hotpluggable memory in SRAT will be un-movable.
3. The nodes full of hotpluggable memory in SRAT will be movable nodes.

No offence, but seeing from me, it is a solution. The only thing it could buy
me is that I can achieve the goal. I was trying my best to avoid bad influence
to other code.
And other things, early_printk, and local node page table are some other
cases that also need acpi tables earlier.

That is exactly all the things I want to do. That is all I have, and I
don't have
anything new here.

I read the whole series, and I understand your points:
1. Even if move SRAT earlier, it is too late. It should be done in
bootloader and
    kexec/kdump.
2. Doing so will make boot sequence more complicated and error-prone.
3. early_printk and local node page table won't by us so much when
using huge page.

...maybe some other points, I won't elaborate them all.

Since we have discussed so much, I don't want debate more. We are working on
patch-set according to your idea, and I also agree it is doable. I
have no objection to
your idea. Sending this patch-set is just one of my another try. So,
let's see how
it will go next week. :)

Thank you all for your comments and patient. :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
  2013-08-23 20:08                                           ` Yinghai Lu
  2013-08-23 20:30                                             ` Russ Anderson
@ 2013-08-24  2:41                                             ` Russ Anderson
  1 sibling, 0 replies; 116+ messages in thread
From: Russ Anderson @ 2013-08-24  2:41 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Zhang Yanfei, Toshi Kani, Tejun Heo, Tang Chen,
	Konrad Rzeszutek Wilk, Bob Moore, Lv Zheng, Rafael J. Wysocki,
	Ingo Molnar, Andrew Morton, Thomas Renninger, Yasuaki Ishimatsu,
	Mel Gorman, Linux Kernel Mailing List

On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > BRK makes sense as long as you can set a sane O(1) size limit.
> >
> >>
> >>put the acpi override table in BRK, we still need ok from HPA.
> >>I have impression that he did not like it, so want to confirm from him.
> 
> on 8 sockets system:
> -rw-r--r-- 1 root root   3532 Aug 22 10:26 APIC.dat
> -rw-r--r-- 1 root root     48 Aug 22 10:26 BDAT.dat
> -rw-r--r-- 1 root root    824 Aug 22 10:26 DMAR.dat
> -rw-r--r-- 1 root root  83509 Aug 22 10:26 DSDT.dat
> -rw-r--r-- 1 root root    244 Aug 22 10:26 FACP.dat
> -rw-r--r-- 1 root root     64 Aug 22 10:26 FACS.dat
> -rw-r--r-- 1 root root     68 Aug 22 10:26 FPDT.dat
> -rw-r--r-- 1 root root     56 Aug 22 10:26 HPET.dat
> -rw-r--r-- 1 root root    304 Aug 22 10:26 MCEJ.dat
> -rw-r--r-- 1 root root     60 Aug 22 10:26 MCFG.dat
> -rw-r--r-- 1 root root   6712 Aug 22 10:26 MPST.dat
> -rw-r--r-- 1 root root    232 Aug 22 10:26 MSCT.dat
> -rw-r--r-- 1 root root    172 Aug 22 10:26 PCCT.dat
> -rw-r--r-- 1 root root     96 Aug 22 10:26 PMCT.dat
> -rw-r--r-- 1 root root     48 Aug 22 10:26 RASF.dat
> -rw-r--r-- 1 root root    108 Aug 22 10:26 SLIT.dat
> -rw-r--r-- 1 root root     80 Aug 22 10:26 SPCR.dat
> -rw-r--r-- 1 root root     65 Aug 22 10:26 SPMI.dat
> -rw-r--r-- 1 root root   6448 Aug 22 10:26 SRAT.dat
> -rw-r--r-- 1 root root    100 Aug 22 10:26 SSDT1.dat
> -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat
> -rw-r--r-- 1 root root     66 Aug 22 10:26 UEFI.dat
> -rw-r--r-- 1 root root     64 Aug 22 10:26 WDDT.dat
> 
> assume for 32sockets will have four times bigger with DSDT and SSDT.
> (with more pci and cpus)
> 
> So we can not have O(1) the size.
> 
> Russ, What is ACPI table size on your big machine?

This is from a 255 socket, 4080 cpu, 15TB system.

-------------------------------------------------------
-rw-r--r-- 1 root root   65392 Aug 23 21:23 apic.dat
-rw-r--r-- 1 root root     316 Aug 23 21:23 dmar.dat
-rw-r--r-- 1 root root 8309249 Aug 23 21:23 dsdt.dat
-rw-r--r-- 1 root root     244 Aug 23 21:23 facp.dat
-rw-r--r-- 1 root root      64 Aug 23 21:23 facs.dat
-rw-r--r-- 1 root root      56 Aug 23 21:23 hpet.dat
-rw-r--r-- 1 root root    4172 Aug 23 21:23 mcfg.dat
-rw-r--r-- 1 root root      36 Aug 23 21:23 rsdp.dat
-rw-r--r-- 1 root root      80 Aug 23 21:23 rsdt.dat
-rw-r--r-- 1 root root   65069 Aug 23 21:23 slit.dat
-rw-r--r-- 1 root root      80 Aug 23 21:23 spcr.dat
-rw-r--r-- 1 root root  108168 Aug 23 21:23 srat.dat
-rw-r--r-- 1 root root   21330 Aug 23 21:23 ssdt.dat
-rw-r--r-- 1 root root      92 Aug 23 21:23 uefi1.dat
-rw-r--r-- 1 root root     298 Aug 23 21:23 uefi.dat
-rw-r--r-- 1 root root     124 Aug 23 21:23 xsdt.dat
-------------------------------------------------------

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 116+ messages in thread

end of thread, other threads:[~2013-08-24  2:41 UTC | newest]

Thread overview: 116+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-21 10:15 [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier Tang Chen
2013-08-21 10:15 ` Tang Chen
2013-08-21 10:15 ` [PATCH 1/8] x86: Make get_ramdisk_{image|size}() global Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 2/8] x86, microcode: Use get_ramdisk_{image|size}() in microcode handling Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 3/8] x86, acpi: Move table_sigs[] to stack Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 4/8] x86, acpi, brk: Extend BRK 256KB to store acpi override tables Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 12:26   ` Konrad Rzeszutek Wilk
2013-08-21 12:26     ` Konrad Rzeszutek Wilk
2013-08-21 12:35     ` H. Peter Anvin
2013-08-21 12:35       ` H. Peter Anvin
2013-08-21 14:42       ` Konrad Rzeszutek Wilk
2013-08-21 14:42         ` Konrad Rzeszutek Wilk
2013-08-21 15:04         ` H. Peter Anvin
2013-08-21 15:04           ` H. Peter Anvin
2013-08-21 10:15 ` [PATCH 6/8] x86, acpi: Make acpi_initrd_override() available with va or pa Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 7/8] x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available with va/pa Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 8/8] x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:42 ` [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier Tang Chen
2013-08-21 10:42   ` Tang Chen
2013-08-21 13:06 ` Tejun Heo
2013-08-21 13:06   ` Tejun Heo
2013-08-21 15:00   ` Zhang Yanfei
2013-08-21 15:00     ` Zhang Yanfei
2013-08-21 15:36     ` Tejun Heo
2013-08-21 15:36       ` Tejun Heo
2013-08-21 19:31       ` Toshi Kani
2013-08-21 19:31         ` Toshi Kani
2013-08-21 19:54         ` Tejun Heo
2013-08-21 19:54           ` Tejun Heo
2013-08-21 20:29           ` Toshi Kani
2013-08-21 20:29             ` Toshi Kani
2013-08-21 20:40             ` Tejun Heo
2013-08-21 20:40               ` Tejun Heo
2013-08-21 22:36               ` Toshi Kani
2013-08-21 22:36                 ` Toshi Kani
2013-08-22  3:32                 ` Tejun Heo
2013-08-22  3:32                   ` Tejun Heo
2013-08-22 15:52                   ` Toshi Kani
2013-08-22 15:52                     ` Toshi Kani
2013-08-22 18:31                     ` Tejun Heo
2013-08-22 18:31                       ` Tejun Heo
2013-08-22 19:39                       ` Zhang Yanfei
2013-08-22 19:39                         ` Zhang Yanfei
2013-08-22 19:45                         ` Tejun Heo
2013-08-22 19:45                           ` Tejun Heo
2013-08-22 20:11                       ` Toshi Kani
2013-08-22 20:11                         ` Toshi Kani
2013-08-22 20:21                         ` Tejun Heo
2013-08-22 20:21                           ` Tejun Heo
2013-08-22 20:35                           ` Tejun Heo
2013-08-22 20:35                             ` Tejun Heo
2013-08-22 21:06                           ` Toshi Kani
2013-08-22 21:06                             ` Toshi Kani
2013-08-22 21:21                             ` Tejun Heo
2013-08-22 21:21                               ` Tejun Heo
2013-08-22 22:17                               ` Toshi Kani
2013-08-22 22:17                                 ` Toshi Kani
2013-08-23 13:04                                 ` Tejun Heo
2013-08-23 13:04                                   ` Tejun Heo
2013-08-23 13:08                                   ` H. Peter Anvin
2013-08-23 13:08                                     ` H. Peter Anvin
2013-08-23 14:19                                     ` Tejun Heo
2013-08-23 14:19                                       ` Tejun Heo
2013-08-23 14:24                                       ` H. Peter Anvin
2013-08-23 14:24                                         ` H. Peter Anvin
2013-08-23 14:24                                         ` H. Peter Anvin
2013-08-23 14:35                                         ` Tejun Heo
2013-08-23 14:35                                           ` Tejun Heo
2013-08-23 14:57                                           ` Tejun Heo
2013-08-23 14:57                                             ` Tejun Heo
2013-08-23 16:14                                   ` Toshi Kani
2013-08-23 16:14                                     ` Toshi Kani
2013-08-23 16:24                                     ` Tejun Heo
2013-08-23 16:24                                       ` Tejun Heo
2013-08-23 17:13                                       ` Toshi Kani
2013-08-23 17:13                                         ` Toshi Kani
2013-08-23 17:29                                         ` Zhang Yanfei
2013-08-23 17:29                                           ` Zhang Yanfei
2013-08-23 16:54                                     ` Zhang Yanfei
2013-08-23 16:54                                       ` Zhang Yanfei
2013-08-23 18:18                                       ` Yinghai Lu
2013-08-23 18:18                                         ` Yinghai Lu
2013-08-23 18:25                                         ` H. Peter Anvin
2013-08-23 20:08                                           ` Yinghai Lu
2013-08-23 20:30                                             ` Russ Anderson
2013-08-23 20:48                                               ` Yinghai Lu
2013-08-23 21:50                                                 ` chen tang
2013-08-23 21:52                                                   ` Moore, Robert
2013-08-23 22:05                                                     ` Yinghai Lu
2013-08-23 22:08                                                   ` Yinghai Lu
2013-08-23 22:40                                                     ` chen tang
2013-08-23 23:04                                                       ` Yinghai Lu
2013-08-24  2:41                                             ` Russ Anderson
2013-08-23 20:33                                         ` chen tang
2013-08-23 20:33                                           ` chen tang
2013-08-23 21:08                                           ` Yinghai Lu
2013-08-23 21:08                                             ` Yinghai Lu
2013-08-23 22:27                                             ` chen tang
2013-08-23 22:27                                               ` chen tang
2013-08-23 18:29                                       ` Toshi Kani
2013-08-23 18:29                                         ` Toshi Kani
2013-08-23 21:37                                         ` chen tang
2013-08-23 21:37                                           ` chen tang
2013-08-23 21:52                                           ` Tejun Heo
2013-08-23 21:52                                             ` Tejun Heo
2013-08-23 23:56                                             ` chen tang
2013-08-23 23:56                                               ` chen tang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.