All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESEND PATCH v2 0/9] x86, memblock: Allocate memory near kernel image before SRAT parsed.
@ 2013-09-12  9:52 ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch-set is based on tj's suggestion, and not fully tested. 
Just for review and discussion. And according to tj's suggestion, 
implemented a new function memblock_alloc_bottom_up() to allocate 
memory from bottom upwards, whihc can simplify the code.

This patch-set is based on the latest kernel (3.11)
HEAD is:
commit d5d04bb48f0eb89c14e76779bb46212494de0bec
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Sep 11 19:55:12 2013 -0700


[Problem]

The current Linux cannot migrate pages used by the kerenl because
of the kernel direct mapping. In Linux kernel space, va = pa + PAGE_OFFSET.
When the pa is changed, we cannot simply update the pagetable and
keep the va unmodified. So the kernel pages are not migratable.

There are also some other issues will cause the kernel pages not migratable.
For example, the physical address may be cached somewhere and will be used.
It is not to update all the caches.

When doing memory hotplug in Linux, we first migrate all the pages in one
memory device somewhere else, and then remove the device. But if pages are
used by the kernel, they are not migratable. As a result, memory used by
the kernel cannot be hot-removed.

Modifying the kernel direct mapping mechanism is too difficult to do. And
it may cause the kernel performance down and unstable. So we use the following
way to do memory hotplug.


[What we are doing]

In Linux, memory in one numa node is divided into several zones. One of the
zones is ZONE_MOVABLE, which the kernel won't use.

In order to implement memory hotplug in Linux, we are going to arrange all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use these memory.
To do this, we need ACPI's help.

In ACPI, SRAT(System Resource Affinity Table) contains NUMA info. The memory
affinities in SRAT record every memory range in the system, and also, flags
specifying if the memory range is hotpluggable.
(Please refer to ACPI spec 5.0 5.2.16)

With the help of SRAT, we have to do the following two things to achieve our
goal:

1. When doing memory hot-add, allow the users arranging hotpluggable as
   ZONE_MOVABLE.
   (This has been done by the MOVABLE_NODE functionality in Linux.)

2. when the system is booting, prevent bootmem allocator from allocating
   hotpluggable memory for the kernel before the memory initialization
   finishes.

The problem 2 is the key problem we are going to solve. But before solving it,
we need some preparation. Please see below.


[Preparation]

Bootloader has to load the kernel image into memory. And this memory must be 
unhotpluggable. We cannot prevent this anyway. So in a memory hotplug system, 
we can assume any node the kernel resides in is not hotpluggable.

Before SRAT is parsed, we don't know which memory ranges are hotpluggable. But
memblock has already started to work. In the current kernel, memblock allocates 
the following memory before SRAT is parsed:

setup_arch()
 |->memblock_x86_fill()            /* memblock is ready */
 |......
 |->early_reserve_e820_mpc_new()   /* allocate memory under 1MB */
 |->reserve_real_mode()            /* allocate memory under 1MB */
 |->init_mem_mapping()             /* allocate page tables, about 2MB to map 1GB memory */
 |->dma_contiguous_reserve()       /* specified by user, should be low */
 |->setup_log_buf()                /* specified by user, several mega bytes */
 |->relocate_initrd()              /* could be large, but will be freed after boot, should reorder */
 |->acpi_initrd_override()         /* several mega bytes */
 |->reserve_crashkernel()          /* could be large, should reorder */
 |......
 |->initmem_init()                 /* Parse SRAT */

According to Tejun's advice, before SRAT is parsed, we should try our best to
allocate memory near the kernel image. Since the whole node the kernel resides 
in won't be hotpluggable, and for a modern server, a node may have at least 16GB
memory, allocating several mega bytes memory around the kernel image won't cross
to hotpluggable memory.


[About this patch-set]

So this patch-set does the following:

1. Make memblock be able to allocate memory from low address to high address.

2. Improve all functions who need to allocate memory before SRAT to support 
   allocating memory from low address to high address.

3. Introduce "movablenode" boot option to enable and disable this functionality.

PS: Reordering of relocate_initrd() has not been done yet. acpi_initrd_override() 
    needs to access initrd with virtual address. So relocate_initrd() must be done 
    before acpi_initrd_override().


Change log v1 -> v2:
1. According to tj's suggestion, implemented a new function memblock_alloc_bottom_up() 
   to allocate memory from bottom upwards, whihc can simplify the code.


Tang Chen (9):
  memblock: Introduce allocation direction to memblock.
  x86, memblock: Introduce memblock_alloc_bottom_up() to memblock.
  x86, dma: Support allocate memory from bottom upwards in
    dma_contiguous_reserve().
  x86: Support allocate memory from bottom upwards in setup_log_buf().
  x86: Support allocate memory from bottom upwards in
    relocate_initrd().
  x86, acpi: Support allocate memory from bottom upwards in
    acpi_initrd_override().
  x86, acpi, crash, kdump: Do reserve_crashkernel() after SRAT is
    parsed.
  x86, mem-hotplug: Support initialize page tables from low to high.
  mem-hotplug: Introduce movablenode boot option to control memblock
    allocation direction.

 Documentation/kernel-parameters.txt |   15 ++++
 arch/x86/kernel/setup.c             |   54 ++++++++++++++-
 arch/x86/mm/init.c                  |  133 +++++++++++++++++++++++++++--------
 drivers/acpi/osl.c                  |   11 +++
 drivers/base/dma-contiguous.c       |   17 ++++-
 include/linux/memblock.h            |   24 ++++++
 include/linux/memory_hotplug.h      |    5 ++
 kernel/printk/printk.c              |   11 +++
 mm/memblock.c                       |   51 +++++++++++++
 mm/memory_hotplug.c                 |    9 +++
 10 files changed, 296 insertions(+), 34 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 0/9] x86, memblock: Allocate memory near kernel image before SRAT parsed.
@ 2013-09-12  9:52 ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch-set is based on tj's suggestion, and not fully tested. 
Just for review and discussion. And according to tj's suggestion, 
implemented a new function memblock_alloc_bottom_up() to allocate 
memory from bottom upwards, whihc can simplify the code.

This patch-set is based on the latest kernel (3.11)
HEAD is:
commit d5d04bb48f0eb89c14e76779bb46212494de0bec
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Sep 11 19:55:12 2013 -0700


[Problem]

The current Linux cannot migrate pages used by the kerenl because
of the kernel direct mapping. In Linux kernel space, va = pa + PAGE_OFFSET.
When the pa is changed, we cannot simply update the pagetable and
keep the va unmodified. So the kernel pages are not migratable.

There are also some other issues will cause the kernel pages not migratable.
For example, the physical address may be cached somewhere and will be used.
It is not to update all the caches.

When doing memory hotplug in Linux, we first migrate all the pages in one
memory device somewhere else, and then remove the device. But if pages are
used by the kernel, they are not migratable. As a result, memory used by
the kernel cannot be hot-removed.

Modifying the kernel direct mapping mechanism is too difficult to do. And
it may cause the kernel performance down and unstable. So we use the following
way to do memory hotplug.


[What we are doing]

In Linux, memory in one numa node is divided into several zones. One of the
zones is ZONE_MOVABLE, which the kernel won't use.

In order to implement memory hotplug in Linux, we are going to arrange all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use these memory.
To do this, we need ACPI's help.

In ACPI, SRAT(System Resource Affinity Table) contains NUMA info. The memory
affinities in SRAT record every memory range in the system, and also, flags
specifying if the memory range is hotpluggable.
(Please refer to ACPI spec 5.0 5.2.16)

With the help of SRAT, we have to do the following two things to achieve our
goal:

1. When doing memory hot-add, allow the users arranging hotpluggable as
   ZONE_MOVABLE.
   (This has been done by the MOVABLE_NODE functionality in Linux.)

2. when the system is booting, prevent bootmem allocator from allocating
   hotpluggable memory for the kernel before the memory initialization
   finishes.

The problem 2 is the key problem we are going to solve. But before solving it,
we need some preparation. Please see below.


[Preparation]

Bootloader has to load the kernel image into memory. And this memory must be 
unhotpluggable. We cannot prevent this anyway. So in a memory hotplug system, 
we can assume any node the kernel resides in is not hotpluggable.

Before SRAT is parsed, we don't know which memory ranges are hotpluggable. But
memblock has already started to work. In the current kernel, memblock allocates 
the following memory before SRAT is parsed:

setup_arch()
 |->memblock_x86_fill()            /* memblock is ready */
 |......
 |->early_reserve_e820_mpc_new()   /* allocate memory under 1MB */
 |->reserve_real_mode()            /* allocate memory under 1MB */
 |->init_mem_mapping()             /* allocate page tables, about 2MB to map 1GB memory */
 |->dma_contiguous_reserve()       /* specified by user, should be low */
 |->setup_log_buf()                /* specified by user, several mega bytes */
 |->relocate_initrd()              /* could be large, but will be freed after boot, should reorder */
 |->acpi_initrd_override()         /* several mega bytes */
 |->reserve_crashkernel()          /* could be large, should reorder */
 |......
 |->initmem_init()                 /* Parse SRAT */

According to Tejun's advice, before SRAT is parsed, we should try our best to
allocate memory near the kernel image. Since the whole node the kernel resides 
in won't be hotpluggable, and for a modern server, a node may have at least 16GB
memory, allocating several mega bytes memory around the kernel image won't cross
to hotpluggable memory.


[About this patch-set]

So this patch-set does the following:

1. Make memblock be able to allocate memory from low address to high address.

2. Improve all functions who need to allocate memory before SRAT to support 
   allocating memory from low address to high address.

3. Introduce "movablenode" boot option to enable and disable this functionality.

PS: Reordering of relocate_initrd() has not been done yet. acpi_initrd_override() 
    needs to access initrd with virtual address. So relocate_initrd() must be done 
    before acpi_initrd_override().


Change log v1 -> v2:
1. According to tj's suggestion, implemented a new function memblock_alloc_bottom_up() 
   to allocate memory from bottom upwards, whihc can simplify the code.


Tang Chen (9):
  memblock: Introduce allocation direction to memblock.
  x86, memblock: Introduce memblock_alloc_bottom_up() to memblock.
  x86, dma: Support allocate memory from bottom upwards in
    dma_contiguous_reserve().
  x86: Support allocate memory from bottom upwards in setup_log_buf().
  x86: Support allocate memory from bottom upwards in
    relocate_initrd().
  x86, acpi: Support allocate memory from bottom upwards in
    acpi_initrd_override().
  x86, acpi, crash, kdump: Do reserve_crashkernel() after SRAT is
    parsed.
  x86, mem-hotplug: Support initialize page tables from low to high.
  mem-hotplug: Introduce movablenode boot option to control memblock
    allocation direction.

 Documentation/kernel-parameters.txt |   15 ++++
 arch/x86/kernel/setup.c             |   54 ++++++++++++++-
 arch/x86/mm/init.c                  |  133 +++++++++++++++++++++++++++--------
 drivers/acpi/osl.c                  |   11 +++
 drivers/base/dma-contiguous.c       |   17 ++++-
 include/linux/memblock.h            |   24 ++++++
 include/linux/memory_hotplug.h      |    5 ++
 kernel/printk/printk.c              |   11 +++
 mm/memblock.c                       |   51 +++++++++++++
 mm/memory_hotplug.c                 |    9 +++
 10 files changed, 296 insertions(+), 34 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 1/9] memblock: Introduce allocation direction to memblock.
  2013-09-12  9:52 ` Tang Chen
@ 2013-09-12  9:52   ` Tang Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Linux kernel cannot migrate pages used by the kernel. As a result, kernel
pages cannot be hot-removed. So we cannot allocate hotpluggable memory for
the kernel.

ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info.
But before SRAT is parsed, memblock has already started to allocate memory
for the kernel. So we need to prevent memblock from doing this.

In a memory hotplug system, any numa node the kernel resides in should
be unhotpluggable. And for a modern server, each node could have at least
16GB memory. So memory around the kernel image is highly likely unhotpluggable.

So the basic idea is: Allocate memory from the end of the kernel image and
to the higher memory. Since memory allocation before SRAT is parsed won't
be too much, it could highly likely be in the same node with kernel image.

The current memblock can only allocate memory from high address to low.
So this patch introduces the allocation direct to memblock. It could be
used to tell memblock to allocate memory from high to low or from low
to high.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |   22 ++++++++++++++++++++++
 mm/memblock.c            |   13 +++++++++++++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 31e95ac..a7d3436 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -19,6 +19,11 @@
 
 #define INIT_MEMBLOCK_REGIONS	128
 
+/* Allocation order. */
+#define MEMBLOCK_DIRECTION_HIGH_TO_LOW	0
+#define MEMBLOCK_DIRECTION_LOW_TO_HIGH	1
+#define MEMBLOCK_DIRECTION_DEFAULT	MEMBLOCK_DIRECTION_HIGH_TO_LOW
+
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
@@ -35,6 +40,7 @@ struct memblock_type {
 };
 
 struct memblock {
+	int current_direction;      /* allocate from higher or lower address */
 	phys_addr_t current_limit;
 	struct memblock_type memory;
 	struct memblock_type reserved;
@@ -148,6 +154,12 @@ phys_addr_t memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid)
 
 phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align);
 
+static inline bool memblock_direction_bottom_up(void)
+{
+	return memblock.current_direction == MEMBLOCK_DIRECTION_LOW_TO_HIGH;
+}
+
+
 /* Flags for memblock_alloc_base() amd __memblock_alloc_base() */
 #define MEMBLOCK_ALLOC_ANYWHERE	(~(phys_addr_t)0)
 #define MEMBLOCK_ALLOC_ACCESSIBLE	0
@@ -175,6 +187,16 @@ static inline void memblock_dump_all(void)
 }
 
 /**
+ * memblock_set_current_direction - Set current allocation direction to allow
+ *                                  allocating memory from higher to lower
+ *                                  address or from lower to higher address
+ *
+ * @direction: In which order to allocate memory. Could be
+ *             MEMBLOCK_DIRECTION_{HIGH_TO_LOW|LOW_TO_HIGH}
+ */
+void memblock_set_current_direction(int direction);
+
+/**
  * memblock_set_current_limit - Set the current allocation limit to allow
  *                         limiting allocations to what is currently
  *                         accessible during boot
diff --git a/mm/memblock.c b/mm/memblock.c
index 0ac412a..f24ca2e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -32,6 +32,7 @@ struct memblock memblock __initdata_memblock = {
 	.reserved.cnt		= 1,	/* empty dummy entry */
 	.reserved.max		= INIT_MEMBLOCK_REGIONS,
 
+	.current_direction	= MEMBLOCK_DIRECTION_DEFAULT,
 	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
 };
 
@@ -995,6 +996,18 @@ void __init_memblock memblock_trim_memory(phys_addr_t align)
 	}
 }
 
+void __init_memblock memblock_set_current_direction(int direction)
+{
+	if (direction != MEMBLOCK_DIRECTION_HIGH_TO_LOW &&
+	    direction != MEMBLOCK_DIRECTION_LOW_TO_HIGH) {
+		pr_warn("memblock: Failed to set allocation order. "
+			"Invalid order type: %d\n", direction);
+		return;
+	}
+
+	memblock.current_direction = direction;
+}
+
 void __init_memblock memblock_set_current_limit(phys_addr_t limit)
 {
 	memblock.current_limit = limit;
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 1/9] memblock: Introduce allocation direction to memblock.
@ 2013-09-12  9:52   ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Linux kernel cannot migrate pages used by the kernel. As a result, kernel
pages cannot be hot-removed. So we cannot allocate hotpluggable memory for
the kernel.

ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info.
But before SRAT is parsed, memblock has already started to allocate memory
for the kernel. So we need to prevent memblock from doing this.

In a memory hotplug system, any numa node the kernel resides in should
be unhotpluggable. And for a modern server, each node could have at least
16GB memory. So memory around the kernel image is highly likely unhotpluggable.

So the basic idea is: Allocate memory from the end of the kernel image and
to the higher memory. Since memory allocation before SRAT is parsed won't
be too much, it could highly likely be in the same node with kernel image.

The current memblock can only allocate memory from high address to low.
So this patch introduces the allocation direct to memblock. It could be
used to tell memblock to allocate memory from high to low or from low
to high.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |   22 ++++++++++++++++++++++
 mm/memblock.c            |   13 +++++++++++++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 31e95ac..a7d3436 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -19,6 +19,11 @@
 
 #define INIT_MEMBLOCK_REGIONS	128
 
+/* Allocation order. */
+#define MEMBLOCK_DIRECTION_HIGH_TO_LOW	0
+#define MEMBLOCK_DIRECTION_LOW_TO_HIGH	1
+#define MEMBLOCK_DIRECTION_DEFAULT	MEMBLOCK_DIRECTION_HIGH_TO_LOW
+
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
@@ -35,6 +40,7 @@ struct memblock_type {
 };
 
 struct memblock {
+	int current_direction;      /* allocate from higher or lower address */
 	phys_addr_t current_limit;
 	struct memblock_type memory;
 	struct memblock_type reserved;
@@ -148,6 +154,12 @@ phys_addr_t memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid)
 
 phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align);
 
+static inline bool memblock_direction_bottom_up(void)
+{
+	return memblock.current_direction == MEMBLOCK_DIRECTION_LOW_TO_HIGH;
+}
+
+
 /* Flags for memblock_alloc_base() amd __memblock_alloc_base() */
 #define MEMBLOCK_ALLOC_ANYWHERE	(~(phys_addr_t)0)
 #define MEMBLOCK_ALLOC_ACCESSIBLE	0
@@ -175,6 +187,16 @@ static inline void memblock_dump_all(void)
 }
 
 /**
+ * memblock_set_current_direction - Set current allocation direction to allow
+ *                                  allocating memory from higher to lower
+ *                                  address or from lower to higher address
+ *
+ * @direction: In which order to allocate memory. Could be
+ *             MEMBLOCK_DIRECTION_{HIGH_TO_LOW|LOW_TO_HIGH}
+ */
+void memblock_set_current_direction(int direction);
+
+/**
  * memblock_set_current_limit - Set the current allocation limit to allow
  *                         limiting allocations to what is currently
  *                         accessible during boot
diff --git a/mm/memblock.c b/mm/memblock.c
index 0ac412a..f24ca2e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -32,6 +32,7 @@ struct memblock memblock __initdata_memblock = {
 	.reserved.cnt		= 1,	/* empty dummy entry */
 	.reserved.max		= INIT_MEMBLOCK_REGIONS,
 
+	.current_direction	= MEMBLOCK_DIRECTION_DEFAULT,
 	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
 };
 
@@ -995,6 +996,18 @@ void __init_memblock memblock_trim_memory(phys_addr_t align)
 	}
 }
 
+void __init_memblock memblock_set_current_direction(int direction)
+{
+	if (direction != MEMBLOCK_DIRECTION_HIGH_TO_LOW &&
+	    direction != MEMBLOCK_DIRECTION_LOW_TO_HIGH) {
+		pr_warn("memblock: Failed to set allocation order. "
+			"Invalid order type: %d\n", direction);
+		return;
+	}
+
+	memblock.current_direction = direction;
+}
+
 void __init_memblock memblock_set_current_limit(phys_addr_t limit)
 {
 	memblock.current_limit = limit;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 2/9] x86, memblock: Introduce memblock_alloc_bottom_up() to memblock.
  2013-09-12  9:52 ` Tang Chen
@ 2013-09-12  9:52   ` Tang Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch introduces a new API memblock_alloc_bottom_up() to make memblock be
able to allocate from bottom upwards.

During early boot, if the bottom up mode is set, just try allocating bottom up
from the end of kernel image, and if that fails, do normal top down allocation.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |    2 ++
 mm/memblock.c            |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index a7d3436..3dff812 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -153,6 +153,8 @@ phys_addr_t memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid);
 phys_addr_t memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid);
 
 phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align);
+phys_addr_t memblock_alloc_bottom_up(phys_addr_t start, phys_addr_t end,
+				     phys_addr_t size, phys_addr_t align);
 
 static inline bool memblock_direction_bottom_up(void)
 {
diff --git a/mm/memblock.c b/mm/memblock.c
index f24ca2e..2eb19f3 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -20,6 +20,8 @@
 #include <linux/seq_file.h>
 #include <linux/memblock.h>
 
+#include <asm-generic/sections.h>
+
 static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
 static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
 
@@ -786,6 +788,42 @@ static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
 	return 0;
 }
 
+/**
+ * memblock_alloc_bottom_up - allocate memory from bottom upwards
+ * @start: start of candidate range, can be %MEMBLOCK_ALLOC_ACCESSIBLE
+ * @@end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
+ * @size: size of free area to allocate
+ * @align: alignment of free area to allocate
+ *
+ * Allocate @size free area aligned to @align from the end of the kernel image
+ * upwards.
+ *
+ * Found address on success, %0 on failure.
+ */
+phys_addr_t __init_memblock memblock_alloc_bottom_up(phys_addr_t start,
+					phys_addr_t end, phys_addr_t size,
+					phys_addr_t align)
+{
+	phys_addr_t this_start, this_end, cand;
+	u64 i;
+
+	if (start == MEMBLOCK_ALLOC_ACCESSIBLE)
+		start = __pa_symbol(_end);	/* End of kernel image. */
+	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
+		end = memblock.current_limit;
+
+	for_each_free_mem_range(i, MAX_NUMNODES, &this_start, &this_end, NULL) {
+		this_start = clamp(this_start, start, end);
+		this_end = clamp(this_end, start, end);
+
+		cand = round_up(this_start, align);
+		if (cand < this_end && this_end - cand >= size)
+			return cand;
+	}
+
+	return 0;
+}
+
 phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid)
 {
 	return memblock_alloc_base_nid(size, align, MEMBLOCK_ALLOC_ACCESSIBLE, nid);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 2/9] x86, memblock: Introduce memblock_alloc_bottom_up() to memblock.
@ 2013-09-12  9:52   ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

This patch introduces a new API memblock_alloc_bottom_up() to make memblock be
able to allocate from bottom upwards.

During early boot, if the bottom up mode is set, just try allocating bottom up
from the end of kernel image, and if that fails, do normal top down allocation.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |    2 ++
 mm/memblock.c            |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index a7d3436..3dff812 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -153,6 +153,8 @@ phys_addr_t memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid);
 phys_addr_t memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid);
 
 phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align);
+phys_addr_t memblock_alloc_bottom_up(phys_addr_t start, phys_addr_t end,
+				     phys_addr_t size, phys_addr_t align);
 
 static inline bool memblock_direction_bottom_up(void)
 {
diff --git a/mm/memblock.c b/mm/memblock.c
index f24ca2e..2eb19f3 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -20,6 +20,8 @@
 #include <linux/seq_file.h>
 #include <linux/memblock.h>
 
+#include <asm-generic/sections.h>
+
 static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
 static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
 
@@ -786,6 +788,42 @@ static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
 	return 0;
 }
 
+/**
+ * memblock_alloc_bottom_up - allocate memory from bottom upwards
+ * @start: start of candidate range, can be %MEMBLOCK_ALLOC_ACCESSIBLE
+ * @@end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
+ * @size: size of free area to allocate
+ * @align: alignment of free area to allocate
+ *
+ * Allocate @size free area aligned to @align from the end of the kernel image
+ * upwards.
+ *
+ * Found address on success, %0 on failure.
+ */
+phys_addr_t __init_memblock memblock_alloc_bottom_up(phys_addr_t start,
+					phys_addr_t end, phys_addr_t size,
+					phys_addr_t align)
+{
+	phys_addr_t this_start, this_end, cand;
+	u64 i;
+
+	if (start == MEMBLOCK_ALLOC_ACCESSIBLE)
+		start = __pa_symbol(_end);	/* End of kernel image. */
+	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
+		end = memblock.current_limit;
+
+	for_each_free_mem_range(i, MAX_NUMNODES, &this_start, &this_end, NULL) {
+		this_start = clamp(this_start, start, end);
+		this_end = clamp(this_end, start, end);
+
+		cand = round_up(this_start, align);
+		if (cand < this_end && this_end - cand >= size)
+			return cand;
+	}
+
+	return 0;
+}
+
 phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid)
 {
 	return memblock_alloc_base_nid(size, align, MEMBLOCK_ALLOC_ACCESSIBLE, nid);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 3/9] x86, dma: Support allocate memory from bottom upwards in dma_contiguous_reserve().
  2013-09-12  9:52 ` Tang Chen
@ 2013-09-12  9:52   ` Tang Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

During early boot, if the bottom up mode is set, just
try allocating bottom up from the end of kernel image,
and if that fails, do normal top down allocation.

So in function dma_contiguous_reserve(), we add the
above logic.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/base/dma-contiguous.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 99802d6..aada945 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -228,17 +228,28 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
 			goto err;
 		}
 	} else {
+		phys_addr_t addr;
+
+		if (memblock_direction_bottom_up()) {
+			addr = memblock_alloc_bottom_up(
+						MEMBLOCK_ALLOC_ACCESSIBLE,
+						limit, size, alignment);
+			if (addr)
+				goto success;
+		}
+
 		/*
 		 * Use __memblock_alloc_base() since
 		 * memblock_alloc_base() panic()s.
 		 */
-		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
+		addr = __memblock_alloc_base(size, alignment, limit);
 		if (!addr) {
 			ret = -ENOMEM;
 			goto err;
-		} else {
-			base = addr;
 		}
+
+success:
+		base = addr;
 	}
 
 	/*
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 3/9] x86, dma: Support allocate memory from bottom upwards in dma_contiguous_reserve().
@ 2013-09-12  9:52   ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

During early boot, if the bottom up mode is set, just
try allocating bottom up from the end of kernel image,
and if that fails, do normal top down allocation.

So in function dma_contiguous_reserve(), we add the
above logic.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/base/dma-contiguous.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 99802d6..aada945 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -228,17 +228,28 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
 			goto err;
 		}
 	} else {
+		phys_addr_t addr;
+
+		if (memblock_direction_bottom_up()) {
+			addr = memblock_alloc_bottom_up(
+						MEMBLOCK_ALLOC_ACCESSIBLE,
+						limit, size, alignment);
+			if (addr)
+				goto success;
+		}
+
 		/*
 		 * Use __memblock_alloc_base() since
 		 * memblock_alloc_base() panic()s.
 		 */
-		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
+		addr = __memblock_alloc_base(size, alignment, limit);
 		if (!addr) {
 			ret = -ENOMEM;
 			goto err;
-		} else {
-			base = addr;
 		}
+
+success:
+		base = addr;
 	}
 
 	/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 4/9] x86: Support allocate memory from bottom upwards in setup_log_buf().
  2013-09-12  9:52 ` Tang Chen
@ 2013-09-12  9:52   ` Tang Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

During early boot, if the bottom up mode is set, just
try allocating bottom up from the end of kernel image,
and if that fails, do normal top down allocation.

So in function setup_log_buf(), we add the above logic.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 kernel/printk/printk.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b4e8500..2958118 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -759,9 +759,20 @@ void __init setup_log_buf(int early)
 	if (early) {
 		unsigned long mem;
 
+		if (memblock_direction_bottom_up()) {
+			mem = memblock_alloc_bottom_up(
+						MEMBLOCK_ALLOC_ACCESSIBLE,
+						MEMBLOCK_ALLOC_ACCESSIBLE,
+						new_log_buf_len, PAGE_SIZE);
+			if (mem)
+				goto success;
+		}
+
 		mem = memblock_alloc(new_log_buf_len, PAGE_SIZE);
 		if (!mem)
 			return;
+
+success:
 		new_log_buf = __va(mem);
 	} else {
 		new_log_buf = alloc_bootmem_nopanic(new_log_buf_len);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 4/9] x86: Support allocate memory from bottom upwards in setup_log_buf().
@ 2013-09-12  9:52   ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

During early boot, if the bottom up mode is set, just
try allocating bottom up from the end of kernel image,
and if that fails, do normal top down allocation.

So in function setup_log_buf(), we add the above logic.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 kernel/printk/printk.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b4e8500..2958118 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -759,9 +759,20 @@ void __init setup_log_buf(int early)
 	if (early) {
 		unsigned long mem;
 
+		if (memblock_direction_bottom_up()) {
+			mem = memblock_alloc_bottom_up(
+						MEMBLOCK_ALLOC_ACCESSIBLE,
+						MEMBLOCK_ALLOC_ACCESSIBLE,
+						new_log_buf_len, PAGE_SIZE);
+			if (mem)
+				goto success;
+		}
+
 		mem = memblock_alloc(new_log_buf_len, PAGE_SIZE);
 		if (!mem)
 			return;
+
+success:
 		new_log_buf = __va(mem);
 	} else {
 		new_log_buf = alloc_bootmem_nopanic(new_log_buf_len);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 5/9] x86: Support allocate memory from bottom upwards in relocate_initrd().
  2013-09-12  9:52 ` Tang Chen
@ 2013-09-12  9:52   ` Tang Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

During early boot, if the bottom up mode is set, just
try allocating bottom up from the end of kernel image,
and if that fails, do normal top down allocation.

So in function relocate_initrd(), we add the above logic.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f0de629..7372be7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -326,6 +326,15 @@ static void __init relocate_initrd(void)
 	char *p, *q;
 
 	/* We need to move the initrd down into directly mapped mem */
+	if (memblock_direction_bottom_up()) {
+		ramdisk_here = memblock_alloc_bottom_up(
+						MEMBLOCK_ALLOC_ACCESSIBLE,
+						PFN_PHYS(max_pfn_mapped),
+						area_size, PAGE_SIZE);
+		if (ramdisk_here)
+			goto success;
+	}
+
 	ramdisk_here = memblock_find_in_range(0, PFN_PHYS(max_pfn_mapped),
 						 area_size, PAGE_SIZE);
 
@@ -333,6 +342,7 @@ static void __init relocate_initrd(void)
 		panic("Cannot find place for new RAMDISK of size %lld\n",
 			 ramdisk_size);
 
+success:
 	/* Note: this includes all the mem currently occupied by
 	   the initrd, we rely on that fact to keep the data intact. */
 	memblock_reserve(ramdisk_here, area_size);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 5/9] x86: Support allocate memory from bottom upwards in relocate_initrd().
@ 2013-09-12  9:52   ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

During early boot, if the bottom up mode is set, just
try allocating bottom up from the end of kernel image,
and if that fails, do normal top down allocation.

So in function relocate_initrd(), we add the above logic.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f0de629..7372be7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -326,6 +326,15 @@ static void __init relocate_initrd(void)
 	char *p, *q;
 
 	/* We need to move the initrd down into directly mapped mem */
+	if (memblock_direction_bottom_up()) {
+		ramdisk_here = memblock_alloc_bottom_up(
+						MEMBLOCK_ALLOC_ACCESSIBLE,
+						PFN_PHYS(max_pfn_mapped),
+						area_size, PAGE_SIZE);
+		if (ramdisk_here)
+			goto success;
+	}
+
 	ramdisk_here = memblock_find_in_range(0, PFN_PHYS(max_pfn_mapped),
 						 area_size, PAGE_SIZE);
 
@@ -333,6 +342,7 @@ static void __init relocate_initrd(void)
 		panic("Cannot find place for new RAMDISK of size %lld\n",
 			 ramdisk_size);
 
+success:
 	/* Note: this includes all the mem currently occupied by
 	   the initrd, we rely on that fact to keep the data intact. */
 	memblock_reserve(ramdisk_here, area_size);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 6/9] x86, acpi: Support allocate memory from bottom upwards in acpi_initrd_override().
  2013-09-12  9:52 ` Tang Chen
@ 2013-09-12  9:52   ` Tang Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

During early boot, if the bottom up mode is set, just
try allocating bottom up from the end of kernel image,
and if that fails, do normal top down allocation.

So in function acpi_initrd_override(), we add the
above logic.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e5f416c..978dcfa 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -632,6 +632,15 @@ void __init acpi_initrd_override(void *data, size_t size)
 	if (table_nr == 0)
 		return;
 
+	if (memblock_direction_bottom_up()) {
+		acpi_tables_addr = memblock_alloc_bottom_up(
+					MEMBLOCK_ALLOC_ACCESSIBLE,
+					max_low_pfn_mapped << PAGE_SHIFT,
+					all_tables_size, PAGE_SIZE);
+		if (acpi_tables_addr)
+			goto success;
+	}
+
 	acpi_tables_addr =
 		memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
 				       all_tables_size, PAGE_SIZE);
@@ -639,6 +648,8 @@ void __init acpi_initrd_override(void *data, size_t size)
 		WARN_ON(1);
 		return;
 	}
+
+success:
 	/*
 	 * Only calling e820_add_reserve does not work and the
 	 * tables are invalid (memory got used) later.
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 6/9] x86, acpi: Support allocate memory from bottom upwards in acpi_initrd_override().
@ 2013-09-12  9:52   ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

During early boot, if the bottom up mode is set, just
try allocating bottom up from the end of kernel image,
and if that fails, do normal top down allocation.

So in function acpi_initrd_override(), we add the
above logic.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 drivers/acpi/osl.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e5f416c..978dcfa 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -632,6 +632,15 @@ void __init acpi_initrd_override(void *data, size_t size)
 	if (table_nr == 0)
 		return;
 
+	if (memblock_direction_bottom_up()) {
+		acpi_tables_addr = memblock_alloc_bottom_up(
+					MEMBLOCK_ALLOC_ACCESSIBLE,
+					max_low_pfn_mapped << PAGE_SHIFT,
+					all_tables_size, PAGE_SIZE);
+		if (acpi_tables_addr)
+			goto success;
+	}
+
 	acpi_tables_addr =
 		memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
 				       all_tables_size, PAGE_SIZE);
@@ -639,6 +648,8 @@ void __init acpi_initrd_override(void *data, size_t size)
 		WARN_ON(1);
 		return;
 	}
+
+success:
 	/*
 	 * Only calling e820_add_reserve does not work and the
 	 * tables are invalid (memory got used) later.
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 7/9] x86, acpi, crash, kdump: Do reserve_crashkernel() after SRAT is parsed.
  2013-09-12  9:52 ` Tang Chen
@ 2013-09-12  9:52   ` Tang Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Memory reserved for crashkernel could be large. So we should not allocate
this memory bottom up from the end of kernel image.

When SRAT is parsed, we will be able to know whihc memory is hotpluggable,
and we can avoid allocating this memory for the kernel. So reorder
reserve_crashkernel() after SRAT is parsed.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 7372be7..fa56a57 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1130,8 +1130,6 @@ void __init setup_arch(char **cmdline_p)
 	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start);
 #endif
 
-	reserve_crashkernel();
-
 	vsmp_init();
 
 	io_delay_init();
@@ -1146,6 +1144,12 @@ void __init setup_arch(char **cmdline_p)
 	initmem_init();
 	memblock_find_dma_reserve();
 
+	/*
+	 * Reserve memory for crash kernel after SRAT is parsed so that it
+	 * won't consume hotpluggable memory.
+	 */
+	reserve_crashkernel();
+
 #ifdef CONFIG_KVM_GUEST
 	kvmclock_init();
 #endif
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 7/9] x86, acpi, crash, kdump: Do reserve_crashkernel() after SRAT is parsed.
@ 2013-09-12  9:52   ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

Memory reserved for crashkernel could be large. So we should not allocate
this memory bottom up from the end of kernel image.

When SRAT is parsed, we will be able to know whihc memory is hotpluggable,
and we can avoid allocating this memory for the kernel. So reorder
reserve_crashkernel() after SRAT is parsed.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 7372be7..fa56a57 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1130,8 +1130,6 @@ void __init setup_arch(char **cmdline_p)
 	acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start);
 #endif
 
-	reserve_crashkernel();
-
 	vsmp_init();
 
 	io_delay_init();
@@ -1146,6 +1144,12 @@ void __init setup_arch(char **cmdline_p)
 	initmem_init();
 	memblock_find_dma_reserve();
 
+	/*
+	 * Reserve memory for crash kernel after SRAT is parsed so that it
+	 * won't consume hotpluggable memory.
+	 */
+	reserve_crashkernel();
+
 #ifdef CONFIG_KVM_GUEST
 	kvmclock_init();
 #endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 8/9] x86, mem-hotplug: Support initialize page tables from low to high.
  2013-09-12  9:52 ` Tang Chen
@ 2013-09-12  9:52   ` Tang Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

init_mem_mapping() is called before SRAT is parsed. And memblock will allocate
memory for page tables. To prevent page tables being allocated within hotpluggable
memory, we will allocate page tables from the end of kernel image to the higher
memory.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/mm/init.c |  133 ++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 104 insertions(+), 29 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 04664cd..7dae4e3 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -54,11 +54,23 @@ __ref void *alloc_low_pages(unsigned int num)
 		unsigned long ret;
 		if (min_pfn_mapped >= max_pfn_mapped)
 			panic("alloc_low_page: ran out of memory");
+
+		if (memblock_direction_bottom_up()) {
+			ret = memblock_alloc_bottom_up(
+						MEMBLOCK_ALLOC_ACCESSIBLE,
+						max_pfn_mapped << PAGE_SHIFT,
+						PAGE_SIZE * num, PAGE_SIZE);
+			if (ret)
+				goto reserve;
+		}
+
 		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
 					max_pfn_mapped << PAGE_SHIFT,
 					PAGE_SIZE * num , PAGE_SIZE);
 		if (!ret)
 			panic("alloc_low_page: can not alloc memory");
+
+reserve:
 		memblock_reserve(ret, PAGE_SIZE * num);
 		pfn = ret >> PAGE_SHIFT;
 	} else {
@@ -401,13 +413,79 @@ static unsigned long __init init_range_memory_mapping(
 
 /* (PUD_SHIFT-PMD_SHIFT)/2 */
 #define STEP_SIZE_SHIFT 5
-void __init init_mem_mapping(void)
+
+#ifdef CONFIG_MOVABLE_NODE
+/**
+ * memory_map_from_low - Map [start, end) from low to high
+ * @start: start address of the target memory range
+ * @end: end address of the target memory range
+ *
+ * This function will setup direct mapping for memory range [start, end) in a
+ * heuristic way. In the beginning, step_size is small. The more memory we map
+ * memory in the next loop.
+ */
+static void __init memory_map_from_low(unsigned long start, unsigned long end)
+{
+	unsigned long next, new_mapped_ram_size;
+	unsigned long mapped_ram_size = 0;
+	/* step_size need to be small so pgt_buf from BRK could cover it */
+	unsigned long step_size = PMD_SIZE;
+
+	while (start < end) {
+		if (end - start > step_size) {
+			next = round_up(start + 1, step_size);
+			if (next > end)
+				next = end;
+		} else
+			next = end;
+
+		new_mapped_ram_size = init_range_memory_mapping(start, next);
+		min_pfn_mapped = start >> PAGE_SHIFT;
+		start = next;
+
+		if (new_mapped_ram_size > mapped_ram_size)
+			step_size <<= STEP_SIZE_SHIFT;
+		mapped_ram_size += new_mapped_ram_size;
+	}
+}
+#endif /* CONFIG_MOVABLE_NODE */
+
+/**
+ * memory_map_from_high - Map [start, end) from high to low
+ * @start: start address of the target memory range
+ * @end: end address of the target memory range
+ *
+ * This function is similar to memory_map_from_low() except it maps memory
+ * from high to low.
+ */
+static void __init memory_map_from_high(unsigned long start, unsigned long end)
 {
-	unsigned long end, real_end, start, last_start;
-	unsigned long step_size;
-	unsigned long addr;
+	unsigned long prev, new_mapped_ram_size;
 	unsigned long mapped_ram_size = 0;
-	unsigned long new_mapped_ram_size;
+	/* step_size need to be small so pgt_buf from BRK could cover it */
+	unsigned long step_size = PMD_SIZE;
+
+	while (start < end) {
+		if (end > step_size) {
+			prev = round_down(end - 1, step_size);
+			if (prev < start)
+				prev = start;
+		} else
+			prev = start;
+
+		new_mapped_ram_size = init_range_memory_mapping(prev, end);
+		min_pfn_mapped = prev >> PAGE_SHIFT;
+		end = prev;
+
+		if (new_mapped_ram_size > mapped_ram_size)
+			step_size <<= STEP_SIZE_SHIFT;
+		mapped_ram_size += new_mapped_ram_size;
+	}
+}
+
+void __init init_mem_mapping(void)
+{
+	unsigned long end;
 
 	probe_page_size_mask();
 
@@ -417,45 +495,42 @@ void __init init_mem_mapping(void)
 	end = max_low_pfn << PAGE_SHIFT;
 #endif
 
-	/* the ISA range is always mapped regardless of memory holes */
-	init_memory_mapping(0, ISA_END_ADDRESS);
+	max_pfn_mapped = 0; /* will get exact value next */
+	min_pfn_mapped = end >> PAGE_SHIFT;
+
+#ifdef CONFIG_MOVABLE_NODE
+	unsigned long kernel_end;
+
+	if (memblock_direction_bottom_up()) {
+		kernel_end = round_up(__pa_symbol(_end), PMD_SIZE);
+
+		memory_map_from_low(kernel_end, end);
+		memory_map_from_low(ISA_END_ADDRESS, kernel_end);
+		goto out;
+	}
+#endif /* CONFIG_MOVABLE_NODE */
+
+	unsigned long addr, real_end;
 
 	/* xen has big range in reserved near end of ram, skip it at first.*/
 	addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE, PMD_SIZE);
 	real_end = addr + PMD_SIZE;
 
-	/* step_size need to be small so pgt_buf from BRK could cover it */
-	step_size = PMD_SIZE;
-	max_pfn_mapped = 0; /* will get exact value next */
-	min_pfn_mapped = real_end >> PAGE_SHIFT;
-	last_start = start = real_end;
-
 	/*
 	 * We start from the top (end of memory) and go to the bottom.
 	 * The memblock_find_in_range() gets us a block of RAM from the
 	 * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
 	 * for page table.
 	 */
-	while (last_start > ISA_END_ADDRESS) {
-		if (last_start > step_size) {
-			start = round_down(last_start - 1, step_size);
-			if (start < ISA_END_ADDRESS)
-				start = ISA_END_ADDRESS;
-		} else
-			start = ISA_END_ADDRESS;
-		new_mapped_ram_size = init_range_memory_mapping(start,
-							last_start);
-		last_start = start;
-		min_pfn_mapped = last_start >> PAGE_SHIFT;
-		/* only increase step_size after big range get mapped */
-		if (new_mapped_ram_size > mapped_ram_size)
-			step_size <<= STEP_SIZE_SHIFT;
-		mapped_ram_size += new_mapped_ram_size;
-	}
+	memory_map_from_high(ISA_END_ADDRESS, real_end);
 
 	if (real_end < end)
 		init_range_memory_mapping(real_end, end);
 
+out:
+	/* the ISA range is always mapped regardless of memory holes */
+	init_memory_mapping(0, ISA_END_ADDRESS);
+
 #ifdef CONFIG_X86_64
 	if (max_pfn > max_low_pfn) {
 		/* can we preseve max_low_pfn ?*/
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 8/9] x86, mem-hotplug: Support initialize page tables from low to high.
@ 2013-09-12  9:52   ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

init_mem_mapping() is called before SRAT is parsed. And memblock will allocate
memory for page tables. To prevent page tables being allocated within hotpluggable
memory, we will allocate page tables from the end of kernel image to the higher
memory.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/mm/init.c |  133 ++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 104 insertions(+), 29 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 04664cd..7dae4e3 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -54,11 +54,23 @@ __ref void *alloc_low_pages(unsigned int num)
 		unsigned long ret;
 		if (min_pfn_mapped >= max_pfn_mapped)
 			panic("alloc_low_page: ran out of memory");
+
+		if (memblock_direction_bottom_up()) {
+			ret = memblock_alloc_bottom_up(
+						MEMBLOCK_ALLOC_ACCESSIBLE,
+						max_pfn_mapped << PAGE_SHIFT,
+						PAGE_SIZE * num, PAGE_SIZE);
+			if (ret)
+				goto reserve;
+		}
+
 		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
 					max_pfn_mapped << PAGE_SHIFT,
 					PAGE_SIZE * num , PAGE_SIZE);
 		if (!ret)
 			panic("alloc_low_page: can not alloc memory");
+
+reserve:
 		memblock_reserve(ret, PAGE_SIZE * num);
 		pfn = ret >> PAGE_SHIFT;
 	} else {
@@ -401,13 +413,79 @@ static unsigned long __init init_range_memory_mapping(
 
 /* (PUD_SHIFT-PMD_SHIFT)/2 */
 #define STEP_SIZE_SHIFT 5
-void __init init_mem_mapping(void)
+
+#ifdef CONFIG_MOVABLE_NODE
+/**
+ * memory_map_from_low - Map [start, end) from low to high
+ * @start: start address of the target memory range
+ * @end: end address of the target memory range
+ *
+ * This function will setup direct mapping for memory range [start, end) in a
+ * heuristic way. In the beginning, step_size is small. The more memory we map
+ * memory in the next loop.
+ */
+static void __init memory_map_from_low(unsigned long start, unsigned long end)
+{
+	unsigned long next, new_mapped_ram_size;
+	unsigned long mapped_ram_size = 0;
+	/* step_size need to be small so pgt_buf from BRK could cover it */
+	unsigned long step_size = PMD_SIZE;
+
+	while (start < end) {
+		if (end - start > step_size) {
+			next = round_up(start + 1, step_size);
+			if (next > end)
+				next = end;
+		} else
+			next = end;
+
+		new_mapped_ram_size = init_range_memory_mapping(start, next);
+		min_pfn_mapped = start >> PAGE_SHIFT;
+		start = next;
+
+		if (new_mapped_ram_size > mapped_ram_size)
+			step_size <<= STEP_SIZE_SHIFT;
+		mapped_ram_size += new_mapped_ram_size;
+	}
+}
+#endif /* CONFIG_MOVABLE_NODE */
+
+/**
+ * memory_map_from_high - Map [start, end) from high to low
+ * @start: start address of the target memory range
+ * @end: end address of the target memory range
+ *
+ * This function is similar to memory_map_from_low() except it maps memory
+ * from high to low.
+ */
+static void __init memory_map_from_high(unsigned long start, unsigned long end)
 {
-	unsigned long end, real_end, start, last_start;
-	unsigned long step_size;
-	unsigned long addr;
+	unsigned long prev, new_mapped_ram_size;
 	unsigned long mapped_ram_size = 0;
-	unsigned long new_mapped_ram_size;
+	/* step_size need to be small so pgt_buf from BRK could cover it */
+	unsigned long step_size = PMD_SIZE;
+
+	while (start < end) {
+		if (end > step_size) {
+			prev = round_down(end - 1, step_size);
+			if (prev < start)
+				prev = start;
+		} else
+			prev = start;
+
+		new_mapped_ram_size = init_range_memory_mapping(prev, end);
+		min_pfn_mapped = prev >> PAGE_SHIFT;
+		end = prev;
+
+		if (new_mapped_ram_size > mapped_ram_size)
+			step_size <<= STEP_SIZE_SHIFT;
+		mapped_ram_size += new_mapped_ram_size;
+	}
+}
+
+void __init init_mem_mapping(void)
+{
+	unsigned long end;
 
 	probe_page_size_mask();
 
@@ -417,45 +495,42 @@ void __init init_mem_mapping(void)
 	end = max_low_pfn << PAGE_SHIFT;
 #endif
 
-	/* the ISA range is always mapped regardless of memory holes */
-	init_memory_mapping(0, ISA_END_ADDRESS);
+	max_pfn_mapped = 0; /* will get exact value next */
+	min_pfn_mapped = end >> PAGE_SHIFT;
+
+#ifdef CONFIG_MOVABLE_NODE
+	unsigned long kernel_end;
+
+	if (memblock_direction_bottom_up()) {
+		kernel_end = round_up(__pa_symbol(_end), PMD_SIZE);
+
+		memory_map_from_low(kernel_end, end);
+		memory_map_from_low(ISA_END_ADDRESS, kernel_end);
+		goto out;
+	}
+#endif /* CONFIG_MOVABLE_NODE */
+
+	unsigned long addr, real_end;
 
 	/* xen has big range in reserved near end of ram, skip it at first.*/
 	addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE, PMD_SIZE);
 	real_end = addr + PMD_SIZE;
 
-	/* step_size need to be small so pgt_buf from BRK could cover it */
-	step_size = PMD_SIZE;
-	max_pfn_mapped = 0; /* will get exact value next */
-	min_pfn_mapped = real_end >> PAGE_SHIFT;
-	last_start = start = real_end;
-
 	/*
 	 * We start from the top (end of memory) and go to the bottom.
 	 * The memblock_find_in_range() gets us a block of RAM from the
 	 * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
 	 * for page table.
 	 */
-	while (last_start > ISA_END_ADDRESS) {
-		if (last_start > step_size) {
-			start = round_down(last_start - 1, step_size);
-			if (start < ISA_END_ADDRESS)
-				start = ISA_END_ADDRESS;
-		} else
-			start = ISA_END_ADDRESS;
-		new_mapped_ram_size = init_range_memory_mapping(start,
-							last_start);
-		last_start = start;
-		min_pfn_mapped = last_start >> PAGE_SHIFT;
-		/* only increase step_size after big range get mapped */
-		if (new_mapped_ram_size > mapped_ram_size)
-			step_size <<= STEP_SIZE_SHIFT;
-		mapped_ram_size += new_mapped_ram_size;
-	}
+	memory_map_from_high(ISA_END_ADDRESS, real_end);
 
 	if (real_end < end)
 		init_range_memory_mapping(real_end, end);
 
+out:
+	/* the ISA range is always mapped regardless of memory holes */
+	init_memory_mapping(0, ISA_END_ADDRESS);
+
 #ifdef CONFIG_X86_64
 	if (max_pfn > max_low_pfn) {
 		/* can we preseve max_low_pfn ?*/
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 9/9] mem-hotplug: Introduce movablenode boot option to control memblock allocation direction.
  2013-09-12  9:52 ` Tang Chen
@ 2013-09-12  9:52   ` Tang Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Hot-Pluggable fired in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel,
it cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movablenode boot option to allow users to
choose to reserve hotpluggable memory and set it as ZONE_MOVABLE or not.

Users can specify "movablenode" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.

After memblock is ready, before SRAT is parsed, we should allocate memory
near the kernel image. So this patch does the following:

1. After memblock is ready, make memblock allocate memory from low address
   to high.
2. After SRAT is parsed, make memblock behave as default, allocate memory
   from high address to low.

This behavior is controlled by movablenode boot option.

Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |   15 ++++++++++++++
 arch/x86/kernel/setup.c             |   36 +++++++++++++++++++++++++++++++++++
 include/linux/memory_hotplug.h      |    5 ++++
 mm/memory_hotplug.c                 |    9 ++++++++
 4 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1a036cd..8c056c4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1769,6 +1769,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablenode		[KNL,X86] This parameter enables/disables the
+			kernel to arrange hotpluggable memory ranges recorded
+			in ACPI SRAT(System Resource Affinity Table) as
+			ZONE_MOVABLE. And these memory can be hot-removed when
+			the system is up.
+			By specifying this option, all the hotpluggable memory
+			will be in ZONE_MOVABLE, which the kernel cannot use.
+			This will cause NUMA performance down. For users who
+			care about NUMA performance, just don't use it.
+			If all the memory ranges in the system are hotpluggable,
+			then the ones used by the kernel at early time, such as
+			kernel code and data segments, initrd file and so on,
+			won't be set as ZONE_MOVABLE, and won't be hotpluggable.
+			Otherwise the kernel won't have enough memory to boot.
+
 	MTD_Partition=	[MTD]
 			Format: <name>,<region-number>,<size>,<offset>
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index fa56a57..b87069b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1104,6 +1104,31 @@ void __init setup_arch(char **cmdline_p)
 	trim_platform_memory_ranges();
 	trim_low_memory_range();
 
+#ifdef CONFIG_MOVABLE_NODE
+	if (movablenode_enable_srat) {
+		/*
+		 * Memory used by the kernel cannot be hot-removed because Linux
+		 * cannot migrate the kernel pages. When memory hotplug is
+		 * enabled, we should prevent memblock from allocating memory
+		 * for the kernel.
+		 *
+		 * ACPI SRAT records all hotpluggable memory ranges. But before
+		 * SRAT is parsed, we don't know about it.
+		 *
+		 * The kernel image is loaded into memory at very early time. We
+		 * cannot prevent this anyway. So on NUMA system, we set any
+		 * node the kernel resides in as un-hotpluggable.
+		 *
+		 * Since on modern servers, one node could have double-digit
+		 * gigabytes memory, we can assume the memory around the kernel
+		 * image is also un-hotpluggable. So before SRAT is parsed, just
+		 * allocate memory near the kernel image to try the best to keep
+		 * the kernel away from hotpluggable memory.
+		 */
+		memblock_set_current_direction(MEMBLOCK_DIRECTION_LOW_TO_HIGH);
+	}
+#endif /* CONFIG_MOVABLE_NODE */
+
 	init_mem_mapping();
 
 	early_trap_pf_init();
@@ -1142,6 +1167,17 @@ void __init setup_arch(char **cmdline_p)
 	early_acpi_boot_init();
 
 	initmem_init();
+
+#ifdef CONFIG_MOVABLE_NODE
+	if (movablenode_enable_srat) {
+		/*
+		 * When ACPI SRAT is parsed, which is done in initmem_init(),
+		 * set memblock back to the default behavior.
+		 */
+		memblock_set_current_direction(MEMBLOCK_DIRECTION_DEFAULT);
+	}
+#endif /* CONFIG_MOVABLE_NODE */
+
 	memblock_find_dma_reserve();
 
 	/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index dd38e62..5d2c07b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,11 @@ enum {
 	ONLINE_MOVABLE,
 };
 
+#ifdef CONFIG_MOVABLE_NODE
+/* Enable/disable SRAT in movablenode boot option */
+extern bool movablenode_enable_srat;
+#endif /* CONFIG_MOVABLE_NODE */
+
 /*
  * pgdat resizing functions
  */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0eb1a1d..8a4c8ff 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1390,6 +1390,15 @@ static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
 {
 	return true;
 }
+
+bool __initdata movablenode_enable_srat;
+
+static int __init cmdline_parse_movablenode(char *p)
+{
+	movablenode_enable_srat = true;
+	return 0;
+}
+early_param("movablenode", cmdline_parse_movablenode);
 #else /* CONFIG_MOVABLE_NODE */
 /* ensure the node has NORMAL memory if it is still online */
 static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND PATCH v2 9/9] mem-hotplug: Introduce movablenode boot option to control memblock allocation direction.
@ 2013-09-12  9:52   ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-12  9:52 UTC (permalink / raw)
  To: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, toshi.kani
  Cc: x86, linux-doc, linux-kernel, linux-mm, linux-acpi

The Hot-Pluggable fired in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel,
it cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movablenode boot option to allow users to
choose to reserve hotpluggable memory and set it as ZONE_MOVABLE or not.

Users can specify "movablenode" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.

After memblock is ready, before SRAT is parsed, we should allocate memory
near the kernel image. So this patch does the following:

1. After memblock is ready, make memblock allocate memory from low address
   to high.
2. After SRAT is parsed, make memblock behave as default, allocate memory
   from high address to low.

This behavior is controlled by movablenode boot option.

Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |   15 ++++++++++++++
 arch/x86/kernel/setup.c             |   36 +++++++++++++++++++++++++++++++++++
 include/linux/memory_hotplug.h      |    5 ++++
 mm/memory_hotplug.c                 |    9 ++++++++
 4 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1a036cd..8c056c4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1769,6 +1769,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablenode		[KNL,X86] This parameter enables/disables the
+			kernel to arrange hotpluggable memory ranges recorded
+			in ACPI SRAT(System Resource Affinity Table) as
+			ZONE_MOVABLE. And these memory can be hot-removed when
+			the system is up.
+			By specifying this option, all the hotpluggable memory
+			will be in ZONE_MOVABLE, which the kernel cannot use.
+			This will cause NUMA performance down. For users who
+			care about NUMA performance, just don't use it.
+			If all the memory ranges in the system are hotpluggable,
+			then the ones used by the kernel at early time, such as
+			kernel code and data segments, initrd file and so on,
+			won't be set as ZONE_MOVABLE, and won't be hotpluggable.
+			Otherwise the kernel won't have enough memory to boot.
+
 	MTD_Partition=	[MTD]
 			Format: <name>,<region-number>,<size>,<offset>
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index fa56a57..b87069b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1104,6 +1104,31 @@ void __init setup_arch(char **cmdline_p)
 	trim_platform_memory_ranges();
 	trim_low_memory_range();
 
+#ifdef CONFIG_MOVABLE_NODE
+	if (movablenode_enable_srat) {
+		/*
+		 * Memory used by the kernel cannot be hot-removed because Linux
+		 * cannot migrate the kernel pages. When memory hotplug is
+		 * enabled, we should prevent memblock from allocating memory
+		 * for the kernel.
+		 *
+		 * ACPI SRAT records all hotpluggable memory ranges. But before
+		 * SRAT is parsed, we don't know about it.
+		 *
+		 * The kernel image is loaded into memory at very early time. We
+		 * cannot prevent this anyway. So on NUMA system, we set any
+		 * node the kernel resides in as un-hotpluggable.
+		 *
+		 * Since on modern servers, one node could have double-digit
+		 * gigabytes memory, we can assume the memory around the kernel
+		 * image is also un-hotpluggable. So before SRAT is parsed, just
+		 * allocate memory near the kernel image to try the best to keep
+		 * the kernel away from hotpluggable memory.
+		 */
+		memblock_set_current_direction(MEMBLOCK_DIRECTION_LOW_TO_HIGH);
+	}
+#endif /* CONFIG_MOVABLE_NODE */
+
 	init_mem_mapping();
 
 	early_trap_pf_init();
@@ -1142,6 +1167,17 @@ void __init setup_arch(char **cmdline_p)
 	early_acpi_boot_init();
 
 	initmem_init();
+
+#ifdef CONFIG_MOVABLE_NODE
+	if (movablenode_enable_srat) {
+		/*
+		 * When ACPI SRAT is parsed, which is done in initmem_init(),
+		 * set memblock back to the default behavior.
+		 */
+		memblock_set_current_direction(MEMBLOCK_DIRECTION_DEFAULT);
+	}
+#endif /* CONFIG_MOVABLE_NODE */
+
 	memblock_find_dma_reserve();
 
 	/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index dd38e62..5d2c07b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,11 @@ enum {
 	ONLINE_MOVABLE,
 };
 
+#ifdef CONFIG_MOVABLE_NODE
+/* Enable/disable SRAT in movablenode boot option */
+extern bool movablenode_enable_srat;
+#endif /* CONFIG_MOVABLE_NODE */
+
 /*
  * pgdat resizing functions
  */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0eb1a1d..8a4c8ff 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1390,6 +1390,15 @@ static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
 {
 	return true;
 }
+
+bool __initdata movablenode_enable_srat;
+
+static int __init cmdline_parse_movablenode(char *p)
+{
+	movablenode_enable_srat = true;
+	return 0;
+}
+early_param("movablenode", cmdline_parse_movablenode);
 #else /* CONFIG_MOVABLE_NODE */
 /* ensure the node has NORMAL memory if it is still online */
 static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RESEND PATCH v2 3/9] x86, dma: Support allocate memory from bottom upwards in dma_contiguous_reserve().
  2013-09-12  9:52   ` Tang Chen
@ 2013-09-12 19:22     ` Toshi Kani
  -1 siblings, 0 replies; 26+ messages in thread
From: Toshi Kani @ 2013-09-12 19:22 UTC (permalink / raw)
  To: Tang Chen
  Cc: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On Thu, 2013-09-12 at 17:52 +0800, Tang Chen wrote:
> During early boot, if the bottom up mode is set, just
> try allocating bottom up from the end of kernel image,
> and if that fails, do normal top down allocation.
> 
> So in function dma_contiguous_reserve(), we add the
> above logic.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/base/dma-contiguous.c |   17 ++++++++++++++---
>  1 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 99802d6..aada945 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -228,17 +228,28 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
>  			goto err;
>  		}
>  	} else {
> +		phys_addr_t addr;
> +
> +		if (memblock_direction_bottom_up()) {
> +			addr = memblock_alloc_bottom_up(
> +						MEMBLOCK_ALLOC_ACCESSIBLE,
> +						limit, size, alignment);
> +			if (addr)
> +				goto success;
> +		}

I am afraid that this version went to a wrong direction.  Allocating
from the bottom up needs to be an internal logic within the memblock
allocator.  It should not require the callers to be aware of the
direction and make a special request.

Thanks,
-Toshi


> +
>  		/*
>  		 * Use __memblock_alloc_base() since
>  		 * memblock_alloc_base() panic()s.
>  		 */
> -		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> +		addr = __memblock_alloc_base(size, alignment, limit);
>  		if (!addr) {
>  			ret = -ENOMEM;
>  			goto err;
> -		} else {
> -			base = addr;
>  		}
> +
> +success:
> +		base = addr;
>  	}
>  
>  	/*



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND PATCH v2 3/9] x86, dma: Support allocate memory from bottom upwards in dma_contiguous_reserve().
@ 2013-09-12 19:22     ` Toshi Kani
  0 siblings, 0 replies; 26+ messages in thread
From: Toshi Kani @ 2013-09-12 19:22 UTC (permalink / raw)
  To: Tang Chen
  Cc: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On Thu, 2013-09-12 at 17:52 +0800, Tang Chen wrote:
> During early boot, if the bottom up mode is set, just
> try allocating bottom up from the end of kernel image,
> and if that fails, do normal top down allocation.
> 
> So in function dma_contiguous_reserve(), we add the
> above logic.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> ---
>  drivers/base/dma-contiguous.c |   17 ++++++++++++++---
>  1 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 99802d6..aada945 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -228,17 +228,28 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
>  			goto err;
>  		}
>  	} else {
> +		phys_addr_t addr;
> +
> +		if (memblock_direction_bottom_up()) {
> +			addr = memblock_alloc_bottom_up(
> +						MEMBLOCK_ALLOC_ACCESSIBLE,
> +						limit, size, alignment);
> +			if (addr)
> +				goto success;
> +		}

I am afraid that this version went to a wrong direction.  Allocating
from the bottom up needs to be an internal logic within the memblock
allocator.  It should not require the callers to be aware of the
direction and make a special request.

Thanks,
-Toshi


> +
>  		/*
>  		 * Use __memblock_alloc_base() since
>  		 * memblock_alloc_base() panic()s.
>  		 */
> -		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
> +		addr = __memblock_alloc_base(size, alignment, limit);
>  		if (!addr) {
>  			ret = -ENOMEM;
>  			goto err;
> -		} else {
> -			base = addr;
>  		}
> +
> +success:
> +		base = addr;
>  	}
>  
>  	/*


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND PATCH v2 3/9] x86, dma: Support allocate memory from bottom upwards in dma_contiguous_reserve().
  2013-09-12 19:22     ` Toshi Kani
@ 2013-09-13  3:36       ` Tang Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-13  3:36 UTC (permalink / raw)
  To: Toshi Kani
  Cc: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hi Toshi,

On 09/13/2013 03:22 AM, Toshi Kani wrote:
......
>> +		if (memblock_direction_bottom_up()) {
>> +			addr = memblock_alloc_bottom_up(
>> +						MEMBLOCK_ALLOC_ACCESSIBLE,
>> +						limit, size, alignment);
>> +			if (addr)
>> +				goto success;
>> +		}
>
> I am afraid that this version went to a wrong direction.  Allocating
> from the bottom up needs to be an internal logic within the memblock
> allocator.  It should not require the callers to be aware of the
> direction and make a special request.
>

I think my v1 patch-set was trying to do so. Was it too complicated ?

So just move this logic to memblock_find_in_range_node(), is this OK ?

Thanks.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND PATCH v2 3/9] x86, dma: Support allocate memory from bottom upwards in dma_contiguous_reserve().
@ 2013-09-13  3:36       ` Tang Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Tang Chen @ 2013-09-13  3:36 UTC (permalink / raw)
  To: Toshi Kani
  Cc: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

Hi Toshi,

On 09/13/2013 03:22 AM, Toshi Kani wrote:
......
>> +		if (memblock_direction_bottom_up()) {
>> +			addr = memblock_alloc_bottom_up(
>> +						MEMBLOCK_ALLOC_ACCESSIBLE,
>> +						limit, size, alignment);
>> +			if (addr)
>> +				goto success;
>> +		}
>
> I am afraid that this version went to a wrong direction.  Allocating
> from the bottom up needs to be an internal logic within the memblock
> allocator.  It should not require the callers to be aware of the
> direction and make a special request.
>

I think my v1 patch-set was trying to do so. Was it too complicated ?

So just move this logic to memblock_find_in_range_node(), is this OK ?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND PATCH v2 3/9] x86, dma: Support allocate memory from bottom upwards in dma_contiguous_reserve().
  2013-09-13  3:36       ` Tang Chen
@ 2013-09-13 21:47         ` Toshi Kani
  -1 siblings, 0 replies; 26+ messages in thread
From: Toshi Kani @ 2013-09-13 21:47 UTC (permalink / raw)
  To: Tang Chen
  Cc: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On Fri, 2013-09-13 at 11:36 +0800, Tang Chen wrote:
> Hi Toshi,
> 
> On 09/13/2013 03:22 AM, Toshi Kani wrote:
> ......
> >> +		if (memblock_direction_bottom_up()) {
> >> +			addr = memblock_alloc_bottom_up(
> >> +						MEMBLOCK_ALLOC_ACCESSIBLE,
> >> +						limit, size, alignment);
> >> +			if (addr)
> >> +				goto success;
> >> +		}
> >
> > I am afraid that this version went to a wrong direction.  Allocating
> > from the bottom up needs to be an internal logic within the memblock
> > allocator.  It should not require the callers to be aware of the
> > direction and make a special request.
> >
> 
> I think my v1 patch-set was trying to do so. Was it too complicated ?
> 
> So just move this logic to memblock_find_in_range_node(), is this OK ?

Yes, the new version looks good on this.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND PATCH v2 3/9] x86, dma: Support allocate memory from bottom upwards in dma_contiguous_reserve().
@ 2013-09-13 21:47         ` Toshi Kani
  0 siblings, 0 replies; 26+ messages in thread
From: Toshi Kani @ 2013-09-13 21:47 UTC (permalink / raw)
  To: Tang Chen
  Cc: tj, rjw, lenb, tglx, mingo, hpa, akpm, trenn, yinghai, jiang.liu,
	wency, laijs, isimatu.yasuaki, izumi.taku, mgorman, minchan,
	mina86, gong.chen, vasilis.liaskovitis, lwoodman, riel, jweiner,
	prarit, zhangyanfei, x86, linux-doc, linux-kernel, linux-mm,
	linux-acpi

On Fri, 2013-09-13 at 11:36 +0800, Tang Chen wrote:
> Hi Toshi,
> 
> On 09/13/2013 03:22 AM, Toshi Kani wrote:
> ......
> >> +		if (memblock_direction_bottom_up()) {
> >> +			addr = memblock_alloc_bottom_up(
> >> +						MEMBLOCK_ALLOC_ACCESSIBLE,
> >> +						limit, size, alignment);
> >> +			if (addr)
> >> +				goto success;
> >> +		}
> >
> > I am afraid that this version went to a wrong direction.  Allocating
> > from the bottom up needs to be an internal logic within the memblock
> > allocator.  It should not require the callers to be aware of the
> > direction and make a special request.
> >
> 
> I think my v1 patch-set was trying to do so. Was it too complicated ?
> 
> So just move this logic to memblock_find_in_range_node(), is this OK ?

Yes, the new version looks good on this.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-09-13 21:49 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-12  9:52 [RESEND PATCH v2 0/9] x86, memblock: Allocate memory near kernel image before SRAT parsed Tang Chen
2013-09-12  9:52 ` Tang Chen
2013-09-12  9:52 ` [RESEND PATCH v2 1/9] memblock: Introduce allocation direction to memblock Tang Chen
2013-09-12  9:52   ` Tang Chen
2013-09-12  9:52 ` [RESEND PATCH v2 2/9] x86, memblock: Introduce memblock_alloc_bottom_up() " Tang Chen
2013-09-12  9:52   ` Tang Chen
2013-09-12  9:52 ` [RESEND PATCH v2 3/9] x86, dma: Support allocate memory from bottom upwards in dma_contiguous_reserve() Tang Chen
2013-09-12  9:52   ` Tang Chen
2013-09-12 19:22   ` Toshi Kani
2013-09-12 19:22     ` Toshi Kani
2013-09-13  3:36     ` Tang Chen
2013-09-13  3:36       ` Tang Chen
2013-09-13 21:47       ` Toshi Kani
2013-09-13 21:47         ` Toshi Kani
2013-09-12  9:52 ` [RESEND PATCH v2 4/9] x86: Support allocate memory from bottom upwards in setup_log_buf() Tang Chen
2013-09-12  9:52   ` Tang Chen
2013-09-12  9:52 ` [RESEND PATCH v2 5/9] x86: Support allocate memory from bottom upwards in relocate_initrd() Tang Chen
2013-09-12  9:52   ` Tang Chen
2013-09-12  9:52 ` [RESEND PATCH v2 6/9] x86, acpi: Support allocate memory from bottom upwards in acpi_initrd_override() Tang Chen
2013-09-12  9:52   ` Tang Chen
2013-09-12  9:52 ` [RESEND PATCH v2 7/9] x86, acpi, crash, kdump: Do reserve_crashkernel() after SRAT is parsed Tang Chen
2013-09-12  9:52   ` Tang Chen
2013-09-12  9:52 ` [RESEND PATCH v2 8/9] x86, mem-hotplug: Support initialize page tables from low to high Tang Chen
2013-09-12  9:52   ` Tang Chen
2013-09-12  9:52 ` [RESEND PATCH v2 9/9] mem-hotplug: Introduce movablenode boot option to control memblock allocation direction Tang Chen
2013-09-12  9:52   ` Tang Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.