linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] mm: Introduce kernelcore=mirror option
@ 2015-12-09  3:18 Taku Izumi
  2015-12-09  3:19 ` [PATCH v3 1/2] mm: Calculate zone_start_pfn at zone_spanned_pages_in_node() Taku Izumi
  2015-12-09  3:19 ` [PATCH v3 2/2] mm: Introduce kernelcore=mirror option Taku Izumi
  0 siblings, 2 replies; 20+ messages in thread
From: Taku Izumi @ 2015-12-09  3:18 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: tony.luck, qiuxishi, kamezawa.hiroyu, mel, dave.hansen, matt, Taku Izumi

Xeon E7 v3 based systems supports Address Range Mirroring
and UEFI BIOS complied with UEFI spec 2.5 can notify which
ranges are mirrored (reliable) via EFI memory map.
Now Linux kernel utilize its information and allocates 
boot time memory from reliable region. 

My requirement is:
  - allocate kernel memory from mirrored region 
  - allocate user memory from non-mirrored region

In order to meet my requirement, ZONE_MOVABLE is useful.
By arranging non-mirrored range into ZONE_MOVABLE, 
mirrored memory is used for kernel allocations.

My idea is to extend existing "kernelcore" option and 
introduces kernelcore=mirror option. By specifying
"mirror" instead of specifying the amount of memory,
non-mirrored region will be arranged into ZONE_MOVABLE.  

Earlier discussions are at: 
 https://lkml.org/lkml/2015/10/9/24
 https://lkml.org/lkml/2015/10/15/9
 https://lkml.org/lkml/2015/11/27/18

For example, suppose 2-nodes system with the following memory
 range: 
  node 0 [mem 0x0000000000001000-0x000000109fffffff] 
  node 1 [mem 0x00000010a0000000-0x000000209fffffff]

and the following ranges are marked as reliable (mirrored):
  [0x0000000000000000-0x0000000100000000] 
  [0x0000000100000000-0x0000000180000000] 
  [0x0000000800000000-0x0000000880000000] 
  [0x00000010a0000000-0x0000001120000000]
  [0x00000017a0000000-0x0000001820000000] 

If you specify kernelcore=mirror, ZONE_NORMAL and ZONE_MOVABLE
are arranged like bellow:

 - node 0:
  ZONE_NORMAL : [0x0000000100000000-0x00000010a0000000]
  ZONE_MOVABLE: [0x0000000180000000-0x00000010a0000000]
 - node 1: 
  ZONE_NORMAL : [0x00000010a0000000-0x00000020a0000000]
  ZONE_MOVABLE: [0x0000001120000000-0x00000020a0000000] 

In overlapped range, pages to be ZONE_MOVABLE in ZONE_NORMAL
are treated as absent pages, and vice versa.

v1 -> v2:
 Refine so that the above example case also can be
 handled properly:
v2 -> v3:
 Change the option name from kernelcore=reliable
 into kernelcore=mirror and some documentation fix
 according to Andrew Morton's point

 
Taku Izumi (2):
  mm: Calculate zone_start_pfn at zone_spanned_pages_in_node()
  mm: Introduce kernelcore=mirror option

 Documentation/kernel-parameters.txt |  11 ++-
 mm/page_alloc.c                     | 140 +++++++++++++++++++++++++++++++-----
 2 files changed, 133 insertions(+), 18 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/2] mm: Calculate zone_start_pfn at zone_spanned_pages_in_node()
  2015-12-09  3:18 [PATCH v3 0/2] mm: Introduce kernelcore=mirror option Taku Izumi
@ 2015-12-09  3:19 ` Taku Izumi
  2015-12-09  3:19 ` [PATCH v3 2/2] mm: Introduce kernelcore=mirror option Taku Izumi
  1 sibling, 0 replies; 20+ messages in thread
From: Taku Izumi @ 2015-12-09  3:19 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: tony.luck, qiuxishi, kamezawa.hiroyu, mel, dave.hansen, matt, Taku Izumi

Currently each zone's zone_start_pfn is calculated at
free_area_init_core(). However zone's range is fixed at
the time when invoking zone_spanned_pages_in_node().

This patch changes each zone->zone_start_pfn is
calculated at zone_spanned_pages_in_node().

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
---
 mm/page_alloc.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 17a3c66..acb0b4e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4928,31 +4928,31 @@ static unsigned long __meminit zone_spanned_pages_in_node(int nid,
 					unsigned long zone_type,
 					unsigned long node_start_pfn,
 					unsigned long node_end_pfn,
+					unsigned long *zone_start_pfn,
+					unsigned long *zone_end_pfn,
 					unsigned long *ignored)
 {
-	unsigned long zone_start_pfn, zone_end_pfn;
-
 	/* When hotadd a new node from cpu_up(), the node should be empty */
 	if (!node_start_pfn && !node_end_pfn)
 		return 0;
 
 	/* Get the start and end of the zone */
-	zone_start_pfn = arch_zone_lowest_possible_pfn[zone_type];
-	zone_end_pfn = arch_zone_highest_possible_pfn[zone_type];
+	*zone_start_pfn = arch_zone_lowest_possible_pfn[zone_type];
+	*zone_end_pfn = arch_zone_highest_possible_pfn[zone_type];
 	adjust_zone_range_for_zone_movable(nid, zone_type,
 				node_start_pfn, node_end_pfn,
-				&zone_start_pfn, &zone_end_pfn);
+				zone_start_pfn, zone_end_pfn);
 
 	/* Check that this node has pages within the zone's required range */
-	if (zone_end_pfn < node_start_pfn || zone_start_pfn > node_end_pfn)
+	if (*zone_end_pfn < node_start_pfn || *zone_start_pfn > node_end_pfn)
 		return 0;
 
 	/* Move the zone boundaries inside the node if necessary */
-	zone_end_pfn = min(zone_end_pfn, node_end_pfn);
-	zone_start_pfn = max(zone_start_pfn, node_start_pfn);
+	*zone_end_pfn = min(*zone_end_pfn, node_end_pfn);
+	*zone_start_pfn = max(*zone_start_pfn, node_start_pfn);
 
 	/* Return the spanned pages */
-	return zone_end_pfn - zone_start_pfn;
+	return *zone_end_pfn - *zone_start_pfn;
 }
 
 /*
@@ -5017,6 +5017,8 @@ static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
 					unsigned long zone_type,
 					unsigned long node_start_pfn,
 					unsigned long node_end_pfn,
+					unsigned long *zone_start_pfn,
+					unsigned long *zone_end_pfn,
 					unsigned long *zones_size)
 {
 	return zones_size[zone_type];
@@ -5047,15 +5049,22 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 
 	for (i = 0; i < MAX_NR_ZONES; i++) {
 		struct zone *zone = pgdat->node_zones + i;
+		unsigned long zone_start_pfn, zone_end_pfn;
 		unsigned long size, real_size;
 
 		size = zone_spanned_pages_in_node(pgdat->node_id, i,
 						  node_start_pfn,
 						  node_end_pfn,
+						  &zone_start_pfn,
+						  &zone_end_pfn,
 						  zones_size);
 		real_size = size - zone_absent_pages_in_node(pgdat->node_id, i,
 						  node_start_pfn, node_end_pfn,
 						  zholes_size);
+		if (size)
+			zone->zone_start_pfn = zone_start_pfn;
+		else
+			zone->zone_start_pfn = 0;
 		zone->spanned_pages = size;
 		zone->present_pages = real_size;
 
@@ -5176,7 +5185,6 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 {
 	enum zone_type j;
 	int nid = pgdat->node_id;
-	unsigned long zone_start_pfn = pgdat->node_start_pfn;
 	int ret;
 
 	pgdat_resize_init(pgdat);
@@ -5192,6 +5200,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 	for (j = 0; j < MAX_NR_ZONES; j++) {
 		struct zone *zone = pgdat->node_zones + j;
 		unsigned long size, realsize, freesize, memmap_pages;
+		unsigned long zone_start_pfn = zone->zone_start_pfn;
 
 		size = zone->spanned_pages;
 		realsize = freesize = zone->present_pages;
@@ -5260,7 +5269,6 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		ret = init_currently_empty_zone(zone, zone_start_pfn, size);
 		BUG_ON(ret);
 		memmap_init(size, nid, j, zone_start_pfn);
-		zone_start_pfn += size;
 	}
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-09  3:18 [PATCH v3 0/2] mm: Introduce kernelcore=mirror option Taku Izumi
  2015-12-09  3:19 ` [PATCH v3 1/2] mm: Calculate zone_start_pfn at zone_spanned_pages_in_node() Taku Izumi
@ 2015-12-09  3:19 ` Taku Izumi
  2015-12-09  3:28   ` Xishi Qiu
                     ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Taku Izumi @ 2015-12-09  3:19 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: tony.luck, qiuxishi, kamezawa.hiroyu, mel, dave.hansen, matt, Taku Izumi

This patch extends existing "kernelcore" option and
introduces kernelcore=mirror option. By specifying
"mirror" instead of specifying the amount of memory,
non-mirrored (non-reliable) region will be arranged
into ZONE_MOVABLE.

v1 -> v2:
 - Refine so that the following case also can be
   handled properly:

 Node X:  |MMMMMM------MMMMMM--------|
   (legend) M: mirrored  -: not mirrrored

 In this case, ZONE_NORMAL and ZONE_MOVABLE are
 arranged like bellow:

 Node X:  |MMMMMM------MMMMMM--------|
          |ooooooxxxxxxooooooxxxxxxxx| ZONE_NORMAL
                |ooooooxxxxxxoooooooo| ZONE_MOVABLE
   (legend) o: present  x: absent

v2 -> v3:
 - change the option name from kernelcore=reliable
   into kernelcore=mirror
 - documentation fix so that users can understand
   nn[KMS] and mirror are exclusive

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
---
 Documentation/kernel-parameters.txt |  11 +++-
 mm/page_alloc.c                     | 110 ++++++++++++++++++++++++++++++++++--
 2 files changed, 114 insertions(+), 7 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index f8aae63..b0ffc76 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1695,7 +1695,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 
 	keepinitrd	[HW,ARM]
 
-	kernelcore=nn[KMG]	[KNL,X86,IA-64,PPC] This parameter
+	kernelcore=	Format: nn[KMG] | "mirror"
+			[KNL,X86,IA-64,PPC] This parameter
 			specifies the amount of memory usable by the kernel
 			for non-movable allocations.  The requested amount is
 			spread evenly throughout all nodes in the system. The
@@ -1711,6 +1712,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			use the HighMem zone if it exists, and the Normal
 			zone if it does not.
 
+			Instead of specifying the amount of memory (nn[KMS]),
+			you can specify "mirror" option. In case "mirror"
+			option is specified, mirrored (reliable) memory is used
+			for non-movable allocations and remaining memory is used
+			for Movable pages. nn[KMS] and "mirror" are exclusive,
+			so you can NOT specify nn[KMG] and "mirror" at the same
+			time.
+
 	kgdbdbgp=	[KGDB,HW] kgdb over EHCI usb debug port.
 			Format: <Controller#>[,poll interval]
 			The controller # is the number of the ehci usb debug
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index acb0b4e..4157476 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -251,6 +251,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
+static bool mirrored_kernelcore;
 
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
@@ -4472,6 +4473,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	unsigned long pfn;
 	struct zone *z;
 	unsigned long nr_initialised = 0;
+	struct memblock_region *r = NULL, *tmp;
 
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
@@ -4491,6 +4493,38 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			if (!update_defer_init(pgdat, pfn, end_pfn,
 						&nr_initialised))
 				break;
+
+			/*
+			 * if not mirrored_kernelcore and ZONE_MOVABLE exists,
+			 * range from zone_movable_pfn[nid] to end of each node
+			 * should be ZONE_MOVABLE not ZONE_NORMAL. skip it.
+			 */
+			if (!mirrored_kernelcore && zone_movable_pfn[nid])
+				if (zone == ZONE_NORMAL &&
+				    pfn >= zone_movable_pfn[nid])
+					continue;
+
+			/*
+			 * check given memblock attribute by firmware which
+			 * can affect kernel memory layout.
+			 * if zone==ZONE_MOVABLE but memory is mirrored,
+			 * it's an overlapped memmap init. skip it.
+			 */
+			if (mirrored_kernelcore && zone == ZONE_MOVABLE) {
+				if (!r ||
+				    pfn >= memblock_region_memory_end_pfn(r)) {
+					for_each_memblock(memory, tmp)
+						if (pfn < memblock_region_memory_end_pfn(tmp))
+							break;
+					r = tmp;
+				}
+				if (pfn >= memblock_region_memory_base_pfn(r) &&
+				    memblock_is_mirror(r)) {
+					/* already initialized as NORMAL */
+					pfn = memblock_region_memory_end_pfn(r);
+					continue;
+				}
+			}
 		}
 
 		/*
@@ -4909,11 +4943,6 @@ static void __meminit adjust_zone_range_for_zone_movable(int nid,
 			*zone_end_pfn = min(node_end_pfn,
 				arch_zone_highest_possible_pfn[movable_zone]);
 
-		/* Adjust for ZONE_MOVABLE starting within this range */
-		} else if (*zone_start_pfn < zone_movable_pfn[nid] &&
-				*zone_end_pfn > zone_movable_pfn[nid]) {
-			*zone_end_pfn = zone_movable_pfn[nid];
-
 		/* Check if this whole range is within ZONE_MOVABLE */
 		} else if (*zone_start_pfn >= zone_movable_pfn[nid])
 			*zone_start_pfn = *zone_end_pfn;
@@ -4998,6 +5027,7 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
 	unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type];
 	unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type];
 	unsigned long zone_start_pfn, zone_end_pfn;
+	unsigned long nr_absent;
 
 	/* When hotadd a new node from cpu_up(), the node should be empty */
 	if (!node_start_pfn && !node_end_pfn)
@@ -5009,7 +5039,39 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
 	adjust_zone_range_for_zone_movable(nid, zone_type,
 			node_start_pfn, node_end_pfn,
 			&zone_start_pfn, &zone_end_pfn);
-	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
+	nr_absent = __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
+
+	/*
+	 * ZONE_MOVABLE handling.
+	 * Treat pages to be ZONE_MOVABLE in ZONE_NORMAL as absent pages
+	 * and vice versa.
+	 */
+	if (zone_movable_pfn[nid]) {
+		if (mirrored_kernelcore) {
+			unsigned long start_pfn, end_pfn;
+			struct memblock_region *r;
+
+			for_each_memblock(memory, r) {
+				start_pfn = clamp(memblock_region_memory_base_pfn(r),
+						  zone_start_pfn, zone_end_pfn);
+				end_pfn = clamp(memblock_region_memory_end_pfn(r),
+						zone_start_pfn, zone_end_pfn);
+
+				if (zone_type == ZONE_MOVABLE &&
+				    memblock_is_mirror(r))
+					nr_absent += end_pfn - start_pfn;
+
+				if (zone_type == ZONE_NORMAL &&
+				    !memblock_is_mirror(r))
+					nr_absent += end_pfn - start_pfn;
+			}
+		} else {
+			if (zone_type == ZONE_NORMAL)
+				nr_absent += node_end_pfn - zone_movable_pfn[nid];
+		}
+	}
+
+	return nr_absent;
 }
 
 #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
@@ -5507,6 +5569,36 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	}
 
 	/*
+	 * If kernelcore=mirror is specified, ignore movablecore option
+	 */
+	if (mirrored_kernelcore) {
+		bool mem_below_4gb_not_mirrored = false;
+
+		for_each_memblock(memory, r) {
+			if (memblock_is_mirror(r))
+				continue;
+
+			nid = r->nid;
+
+			usable_startpfn = memblock_region_memory_base_pfn(r);
+
+			if (usable_startpfn < 0x100000) {
+				mem_below_4gb_not_mirrored = true;
+				continue;
+			}
+
+			zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
+				min(usable_startpfn, zone_movable_pfn[nid]) :
+				usable_startpfn;
+		}
+
+		if (mem_below_4gb_not_mirrored)
+			pr_warn("This configuration results in unmirrored kernel memory.");
+
+		goto out2;
+	}
+
+	/*
 	 * If movablecore=nn[KMG] was specified, calculate what size of
 	 * kernelcore that corresponds so that memory usable for
 	 * any allocation type is evenly spread. If both kernelcore
@@ -5766,6 +5858,12 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
  */
 static int __init cmdline_parse_kernelcore(char *p)
 {
+	/* parse kernelcore=mirror */
+	if (parse_option_str(p, "mirror")) {
+		mirrored_kernelcore = true;
+		return 0;
+	}
+
 	return cmdline_parse_core(p, &required_kernelcore);
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-09  3:19 ` [PATCH v3 2/2] mm: Introduce kernelcore=mirror option Taku Izumi
@ 2015-12-09  3:28   ` Xishi Qiu
  2015-12-09 21:59     ` Luck, Tony
  2015-12-28 22:21   ` Andrew Morton
  2015-12-28 22:32   ` Andrew Morton
  2 siblings, 1 reply; 20+ messages in thread
From: Xishi Qiu @ 2015-12-09  3:28 UTC (permalink / raw)
  To: Taku Izumi
  Cc: linux-kernel, linux-mm, akpm, tony.luck, kamezawa.hiroyu, mel,
	dave.hansen, matt

On 2015/12/9 11:19, Taku Izumi wrote:

> This patch extends existing "kernelcore" option and
> introduces kernelcore=mirror option. By specifying
> "mirror" instead of specifying the amount of memory,
> non-mirrored (non-reliable) region will be arranged
> into ZONE_MOVABLE.
> 
> v1 -> v2:
>  - Refine so that the following case also can be
>    handled properly:
> 
>  Node X:  |MMMMMM------MMMMMM--------|
>    (legend) M: mirrored  -: not mirrrored
> 
>  In this case, ZONE_NORMAL and ZONE_MOVABLE are
>  arranged like bellow:
> 
>  Node X:  |MMMMMM------MMMMMM--------|
>           |ooooooxxxxxxooooooxxxxxxxx| ZONE_NORMAL
>                 |ooooooxxxxxxoooooooo| ZONE_MOVABLE
>    (legend) o: present  x: absent
> 
> v2 -> v3:
>  - change the option name from kernelcore=reliable
>    into kernelcore=mirror
>  - documentation fix so that users can understand
>    nn[KMS] and mirror are exclusive
> 
> Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
> ---
>  Documentation/kernel-parameters.txt |  11 +++-
>  mm/page_alloc.c                     | 110 ++++++++++++++++++++++++++++++++++--
>  2 files changed, 114 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index f8aae63..b0ffc76 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -1695,7 +1695,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  
>  	keepinitrd	[HW,ARM]
>  
> -	kernelcore=nn[KMG]	[KNL,X86,IA-64,PPC] This parameter
> +	kernelcore=	Format: nn[KMG] | "mirror"
> +			[KNL,X86,IA-64,PPC] This parameter
>  			specifies the amount of memory usable by the kernel
>  			for non-movable allocations.  The requested amount is
>  			spread evenly throughout all nodes in the system. The
> @@ -1711,6 +1712,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  			use the HighMem zone if it exists, and the Normal
>  			zone if it does not.
>  
> +			Instead of specifying the amount of memory (nn[KMS]),
> +			you can specify "mirror" option. In case "mirror"
> +			option is specified, mirrored (reliable) memory is used
> +			for non-movable allocations and remaining memory is used
> +			for Movable pages. nn[KMS] and "mirror" are exclusive,
> +			so you can NOT specify nn[KMG] and "mirror" at the same
> +			time.
> +

Hi Taku,

How about add some comment, if mirrored memroy is too small, then the
normal zone is small, so it may be oom.
The mirrored memory is at least 1/64 of whole memory, because struct
pages usually take 64 bytes per page.

Thanks,
Xishi Qiu

>  	kgdbdbgp=	[KGDB,HW] kgdb over EHCI usb debug port.
>  			Format: <Controller#>[,poll interval]
>  			The controller # is the number of the ehci usb debug
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index acb0b4e..4157476 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -251,6 +251,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
>  static unsigned long __initdata required_kernelcore;
>  static unsigned long __initdata required_movablecore;
>  static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
> +static bool mirrored_kernelcore;
>  
>  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
>  int movable_zone;
> @@ -4472,6 +4473,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  	unsigned long pfn;
>  	struct zone *z;
>  	unsigned long nr_initialised = 0;
> +	struct memblock_region *r = NULL, *tmp;
>  
>  	if (highest_memmap_pfn < end_pfn - 1)
>  		highest_memmap_pfn = end_pfn - 1;
> @@ -4491,6 +4493,38 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  			if (!update_defer_init(pgdat, pfn, end_pfn,
>  						&nr_initialised))
>  				break;
> +
> +			/*
> +			 * if not mirrored_kernelcore and ZONE_MOVABLE exists,
> +			 * range from zone_movable_pfn[nid] to end of each node
> +			 * should be ZONE_MOVABLE not ZONE_NORMAL. skip it.
> +			 */
> +			if (!mirrored_kernelcore && zone_movable_pfn[nid])
> +				if (zone == ZONE_NORMAL &&
> +				    pfn >= zone_movable_pfn[nid])
> +					continue;
> +
> +			/*
> +			 * check given memblock attribute by firmware which
> +			 * can affect kernel memory layout.
> +			 * if zone==ZONE_MOVABLE but memory is mirrored,
> +			 * it's an overlapped memmap init. skip it.
> +			 */
> +			if (mirrored_kernelcore && zone == ZONE_MOVABLE) {
> +				if (!r ||
> +				    pfn >= memblock_region_memory_end_pfn(r)) {
> +					for_each_memblock(memory, tmp)
> +						if (pfn < memblock_region_memory_end_pfn(tmp))
> +							break;
> +					r = tmp;
> +				}
> +				if (pfn >= memblock_region_memory_base_pfn(r) &&
> +				    memblock_is_mirror(r)) {
> +					/* already initialized as NORMAL */
> +					pfn = memblock_region_memory_end_pfn(r);
> +					continue;
> +				}
> +			}
>  		}
>  
>  		/*
> @@ -4909,11 +4943,6 @@ static void __meminit adjust_zone_range_for_zone_movable(int nid,
>  			*zone_end_pfn = min(node_end_pfn,
>  				arch_zone_highest_possible_pfn[movable_zone]);
>  
> -		/* Adjust for ZONE_MOVABLE starting within this range */
> -		} else if (*zone_start_pfn < zone_movable_pfn[nid] &&
> -				*zone_end_pfn > zone_movable_pfn[nid]) {
> -			*zone_end_pfn = zone_movable_pfn[nid];
> -
>  		/* Check if this whole range is within ZONE_MOVABLE */
>  		} else if (*zone_start_pfn >= zone_movable_pfn[nid])
>  			*zone_start_pfn = *zone_end_pfn;
> @@ -4998,6 +5027,7 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
>  	unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type];
>  	unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type];
>  	unsigned long zone_start_pfn, zone_end_pfn;
> +	unsigned long nr_absent;
>  
>  	/* When hotadd a new node from cpu_up(), the node should be empty */
>  	if (!node_start_pfn && !node_end_pfn)
> @@ -5009,7 +5039,39 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
>  	adjust_zone_range_for_zone_movable(nid, zone_type,
>  			node_start_pfn, node_end_pfn,
>  			&zone_start_pfn, &zone_end_pfn);
> -	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
> +	nr_absent = __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
> +
> +	/*
> +	 * ZONE_MOVABLE handling.
> +	 * Treat pages to be ZONE_MOVABLE in ZONE_NORMAL as absent pages
> +	 * and vice versa.
> +	 */
> +	if (zone_movable_pfn[nid]) {
> +		if (mirrored_kernelcore) {
> +			unsigned long start_pfn, end_pfn;
> +			struct memblock_region *r;
> +
> +			for_each_memblock(memory, r) {
> +				start_pfn = clamp(memblock_region_memory_base_pfn(r),
> +						  zone_start_pfn, zone_end_pfn);
> +				end_pfn = clamp(memblock_region_memory_end_pfn(r),
> +						zone_start_pfn, zone_end_pfn);
> +
> +				if (zone_type == ZONE_MOVABLE &&
> +				    memblock_is_mirror(r))
> +					nr_absent += end_pfn - start_pfn;
> +
> +				if (zone_type == ZONE_NORMAL &&
> +				    !memblock_is_mirror(r))
> +					nr_absent += end_pfn - start_pfn;
> +			}
> +		} else {
> +			if (zone_type == ZONE_NORMAL)
> +				nr_absent += node_end_pfn - zone_movable_pfn[nid];
> +		}
> +	}
> +
> +	return nr_absent;
>  }
>  
>  #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> @@ -5507,6 +5569,36 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>  	}
>  
>  	/*
> +	 * If kernelcore=mirror is specified, ignore movablecore option
> +	 */
> +	if (mirrored_kernelcore) {
> +		bool mem_below_4gb_not_mirrored = false;
> +
> +		for_each_memblock(memory, r) {
> +			if (memblock_is_mirror(r))
> +				continue;
> +
> +			nid = r->nid;
> +
> +			usable_startpfn = memblock_region_memory_base_pfn(r);
> +
> +			if (usable_startpfn < 0x100000) {
> +				mem_below_4gb_not_mirrored = true;
> +				continue;
> +			}
> +
> +			zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
> +				min(usable_startpfn, zone_movable_pfn[nid]) :
> +				usable_startpfn;
> +		}
> +
> +		if (mem_below_4gb_not_mirrored)
> +			pr_warn("This configuration results in unmirrored kernel memory.");
> +
> +		goto out2;
> +	}
> +
> +	/*
>  	 * If movablecore=nn[KMG] was specified, calculate what size of
>  	 * kernelcore that corresponds so that memory usable for
>  	 * any allocation type is evenly spread. If both kernelcore
> @@ -5766,6 +5858,12 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
>   */
>  static int __init cmdline_parse_kernelcore(char *p)
>  {
> +	/* parse kernelcore=mirror */
> +	if (parse_option_str(p, "mirror")) {
> +		mirrored_kernelcore = true;
> +		return 0;
> +	}
> +
>  	return cmdline_parse_core(p, &required_kernelcore);
>  }
>  




^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-09  3:28   ` Xishi Qiu
@ 2015-12-09 21:59     ` Luck, Tony
  2015-12-10  1:14       ` Xishi Qiu
  0 siblings, 1 reply; 20+ messages in thread
From: Luck, Tony @ 2015-12-09 21:59 UTC (permalink / raw)
  To: Xishi Qiu, Taku Izumi
  Cc: linux-kernel, linux-mm, akpm, kamezawa.hiroyu, mel, Hansen, Dave, matt

> How about add some comment, if mirrored memroy is too small, then the
> normal zone is small, so it may be oom.
> The mirrored memory is at least 1/64 of whole memory, because struct
> pages usually take 64 bytes per page.

1/64th is the absolute lower bound (for the page structures as you say). I
expect people will need to configure 10% or more to run any real workloads.

I made the memblock boot time allocator fall back to non-mirrored memory
if mirrored memory ran out.  What happens in the run time allocator if the
non-movable zones run out of pages? Will we allocate kernel pages from movable
memory?

-Tony

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-09 21:59     ` Luck, Tony
@ 2015-12-10  1:14       ` Xishi Qiu
  2015-12-10  5:37         ` Izumi, Taku
  0 siblings, 1 reply; 20+ messages in thread
From: Xishi Qiu @ 2015-12-10  1:14 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Taku Izumi, linux-kernel, linux-mm, akpm, kamezawa.hiroyu, mel,
	Hansen, Dave, matt

On 2015/12/10 5:59, Luck, Tony wrote:

>> How about add some comment, if mirrored memroy is too small, then the
>> normal zone is small, so it may be oom.
>> The mirrored memory is at least 1/64 of whole memory, because struct
>> pages usually take 64 bytes per page.
> 
> 1/64th is the absolute lower bound (for the page structures as you say). I
> expect people will need to configure 10% or more to run any real workloads.
> 
> I made the memblock boot time allocator fall back to non-mirrored memory
> if mirrored memory ran out.  What happens in the run time allocator if the
> non-movable zones run out of pages? Will we allocate kernel pages from movable
> memory?
> 

As I know, the kernel pages will not allocated from movable zone.

Thanks,
Xishi Qiu

> -Tony
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-10  1:14       ` Xishi Qiu
@ 2015-12-10  5:37         ` Izumi, Taku
  2015-12-10  6:13           ` Xishi Qiu
  0 siblings, 1 reply; 20+ messages in thread
From: Izumi, Taku @ 2015-12-10  5:37 UTC (permalink / raw)
  To: Xishi Qiu, Luck, Tony
  Cc: linux-kernel, linux-mm, akpm, Kamezawa, Hiroyuki, mel, Hansen,
	Dave, matt

Dear Tony, Xishi,

> >> How about add some comment, if mirrored memroy is too small, then the
> >> normal zone is small, so it may be oom.
> >> The mirrored memory is at least 1/64 of whole memory, because struct
> >> pages usually take 64 bytes per page.
> >
> > 1/64th is the absolute lower bound (for the page structures as you say). I
> > expect people will need to configure 10% or more to run any real workloads.

> >
> > I made the memblock boot time allocator fall back to non-mirrored memory
> > if mirrored memory ran out.  What happens in the run time allocator if the
> > non-movable zones run out of pages? Will we allocate kernel pages from movable
> > memory?
> >
> 
> As I know, the kernel pages will not allocated from movable zone.

 Yes, kernel pages are not allocated from ZONE_MOVABLE.

 In this case administrator must review and reconfigure the mirror ratio via 
 "MirrorRequest" EFI variable.
 
  Sincerely,
  Taku Izumi

>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> > .
> >
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-10  5:37         ` Izumi, Taku
@ 2015-12-10  6:13           ` Xishi Qiu
  2015-12-11  5:53             ` Izumi, Taku
  0 siblings, 1 reply; 20+ messages in thread
From: Xishi Qiu @ 2015-12-10  6:13 UTC (permalink / raw)
  To: Izumi, Taku
  Cc: Luck, Tony, linux-kernel, linux-mm, akpm, Kamezawa, Hiroyuki,
	mel, Hansen, Dave, matt

On 2015/12/10 13:37, Izumi, Taku wrote:

> Dear Tony, Xishi,
> 
>>>> How about add some comment, if mirrored memroy is too small, then the
>>>> normal zone is small, so it may be oom.
>>>> The mirrored memory is at least 1/64 of whole memory, because struct
>>>> pages usually take 64 bytes per page.
>>>
>>> 1/64th is the absolute lower bound (for the page structures as you say). I
>>> expect people will need to configure 10% or more to run any real workloads.
> 
>>>
>>> I made the memblock boot time allocator fall back to non-mirrored memory
>>> if mirrored memory ran out.  What happens in the run time allocator if the
>>> non-movable zones run out of pages? Will we allocate kernel pages from movable
>>> memory?
>>>
>>
>> As I know, the kernel pages will not allocated from movable zone.
> 
>  Yes, kernel pages are not allocated from ZONE_MOVABLE.
> 
>  In this case administrator must review and reconfigure the mirror ratio via 
>  "MirrorRequest" EFI variable.
>  
>   Sincerely,
>   Taku Izumi
> 

Hi Taku,

Whether it is possible that we rewrite the fallback function in buddy system
when zone_movable and mirrored_kernelcore are both enabled?

It seems something like that we add a new zone but the name is zone_movable,
not zone_mirror. And the prerequisite is that we won't enable these two
features(movable memory and mirrored memory) at the same time. Thus we can
reuse the code of movable zone.

Thanks,
Xishi Qiu

>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
>>> .
>>>
>>
>>
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-10  6:13           ` Xishi Qiu
@ 2015-12-11  5:53             ` Izumi, Taku
  2015-12-11  9:44               ` Xishi Qiu
  0 siblings, 1 reply; 20+ messages in thread
From: Izumi, Taku @ 2015-12-11  5:53 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Luck, Tony, linux-kernel, linux-mm, akpm, Kamezawa, Hiroyuki,
	mel, Hansen, Dave, matt

Dear Xishi,

> Hi Taku,
> 
> Whether it is possible that we rewrite the fallback function in buddy system
> when zone_movable and mirrored_kernelcore are both enabled?

  What does "when zone_movable and mirrored_kernelcore are both enabled?" mean ?
  
  My patchset just provides a new way to create ZONE_MOVABLE.

  Sincerely,
  Taku Izumi
> 
> It seems something like that we add a new zone but the name is zone_movable,
> not zone_mirror. And the prerequisite is that we won't enable these two
> features(movable memory and mirrored memory) at the same time. Thus we can
> reuse the code of movable zone.
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-11  5:53             ` Izumi, Taku
@ 2015-12-11  9:44               ` Xishi Qiu
  2015-12-17  1:38                 ` Izumi, Taku
  0 siblings, 1 reply; 20+ messages in thread
From: Xishi Qiu @ 2015-12-11  9:44 UTC (permalink / raw)
  To: Izumi, Taku
  Cc: Luck, Tony, linux-kernel, linux-mm, akpm, Kamezawa, Hiroyuki,
	mel, Hansen, Dave, matt

On 2015/12/11 13:53, Izumi, Taku wrote:

> Dear Xishi,
> 
>> Hi Taku,
>>
>> Whether it is possible that we rewrite the fallback function in buddy system
>> when zone_movable and mirrored_kernelcore are both enabled?
> 
>   What does "when zone_movable and mirrored_kernelcore are both enabled?" mean ?
>   
>   My patchset just provides a new way to create ZONE_MOVABLE.
> 

Hi Taku,

I mean when zone_movable is from kernelcore=mirror, not kernelcore=nn[KMG].

Thanks,
Xishi Qiu

>   Sincerely,
>   Taku Izumi
>>
>> It seems something like that we add a new zone but the name is zone_movable,
>> not zone_mirror. And the prerequisite is that we won't enable these two
>> features(movable memory and mirrored memory) at the same time. Thus we can
>> reuse the code of movable zone.
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-11  9:44               ` Xishi Qiu
@ 2015-12-17  1:38                 ` Izumi, Taku
  2015-12-17  2:47                   ` Xishi Qiu
  0 siblings, 1 reply; 20+ messages in thread
From: Izumi, Taku @ 2015-12-17  1:38 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Luck, Tony, linux-kernel, linux-mm, akpm, Kamezawa, Hiroyuki,
	mel, Hansen, Dave, matt

Dear Xishi,

 Sorry for late.

> -----Original Message-----
> From: Xishi Qiu [mailto:qiuxishi@huawei.com]
> Sent: Friday, December 11, 2015 6:44 PM
> To: Izumi, Taku/泉 拓
> Cc: Luck, Tony; linux-kernel@vger.kernel.org; linux-mm@kvack.org; akpm@linux-foundation.org; Kamezawa, Hiroyuki/亀澤 寛
> 之; mel@csn.ul.ie; Hansen, Dave; matt@codeblueprint.co.uk
> Subject: Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
> 
> On 2015/12/11 13:53, Izumi, Taku wrote:
> 
> > Dear Xishi,
> >
> >> Hi Taku,
> >>
> >> Whether it is possible that we rewrite the fallback function in buddy system
> >> when zone_movable and mirrored_kernelcore are both enabled?
> >
> >   What does "when zone_movable and mirrored_kernelcore are both enabled?" mean ?
> >
> >   My patchset just provides a new way to create ZONE_MOVABLE.
> >
> 
> Hi Taku,
> 
> I mean when zone_movable is from kernelcore=mirror, not kernelcore=nn[KMG].

  I'm not quite sure what you are saying, but if you want to screen user memory
  so that one is allocated from mirrored zone and another is from non-mirrored zone,
  I think it is possible to reuse my patchset.

  Sincerely,
  Taku Izumi

> Thanks,
> Xishi Qiu
> 
> >   Sincerely,
> >   Taku Izumi
> >>
> >> It seems something like that we add a new zone but the name is zone_movable,
> >> not zone_mirror. And the prerequisite is that we won't enable these two
> >> features(movable memory and mirrored memory) at the same time. Thus we can
> >> reuse the code of movable zone.
> >>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> > .
> >
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-17  1:38                 ` Izumi, Taku
@ 2015-12-17  2:47                   ` Xishi Qiu
  2015-12-17  2:53                     ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Xishi Qiu @ 2015-12-17  2:47 UTC (permalink / raw)
  To: Izumi, Taku
  Cc: Luck, Tony, linux-kernel, linux-mm, akpm, Kamezawa, Hiroyuki,
	mel, Hansen, Dave, matt

On 2015/12/17 9:38, Izumi, Taku wrote:

> Dear Xishi,
> 
>  Sorry for late.
> 
>> -----Original Message-----
>> From: Xishi Qiu [mailto:qiuxishi@huawei.com]
>> Sent: Friday, December 11, 2015 6:44 PM
>> To: Izumi, Taku/泉 拓
>> Cc: Luck, Tony; linux-kernel@vger.kernel.org; linux-mm@kvack.org; akpm@linux-foundation.org; Kamezawa, Hiroyuki/亀澤 寛
>> 之; mel@csn.ul.ie; Hansen, Dave; matt@codeblueprint.co.uk
>> Subject: Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
>>
>> On 2015/12/11 13:53, Izumi, Taku wrote:
>>
>>> Dear Xishi,
>>>
>>>> Hi Taku,
>>>>
>>>> Whether it is possible that we rewrite the fallback function in buddy system
>>>> when zone_movable and mirrored_kernelcore are both enabled?
>>>
>>>   What does "when zone_movable and mirrored_kernelcore are both enabled?" mean ?
>>>
>>>   My patchset just provides a new way to create ZONE_MOVABLE.
>>>
>>
>> Hi Taku,
>>

Hi Taku,

We can NOT specify kernelcore= "nn[KMG]" and "mirror" at the same time.
So when we use "mirror", in fact, the movable zone is a new zone. I think it is
more appropriate with this name "mirrored zone", and also we can rewrite the
fallback function in buddy system in this case.

Thanks,
Xishi Qiu

>> I mean when zone_movable is from kernelcore=mirror, not kernelcore=nn[KMG].
> 
>   I'm not quite sure what you are saying, but if you want to screen user memory
>   so that one is allocated from mirrored zone and another is from non-mirrored zone,
>   I think it is possible to reuse my patchset.
> 
>   Sincerely,
>   Taku Izumi
> 
>> Thanks,
>> Xishi Qiu
>>
>>>   Sincerely,
>>>   Taku Izumi
>>>>
>>>> It seems something like that we add a new zone but the name is zone_movable,
>>>> not zone_mirror. And the prerequisite is that we won't enable these two
>>>> features(movable memory and mirrored memory) at the same time. Thus we can
>>>> reuse the code of movable zone.
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
>>> .
>>>
>>
>>
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-17  2:47                   ` Xishi Qiu
@ 2015-12-17  2:53                     ` Kamezawa Hiroyuki
  2015-12-17  4:48                       ` Xishi Qiu
  0 siblings, 1 reply; 20+ messages in thread
From: Kamezawa Hiroyuki @ 2015-12-17  2:53 UTC (permalink / raw)
  To: Xishi Qiu, Izumi, Taku
  Cc: Luck, Tony, linux-kernel, linux-mm, akpm, mel, Hansen, Dave, matt

On 2015/12/17 11:47, Xishi Qiu wrote:
> On 2015/12/17 9:38, Izumi, Taku wrote:
> 
>> Dear Xishi,
>>
>>   Sorry for late.
>>
>>> -----Original Message-----
>>> From: Xishi Qiu [mailto:qiuxishi@huawei.com]
>>> Sent: Friday, December 11, 2015 6:44 PM
>>> To: Izumi, Taku/泉 拓
>>> Cc: Luck, Tony; linux-kernel@vger.kernel.org; linux-mm@kvack.org; akpm@linux-foundation.org; Kamezawa, Hiroyuki/亀澤 寛
>>> 之; mel@csn.ul.ie; Hansen, Dave; matt@codeblueprint.co.uk
>>> Subject: Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
>>>
>>> On 2015/12/11 13:53, Izumi, Taku wrote:
>>>
>>>> Dear Xishi,
>>>>
>>>>> Hi Taku,
>>>>>
>>>>> Whether it is possible that we rewrite the fallback function in buddy system
>>>>> when zone_movable and mirrored_kernelcore are both enabled?
>>>>
>>>>    What does "when zone_movable and mirrored_kernelcore are both enabled?" mean ?
>>>>
>>>>    My patchset just provides a new way to create ZONE_MOVABLE.
>>>>
>>>
>>> Hi Taku,
>>>
> 
> Hi Taku,
> 
> We can NOT specify kernelcore= "nn[KMG]" and "mirror" at the same time.
> So when we use "mirror", in fact, the movable zone is a new zone. I think it is
> more appropriate with this name "mirrored zone", and also we can rewrite the
> fallback function in buddy system in this case.

kernelcore ="mirrored zone" ?

BTW, let me confirm.

  ZONE_NORMAL = mirrored
  ZONE_MOVABLE = not mirrored.

so, the new zone is "not-mirrored" zone.

Now, fallback function is

   movable -> normal -> DMA.

As Tony requested, we may need a knob to stop a fallback in "movable->normal", later.

Thanks,
-Kame







^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-17  2:53                     ` Kamezawa Hiroyuki
@ 2015-12-17  4:48                       ` Xishi Qiu
  2015-12-17  5:01                         ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Xishi Qiu @ 2015-12-17  4:48 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Izumi, Taku, Luck, Tony, linux-kernel, linux-mm, akpm, mel,
	Hansen, Dave, matt

On 2015/12/17 10:53, Kamezawa Hiroyuki wrote:

> On 2015/12/17 11:47, Xishi Qiu wrote:
>> On 2015/12/17 9:38, Izumi, Taku wrote:
>>
>>> Dear Xishi,
>>>
>>>   Sorry for late.
>>>
>>>> -----Original Message-----
>>>> From: Xishi Qiu [mailto:qiuxishi@huawei.com]
>>>> Sent: Friday, December 11, 2015 6:44 PM
>>>> To: Izumi, Taku/泉 拓
>>>> Cc: Luck, Tony; linux-kernel@vger.kernel.org; linux-mm@kvack.org; akpm@linux-foundation.org; Kamezawa, Hiroyuki/亀澤 寛
>>>> 之; mel@csn.ul.ie; Hansen, Dave; matt@codeblueprint.co.uk
>>>> Subject: Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
>>>>
>>>> On 2015/12/11 13:53, Izumi, Taku wrote:
>>>>
>>>>> Dear Xishi,
>>>>>
>>>>>> Hi Taku,
>>>>>>
>>>>>> Whether it is possible that we rewrite the fallback function in buddy system
>>>>>> when zone_movable and mirrored_kernelcore are both enabled?
>>>>>
>>>>>    What does "when zone_movable and mirrored_kernelcore are both enabled?" mean ?
>>>>>
>>>>>    My patchset just provides a new way to create ZONE_MOVABLE.
>>>>>
>>>>
>>>> Hi Taku,
>>>>
>>
>> Hi Taku,
>>
>> We can NOT specify kernelcore= "nn[KMG]" and "mirror" at the same time.
>> So when we use "mirror", in fact, the movable zone is a new zone. I think it is
>> more appropriate with this name "mirrored zone", and also we can rewrite the
>> fallback function in buddy system in this case.
> 
> kernelcore ="mirrored zone" ?

No, it's zone_names[MAX_NR_ZONES]
How about "Movable", -> "Non-mirrored"?

> 
> BTW, let me confirm.
> 
>   ZONE_NORMAL = mirrored
>   ZONE_MOVABLE = not mirrored.
> 

Yes, 

> so, the new zone is "not-mirrored" zone.
> 
> Now, fallback function is
> 
>    movable -> normal -> DMA.
> 
> As Tony requested, we may need a knob to stop a fallback in "movable->normal", later.
> 

If the mirrored memory is small and the other is large,
I think we can both enable "non-mirrored -> normal" and "normal -> non-mirrored".

Thanks,
Xishi Qiu

> Thanks,
> -Kame
> 
> 
> 
> 
> 
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-17  4:48                       ` Xishi Qiu
@ 2015-12-17  5:01                         ` Kamezawa Hiroyuki
  2015-12-17 18:43                           ` Luck, Tony
  0 siblings, 1 reply; 20+ messages in thread
From: Kamezawa Hiroyuki @ 2015-12-17  5:01 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Izumi, Taku, Luck, Tony, linux-kernel, linux-mm, akpm, mel,
	Hansen, Dave, matt

On 2015/12/17 13:48, Xishi Qiu wrote:
> On 2015/12/17 10:53, Kamezawa Hiroyuki wrote:
> 
>> On 2015/12/17 11:47, Xishi Qiu wrote:
>>> On 2015/12/17 9:38, Izumi, Taku wrote:
>>>
>>>> Dear Xishi,
>>>>
>>>>    Sorry for late.
>>>>
>>>>> -----Original Message-----
>>>>> From: Xishi Qiu [mailto:qiuxishi@huawei.com]
>>>>> Sent: Friday, December 11, 2015 6:44 PM
>>>>> To: Izumi, Taku/泉 拓
>>>>> Cc: Luck, Tony; linux-kernel@vger.kernel.org; linux-mm@kvack.org; akpm@linux-foundation.org; Kamezawa, Hiroyuki/亀澤 寛
>>>>> 之; mel@csn.ul.ie; Hansen, Dave; matt@codeblueprint.co.uk
>>>>> Subject: Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
>>>>>
>>>>> On 2015/12/11 13:53, Izumi, Taku wrote:
>>>>>
>>>>>> Dear Xishi,
>>>>>>
>>>>>>> Hi Taku,
>>>>>>>
>>>>>>> Whether it is possible that we rewrite the fallback function in buddy system
>>>>>>> when zone_movable and mirrored_kernelcore are both enabled?
>>>>>>
>>>>>>     What does "when zone_movable and mirrored_kernelcore are both enabled?" mean ?
>>>>>>
>>>>>>     My patchset just provides a new way to create ZONE_MOVABLE.
>>>>>>
>>>>>
>>>>> Hi Taku,
>>>>>
>>>
>>> Hi Taku,
>>>
>>> We can NOT specify kernelcore= "nn[KMG]" and "mirror" at the same time.
>>> So when we use "mirror", in fact, the movable zone is a new zone. I think it is
>>> more appropriate with this name "mirrored zone", and also we can rewrite the
>>> fallback function in buddy system in this case.
>>
>> kernelcore ="mirrored zone" ?
> 
> No, it's zone_names[MAX_NR_ZONES]
> How about "Movable", -> "Non-mirrored"?
> 
That will break many user apps. I think we don't have enough reason. 

>>
>> BTW, let me confirm.
>>
>>    ZONE_NORMAL = mirrored
>>    ZONE_MOVABLE = not mirrored.
>>
> 
> Yes,
> 
>> so, the new zone is "not-mirrored" zone.
>>
>> Now, fallback function is
>>
>>     movable -> normal -> DMA.
>>
>> As Tony requested, we may need a knob to stop a fallback in "movable->normal", later.
>>
> 
> If the mirrored memory is small and the other is large,
> I think we can both enable "non-mirrored -> normal" and "normal -> non-mirrored".

Size of mirrored memory can be configured by software(EFI var).
So, having both is just overkill and normal->non-mirroed fallback is meaningless considering
what the feature want to guarantee.

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-17  5:01                         ` Kamezawa Hiroyuki
@ 2015-12-17 18:43                           ` Luck, Tony
  2015-12-18  2:12                             ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Luck, Tony @ 2015-12-17 18:43 UTC (permalink / raw)
  To: Kamezawa Hiroyuki, Xishi Qiu
  Cc: Izumi, Taku, linux-kernel, linux-mm, akpm, mel, Hansen, Dave, matt

>>> As Tony requested, we may need a knob to stop a fallback in "movable->normal", later.
>>>
>> 
>> If the mirrored memory is small and the other is large,
>> I think we can both enable "non-mirrored -> normal" and "normal -> non-mirrored".
>
> Size of mirrored memory can be configured by software(EFI var).
> So, having both is just overkill and normal->non-mirroed fallback is meaningless considering
> what the feature want to guarantee.

In the original removable usage we wanted to guarantee that Linux did not allocate any
kernel objects in removable memory - because that would prevent later removal of that
memory.

Mirror case is the same - we don't want to allocate kernel structures in non-mirrored memory
because an uncorrectable error in one of them would crash the system.

But I think some users might like some flexibility here.  If the system doesn't have enough
memory for the kernel (non-movable or mirrored), then it seems odd to end up crashing
the system at the point of memory exhaustion (a likely result ... the kernel can try to reclaim
some pages from SLAB, but that might only return a few pages, if the shortage continues
the system will perform poorly and eventually fail).

The whole point of removable memory or mirrored memory is to provide better availability.

I'd vote for a mode where running out of memory for kernel results in a

   warn_on_once("Ran out of mirrored/non-removable memory for kernel - now allocating from all zones\n")

because I think most people would like the system to stay up rather than worry about some future problem that may never happen.

-Tony




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-17 18:43                           ` Luck, Tony
@ 2015-12-18  2:12                             ` Kamezawa Hiroyuki
  2015-12-18  6:59                               ` Luck, Tony
  0 siblings, 1 reply; 20+ messages in thread
From: Kamezawa Hiroyuki @ 2015-12-18  2:12 UTC (permalink / raw)
  To: Luck, Tony, Xishi Qiu
  Cc: Izumi, Taku, linux-kernel, linux-mm, akpm, mel, Hansen, Dave, matt

On 2015/12/18 3:43, Luck, Tony wrote:
>>>> As Tony requested, we may need a knob to stop a fallback in "movable->normal", later.
>>>>
>>>
>>> If the mirrored memory is small and the other is large,
>>> I think we can both enable "non-mirrored -> normal" and "normal -> non-mirrored".
>>
>> Size of mirrored memory can be configured by software(EFI var).
>> So, having both is just overkill and normal->non-mirroed fallback is meaningless considering
>> what the feature want to guarantee.
>
> In the original removable usage we wanted to guarantee that Linux did not allocate any
> kernel objects in removable memory - because that would prevent later removal of that
> memory.
>
> Mirror case is the same - we don't want to allocate kernel structures in non-mirrored memory
> because an uncorrectable error in one of them would crash the system.
>
> But I think some users might like some flexibility here.  If the system doesn't have enough
> memory for the kernel (non-movable or mirrored), then it seems odd to end up crashing
> the system at the point of memory exhaustion (a likely result ... the kernel can try to reclaim
> some pages from SLAB, but that might only return a few pages, if the shortage continues
> the system will perform poorly and eventually fail).
>
> The whole point of removable memory or mirrored memory is to provide better availability.
>
> I'd vote for a mode where running out of memory for kernel results in a
>
>     warn_on_once("Ran out of mirrored/non-removable memory for kernel - now allocating from all zones\n")
>
> because I think most people would like the system to stay up rather than worry about some future problem that may never happen.

Hmm...like this ?
       sysctl.vm.fallback_mirror_memory = 0  // never fallback  # default.
       sysctl.vm.fallback_mirror_memory = 1  // the user memory may be allocated from mirrored zone.
       sysctl.vm.fallback_mirror_memory = 2  // usually kernel allocates memory from mirrored zone before OOM.
       sysctl.vm.fallback_mirror_memory = 3  // 1+2

However I believe my customer's choice is always 0, above implementation can be done in a clean way.
(adding a flag to zones (mirrored or not) and controlling fallback zonelist walk.)

BTW, we need this Taku's patch to make a progress. I think other devs should be done in another
development cycle. What does he need to get your Acks ?

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-18  2:12                             ` Kamezawa Hiroyuki
@ 2015-12-18  6:59                               ` Luck, Tony
  0 siblings, 0 replies; 20+ messages in thread
From: Luck, Tony @ 2015-12-18  6:59 UTC (permalink / raw)
  To: Kamezawa Hiroyuki, Xishi Qiu
  Cc: Izumi, Taku, linux-kernel, linux-mm, akpm, mel, Hansen, Dave, matt

>Hmm...like this ?
>       sysctl.vm.fallback_mirror_memory = 0  // never fallback  # default.
>       sysctl.vm.fallback_mirror_memory = 1  // the user memory may be allocated from mirrored zone.
>       sysctl.vm.fallback_mirror_memory = 2  // usually kernel allocates memory from mirrored zone before OOM.
>       sysctl.vm.fallback_mirror_memory = 3  // 1+2

Should option 2 say: // allow kernel to allocate from non-mirror zone to avoid OOM

> However I believe my customer's choice is always 0, above implementation can be done in a clean way.
> (adding a flag to zones (mirrored or not) and controlling fallback zonelist walk.)

Modes allow us to make all of the people happy (I hope).

> BTW, we need this Taku's patch to make a progress. I think other devs should be done in another
> development cycle. What does he need to get your Acks ?

The concept is great.   It's even "Tested-by: Tony Luck <tony.luck@intel.com>".
I need to read the code more carefully before Acked-by.

-Tony

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-09  3:19 ` [PATCH v3 2/2] mm: Introduce kernelcore=mirror option Taku Izumi
  2015-12-09  3:28   ` Xishi Qiu
@ 2015-12-28 22:21   ` Andrew Morton
  2015-12-28 22:32   ` Andrew Morton
  2 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2015-12-28 22:21 UTC (permalink / raw)
  To: Taku Izumi
  Cc: linux-kernel, linux-mm, tony.luck, qiuxishi, kamezawa.hiroyu,
	mel, dave.hansen, matt

On Wed,  9 Dec 2015 12:19:37 +0900 Taku Izumi <izumi.taku@jp.fujitsu.com> wrote:

> This patch extends existing "kernelcore" option and
> introduces kernelcore=mirror option. By specifying
> "mirror" instead of specifying the amount of memory,
> non-mirrored (non-reliable) region will be arranged
> into ZONE_MOVABLE.
> 
> v1 -> v2:
>  - Refine so that the following case also can be
>    handled properly:
> 
>  Node X:  |MMMMMM------MMMMMM--------|
>    (legend) M: mirrored  -: not mirrrored
> 
>  In this case, ZONE_NORMAL and ZONE_MOVABLE are
>  arranged like bellow:
> 
>  Node X:  |MMMMMM------MMMMMM--------|
>           |ooooooxxxxxxooooooxxxxxxxx| ZONE_NORMAL
>                 |ooooooxxxxxxoooooooo| ZONE_MOVABLE
>    (legend) o: present  x: absent
> 
> v2 -> v3:
>  - change the option name from kernelcore=reliable
>    into kernelcore=mirror
>  - documentation fix so that users can understand
>    nn[KMS] and mirror are exclusive

My earlier concern with this approach is the assumption that *all* of
the mirrored memory will be used to kernelcore.  The user might want to
use half the machine's mirrored memory for kernelcore and half for
regular memory but cannot do so.

However I think what I'm seeing from the discussion is that in this
case, the user can alter the amount of mirrored memory via EFI to
achieve the same effect.

Is this correct?  If so, should we document this somewhere and provide
our users with instructions on how to make the required EFI changes? 
Or is this all so totally obvious to them that there is no need?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
  2015-12-09  3:19 ` [PATCH v3 2/2] mm: Introduce kernelcore=mirror option Taku Izumi
  2015-12-09  3:28   ` Xishi Qiu
  2015-12-28 22:21   ` Andrew Morton
@ 2015-12-28 22:32   ` Andrew Morton
  2 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2015-12-28 22:32 UTC (permalink / raw)
  To: Taku Izumi
  Cc: linux-kernel, linux-mm, tony.luck, qiuxishi, kamezawa.hiroyu,
	mel, dave.hansen, matt

On Wed,  9 Dec 2015 12:19:37 +0900 Taku Izumi <izumi.taku@jp.fujitsu.com> wrote:

> This patch extends existing "kernelcore" option and
> introduces kernelcore=mirror option. By specifying
> "mirror" instead of specifying the amount of memory,
> non-mirrored (non-reliable) region will be arranged
> into ZONE_MOVABLE.
> 
> v1 -> v2:
>  - Refine so that the following case also can be
>    handled properly:
> 
>  Node X:  |MMMMMM------MMMMMM--------|
>    (legend) M: mirrored  -: not mirrrored
> 
>  In this case, ZONE_NORMAL and ZONE_MOVABLE are
>  arranged like bellow:
> 
>  Node X:  |MMMMMM------MMMMMM--------|
>           |ooooooxxxxxxooooooxxxxxxxx| ZONE_NORMAL
>                 |ooooooxxxxxxoooooooo| ZONE_MOVABLE
>    (legend) o: present  x: absent
> 
> ...
>
>
> @@ -5507,6 +5569,36 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>  	}
>  
>  	/*
> +	 * If kernelcore=mirror is specified, ignore movablecore option
> +	 */
> +	if (mirrored_kernelcore) {
> +		bool mem_below_4gb_not_mirrored = false;
> +
> +		for_each_memblock(memory, r) {
> +			if (memblock_is_mirror(r))
> +				continue;
> +
> +			nid = r->nid;
> +
> +			usable_startpfn = memblock_region_memory_base_pfn(r);
> +
> +			if (usable_startpfn < 0x100000) {
> +				mem_below_4gb_not_mirrored = true;
> +				continue;
> +			}
> +
> +			zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
> +				min(usable_startpfn, zone_movable_pfn[nid]) :
> +				usable_startpfn;
> +		}
> +
> +		if (mem_below_4gb_not_mirrored)
> +			pr_warn("This configuration results in unmirrored kernel memory.");

This looks a bit strange.

What's the code actually doing?  Checking to see that all memory at
physical addresses < 0x100000000 is mirrored, I think?

If so, what's up with that hard-wired 0x100000?  That's not going to
work on PAGE_SIZE>4k machines, and this is generic code.

Also, I don't think the magical 0x100000000 is necessarily going to be
relevant for other architectures (presumably powerpc will be next...)

I guess this is all OK as an "initial implementation for x86_64 only"
for now, and this stuff will be changed later if/when another
architecture comes along.  However it would be nice to make things more
generic from the outset, to provide a guide for how things should be
implemented by other architectures.



Separately, what feedback does the user get about the success of the
kernelcore=mirror comment?  We didn't include an explicit printk which
displays the outcome, so how is the user to find out that it all worked
OK and that the kernel is now using mirrored memory?


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-12-28 22:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-09  3:18 [PATCH v3 0/2] mm: Introduce kernelcore=mirror option Taku Izumi
2015-12-09  3:19 ` [PATCH v3 1/2] mm: Calculate zone_start_pfn at zone_spanned_pages_in_node() Taku Izumi
2015-12-09  3:19 ` [PATCH v3 2/2] mm: Introduce kernelcore=mirror option Taku Izumi
2015-12-09  3:28   ` Xishi Qiu
2015-12-09 21:59     ` Luck, Tony
2015-12-10  1:14       ` Xishi Qiu
2015-12-10  5:37         ` Izumi, Taku
2015-12-10  6:13           ` Xishi Qiu
2015-12-11  5:53             ` Izumi, Taku
2015-12-11  9:44               ` Xishi Qiu
2015-12-17  1:38                 ` Izumi, Taku
2015-12-17  2:47                   ` Xishi Qiu
2015-12-17  2:53                     ` Kamezawa Hiroyuki
2015-12-17  4:48                       ` Xishi Qiu
2015-12-17  5:01                         ` Kamezawa Hiroyuki
2015-12-17 18:43                           ` Luck, Tony
2015-12-18  2:12                             ` Kamezawa Hiroyuki
2015-12-18  6:59                               ` Luck, Tony
2015-12-28 22:21   ` Andrew Morton
2015-12-28 22:32   ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).