linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
@ 2015-06-04 12:54 Xishi Qiu
  2015-06-04 12:56 ` [RFC PATCH 01/12] mm: add a new config to manage the code Xishi Qiu
                   ` (13 more replies)
  0 siblings, 14 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 12:54 UTC (permalink / raw)
  To: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony
  Cc: Xishi Qiu, Linux MM, LKML

Intel Xeon processor E7 v3 product family-based platforms introduces support
for partial memory mirroring called as 'Address Range Mirroring'. This feature
allows BIOS to specify a subset of total available memory to be mirrored (and
optionally also specify whether to mirror the range 0-4 GB). This capability
allows user to make an appropriate tradeoff between non-mirrored memory range
and mirrored memory range thus optimizing total available memory and still
achieving highly reliable memory range for mission critical workloads and/or
kernel space.

Tony has already send a patchset to supprot this feature at boot time.
https://lkml.org/lkml/2015/5/8/521

This patchset can support the feature after boot time. It introduces mirror_info
to save the mirrored memory range. Then use __GFP_MIRROR to allocate mirrored 
pages. 

I think add a new migratetype is btter and easier than a new zone, so I use
MIGRATE_MIRROR to manage the mirrored pages. However it changed some code in the
core file, please review and comment, thanks.

TBD: 
1) call add_mirror_info() to fill mirrored memory info.
2) add compatibility with memory online/offline.
3) add more interface? others?

Xishi Qiu (12):
  mm: add a new config to manage the code
  mm: introduce mirror_info
  mm: introduce MIGRATE_MIRROR to manage the mirrored pages
  mm: add mirrored pages to buddy system
  mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES
  mm: add free mirrored pages info
  mm: introduce __GFP_MIRROR to allocate mirrored pages
  mm: use mirrorable to switch allocate mirrored memory
  mm: enable allocate mirrored memory at boot time
  mm: add the buddy system interface
  mm: add the PCP interface
  mm: let slab/slub/slob use mirrored memory

 arch/x86/mm/numa.c     |   3 ++
 drivers/base/node.c    |  17 ++++---
 fs/proc/meminfo.c      |   6 +++
 include/linux/gfp.h    |   5 +-
 include/linux/mmzone.h |  23 +++++++++
 include/linux/vmstat.h |   2 +
 kernel/sysctl.c        |   9 ++++
 mm/Kconfig             |   8 +++
 mm/page_alloc.c        | 134 ++++++++++++++++++++++++++++++++++++++++++++++---
 mm/slab.c              |   3 +-
 mm/slob.c              |   2 +-
 mm/slub.c              |   2 +-
 mm/vmstat.c            |   4 ++
 13 files changed, 202 insertions(+), 16 deletions(-)

-- 
2.0.0



^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC PATCH 01/12] mm: add a new config to manage the code
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
@ 2015-06-04 12:56 ` Xishi Qiu
  2015-06-08 11:52   ` Leon Romanovsky
  2015-06-09  6:44   ` Kamezawa Hiroyuki
  2015-06-04 12:57 ` [RFC PATCH 02/12] mm: introduce mirror_info Xishi Qiu
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 12:56 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", it is
used to on/off the feature.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 mm/Kconfig | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 390214d..4f2a726 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE
 	depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
 	depends on MIGRATION
 
+config MEMORY_MIRROR
+	bool "Address range mirroring support"
+	depends on X86 && NUMA
+	default y
+	help
+	  This feature depends on hardware and firmware support.
+	  ACPI or EFI records the mirror info.
+
 #
 # If we have space for more page flags then we can enable additional
 # optimizations and functionality.
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 02/12] mm: introduce mirror_info
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
  2015-06-04 12:56 ` [RFC PATCH 01/12] mm: add a new config to manage the code Xishi Qiu
@ 2015-06-04 12:57 ` Xishi Qiu
  2015-06-04 16:57   ` Luck, Tony
  2015-06-09  6:48   ` Kamezawa Hiroyuki
  2015-06-04 12:58 ` [RFC PATCH 03/12] mm: introduce MIGRATE_MIRROR to manage the mirrored, pages Xishi Qiu
                   ` (11 subsequent siblings)
  13 siblings, 2 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 12:57 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

This patch introduces a new struct called "mirror_info", it is used to storage
the mirror address range which reported by EFI or ACPI.

TBD: call add_mirror_info() to fill it.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 arch/x86/mm/numa.c     |  3 +++
 include/linux/mmzone.h | 15 +++++++++++++++
 mm/page_alloc.c        | 33 +++++++++++++++++++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 4053bb5..781fd68 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -619,6 +619,9 @@ static int __init numa_init(int (*init_func)(void))
 	/* In case that parsing SRAT failed. */
 	WARN_ON(memblock_clear_hotplug(0, ULLONG_MAX));
 	numa_reset_distance();
+#ifdef CONFIG_MEMORY_MIRROR
+	memset(&mirror_info, 0, sizeof(mirror_info));
+#endif
 
 	ret = init_func();
 	if (ret < 0)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 54d74f6..1fae07b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -69,6 +69,21 @@ enum {
 #  define is_migrate_cma(migratetype) false
 #endif
 
+#ifdef CONFIG_MEMORY_MIRROR
+struct numa_mirror_info {
+	int node;
+	unsigned long start;
+	unsigned long size;
+};
+
+struct mirror_info {
+	int count;
+	struct numa_mirror_info info[MAX_NUMNODES];
+};
+
+extern struct mirror_info mirror_info;
+#endif
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ebffa0e..41a95a7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -210,6 +210,10 @@ static char * const zone_names[MAX_NR_ZONES] = {
 int min_free_kbytes = 1024;
 int user_min_free_kbytes = -1;
 
+#ifdef CONFIG_MEMORY_MIRROR
+struct mirror_info mirror_info;
+#endif
+
 static unsigned long __meminitdata nr_kernel_pages;
 static unsigned long __meminitdata nr_all_pages;
 static unsigned long __meminitdata dma_reserve;
@@ -545,6 +549,31 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
 	return 0;
 }
 
+#ifdef CONFIG_MEMORY_MIRROR
+static void __init add_mirror_info(int node,
+			unsigned long start, unsigned long size)
+{
+	mirror_info.info[mirror_info.count].node = node;
+	mirror_info.info[mirror_info.count].start = start;
+	mirror_info.info[mirror_info.count].size = size;
+
+	mirror_info.count++;
+}
+
+static void __init print_mirror_info(void)
+{
+	int i;
+
+	printk("Mirror info\n");
+	for (i = 0; i < mirror_info.count; i++)
+		printk("  node %3d: [mem %#010lx-%#010lx]\n",
+			mirror_info.info[i].node,
+			mirror_info.info[i].start,
+			mirror_info.info[i].start +
+				mirror_info.info[i].size - 1);
+}
+#endif
+
 /*
  * Freeing function for a buddy system allocator.
  *
@@ -5438,6 +5467,10 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 			       (u64)zone_movable_pfn[i] << PAGE_SHIFT);
 	}
 
+#ifdef CONFIG_MEMORY_MIRROR
+	print_mirror_info();
+#endif
+
 	/* Print out the early node map */
 	pr_info("Early memory node ranges\n");
 	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 03/12] mm: introduce MIGRATE_MIRROR to manage the mirrored, pages
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
  2015-06-04 12:56 ` [RFC PATCH 01/12] mm: add a new config to manage the code Xishi Qiu
  2015-06-04 12:57 ` [RFC PATCH 02/12] mm: introduce mirror_info Xishi Qiu
@ 2015-06-04 12:58 ` Xishi Qiu
  2015-06-09  6:54   ` Kamezawa Hiroyuki
  2015-06-04 12:59 ` [RFC PATCH 04/12] mm: add mirrored pages to buddy system Xishi Qiu
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 12:58 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

This patch introduces a new MIGRATE_TYPES called "MIGRATE_MIRROR", it is used
to storage the mirrored pages list.
When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks.

e.g.
euler-linux:~ # cat /proc/pagetypeinfo
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone      DMA, type    Unmovable      1      1      0      0      2      1      1      0      1      0      0
Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      3
Node    0, zone      DMA, type       Mirror      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0
Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type    Unmovable      0      0      1      0      0      0      0      1      1      1      0
Node    0, zone    DMA32, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type      Movable      1      2      6      6      6      4      5      3      3      2    738
Node    0, zone    DMA32, type       Mirror      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Movable      0      0      1      1      0      0      0      2      1      0   4254
Node    0, zone   Normal, type       Mirror    148    104     63     70     26     11      2      2      1      1    973
Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable  Reclaimable      Movable       Mirror      Reserve      Isolate
Node 0, zone      DMA            1            0            6            0            1            0
Node 0, zone    DMA32            2            0         1525            0            1            0
Node 0, zone   Normal            0            0         8702         2048            2            0
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    1, zone   Normal, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0
Node    1, zone   Normal, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    1, zone   Normal, type      Movable      2      2      1      1      2      1      2      2      2      3   3996
Node    1, zone   Normal, type       Mirror     68     94     57      6      8      1      0      0      3      1   2003
Node    1, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
Node    1, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable  Reclaimable      Movable       Mirror      Reserve      Isolate
Node 1, zone   Normal            0            0         8190         4096            2            0


Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 include/linux/mmzone.h | 6 ++++++
 mm/page_alloc.c        | 3 +++
 mm/vmstat.c            | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 1fae07b..b444335 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -39,6 +39,9 @@ enum {
 	MIGRATE_UNMOVABLE,
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_MOVABLE,
+#ifdef CONFIG_MEMORY_MIRROR
+	MIGRATE_MIRROR,
+#endif
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
 	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
 #ifdef CONFIG_CMA
@@ -82,6 +85,9 @@ struct mirror_info {
 };
 
 extern struct mirror_info mirror_info;
+#  define is_migrate_mirror(migratetype) unlikely((migratetype) == MIGRATE_MIRROR)
+#else
+#  define is_migrate_mirror(migratetype) false
 #endif
 
 #define for_each_migratetype_order(order, type) \
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 41a95a7..3b2ff46 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3245,6 +3245,9 @@ static void show_migration_types(unsigned char type)
 		[MIGRATE_UNMOVABLE]	= 'U',
 		[MIGRATE_RECLAIMABLE]	= 'E',
 		[MIGRATE_MOVABLE]	= 'M',
+#ifdef CONFIG_MEMORY_MIRROR
+		[MIGRATE_MIRROR]	= 'O',
+#endif
 		[MIGRATE_RESERVE]	= 'R',
 #ifdef CONFIG_CMA
 		[MIGRATE_CMA]		= 'C',
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4f5cd97..d0323e0 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -901,6 +901,9 @@ static char * const migratetype_names[MIGRATE_TYPES] = {
 	"Unmovable",
 	"Reclaimable",
 	"Movable",
+#ifdef CONFIG_MEMORY_MIRROR
+	"Mirror",
+#endif
 	"Reserve",
 #ifdef CONFIG_CMA
 	"CMA",
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 04/12] mm: add mirrored pages to buddy system
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (2 preceding siblings ...)
  2015-06-04 12:58 ` [RFC PATCH 03/12] mm: introduce MIGRATE_MIRROR to manage the mirrored, pages Xishi Qiu
@ 2015-06-04 12:59 ` Xishi Qiu
  2015-06-04 13:00 ` [RFC PATCH 05/12] mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES Xishi Qiu
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 12:59 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

Set mirrored pageblock's migratetype to MIGRATE_MIRROR, so they could free to
buddy system's MIGRATE_MIRROR list when free bootmem.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 mm/page_alloc.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b2ff46..8fe0187 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -572,6 +572,25 @@ static void __init print_mirror_info(void)
 			mirror_info.info[i].start +
 				mirror_info.info[i].size - 1);
 }
+
+static inline bool is_mirror_pfn(unsigned long pfn)
+{
+	int i;
+	unsigned long addr = pfn << PAGE_SHIFT;
+
+	/* 0-4G is always mirrored, so ignore it */
+	if (addr < (4UL << 30))
+		return false;
+
+	for (i = 0; i < mirror_info.count; i++) {
+		if (addr >= mirror_info.info[i].start &&
+		    addr < mirror_info.info[i].start +
+			   mirror_info.info[i].size)
+			return true;
+	}
+
+	return false;
+}
 #endif
 
 /*
@@ -4147,6 +4166,9 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 
 		block_migratetype = get_pageblock_migratetype(page);
 
+		if (is_migrate_mirror(block_migratetype))
+			continue;
+
 		/* Only test what is necessary when the reserves are not met */
 		if (reserve > 0) {
 			/*
@@ -4246,6 +4268,11 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		    && !(pfn & (pageblock_nr_pages - 1)))
 			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 
+#ifdef CONFIG_MEMORY_MIRROR
+		if (is_mirror_pfn(pfn))
+			set_pageblock_migratetype(page, MIGRATE_MIRROR);
+#endif
+
 		INIT_LIST_HEAD(&page->lru);
 #ifdef WANT_PAGE_VIRTUAL
 		/* The shift won't overflow because ZONE_NORMAL is below 4G. */
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 05/12] mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (3 preceding siblings ...)
  2015-06-04 12:59 ` [RFC PATCH 04/12] mm: add mirrored pages to buddy system Xishi Qiu
@ 2015-06-04 13:00 ` Xishi Qiu
  2015-06-04 13:01 ` [RFC PATCH 06/12] mm: add free mirrored pages info Xishi Qiu
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 13:00 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

This patch introduces a new zone_stat_item called "NR_FREE_MIRROR_PAGES", it is
used to storage free mirrored pages count.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 include/linux/mmzone.h | 1 +
 include/linux/vmstat.h | 2 ++
 mm/vmstat.c            | 1 +
 3 files changed, 4 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b444335..f82e3ae 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -178,6 +178,7 @@ enum zone_stat_item {
 	WORKINGSET_NODERECLAIM,
 	NR_ANON_TRANSPARENT_HUGEPAGES,
 	NR_FREE_CMA_PAGES,
+	NR_FREE_MIRROR_PAGES,
 	NR_VM_ZONE_STAT_ITEMS };
 
 /*
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 82e7db7..d0a7268 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -283,6 +283,8 @@ static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
 	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
 	if (is_migrate_cma(migratetype))
 		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
+	if (is_migrate_mirror(migratetype))
+		__mod_zone_page_state(zone, NR_FREE_MIRROR_PAGES, nr_pages);
 }
 
 extern const char * const vmstat_text[];
diff --git a/mm/vmstat.c b/mm/vmstat.c
index d0323e0..7ee11ca 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -739,6 +739,7 @@ const char * const vmstat_text[] = {
 	"workingset_nodereclaim",
 	"nr_anon_transparent_hugepages",
 	"nr_free_cma",
+	"nr_free_mirror",
 
 	/* enum writeback_stat_item counters */
 	"nr_dirty_threshold",
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 06/12] mm: add free mirrored pages info
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (4 preceding siblings ...)
  2015-06-04 13:00 ` [RFC PATCH 05/12] mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES Xishi Qiu
@ 2015-06-04 13:01 ` Xishi Qiu
  2015-06-04 13:02 ` [RFC PATCH 07/12] mm: introduce __GFP_MIRROR to allocate mirrored pages Xishi Qiu
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 13:01 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

Add the count of free mirrored pages in the following paths:
/proc/meminfo
/proc/zoneinfo
/sys/devices/system/node/node XX/meminfo
/sys/devices/system/node/node XX/vmstat

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 drivers/base/node.c | 17 +++++++++++------
 fs/proc/meminfo.c   |  6 ++++++
 mm/page_alloc.c     |  7 +++++--
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index a2aa65b..d1a3556 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -114,6 +114,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 		       "Node %d AnonHugePages:  %8lu kB\n"
 #endif
+#ifdef CONFIG_MEMORY_MIRROR
+		       "Node %d MirrorFree:     %8lu kB\n"
+#endif
 			,
 		       nid, K(node_page_state(nid, NR_FILE_DIRTY)),
 		       nid, K(node_page_state(nid, NR_WRITEBACK)),
@@ -130,14 +133,16 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE) +
 				node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
 		       nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)),
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 		       nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE))
-			, nid,
-			K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
-			HPAGE_PMD_NR));
-#else
-		       nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+		     , nid, K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
+				HPAGE_PMD_NR)
+#endif
+#ifdef CONFIG_MEMORY_MIRROR
+		     , nid, K(node_page_state(nid, NR_FREE_MIRROR_PAGES))
 #endif
+			);
+
 	n += hugetlb_report_node_meminfo(nid, buf + n);
 	return n;
 }
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index d3ebf2e..d1ebb20 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -145,6 +145,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		"CmaTotal:       %8lu kB\n"
 		"CmaFree:        %8lu kB\n"
 #endif
+#ifdef CONFIG_MEMORY_MIRROR
+		"MirrorFree:     %8lu kB\n"
+#endif
 		,
 		K(i.totalram),
 		K(i.freeram),
@@ -204,6 +207,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		, K(totalcma_pages)
 		, K(global_page_state(NR_FREE_CMA_PAGES))
 #endif
+#ifdef CONFIG_MEMORY_MIRROR
+		, K(global_page_state(NR_FREE_MIRROR_PAGES))
+#endif
 		);
 
 	hugetlb_report_meminfo(m);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fe0187..249a8f6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3316,7 +3316,7 @@ void show_free_areas(unsigned int filter)
 		" unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n"
 		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
 		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
-		" free:%lu free_pcp:%lu free_cma:%lu\n",
+		" free:%lu free_pcp:%lu free_cma:%lu free_mirror:%lu\n",
 		global_page_state(NR_ACTIVE_ANON),
 		global_page_state(NR_INACTIVE_ANON),
 		global_page_state(NR_ISOLATED_ANON),
@@ -3335,7 +3335,8 @@ void show_free_areas(unsigned int filter)
 		global_page_state(NR_BOUNCE),
 		global_page_state(NR_FREE_PAGES),
 		free_pcp,
-		global_page_state(NR_FREE_CMA_PAGES));
+		global_page_state(NR_FREE_CMA_PAGES),
+		global_page_state(NR_FREE_MIRROR_PAGES));
 
 	for_each_populated_zone(zone) {
 		int i;
@@ -3376,6 +3377,7 @@ void show_free_areas(unsigned int filter)
 			" free_pcp:%lukB"
 			" local_pcp:%ukB"
 			" free_cma:%lukB"
+			" free_mirror:%lukB"
 			" writeback_tmp:%lukB"
 			" pages_scanned:%lu"
 			" all_unreclaimable? %s"
@@ -3409,6 +3411,7 @@ void show_free_areas(unsigned int filter)
 			K(free_pcp),
 			K(this_cpu_read(zone->pageset->pcp.count)),
 			K(zone_page_state(zone, NR_FREE_CMA_PAGES)),
+			K(zone_page_state(zone, NR_FREE_MIRROR_PAGES)),
 			K(zone_page_state(zone, NR_WRITEBACK_TEMP)),
 			K(zone_page_state(zone, NR_PAGES_SCANNED)),
 			(!zone_reclaimable(zone) ? "yes" : "no")
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 07/12] mm: introduce __GFP_MIRROR to allocate mirrored pages
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (5 preceding siblings ...)
  2015-06-04 13:01 ` [RFC PATCH 06/12] mm: add free mirrored pages info Xishi Qiu
@ 2015-06-04 13:02 ` Xishi Qiu
  2015-06-09  7:01   ` Kamezawa Hiroyuki
  2015-06-04 13:02 ` [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory Xishi Qiu
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 13:02 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

This patch introduces a new gfp flag called "__GFP_MIRROR", it is used to
allocate mirrored pages through buddy system.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 include/linux/gfp.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 15928f0..89d0091 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -35,6 +35,7 @@ struct vm_area_struct;
 #define ___GFP_NO_KSWAPD	0x400000u
 #define ___GFP_OTHER_NODE	0x800000u
 #define ___GFP_WRITE		0x1000000u
+#define ___GFP_MIRROR		0x2000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -95,13 +96,15 @@ struct vm_area_struct;
 #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */
 #define __GFP_WRITE	((__force gfp_t)___GFP_WRITE)	/* Allocator intends to dirty page */
 
+#define __GFP_MIRROR	((__force gfp_t)___GFP_MIRROR)	/* Allocate mirrored memory */
+
 /*
  * This may seem redundant, but it's a way of annotating false positives vs.
  * allocations that simply cannot be supported (e.g. page tables).
  */
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
 
-#define __GFP_BITS_SHIFT 25	/* Room for N __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 26	/* Room for N __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (6 preceding siblings ...)
  2015-06-04 13:02 ` [RFC PATCH 07/12] mm: introduce __GFP_MIRROR to allocate mirrored pages Xishi Qiu
@ 2015-06-04 13:02 ` Xishi Qiu
  2015-06-04 17:01   ` Luck, Tony
                     ` (3 more replies)
  2015-06-04 13:03 ` [RFC PATCH 09/12] mm: enable allocate mirrored memory at boot time Xishi Qiu
                   ` (5 subsequent siblings)
  13 siblings, 4 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 13:02 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means
we should allocate mirrored memory for both user and kernel processes.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 include/linux/mmzone.h | 1 +
 kernel/sysctl.c        | 9 +++++++++
 mm/page_alloc.c        | 1 +
 3 files changed, 11 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f82e3ae..20888dd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -85,6 +85,7 @@ struct mirror_info {
 };
 
 extern struct mirror_info mirror_info;
+extern int sysctl_mirrorable;
 #  define is_migrate_mirror(migratetype) unlikely((migratetype) == MIGRATE_MIRROR)
 #else
 #  define is_migrate_mirror(migratetype) false
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2082b1a..dc2625e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1514,6 +1514,15 @@ static struct ctl_table vm_table[] = {
 		.extra2		= &one,
 	},
 #endif
+#ifdef CONFIG_MEMORY_MIRROR
+	{
+		.procname	= "mirrorable",
+		.data		= &sysctl_mirrorable,
+		.maxlen		= sizeof(sysctl_mirrorable),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+	},
+#endif
 	{
 		.procname	= "user_reserve_kbytes",
 		.data		= &sysctl_user_reserve_kbytes,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 249a8f6..63b90ca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -212,6 +212,7 @@ int user_min_free_kbytes = -1;
 
 #ifdef CONFIG_MEMORY_MIRROR
 struct mirror_info mirror_info;
+int sysctl_mirrorable = 0;
 #endif
 
 static unsigned long __meminitdata nr_kernel_pages;
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 09/12] mm: enable allocate mirrored memory at boot time
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (7 preceding siblings ...)
  2015-06-04 13:02 ` [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory Xishi Qiu
@ 2015-06-04 13:03 ` Xishi Qiu
  2015-06-04 13:04 ` [RFC PATCH 10/12] mm: add the buddy system interface Xishi Qiu
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 13:03 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

Add a boot option called "mirrorable" to allocate mirrored memory at boot time
(after bootmem free).

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 mm/page_alloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 63b90ca..d4d2066 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -213,6 +213,13 @@ int user_min_free_kbytes = -1;
 #ifdef CONFIG_MEMORY_MIRROR
 struct mirror_info mirror_info;
 int sysctl_mirrorable = 0;
+
+static int __init set_mirrorable(char *p)
+{
+	sysctl_mirrorable = 1;
+	return 0;
+}
+early_param("mirrorable", set_mirrorable);
 #endif
 
 static unsigned long __meminitdata nr_kernel_pages;
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (8 preceding siblings ...)
  2015-06-04 13:03 ` [RFC PATCH 09/12] mm: enable allocate mirrored memory at boot time Xishi Qiu
@ 2015-06-04 13:04 ` Xishi Qiu
  2015-06-04 17:09   ` Luck, Tony
  2015-06-09  7:12   ` Kamezawa Hiroyuki
  2015-06-04 13:04 ` [RFC PATCH 11/12] mm: add the PCP interface Xishi Qiu
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 13:04 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

Add the buddy system interface for address range mirroring feature.
Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
left, use other types pages.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 mm/page_alloc.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d4d2066..0fb55288 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -599,6 +599,26 @@ static inline bool is_mirror_pfn(unsigned long pfn)
 
 	return false;
 }
+
+static inline bool change_to_mirror(gfp_t gfp_flags, int high_zoneidx)
+{
+	/*
+	 * Do not alloc mirrored memory below 4G, because 0-4G is
+	 * all mirrored by default, and the list is always empty.
+	 */
+	if (high_zoneidx < ZONE_NORMAL)
+		return false;
+
+	/* Alloc mirrored memory for only kernel */
+	if (gfp_flags & __GFP_MIRROR)
+		return true;
+
+	/* Alloc mirrored memory for both user and kernel */
+	if (sysctl_mirrorable)
+		return true;
+
+	return false;
+}
 #endif
 
 /*
@@ -1796,7 +1816,10 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 			WARN_ON_ONCE(order > 1);
 		}
 		spin_lock_irqsave(&zone->lock, flags);
-		page = __rmqueue(zone, order, migratetype);
+		if (is_migrate_mirror(migratetype))
+			page = __rmqueue_smallest(zone, order, migratetype);
+		else
+			page = __rmqueue(zone, order, migratetype);
 		spin_unlock(&zone->lock);
 		if (!page)
 			goto failed;
@@ -2928,6 +2951,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
 		alloc_flags |= ALLOC_CMA;
 
+#ifdef CONFIG_MEMORY_MIRROR
+	if (change_to_mirror(gfp_mask, ac.high_zoneidx))
+		ac.migratetype = MIGRATE_MIRROR;
+#endif
+
 retry_cpuset:
 	cpuset_mems_cookie = read_mems_allowed_begin();
 
@@ -2943,9 +2971,19 @@ retry_cpuset:
 
 	/* First allocation attempt */
 	alloc_mask = gfp_mask|__GFP_HARDWALL;
+retry:
 	page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
 	if (unlikely(!page)) {
 		/*
+		 * If there is no mirrored memory, we will alloc other
+		 * types memory.
+		 */
+		if (is_migrate_mirror(ac.migratetype)) {
+			ac.migratetype = gfpflags_to_migratetype(gfp_mask);
+			goto retry;
+		}
+
+		/*
 		 * Runtime PM, block IO and its error handling path
 		 * can deadlock because I/O on the device might not
 		 * complete.
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 11/12] mm: add the PCP interface
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (9 preceding siblings ...)
  2015-06-04 13:04 ` [RFC PATCH 10/12] mm: add the buddy system interface Xishi Qiu
@ 2015-06-04 13:04 ` Xishi Qiu
  2015-06-04 18:44   ` Dave Hansen
  2015-06-04 13:05 ` [RFC PATCH 12/12] mm: let slab/slub/slob use mirrored memory Xishi Qiu
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 13:04 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

Add the PCP interface for address range mirroring feature.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 mm/page_alloc.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0fb55288..cf3b7cb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1401,11 +1401,16 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			unsigned long count, struct list_head *list,
 			int migratetype, bool cold)
 {
-	int i;
+	int i, mt;
 
 	spin_lock(&zone->lock);
 	for (i = 0; i < count; ++i) {
-		struct page *page = __rmqueue(zone, order, migratetype);
+		struct page *page;
+
+		if (is_migrate_mirror(migratetype))
+			page = __rmqueue_smallest(zone, order, migratetype);
+		else
+			page = __rmqueue(zone, order, migratetype);
 		if (unlikely(page == NULL))
 			break;
 
@@ -1423,9 +1428,14 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		else
 			list_add_tail(&page->lru, list);
 		list = &page->lru;
-		if (is_migrate_cma(get_freepage_migratetype(page)))
+
+		mt = get_freepage_migratetype(page);
+		if (is_migrate_cma(mt))
 			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
 					      -(1 << order));
+		if (is_migrate_mirror(mt))
+			__mod_zone_page_state(zone, NR_FREE_MIRROR_PAGES,
+					      -(1 << order));
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
 	spin_unlock(&zone->lock);
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 12/12] mm: let slab/slub/slob use mirrored memory
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (10 preceding siblings ...)
  2015-06-04 13:04 ` [RFC PATCH 11/12] mm: add the PCP interface Xishi Qiu
@ 2015-06-04 13:05 ` Xishi Qiu
  2015-06-04 17:14   ` Luck, Tony
  2015-06-12  8:42 ` [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Naoya Horiguchi
  2015-06-16  7:53 ` Vlastimil Babka
  13 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-04 13:05 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

Add __GFP_MIRROR flag when allocate a new slab.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 mm/slab.c | 3 ++-
 mm/slob.c | 2 +-
 mm/slub.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 7eb38dd..3b3ef22 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1594,7 +1594,8 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	if (memcg_charge_slab(cachep, flags, cachep->gfporder))
 		return NULL;
 
-	page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
+	page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK | __GFP_MIRROR,
+					cachep->gfporder);
 	if (!page) {
 		memcg_uncharge_slab(cachep, cachep->gfporder);
 		slab_out_of_memory(cachep, flags, nodeid);
diff --git a/mm/slob.c b/mm/slob.c
index 4765f65..4ff9bde 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -452,7 +452,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 
 		if (likely(order))
 			gfp |= __GFP_COMP;
-		ret = slob_new_pages(gfp, order, node);
+		ret = slob_new_pages(gfp | __GFP_MIRROR, order, node);
 
 		trace_kmalloc_node(caller, ret,
 				   size, PAGE_SIZE << order, gfp, node);
diff --git a/mm/slub.c b/mm/slub.c
index 54c0876..1219e33 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1315,7 +1315,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	int order = oo_order(oo);
 
-	flags |= __GFP_NOTRACK;
+	flags |= __GFP_NOTRACK | __GFP_MIRROR;
 
 	if (memcg_charge_slab(s, flags, order))
 		return NULL;
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* RE: [RFC PATCH 02/12] mm: introduce mirror_info
  2015-06-04 12:57 ` [RFC PATCH 02/12] mm: introduce mirror_info Xishi Qiu
@ 2015-06-04 16:57   ` Luck, Tony
  2015-06-05  1:53     ` Xishi Qiu
  2015-06-09  6:48   ` Kamezawa Hiroyuki
  1 sibling, 1 reply; 62+ messages in thread
From: Luck, Tony @ 2015-06-04 16:57 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo
  Cc: Linux MM, LKML

+#ifdef CONFIG_MEMORY_MIRROR
+struct numa_mirror_info {
+	int node;
+	unsigned long start;
+	unsigned long size;
+};
+
+struct mirror_info {
+	int count;
+	struct numa_mirror_info info[MAX_NUMNODES];
+};

Do we really need this?  My patch series leaves all the mirrored memory in
the memblock allocator tagged with the MEMBLOCK_MIRROR flag.  Can't
we use that information when freeing the boot memory into the runtime
free lists?

If we can't ... then [MAX_NUMNODES] may not be enough.  We may have
more than one mirrored range on each node. Current h/w allows two ranges
per node.

-Tony

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory
  2015-06-04 13:02 ` [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory Xishi Qiu
@ 2015-06-04 17:01   ` Luck, Tony
  2015-06-04 18:41   ` Dave Hansen
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 62+ messages in thread
From: Luck, Tony @ 2015-06-04 17:01 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo
  Cc: Linux MM, LKML

> Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means
> we should allocate mirrored memory for both user and kernel processes.

With some "to be defined later" mechanism for how the user requests mirror vs.
not mirror.  Plus some capability/ulimit pieces that restrict who can do this and how
much they can get???

-Tony

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-04 13:04 ` [RFC PATCH 10/12] mm: add the buddy system interface Xishi Qiu
@ 2015-06-04 17:09   ` Luck, Tony
  2015-06-05  3:14     ` Xishi Qiu
  2015-06-09  7:12   ` Kamezawa Hiroyuki
  1 sibling, 1 reply; 62+ messages in thread
From: Luck, Tony @ 2015-06-04 17:09 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo
  Cc: Linux MM, LKML

+#ifdef CONFIG_MEMORY_MIRROR
+	if (change_to_mirror(gfp_mask, ac.high_zoneidx))
+		ac.migratetype = MIGRATE_MIRROR;
+#endif

We may have to be smarter than this here. I'd like to encourage the
enterprise Linux distributions to set CONFIG_MEMORY_MIRROR=y
But the reality is that most systems will not configure any mirrored
memory - so we don't want the common code path for memory
allocation to call functions that set the migrate type, try to allocate
and then fall back to a non-mirror when that may be a complete waste
of time.

Maybe a global "got_mirror" that is true if we have some mirrored
memory.  Then code is

	if (got_mirror && change_to_mirror(...))

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [RFC PATCH 12/12] mm: let slab/slub/slob use mirrored memory
  2015-06-04 13:05 ` [RFC PATCH 12/12] mm: let slab/slub/slob use mirrored memory Xishi Qiu
@ 2015-06-04 17:14   ` Luck, Tony
  0 siblings, 0 replies; 62+ messages in thread
From: Luck, Tony @ 2015-06-04 17:14 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo
  Cc: Linux MM, LKML

-	page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
+	page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK | __GFP_MIRROR,
+					cachep->gfporder);
 
Set some global "got_mirror"[*] if we have any mirrored memory to __GFP_MIRROR, else to 0.

then
	
	page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK | got_mirror,
					cachep->gfporder);

-Tony

[*] Someone will suggest a better name. I'm bad at picking names.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory
  2015-06-04 13:02 ` [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory Xishi Qiu
  2015-06-04 17:01   ` Luck, Tony
@ 2015-06-04 18:41   ` Dave Hansen
  2015-06-05  3:13     ` Xishi Qiu
  2015-06-09  7:06   ` Kamezawa Hiroyuki
  2015-06-12  8:05   ` Naoya Horiguchi
  3 siblings, 1 reply; 62+ messages in thread
From: Dave Hansen @ 2015-06-04 18:41 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

On 06/04/2015 06:02 AM, Xishi Qiu wrote:
> Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means
> we should allocate mirrored memory for both user and kernel processes.

That's a pretty dangerously short name. :)

How would this end up getting used?  It seems like it would be dangerous
to use once userspace was very far along.  So would the kernel set it to
1 and then let (early??) userspace set it back to 0?  That would let
important userspace like /bin/init get mirrored memory without having to
actually change much in userspace.

This definitely needs some good documentation.

Also, if it's insane to turn it back *on*, maybe it should be a one-way
trip to turn off.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 11/12] mm: add the PCP interface
  2015-06-04 13:04 ` [RFC PATCH 11/12] mm: add the PCP interface Xishi Qiu
@ 2015-06-04 18:44   ` Dave Hansen
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Hansen @ 2015-06-04 18:44 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

On 06/04/2015 06:04 AM, Xishi Qiu wrote:
>  	spin_lock(&zone->lock);
>  	for (i = 0; i < count; ++i) {
> -		struct page *page = __rmqueue(zone, order, migratetype);
> +		struct page *page;
> +
> +		if (is_migrate_mirror(migratetype))
> +			page = __rmqueue_smallest(zone, order, migratetype);
> +		else
> +			page = __rmqueue(zone, order, migratetype);
>  		if (unlikely(page == NULL))
>  			break;

Why is this necessary/helpful?  The changelog doesn't tell me either. :(

Why was this code modified in stead of putting the changes in
__rmqueue() itself (like CMA did)?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 02/12] mm: introduce mirror_info
  2015-06-04 16:57   ` Luck, Tony
@ 2015-06-05  1:53     ` Xishi Qiu
  0 siblings, 0 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-05  1:53 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Linux MM, LKML

On 2015/6/5 0:57, Luck, Tony wrote:

> +#ifdef CONFIG_MEMORY_MIRROR
> +struct numa_mirror_info {
> +	int node;
> +	unsigned long start;
> +	unsigned long size;
> +};
> +
> +struct mirror_info {
> +	int count;
> +	struct numa_mirror_info info[MAX_NUMNODES];
> +};
> 
> Do we really need this?  My patch series leaves all the mirrored memory in
> the memblock allocator tagged with the MEMBLOCK_MIRROR flag.  Can't
> we use that information when freeing the boot memory into the runtime
> free lists?
> 

Hi Tony,

I used this code for testing before, so when your patchset added to mainline,
I'll rewrite it, use MEMBLOCK_MIRROR, not mirror_info. 

I find Andrew has added your patches to mm-tree, right?

Thanks,
Xishi Qiu

> If we can't ... then [MAX_NUMNODES] may not be enough.  We may have
> more than one mirrored range on each node. Current h/w allows two ranges
> per node.
> 
> -Tony
> 
> .
> 




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory
  2015-06-04 18:41   ` Dave Hansen
@ 2015-06-05  3:13     ` Xishi Qiu
  0 siblings, 0 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-05  3:13 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/6/5 2:41, Dave Hansen wrote:

> On 06/04/2015 06:02 AM, Xishi Qiu wrote:
>> Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means
>> we should allocate mirrored memory for both user and kernel processes.
> 
> That's a pretty dangerously short name. :)
> 

Hi Dave,

Thanks for your comment. I'm not sure whether we should add this interface
for user processes. However some important userspace(e.g. /bin/init, key
business like datebase) may be want mirrored memory to improve reliability.

If we want this interface, I think the code need more change.

Thanks,
Xishi Qiu

> How would this end up getting used?  It seems like it would be dangerous
> to use once userspace was very far along.  So would the kernel set it to
> 1 and then let (early??) userspace set it back to 0?  That would let
> important userspace like /bin/init get mirrored memory without having to
> actually change much in userspace.
> 
> This definitely needs some good documentation.
> 
> Also, if it's insane to turn it back *on*, maybe it should be a one-way
> trip to turn off.
> 
> .
> 




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-04 17:09   ` Luck, Tony
@ 2015-06-05  3:14     ` Xishi Qiu
  0 siblings, 0 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-05  3:14 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Linux MM, LKML

On 2015/6/5 1:09, Luck, Tony wrote:

> +#ifdef CONFIG_MEMORY_MIRROR
> +	if (change_to_mirror(gfp_mask, ac.high_zoneidx))
> +		ac.migratetype = MIGRATE_MIRROR;
> +#endif
> 
> We may have to be smarter than this here. I'd like to encourage the
> enterprise Linux distributions to set CONFIG_MEMORY_MIRROR=y
> But the reality is that most systems will not configure any mirrored
> memory - so we don't want the common code path for memory
> allocation to call functions that set the migrate type, try to allocate
> and then fall back to a non-mirror when that may be a complete waste
> of time.
> 
> Maybe a global "got_mirror" that is true if we have some mirrored
> memory.  Then code is
> 
> 	if (got_mirror && change_to_mirror(...))
> 

Yes, I will change next time.

Thanks,

> .
> 




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 01/12] mm: add a new config to manage the code
  2015-06-04 12:56 ` [RFC PATCH 01/12] mm: add a new config to manage the code Xishi Qiu
@ 2015-06-08 11:52   ` Leon Romanovsky
  2015-06-08 15:14     ` Luck, Tony
  2015-06-09  6:44   ` Kamezawa Hiroyuki
  1 sibling, 1 reply; 62+ messages in thread
From: Leon Romanovsky @ 2015-06-08 11:52 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On Thu, Jun 4, 2015 at 3:56 PM, Xishi Qiu <qiuxishi@huawei.com> wrote:
> This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", it is
> used to on/off the feature.
>
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> ---
>  mm/Kconfig | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 390214d..4f2a726 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE
>         depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
>         depends on MIGRATION
>
> +config MEMORY_MIRROR
> +       bool "Address range mirroring support"
> +       depends on X86 && NUMA
> +       default y
Is it correct for the systems (NOT xeon) without memory support built in?

> +       help
> +         This feature depends on hardware and firmware support.
> +         ACPI or EFI records the mirror info.
> +
>  #
>  # If we have space for more page flags then we can enable additional
>  # optimizations and functionality.
> --
> 2.0.0
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>



-- 
Leon Romanovsky | Independent Linux Consultant
        www.leon.nu | leon@leon.nu

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [RFC PATCH 01/12] mm: add a new config to manage the code
  2015-06-08 11:52   ` Leon Romanovsky
@ 2015-06-08 15:14     ` Luck, Tony
  2015-06-08 16:36       ` Leon Romanovsky
  0 siblings, 1 reply; 62+ messages in thread
From: Luck, Tony @ 2015-06-08 15:14 UTC (permalink / raw)
  To: Leon Romanovsky, Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Linux MM, LKML

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 596 bytes --]

> > +config MEMORY_MIRROR
> > +       bool "Address range mirroring support"
> > +       depends on X86 && NUMA
> > +       default y
> Is it correct for the systems (NOT xeon) without memory support built in?

Is the "&& NUMA" doing that?  If you support NUMA, then you are not a minimal
config for a tablet or laptop.

If you want a symbol that has a stronger correlation to high end Xeon features
then perhaps MEMORY_FAILURE?

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 01/12] mm: add a new config to manage the code
  2015-06-08 15:14     ` Luck, Tony
@ 2015-06-08 16:36       ` Leon Romanovsky
  0 siblings, 0 replies; 62+ messages in thread
From: Leon Romanovsky @ 2015-06-08 16:36 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Linux MM, LKML

On Mon, Jun 8, 2015 at 6:14 PM, Luck, Tony <tony.luck@intel.com> wrote:
>> > +config MEMORY_MIRROR
>> > +       bool "Address range mirroring support"
>> > +       depends on X86 && NUMA
>> > +       default y
>> Is it correct for the systems (NOT xeon) without memory support built in?
>
> Is the "&& NUMA" doing that?  If you support NUMA, then you are not a minimal
> config for a tablet or laptop.
>
> If you want a symbol that has a stronger correlation to high end Xeon features
> then perhaps MEMORY_FAILURE?
I would like to see the default set to be "n".
On my machine (x86_64) defconfig enables this feature and I don't know
if this feature can work there.

➜  linux-mm git:(dev) ✗ make defconfig ARCH=x86
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/basic/bin2c
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
*** Default configuration is based on 'x86_64_defconfig'
#
# configuration written to .config
#
➜  linux-mm git:(dev) ✗ grep CONFIG_MEMORY_MIRROR .config
CONFIG_MEMORY_MIRROR=y


>
> -Tony



-- 
Leon Romanovsky | Independent Linux Consultant
        www.leon.nu | leon@leon.nu

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 01/12] mm: add a new config to manage the code
  2015-06-04 12:56 ` [RFC PATCH 01/12] mm: add a new config to manage the code Xishi Qiu
  2015-06-08 11:52   ` Leon Romanovsky
@ 2015-06-09  6:44   ` Kamezawa Hiroyuki
  2015-06-09 10:10     ` Xishi Qiu
  1 sibling, 1 reply; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-09  6:44 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

On 2015/06/04 21:56, Xishi Qiu wrote:
> This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", it is
> used to on/off the feature.
>
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> ---
>   mm/Kconfig | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 390214d..4f2a726 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE
>   	depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
>   	depends on MIGRATION
>
> +config MEMORY_MIRROR
> +	bool "Address range mirroring support"
> +	depends on X86 && NUMA
> +	default y
> +	help
> +	  This feature depends on hardware and firmware support.
> +	  ACPI or EFI records the mirror info.

default y...no runtime influence when the user doesn't use memory mirror ?

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 02/12] mm: introduce mirror_info
  2015-06-04 12:57 ` [RFC PATCH 02/12] mm: introduce mirror_info Xishi Qiu
  2015-06-04 16:57   ` Luck, Tony
@ 2015-06-09  6:48   ` Kamezawa Hiroyuki
  1 sibling, 0 replies; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-09  6:48 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

On 2015/06/04 21:57, Xishi Qiu wrote:
> This patch introduces a new struct called "mirror_info", it is used to storage
> the mirror address range which reported by EFI or ACPI.
>
> TBD: call add_mirror_info() to fill it.
>
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> ---
>   arch/x86/mm/numa.c     |  3 +++
>   include/linux/mmzone.h | 15 +++++++++++++++
>   mm/page_alloc.c        | 33 +++++++++++++++++++++++++++++++++
>   3 files changed, 51 insertions(+)
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 4053bb5..781fd68 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -619,6 +619,9 @@ static int __init numa_init(int (*init_func)(void))
>   	/* In case that parsing SRAT failed. */
>   	WARN_ON(memblock_clear_hotplug(0, ULLONG_MAX));
>   	numa_reset_distance();
> +#ifdef CONFIG_MEMORY_MIRROR
> +	memset(&mirror_info, 0, sizeof(mirror_info));
> +#endif
>
>   	ret = init_func();
>   	if (ret < 0)
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 54d74f6..1fae07b 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -69,6 +69,21 @@ enum {
>   #  define is_migrate_cma(migratetype) false
>   #endif
>
> +#ifdef CONFIG_MEMORY_MIRROR
> +struct numa_mirror_info {
> +	int node;
> +	unsigned long start;
> +	unsigned long size;
> +};
> +
> +struct mirror_info {
> +	int count;
> +	struct numa_mirror_info info[MAX_NUMNODES];
> +};

MAX_NUMNODE may not be enough when the firmware cannot use contiguous
address for mirroing.


> +
> +extern struct mirror_info mirror_info;
> +#endif

If this structure will not be updated after boot, read_mostly should be
helpful.


> +
>   #define for_each_migratetype_order(order, type) \
>   	for (order = 0; order < MAX_ORDER; order++) \
>   		for (type = 0; type < MIGRATE_TYPES; type++)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ebffa0e..41a95a7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -210,6 +210,10 @@ static char * const zone_names[MAX_NR_ZONES] = {
>   int min_free_kbytes = 1024;
>   int user_min_free_kbytes = -1;
>
> +#ifdef CONFIG_MEMORY_MIRROR
> +struct mirror_info mirror_info;
> +#endif
> +
>   static unsigned long __meminitdata nr_kernel_pages;
>   static unsigned long __meminitdata nr_all_pages;
>   static unsigned long __meminitdata dma_reserve;
> @@ -545,6 +549,31 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>   	return 0;
>   }
>
> +#ifdef CONFIG_MEMORY_MIRROR
> +static void __init add_mirror_info(int node,
> +			unsigned long start, unsigned long size)
> +{
> +	mirror_info.info[mirror_info.count].node = node;
> +	mirror_info.info[mirror_info.count].start = start;
> +	mirror_info.info[mirror_info.count].size = size;
> +
> +	mirror_info.count++;
> +}
> +
> +static void __init print_mirror_info(void)
> +{
> +	int i;
> +
> +	printk("Mirror info\n");
> +	for (i = 0; i < mirror_info.count; i++)
> +		printk("  node %3d: [mem %#010lx-%#010lx]\n",
> +			mirror_info.info[i].node,
> +			mirror_info.info[i].start,
> +			mirror_info.info[i].start +
> +				mirror_info.info[i].size - 1);
> +}
> +#endif
> +
>   /*
>    * Freeing function for a buddy system allocator.
>    *
> @@ -5438,6 +5467,10 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>   			       (u64)zone_movable_pfn[i] << PAGE_SHIFT);
>   	}
>
> +#ifdef CONFIG_MEMORY_MIRROR
> +	print_mirror_info();
> +#endif
> +
>   	/* Print out the early node map */
>   	pr_info("Early memory node ranges\n");
>   	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
>



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 03/12] mm: introduce MIGRATE_MIRROR to manage the mirrored, pages
  2015-06-04 12:58 ` [RFC PATCH 03/12] mm: introduce MIGRATE_MIRROR to manage the mirrored, pages Xishi Qiu
@ 2015-06-09  6:54   ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-09  6:54 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

On 2015/06/04 21:58, Xishi Qiu wrote:
> This patch introduces a new MIGRATE_TYPES called "MIGRATE_MIRROR", it is used
> to storage the mirrored pages list.
> When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks.
>

I guess you need to add Mel to CC.

> e.g.
> euler-linux:~ # cat /proc/pagetypeinfo
> Page block order: 9
> Pages per block:  512
>
> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> Node    0, zone      DMA, type    Unmovable      1      1      0      0      2      1      1      0      1      0      0
> Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      3
> Node    0, zone      DMA, type       Mirror      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0
> Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone    DMA32, type    Unmovable      0      0      1      0      0      0      0      1      1      1      0
> Node    0, zone    DMA32, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone    DMA32, type      Movable      1      2      6      6      6      4      5      3      3      2    738
> Node    0, zone    DMA32, type       Mirror      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone    DMA32, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
> Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone   Normal, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone   Normal, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone   Normal, type      Movable      0      0      1      1      0      0      0      2      1      0   4254
> Node    0, zone   Normal, type       Mirror    148    104     63     70     26     11      2      2      1      1    973
> Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
>
> Number of blocks type     Unmovable  Reclaimable      Movable       Mirror      Reserve      Isolate
> Node 0, zone      DMA            1            0            6            0            1            0
> Node 0, zone    DMA32            2            0         1525            0            1            0
> Node 0, zone   Normal            0            0         8702         2048            2            0
> Page block order: 9
> Pages per block:  512



>
> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> Node    1, zone   Normal, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0
> Node    1, zone   Normal, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
> Node    1, zone   Normal, type      Movable      2      2      1      1      2      1      2      2      2      3   3996
> Node    1, zone   Normal, type       Mirror     68     94     57      6      8      1      0      0      3      1   2003
> Node    1, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
> Node    1, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
>
> Number of blocks type     Unmovable  Reclaimable      Movable       Mirror      Reserve      Isolate
> Node 1, zone   Normal            0            0         8190         4096            2            0
>
>
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> ---
>   include/linux/mmzone.h | 6 ++++++
>   mm/page_alloc.c        | 3 +++
>   mm/vmstat.c            | 3 +++
>   3 files changed, 12 insertions(+)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 1fae07b..b444335 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -39,6 +39,9 @@ enum {
>   	MIGRATE_UNMOVABLE,
>   	MIGRATE_RECLAIMABLE,
>   	MIGRATE_MOVABLE,
> +#ifdef CONFIG_MEMORY_MIRROR
> +	MIGRATE_MIRROR,
> +#endif

I can't imagine how the fallback logic will work at reading this patch.
I think an update for fallback order array should be in this patch...

>   	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
>   	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
>   #ifdef CONFIG_CMA
> @@ -82,6 +85,9 @@ struct mirror_info {
>   };
>
>   extern struct mirror_info mirror_info;
> +#  define is_migrate_mirror(migratetype) unlikely((migratetype) == MIGRATE_MIRROR)
> +#else
> +#  define is_migrate_mirror(migratetype) false
>   #endif
>
>   #define for_each_migratetype_order(order, type) \
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 41a95a7..3b2ff46 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3245,6 +3245,9 @@ static void show_migration_types(unsigned char type)
>   		[MIGRATE_UNMOVABLE]	= 'U',
>   		[MIGRATE_RECLAIMABLE]	= 'E',
>   		[MIGRATE_MOVABLE]	= 'M',
> +#ifdef CONFIG_MEMORY_MIRROR
> +		[MIGRATE_MIRROR]	= 'O',
> +#endif
>   		[MIGRATE_RESERVE]	= 'R',
>   #ifdef CONFIG_CMA
>   		[MIGRATE_CMA]		= 'C',
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4f5cd97..d0323e0 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -901,6 +901,9 @@ static char * const migratetype_names[MIGRATE_TYPES] = {
>   	"Unmovable",
>   	"Reclaimable",
>   	"Movable",
> +#ifdef CONFIG_MEMORY_MIRROR
> +	"Mirror",
> +#endif
>   	"Reserve",
>   #ifdef CONFIG_CMA
>   	"CMA",
>



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 07/12] mm: introduce __GFP_MIRROR to allocate mirrored pages
  2015-06-04 13:02 ` [RFC PATCH 07/12] mm: introduce __GFP_MIRROR to allocate mirrored pages Xishi Qiu
@ 2015-06-09  7:01   ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-09  7:01 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

On 2015/06/04 22:02, Xishi Qiu wrote:
> This patch introduces a new gfp flag called "__GFP_MIRROR", it is used to
> allocate mirrored pages through buddy system.
>
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>

In Tony's original proposal, the motivation was to mirror all kernel memory.

Is the purpose of this patch making mirrored range available for user space ?

But, hmm... I don't think adding a new GFP flag is a good idea. It adds many conditional jumps.

How about using GFP_KERNEL for user memory if the user wants mirrored memory with mirroring
all kernel memory?

Thanks,
-Kame

> ---
>   include/linux/gfp.h | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 15928f0..89d0091 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -35,6 +35,7 @@ struct vm_area_struct;
>   #define ___GFP_NO_KSWAPD	0x400000u
>   #define ___GFP_OTHER_NODE	0x800000u
>   #define ___GFP_WRITE		0x1000000u
> +#define ___GFP_MIRROR		0x2000000u
>   /* If the above are modified, __GFP_BITS_SHIFT may need updating */
>
>   /*
> @@ -95,13 +96,15 @@ struct vm_area_struct;
>   #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */
>   #define __GFP_WRITE	((__force gfp_t)___GFP_WRITE)	/* Allocator intends to dirty page */
>
> +#define __GFP_MIRROR	((__force gfp_t)___GFP_MIRROR)	/* Allocate mirrored memory */
> +
>   /*
>    * This may seem redundant, but it's a way of annotating false positives vs.
>    * allocations that simply cannot be supported (e.g. page tables).
>    */
>   #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
>
> -#define __GFP_BITS_SHIFT 25	/* Room for N __GFP_FOO bits */
> +#define __GFP_BITS_SHIFT 26	/* Room for N __GFP_FOO bits */
>   #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>
>   /* This equals 0, but use constants in case they ever change */
>



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory
  2015-06-04 13:02 ` [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory Xishi Qiu
  2015-06-04 17:01   ` Luck, Tony
  2015-06-04 18:41   ` Dave Hansen
@ 2015-06-09  7:06   ` Kamezawa Hiroyuki
  2015-06-09 10:09     ` Xishi Qiu
  2015-06-12  8:05   ` Naoya Horiguchi
  3 siblings, 1 reply; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-09  7:06 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

On 2015/06/04 22:02, Xishi Qiu wrote:
> Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means
> we should allocate mirrored memory for both user and kernel processes.
>
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>

I can't see why do we need this switch. If this is set, all GFP_HIGHUSER will use
mirrored memory ?

Or will you add special MMAP/madvise flag to use mirrored memory ?

Thanks,
-Kame

> ---
>   include/linux/mmzone.h | 1 +
>   kernel/sysctl.c        | 9 +++++++++
>   mm/page_alloc.c        | 1 +
>   3 files changed, 11 insertions(+)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f82e3ae..20888dd 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -85,6 +85,7 @@ struct mirror_info {
>   };
>
>   extern struct mirror_info mirror_info;
> +extern int sysctl_mirrorable;
>   #  define is_migrate_mirror(migratetype) unlikely((migratetype) == MIGRATE_MIRROR)
>   #else
>   #  define is_migrate_mirror(migratetype) false
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 2082b1a..dc2625e 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1514,6 +1514,15 @@ static struct ctl_table vm_table[] = {
>   		.extra2		= &one,
>   	},
>   #endif
> +#ifdef CONFIG_MEMORY_MIRROR
> +	{
> +		.procname	= "mirrorable",
> +		.data		= &sysctl_mirrorable,
> +		.maxlen		= sizeof(sysctl_mirrorable),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +	},
> +#endif
>   	{
>   		.procname	= "user_reserve_kbytes",
>   		.data		= &sysctl_user_reserve_kbytes,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 249a8f6..63b90ca 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -212,6 +212,7 @@ int user_min_free_kbytes = -1;
>
>   #ifdef CONFIG_MEMORY_MIRROR
>   struct mirror_info mirror_info;
> +int sysctl_mirrorable = 0;
>   #endif
>
>   static unsigned long __meminitdata nr_kernel_pages;
>



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-04 13:04 ` [RFC PATCH 10/12] mm: add the buddy system interface Xishi Qiu
  2015-06-04 17:09   ` Luck, Tony
@ 2015-06-09  7:12   ` Kamezawa Hiroyuki
  2015-06-09 10:04     ` Xishi Qiu
  1 sibling, 1 reply; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-09  7:12 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

On 2015/06/04 22:04, Xishi Qiu wrote:
> Add the buddy system interface for address range mirroring feature.
> Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
> left, use other types pages.
>
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> ---
>   mm/page_alloc.c | 40 +++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d4d2066..0fb55288 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -599,6 +599,26 @@ static inline bool is_mirror_pfn(unsigned long pfn)
>
>   	return false;
>   }
> +
> +static inline bool change_to_mirror(gfp_t gfp_flags, int high_zoneidx)
> +{
> +	/*
> +	 * Do not alloc mirrored memory below 4G, because 0-4G is
> +	 * all mirrored by default, and the list is always empty.
> +	 */
> +	if (high_zoneidx < ZONE_NORMAL)
> +		return false;
> +
> +	/* Alloc mirrored memory for only kernel */
> +	if (gfp_flags & __GFP_MIRROR)
> +		return true;

GFP_KERNEL itself should imply mirror, I think.

> +
> +	/* Alloc mirrored memory for both user and kernel */
> +	if (sysctl_mirrorable)
> +		return true;

Reading this, I think this sysctl is not good. The user cannot know what is mirrored
because memory may not be mirrored until the sysctl is set.

Thanks,
-Kame


> +
> +	return false;
> +}
>   #endif
>
>   /*
> @@ -1796,7 +1816,10 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
>   			WARN_ON_ONCE(order > 1);
>   		}
>   		spin_lock_irqsave(&zone->lock, flags);
> -		page = __rmqueue(zone, order, migratetype);
> +		if (is_migrate_mirror(migratetype))
> +			page = __rmqueue_smallest(zone, order, migratetype);
> +		else
> +			page = __rmqueue(zone, order, migratetype);
>   		spin_unlock(&zone->lock);
>   		if (!page)
>   			goto failed;
> @@ -2928,6 +2951,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>   	if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
>   		alloc_flags |= ALLOC_CMA;
>
> +#ifdef CONFIG_MEMORY_MIRROR
> +	if (change_to_mirror(gfp_mask, ac.high_zoneidx))
> +		ac.migratetype = MIGRATE_MIRROR;
> +#endif
> +
>   retry_cpuset:
>   	cpuset_mems_cookie = read_mems_allowed_begin();
>
> @@ -2943,9 +2971,19 @@ retry_cpuset:
>
>   	/* First allocation attempt */
>   	alloc_mask = gfp_mask|__GFP_HARDWALL;
> +retry:
>   	page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
>   	if (unlikely(!page)) {
>   		/*
> +		 * If there is no mirrored memory, we will alloc other
> +		 * types memory.
> +		 */
> +		if (is_migrate_mirror(ac.migratetype)) {
> +			ac.migratetype = gfpflags_to_migratetype(gfp_mask);
> +			goto retry;
> +		}
> +
> +		/*
>   		 * Runtime PM, block IO and its error handling path
>   		 * can deadlock because I/O on the device might not
>   		 * complete.
>



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-09  7:12   ` Kamezawa Hiroyuki
@ 2015-06-09 10:04     ` Xishi Qiu
  2015-06-10  3:06       ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-09 10:04 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/6/9 15:12, Kamezawa Hiroyuki wrote:

> On 2015/06/04 22:04, Xishi Qiu wrote:
>> Add the buddy system interface for address range mirroring feature.
>> Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
>> left, use other types pages.
>>
>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>> ---
>>   mm/page_alloc.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index d4d2066..0fb55288 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -599,6 +599,26 @@ static inline bool is_mirror_pfn(unsigned long pfn)
>>
>>       return false;
>>   }
>> +
>> +static inline bool change_to_mirror(gfp_t gfp_flags, int high_zoneidx)
>> +{
>> +    /*
>> +     * Do not alloc mirrored memory below 4G, because 0-4G is
>> +     * all mirrored by default, and the list is always empty.
>> +     */
>> +    if (high_zoneidx < ZONE_NORMAL)
>> +        return false;
>> +
>> +    /* Alloc mirrored memory for only kernel */
>> +    if (gfp_flags & __GFP_MIRROR)
>> +        return true;
> 
> GFP_KERNEL itself should imply mirror, I think.
> 

Hi Kame,

How about like this: #define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_MIRROR) ?

Thanks,
Xishi Qiu

>> +
>> +    /* Alloc mirrored memory for both user and kernel */
>> +    if (sysctl_mirrorable)
>> +        return true;
> 
> Reading this, I think this sysctl is not good. The user cannot know what is mirrored
> because memory may not be mirrored until the sysctl is set.
> 
> Thanks,
> -Kame
> 
> 
>> +
>> +    return false;
>> +}
>>   #endif
>>
>>   /*
>> @@ -1796,7 +1816,10 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
>>               WARN_ON_ONCE(order > 1);
>>           }
>>           spin_lock_irqsave(&zone->lock, flags);
>> -        page = __rmqueue(zone, order, migratetype);
>> +        if (is_migrate_mirror(migratetype))
>> +            page = __rmqueue_smallest(zone, order, migratetype);
>> +        else
>> +            page = __rmqueue(zone, order, migratetype);
>>           spin_unlock(&zone->lock);
>>           if (!page)
>>               goto failed;
>> @@ -2928,6 +2951,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>>       if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
>>           alloc_flags |= ALLOC_CMA;
>>
>> +#ifdef CONFIG_MEMORY_MIRROR
>> +    if (change_to_mirror(gfp_mask, ac.high_zoneidx))
>> +        ac.migratetype = MIGRATE_MIRROR;
>> +#endif
>> +
>>   retry_cpuset:
>>       cpuset_mems_cookie = read_mems_allowed_begin();
>>
>> @@ -2943,9 +2971,19 @@ retry_cpuset:
>>
>>       /* First allocation attempt */
>>       alloc_mask = gfp_mask|__GFP_HARDWALL;
>> +retry:
>>       page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
>>       if (unlikely(!page)) {
>>           /*
>> +         * If there is no mirrored memory, we will alloc other
>> +         * types memory.
>> +         */
>> +        if (is_migrate_mirror(ac.migratetype)) {
>> +            ac.migratetype = gfpflags_to_migratetype(gfp_mask);
>> +            goto retry;
>> +        }
>> +
>> +        /*
>>            * Runtime PM, block IO and its error handling path
>>            * can deadlock because I/O on the device might not
>>            * complete.
>>
> 
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory
  2015-06-09  7:06   ` Kamezawa Hiroyuki
@ 2015-06-09 10:09     ` Xishi Qiu
  2015-06-10  3:09       ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-09 10:09 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/6/9 15:06, Kamezawa Hiroyuki wrote:

> On 2015/06/04 22:02, Xishi Qiu wrote:
>> Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means
>> we should allocate mirrored memory for both user and kernel processes.
>>
>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> 
> I can't see why do we need this switch. If this is set, all GFP_HIGHUSER will use
> mirrored memory ?
> 
> Or will you add special MMAP/madvise flag to use mirrored memory ?
> 

Hi Kame,

Yes, 

MMAP/madvise 
	-> add VM_MIRROR 
		-> add GFP_MIRROR
			-> use MIGRATE_MIRROR list to alloc mirrored pages

So user can use mirrored memory. What do you think?

Thanks,
Xishi Qiu

> Thanks,
> -Kame
> 
>> ---
>>   include/linux/mmzone.h | 1 +
>>   kernel/sysctl.c        | 9 +++++++++
>>   mm/page_alloc.c        | 1 +
>>   3 files changed, 11 insertions(+)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index f82e3ae..20888dd 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -85,6 +85,7 @@ struct mirror_info {
>>   };
>>
>>   extern struct mirror_info mirror_info;
>> +extern int sysctl_mirrorable;
>>   #  define is_migrate_mirror(migratetype) unlikely((migratetype) == MIGRATE_MIRROR)
>>   #else
>>   #  define is_migrate_mirror(migratetype) false
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index 2082b1a..dc2625e 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -1514,6 +1514,15 @@ static struct ctl_table vm_table[] = {
>>           .extra2        = &one,
>>       },
>>   #endif
>> +#ifdef CONFIG_MEMORY_MIRROR
>> +    {
>> +        .procname    = "mirrorable",
>> +        .data        = &sysctl_mirrorable,
>> +        .maxlen        = sizeof(sysctl_mirrorable),
>> +        .mode        = 0644,
>> +        .proc_handler    = proc_dointvec_minmax,
>> +    },
>> +#endif
>>       {
>>           .procname    = "user_reserve_kbytes",
>>           .data        = &sysctl_user_reserve_kbytes,
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 249a8f6..63b90ca 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -212,6 +212,7 @@ int user_min_free_kbytes = -1;
>>
>>   #ifdef CONFIG_MEMORY_MIRROR
>>   struct mirror_info mirror_info;
>> +int sysctl_mirrorable = 0;
>>   #endif
>>
>>   static unsigned long __meminitdata nr_kernel_pages;
>>
> 
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 01/12] mm: add a new config to manage the code
  2015-06-09  6:44   ` Kamezawa Hiroyuki
@ 2015-06-09 10:10     ` Xishi Qiu
  2015-06-10  3:07       ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-09 10:10 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/6/9 14:44, Kamezawa Hiroyuki wrote:

> On 2015/06/04 21:56, Xishi Qiu wrote:
>> This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", it is
>> used to on/off the feature.
>>
>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>> ---
>>   mm/Kconfig | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index 390214d..4f2a726 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE
>>       depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
>>       depends on MIGRATION
>>
>> +config MEMORY_MIRROR
>> +    bool "Address range mirroring support"
>> +    depends on X86 && NUMA
>> +    default y
>> +    help
>> +      This feature depends on hardware and firmware support.
>> +      ACPI or EFI records the mirror info.
> 
> default y...no runtime influence when the user doesn't use memory mirror ?
> 

It is a new feature, so how about like this: default y -> n?

Thanks,
Xishi Qiu

> Thanks,
> -Kame
> 
> 
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-09 10:04     ` Xishi Qiu
@ 2015-06-10  3:06       ` Kamezawa Hiroyuki
  2015-06-10 20:40         ` Luck, Tony
  2015-06-25  9:44         ` Xishi Qiu
  0 siblings, 2 replies; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-10  3:06 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/06/09 19:04, Xishi Qiu wrote:
> On 2015/6/9 15:12, Kamezawa Hiroyuki wrote:
>
>> On 2015/06/04 22:04, Xishi Qiu wrote:
>>> Add the buddy system interface for address range mirroring feature.
>>> Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
>>> left, use other types pages.
>>>
>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>> ---
>>>    mm/page_alloc.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>>    1 file changed, 39 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index d4d2066..0fb55288 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -599,6 +599,26 @@ static inline bool is_mirror_pfn(unsigned long pfn)
>>>
>>>        return false;
>>>    }
>>> +
>>> +static inline bool change_to_mirror(gfp_t gfp_flags, int high_zoneidx)
>>> +{
>>> +    /*
>>> +     * Do not alloc mirrored memory below 4G, because 0-4G is
>>> +     * all mirrored by default, and the list is always empty.
>>> +     */
>>> +    if (high_zoneidx < ZONE_NORMAL)
>>> +        return false;
>>> +
>>> +    /* Alloc mirrored memory for only kernel */
>>> +    if (gfp_flags & __GFP_MIRROR)
>>> +        return true;
>>
>> GFP_KERNEL itself should imply mirror, I think.
>>
>
> Hi Kame,
>
> How about like this: #define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_MIRROR) ?
>

Hm.... it cannot cover GFP_ATOMIC at el.

I guess, mirrored memory should be allocated if !__GFP_HIGHMEM or !__GFP_MOVABLE

thanks,
-Kame

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 01/12] mm: add a new config to manage the code
  2015-06-09 10:10     ` Xishi Qiu
@ 2015-06-10  3:07       ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-10  3:07 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/06/09 19:10, Xishi Qiu wrote:
> On 2015/6/9 14:44, Kamezawa Hiroyuki wrote:
>
>> On 2015/06/04 21:56, Xishi Qiu wrote:
>>> This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", it is
>>> used to on/off the feature.
>>>
>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>> ---
>>>    mm/Kconfig | 8 ++++++++
>>>    1 file changed, 8 insertions(+)
>>>
>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>> index 390214d..4f2a726 100644
>>> --- a/mm/Kconfig
>>> +++ b/mm/Kconfig
>>> @@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE
>>>        depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
>>>        depends on MIGRATION
>>>
>>> +config MEMORY_MIRROR
>>> +    bool "Address range mirroring support"
>>> +    depends on X86 && NUMA
>>> +    default y
>>> +    help
>>> +      This feature depends on hardware and firmware support.
>>> +      ACPI or EFI records the mirror info.
>>
>> default y...no runtime influence when the user doesn't use memory mirror ?
>>
>
> It is a new feature, so how about like this: default y -> n?
>

It's okay to me. But it's better to check performance impact before merge
because you modified core code of memory management.

Thanks,
-Kame
  



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory
  2015-06-09 10:09     ` Xishi Qiu
@ 2015-06-10  3:09       ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-10  3:09 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/06/09 19:09, Xishi Qiu wrote:
> On 2015/6/9 15:06, Kamezawa Hiroyuki wrote:
>
>> On 2015/06/04 22:02, Xishi Qiu wrote:
>>> Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means
>>> we should allocate mirrored memory for both user and kernel processes.
>>>
>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>
>> I can't see why do we need this switch. If this is set, all GFP_HIGHUSER will use
>> mirrored memory ?
>>
>> Or will you add special MMAP/madvise flag to use mirrored memory ?
>>
>
> Hi Kame,
>
> Yes,
>
> MMAP/madvise
> 	-> add VM_MIRROR
> 		-> add GFP_MIRROR
> 			-> use MIGRATE_MIRROR list to alloc mirrored pages
>
> So user can use mirrored memory. What do you think?
>

I see. please explain it (your final plan) in patch description or in cover page of patches.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-10  3:06       ` Kamezawa Hiroyuki
@ 2015-06-10 20:40         ` Luck, Tony
  2015-06-15  8:47           ` Kamezawa Hiroyuki
  2015-06-25  9:44         ` Xishi Qiu
  1 sibling, 1 reply; 62+ messages in thread
From: Luck, Tony @ 2015-06-10 20:40 UTC (permalink / raw)
  To: Kamezawa Hiroyuki, Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Linux MM, LKML

> I guess, mirrored memory should be allocated if !__GFP_HIGHMEM or !__GFP_MOVABLE

HIGHMEM shouldn't matter - partial memory mirror only makes any sense on X86_64 systems ... 32-bit kernels
don't even boot on systems with 64GB, and the minimum rational configuration for a machine that supports
mirror is 128GB (4 cpu sockets * 2 memory controller per socket * 4 channels per controller * 4GB DIMM ...
leaving any channels empty likely leaves you short of memory bandwidth for these high core count processors).

MOVABLE is mostly the opposite of MIRROR - we never want to fill a kernel allocation from a MOVABLE page. I
want all kernel allocations to be from MIRROR.

-Tony



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory
  2015-06-04 13:02 ` [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory Xishi Qiu
                     ` (2 preceding siblings ...)
  2015-06-09  7:06   ` Kamezawa Hiroyuki
@ 2015-06-12  8:05   ` Naoya Horiguchi
  3 siblings, 0 replies; 62+ messages in thread
From: Naoya Horiguchi @ 2015-06-12  8:05 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On Thu, Jun 04, 2015 at 09:02:49PM +0800, Xishi Qiu wrote:
> Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means
> we should allocate mirrored memory for both user and kernel processes.

As Dave and Kamezawa-san commented, documentation is not enough, so please
add a section in Documentation/sysctl/vm.txt for this new tuning parameter.

Thanks,
Naoya Horiguchi

> 
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> ---
>  include/linux/mmzone.h | 1 +
>  kernel/sysctl.c        | 9 +++++++++
>  mm/page_alloc.c        | 1 +
>  3 files changed, 11 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f82e3ae..20888dd 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -85,6 +85,7 @@ struct mirror_info {
>  };
>  
>  extern struct mirror_info mirror_info;
> +extern int sysctl_mirrorable;
>  #  define is_migrate_mirror(migratetype) unlikely((migratetype) == MIGRATE_MIRROR)
>  #else
>  #  define is_migrate_mirror(migratetype) false
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 2082b1a..dc2625e 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1514,6 +1514,15 @@ static struct ctl_table vm_table[] = {
>  		.extra2		= &one,
>  	},
>  #endif
> +#ifdef CONFIG_MEMORY_MIRROR
> +	{
> +		.procname	= "mirrorable",
> +		.data		= &sysctl_mirrorable,
> +		.maxlen		= sizeof(sysctl_mirrorable),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +	},
> +#endif
>  	{
>  		.procname	= "user_reserve_kbytes",
>  		.data		= &sysctl_user_reserve_kbytes,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 249a8f6..63b90ca 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -212,6 +212,7 @@ int user_min_free_kbytes = -1;
>  
>  #ifdef CONFIG_MEMORY_MIRROR
>  struct mirror_info mirror_info;
> +int sysctl_mirrorable = 0;
>  #endif
>  
>  static unsigned long __meminitdata nr_kernel_pages;
> -- 
> 2.0.0
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (11 preceding siblings ...)
  2015-06-04 13:05 ` [RFC PATCH 12/12] mm: let slab/slub/slob use mirrored memory Xishi Qiu
@ 2015-06-12  8:42 ` Naoya Horiguchi
  2015-06-12  9:09   ` Xishi Qiu
  2015-06-12 19:03   ` Luck, Tony
  2015-06-16  7:53 ` Vlastimil Babka
  13 siblings, 2 replies; 62+ messages in thread
From: Naoya Horiguchi @ 2015-06-12  8:42 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On Thu, Jun 04, 2015 at 08:54:22PM +0800, Xishi Qiu wrote:
> Intel Xeon processor E7 v3 product family-based platforms introduces support
> for partial memory mirroring called as 'Address Range Mirroring'. This feature
> allows BIOS to specify a subset of total available memory to be mirrored (and
> optionally also specify whether to mirror the range 0-4 GB). This capability
> allows user to make an appropriate tradeoff between non-mirrored memory range
> and mirrored memory range thus optimizing total available memory and still
> achieving highly reliable memory range for mission critical workloads and/or
> kernel space.
> 
> Tony has already send a patchset to supprot this feature at boot time.
> https://lkml.org/lkml/2015/5/8/521
> 
> This patchset can support the feature after boot time. It introduces mirror_info
> to save the mirrored memory range. Then use __GFP_MIRROR to allocate mirrored 
> pages. 
> 
> I think add a new migratetype is btter and easier than a new zone, so I use
> MIGRATE_MIRROR to manage the mirrored pages. However it changed some code in the
> core file, please review and comment, thanks.
> 
> TBD: 
> 1) call add_mirror_info() to fill mirrored memory info.
> 2) add compatibility with memory online/offline.

Maybe simply disabling memory offlining of memory block including MIGRATE_MIRROR?

> 3) add more interface? others?

4?) I don't have the whole picture of how address ranging mirroring works,
but I'm curious about what happens when an uncorrected memory error happens
on the a mirror page. If HW/FW do some useful work invisible from kernel,
please document it somewhere. And my questions are:
 - can the kernel with this patchset really continue its operation without
   breaking consistency? More specifically, the corrupted page is replaced with
   its mirror page, but can any other pages which have references (like struct
   page or pfn) for the corrupted page properly switch these references to the
   mirror page? Or no worry about that?  (This is difficult for kernel pages
   like slab, and that's why currently hwpoison doesn't handle any kernel pages.)
 - How can we test/confirm that the whole scheme works fine?  Is current memory
   error injection framework enough?

It's really nice if any roadmap including testing is shared.

# And please CC me as n-horiguchi@ah.nec.com (my primary email address :)

Thanks,
Naoya Horiguchi

> Xishi Qiu (12):
>   mm: add a new config to manage the code
>   mm: introduce mirror_info
>   mm: introduce MIGRATE_MIRROR to manage the mirrored pages
>   mm: add mirrored pages to buddy system
>   mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES
>   mm: add free mirrored pages info
>   mm: introduce __GFP_MIRROR to allocate mirrored pages
>   mm: use mirrorable to switch allocate mirrored memory
>   mm: enable allocate mirrored memory at boot time
>   mm: add the buddy system interface
>   mm: add the PCP interface
>   mm: let slab/slub/slob use mirrored memory
> 
>  arch/x86/mm/numa.c     |   3 ++
>  drivers/base/node.c    |  17 ++++---
>  fs/proc/meminfo.c      |   6 +++
>  include/linux/gfp.h    |   5 +-
>  include/linux/mmzone.h |  23 +++++++++
>  include/linux/vmstat.h |   2 +
>  kernel/sysctl.c        |   9 ++++
>  mm/Kconfig             |   8 +++
>  mm/page_alloc.c        | 134 ++++++++++++++++++++++++++++++++++++++++++++++---
>  mm/slab.c              |   3 +-
>  mm/slob.c              |   2 +-
>  mm/slub.c              |   2 +-
>  mm/vmstat.c            |   4 ++
>  13 files changed, 202 insertions(+), 16 deletions(-)
> 
> -- 
> 2.0.0
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-12  8:42 ` [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Naoya Horiguchi
@ 2015-06-12  9:09   ` Xishi Qiu
  2015-06-12 19:03   ` Luck, Tony
  1 sibling, 0 replies; 62+ messages in thread
From: Xishi Qiu @ 2015-06-12  9:09 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML, n-horiguchi

On 2015/6/12 16:42, Naoya Horiguchi wrote:

> On Thu, Jun 04, 2015 at 08:54:22PM +0800, Xishi Qiu wrote:
>> Intel Xeon processor E7 v3 product family-based platforms introduces support
>> for partial memory mirroring called as 'Address Range Mirroring'. This feature
>> allows BIOS to specify a subset of total available memory to be mirrored (and
>> optionally also specify whether to mirror the range 0-4 GB). This capability
>> allows user to make an appropriate tradeoff between non-mirrored memory range
>> and mirrored memory range thus optimizing total available memory and still
>> achieving highly reliable memory range for mission critical workloads and/or
>> kernel space.
>>
>> Tony has already send a patchset to supprot this feature at boot time.
>> https://lkml.org/lkml/2015/5/8/521
>>
>> This patchset can support the feature after boot time. It introduces mirror_info
>> to save the mirrored memory range. Then use __GFP_MIRROR to allocate mirrored 
>> pages. 
>>
>> I think add a new migratetype is btter and easier than a new zone, so I use
>> MIGRATE_MIRROR to manage the mirrored pages. However it changed some code in the
>> core file, please review and comment, thanks.
>>
>> TBD: 
>> 1) call add_mirror_info() to fill mirrored memory info.
>> 2) add compatibility with memory online/offline.
> 
> Maybe simply disabling memory offlining of memory block including MIGRATE_MIRROR?
> 
>> 3) add more interface? others?
> 
> 4?) I don't have the whole picture of how address ranging mirroring works,
> but I'm curious about what happens when an uncorrected memory error happens
> on the a mirror page. If HW/FW do some useful work invisible from kernel,
> please document it somewhere. And my questions are:

Hi Naoya,

I think the hardware and BIOS will do the work when page corrupted, and it is 
invisible to kernel. The kernel just use the mirrored memory (alloc pages in
special physical address).

Thanks,
Xishi Qiu

>  - can the kernel with this patchset really continue its operation without
>    breaking consistency? More specifically, the corrupted page is replaced with
>    its mirror page, but can any other pages which have references (like struct
>    page or pfn) for the corrupted page properly switch these references to the
>    mirror page? Or no worry about that?  (This is difficult for kernel pages
>    like slab, and that's why currently hwpoison doesn't handle any kernel pages.)
>  - How can we test/confirm that the whole scheme works fine?  Is current memory
>    error injection framework enough?
> 
> It's really nice if any roadmap including testing is shared.
> 
> # And please CC me as n-horiguchi@ah.nec.com (my primary email address :)
> 
> Thanks,
> Naoya Horiguchi
> 
>> Xishi Qiu (12):
>>   mm: add a new config to manage the code
>>   mm: introduce mirror_info
>>   mm: introduce MIGRATE_MIRROR to manage the mirrored pages
>>   mm: add mirrored pages to buddy system
>>   mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES
>>   mm: add free mirrored pages info
>>   mm: introduce __GFP_MIRROR to allocate mirrored pages
>>   mm: use mirrorable to switch allocate mirrored memory
>>   mm: enable allocate mirrored memory at boot time
>>   mm: add the buddy system interface
>>   mm: add the PCP interface
>>   mm: let slab/slub/slob use mirrored memory
>>
>>  arch/x86/mm/numa.c     |   3 ++
>>  drivers/base/node.c    |  17 ++++---
>>  fs/proc/meminfo.c      |   6 +++
>>  include/linux/gfp.h    |   5 +-
>>  include/linux/mmzone.h |  23 +++++++++
>>  include/linux/vmstat.h |   2 +
>>  kernel/sysctl.c        |   9 ++++
>>  mm/Kconfig             |   8 +++
>>  mm/page_alloc.c        | 134 ++++++++++++++++++++++++++++++++++++++++++++++---
>>  mm/slab.c              |   3 +-
>>  mm/slob.c              |   2 +-
>>  mm/slub.c              |   2 +-
>>  mm/vmstat.c            |   4 ++
>>  13 files changed, 202 insertions(+), 16 deletions(-)
>>
>> -- 
>> 2.0.0
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-12  8:42 ` [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Naoya Horiguchi
  2015-06-12  9:09   ` Xishi Qiu
@ 2015-06-12 19:03   ` Luck, Tony
  2015-06-15  0:25     ` Naoya Horiguchi
  1 sibling, 1 reply; 62+ messages in thread
From: Luck, Tony @ 2015-06-12 19:03 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Linux MM, LKML

On Fri, Jun 12, 2015 at 08:42:33AM +0000, Naoya Horiguchi wrote:
> 4?) I don't have the whole picture of how address ranging mirroring works,
> but I'm curious about what happens when an uncorrected memory error happens
> on the a mirror page. If HW/FW do some useful work invisible from kernel,
> please document it somewhere. And my questions are:
>  - can the kernel with this patchset really continue its operation without
>    breaking consistency? More specifically, the corrupted page is replaced with
>    its mirror page, but can any other pages which have references (like struct
>    page or pfn) for the corrupted page properly switch these references to the
>    mirror page? Or no worry about that?  (This is difficult for kernel pages
>    like slab, and that's why currently hwpoison doesn't handle any kernel pages.)

The mirror is operated by h/w (perhaps with some platform firmware
intervention when things start breaking badly).

In normal operation there are two DIMM addresses backing each
system physical address in the mirrored range (thus total system
memory capacity is reduced when mirror is enabled).  Memory writes
are directed to both locations. Memory reads are interleaved to
maintain bandwidth, so could come from either address.

When a read returns with an ECC failure the h/w automatically:
 1) Re-issues the read to the other DIMM address. If that also fails - then
    we do the normal machine check processing for an uncorrected error
 2) But if the other side of the mirror is good, we can send the good
    data to the reader (cpu, or dma) and, in parallel try to fix the
    bad side by writing the good data to it.
 3) A corrected error will be logged, it may indicate whether the
    attempt to fix succeeded or not.
 4) If platform firmware wants, it can be notified of the correction
    and it may keep statistics on the rate of errors, correction status,
    etc.  If things get very bad it may "break" the mirror and direct
    all future reads to the remaining "good" side. If does this it will
    likely tell the OS via some ACPI method.

All of this is done at much less than page granularity. Cache coherence
is maintained ... apart from some small performance glitches and the corrected
error logs, the OS is unware of all of this.

Note that in current implementations the mirror copies are both behind
the same memory controller ... so this isn't intended to cope with high
level failure of a memory controller ... just to deal with randomly
distributed ECC errors.

>  - How can we test/confirm that the whole scheme works fine?  Is current memory
>    error injection framework enough?

Still working on that piece. To validate you need to be able to
inject errors to just one side of the mirror, and I'm not really
sure that the ACPI/EINJ interface is up to the task.

-Tony

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-12 19:03   ` Luck, Tony
@ 2015-06-15  0:25     ` Naoya Horiguchi
  0 siblings, 0 replies; 62+ messages in thread
From: Naoya Horiguchi @ 2015-06-15  0:25 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Linux MM, LKML

On Fri, Jun 12, 2015 at 12:03:35PM -0700, Luck, Tony wrote:
> On Fri, Jun 12, 2015 at 08:42:33AM +0000, Naoya Horiguchi wrote:
> > 4?) I don't have the whole picture of how address ranging mirroring works,
> > but I'm curious about what happens when an uncorrected memory error happens
> > on the a mirror page. If HW/FW do some useful work invisible from kernel,
> > please document it somewhere. And my questions are:
> >  - can the kernel with this patchset really continue its operation without
> >    breaking consistency? More specifically, the corrupted page is replaced with
> >    its mirror page, but can any other pages which have references (like struct
> >    page or pfn) for the corrupted page properly switch these references to the
> >    mirror page? Or no worry about that?  (This is difficult for kernel pages
> >    like slab, and that's why currently hwpoison doesn't handle any kernel pages.)
> 
> The mirror is operated by h/w (perhaps with some platform firmware
> intervention when things start breaking badly).
> 
> In normal operation there are two DIMM addresses backing each
> system physical address in the mirrored range (thus total system
> memory capacity is reduced when mirror is enabled).  Memory writes
> are directed to both locations. Memory reads are interleaved to
> maintain bandwidth, so could come from either address.

I misunderstood that both of mirrored page and mirroring page are visible
to OS, which is incorrect.

> When a read returns with an ECC failure the h/w automatically:
>  1) Re-issues the read to the other DIMM address. If that also fails - then
>     we do the normal machine check processing for an uncorrected error
>  2) But if the other side of the mirror is good, we can send the good
>     data to the reader (cpu, or dma) and, in parallel try to fix the
>     bad side by writing the good data to it.
>  3) A corrected error will be logged, it may indicate whether the
>     attempt to fix succeeded or not.
>  4) If platform firmware wants, it can be notified of the correction
>     and it may keep statistics on the rate of errors, correction status,
>     etc.  If things get very bad it may "break" the mirror and direct
>     all future reads to the remaining "good" side. If does this it will
>     likely tell the OS via some ACPI method.

Thanks, this fully answered my question. 

> All of this is done at much less than page granularity. Cache coherence
> is maintained ... apart from some small performance glitches and the corrected
> error logs, the OS is unware of all of this.
> 
> Note that in current implementations the mirror copies are both behind
> the same memory controller ... so this isn't intended to cope with high
> level failure of a memory controller ... just to deal with randomly
> distributed ECC errors.

OK, I looked at "Memory Address Range Mirroring Validation Guide" and Fig 2-2
clearly shows that.

> >  - How can we test/confirm that the whole scheme works fine?  Is current memory
> >    error injection framework enough?
> 
> Still working on that piece. To validate you need to be able to
> inject errors to just one side of the mirror, and I'm not really
> sure that the ACPI/EINJ interface is up to the task.

OK.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-10 20:40         ` Luck, Tony
@ 2015-06-15  8:47           ` Kamezawa Hiroyuki
  2015-06-15 17:20             ` Luck, Tony
  0 siblings, 1 reply; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-15  8:47 UTC (permalink / raw)
  To: Luck, Tony, Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Linux MM, LKML

On 2015/06/11 5:40, Luck, Tony wrote:
>> I guess, mirrored memory should be allocated if !__GFP_HIGHMEM or !__GFP_MOVABLE
>
> HIGHMEM shouldn't matter - partial memory mirror only makes any sense on X86_64 systems ... 32-bit kernels
> don't even boot on systems with 64GB, and the minimum rational configuration for a machine that supports
> mirror is 128GB (4 cpu sockets * 2 memory controller per socket * 4 channels per controller * 4GB DIMM ...
> leaving any channels empty likely leaves you short of memory bandwidth for these high core count processors).
>
> MOVABLE is mostly the opposite of MIRROR - we never want to fill a kernel allocation from a MOVABLE page. I
> want all kernel allocations to be from MIRROR.
>

So, there are 3 ideas.

  (1) kernel only from MIRROR / user only from MOVABLE (Tony)
  (2) kernel only from MIRROR / user from MOVABLE + MIRROR(ASAP)  (AKPM suggested)
      This makes use of the fact MOVABLE memory is reclaimable but Tony pointed out
      the memory reclaim can be critical for GFP_ATOMIC.
  (3) kernel only from MIRROR / user from MOVABLE, special user from MIRROR (Xishi)

2 Implementation ideas.
   - creating ZONE
   - creating new alloation attribute

I don't convince whether we need some new structure in mm. Isn't it good to use
ZONE_MOVABLE for not-mirrored memory ?
Then, disable fallback from ZONE_MOVABLE -> ZONE_NORMAL for (1) and (3)

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-15  8:47           ` Kamezawa Hiroyuki
@ 2015-06-15 17:20             ` Luck, Tony
  2015-06-16  0:31               ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 62+ messages in thread
From: Luck, Tony @ 2015-06-15 17:20 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Linux MM, LKML

On Mon, Jun 15, 2015 at 05:47:27PM +0900, Kamezawa Hiroyuki wrote:
> So, there are 3 ideas.
> 
>  (1) kernel only from MIRROR / user only from MOVABLE (Tony)
>  (2) kernel only from MIRROR / user from MOVABLE + MIRROR(ASAP)  (AKPM suggested)
>      This makes use of the fact MOVABLE memory is reclaimable but Tony pointed out
>      the memory reclaim can be critical for GFP_ATOMIC.
>  (3) kernel only from MIRROR / user from MOVABLE, special user from MIRROR (Xishi)
> 
> 2 Implementation ideas.
>   - creating ZONE
>   - creating new alloation attribute
> 
> I don't convince whether we need some new structure in mm. Isn't it good to use
> ZONE_MOVABLE for not-mirrored memory ?
> Then, disable fallback from ZONE_MOVABLE -> ZONE_NORMAL for (1) and (3)

We might need to rename it ... right now the memory hotplug
people use ZONE_MOVABLE to indicate regions of physical memory
that can be removed from the system.  I'm wondering whether
people will want systems that have both removable and mirrored
areas?  Then we have four attribute combinations:

mirror=no  removable=no  - prefer to use for user, could use for kernel if we run out of mirror
mirror=no  removable=yes - can only be used for user (kernel allocation makes it not-removable)
mirror=yes removable=no  - use for kernel, possibly for special users if we define some interface
mirror=yes removable=yes - must not use for kernel ... would have to give to user ... seems like a bad idea to configure a system this way

-Tony

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-15 17:20             ` Luck, Tony
@ 2015-06-16  0:31               ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-16  0:31 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Linux MM, LKML

On 2015/06/16 2:20, Luck, Tony wrote:
> On Mon, Jun 15, 2015 at 05:47:27PM +0900, Kamezawa Hiroyuki wrote:
>> So, there are 3 ideas.
>>
>>   (1) kernel only from MIRROR / user only from MOVABLE (Tony)
>>   (2) kernel only from MIRROR / user from MOVABLE + MIRROR(ASAP)  (AKPM suggested)
>>       This makes use of the fact MOVABLE memory is reclaimable but Tony pointed out
>>       the memory reclaim can be critical for GFP_ATOMIC.
>>   (3) kernel only from MIRROR / user from MOVABLE, special user from MIRROR (Xishi)
>>
>> 2 Implementation ideas.
>>    - creating ZONE
>>    - creating new alloation attribute
>>
>> I don't convince whether we need some new structure in mm. Isn't it good to use
>> ZONE_MOVABLE for not-mirrored memory ?
>> Then, disable fallback from ZONE_MOVABLE -> ZONE_NORMAL for (1) and (3)
>
> We might need to rename it ... right now the memory hotplug
> people use ZONE_MOVABLE to indicate regions of physical memory
> that can be removed from the system.  I'm wondering whether
> people will want systems that have both removable and mirrored
> areas?  Then we have four attribute combinations:
>
> mirror=no  removable=no  - prefer to use for user, could use for kernel if we run out of mirror
> mirror=no  removable=yes - can only be used for user (kernel allocation makes it not-removable)
> mirror=yes removable=no  - use for kernel, possibly for special users if we define some interface
> mirror=yes removable=yes - must not use for kernel ... would have to give to user ... seems like a bad idea to configure a system this way
>

Thank you for clarification. I see "mirror=no, removable=no" case may require a new name.

IMHO, the value of Address-Based-Memory-Mirror is that users can protect their system's
important functions without using full-memory mirror. So, I feel thinking
"mirror=no, removable=no" just makes our discussion/implemenation complex without real
user value.

Shouldn't we start with just thiking 2 cases of
  mirror=no  removable=yes
  mirror=yes removable=no
?

And then, if the naming is problem, alias name can be added.

Thanks,
-Kame







^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
                   ` (12 preceding siblings ...)
  2015-06-12  8:42 ` [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Naoya Horiguchi
@ 2015-06-16  7:53 ` Vlastimil Babka
  2015-06-16  8:17   ` Xishi Qiu
  13 siblings, 1 reply; 62+ messages in thread
From: Vlastimil Babka @ 2015-06-16  7:53 UTC (permalink / raw)
  To: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Luck, Tony
  Cc: Linux MM, LKML

On 06/04/2015 02:54 PM, Xishi Qiu wrote:
> Intel Xeon processor E7 v3 product family-based platforms introduces support
> for partial memory mirroring called as 'Address Range Mirroring'. This feature
> allows BIOS to specify a subset of total available memory to be mirrored (and
> optionally also specify whether to mirror the range 0-4 GB). This capability
> allows user to make an appropriate tradeoff between non-mirrored memory range
> and mirrored memory range thus optimizing total available memory and still
> achieving highly reliable memory range for mission critical workloads and/or
> kernel space.
> 
> Tony has already send a patchset to supprot this feature at boot time.
> https://lkml.org/lkml/2015/5/8/521
> 
> This patchset can support the feature after boot time. It introduces mirror_info
> to save the mirrored memory range. Then use __GFP_MIRROR to allocate mirrored 
> pages. 
> 
> I think add a new migratetype is btter and easier than a new zone, so I use

If the mirrored memory is in a single reasonably compact (no large holes) range
(per NUMA node) and won't dynamically change its size, then zone might be a
better option. For one thing, it will still allow distinguishing movable and
unmovable allocations within the mirrored memory.

We had enough fun with MIGRATE_CMA and all kinds of checks it added to allocator
hot paths, and even CMA is now considering moving to a separate zone.

> MIGRATE_MIRROR to manage the mirrored pages. However it changed some code in the
> core file, please review and comment, thanks.
> 
> TBD: 
> 1) call add_mirror_info() to fill mirrored memory info.
> 2) add compatibility with memory online/offline.
> 3) add more interface? others?
> 
> Xishi Qiu (12):
>   mm: add a new config to manage the code
>   mm: introduce mirror_info
>   mm: introduce MIGRATE_MIRROR to manage the mirrored pages
>   mm: add mirrored pages to buddy system
>   mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES
>   mm: add free mirrored pages info
>   mm: introduce __GFP_MIRROR to allocate mirrored pages
>   mm: use mirrorable to switch allocate mirrored memory
>   mm: enable allocate mirrored memory at boot time
>   mm: add the buddy system interface
>   mm: add the PCP interface
>   mm: let slab/slub/slob use mirrored memory
> 
>  arch/x86/mm/numa.c     |   3 ++
>  drivers/base/node.c    |  17 ++++---
>  fs/proc/meminfo.c      |   6 +++
>  include/linux/gfp.h    |   5 +-
>  include/linux/mmzone.h |  23 +++++++++
>  include/linux/vmstat.h |   2 +
>  kernel/sysctl.c        |   9 ++++
>  mm/Kconfig             |   8 +++
>  mm/page_alloc.c        | 134 ++++++++++++++++++++++++++++++++++++++++++++++---
>  mm/slab.c              |   3 +-
>  mm/slob.c              |   2 +-
>  mm/slub.c              |   2 +-
>  mm/vmstat.c            |   4 ++
>  13 files changed, 202 insertions(+), 16 deletions(-)
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-16  7:53 ` Vlastimil Babka
@ 2015-06-16  8:17   ` Xishi Qiu
  2015-06-16  9:46     ` Vlastimil Babka
  0 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-16  8:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/6/16 15:53, Vlastimil Babka wrote:

> On 06/04/2015 02:54 PM, Xishi Qiu wrote:
>> Intel Xeon processor E7 v3 product family-based platforms introduces support
>> for partial memory mirroring called as 'Address Range Mirroring'. This feature
>> allows BIOS to specify a subset of total available memory to be mirrored (and
>> optionally also specify whether to mirror the range 0-4 GB). This capability
>> allows user to make an appropriate tradeoff between non-mirrored memory range
>> and mirrored memory range thus optimizing total available memory and still
>> achieving highly reliable memory range for mission critical workloads and/or
>> kernel space.
>>
>> Tony has already send a patchset to supprot this feature at boot time.
>> https://lkml.org/lkml/2015/5/8/521
>>
>> This patchset can support the feature after boot time. It introduces mirror_info
>> to save the mirrored memory range. Then use __GFP_MIRROR to allocate mirrored 
>> pages. 
>>
>> I think add a new migratetype is btter and easier than a new zone, so I use
> 
> If the mirrored memory is in a single reasonably compact (no large holes) range
> (per NUMA node) and won't dynamically change its size, then zone might be a
> better option. For one thing, it will still allow distinguishing movable and
> unmovable allocations within the mirrored memory.
> 
> We had enough fun with MIGRATE_CMA and all kinds of checks it added to allocator
> hot paths, and even CMA is now considering moving to a separate zone.
> 

Hi, how about the problem of this case:
e.g. node 0: 0-4G(dma and dma32)
     node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal),
so more than one normal zone in a node? or normal zone just span the mirror zone?

Thanks,
Xishi Qiu


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-16  8:17   ` Xishi Qiu
@ 2015-06-16  9:46     ` Vlastimil Babka
  2015-06-18  1:23       ` Xishi Qiu
  0 siblings, 1 reply; 62+ messages in thread
From: Vlastimil Babka @ 2015-06-16  9:46 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 06/16/2015 10:17 AM, Xishi Qiu wrote:
> On 2015/6/16 15:53, Vlastimil Babka wrote:
> 
>> On 06/04/2015 02:54 PM, Xishi Qiu wrote:
>>>
>>> I think add a new migratetype is btter and easier than a new zone, so I use
>> 
>> If the mirrored memory is in a single reasonably compact (no large holes) range
>> (per NUMA node) and won't dynamically change its size, then zone might be a
>> better option. For one thing, it will still allow distinguishing movable and
>> unmovable allocations within the mirrored memory.
>> 
>> We had enough fun with MIGRATE_CMA and all kinds of checks it added to allocator
>> hot paths, and even CMA is now considering moving to a separate zone.
>> 
> 
> Hi, how about the problem of this case:
> e.g. node 0: 0-4G(dma and dma32)
>      node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal),
> so more than one normal zone in a node? or normal zone just span the mirror zone?

Normal zone can span the mirror zone just fine. However, it will result in zone
scanners such as compaction to skip over the mirror zone inefficiently. Hmm...


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-16  9:46     ` Vlastimil Babka
@ 2015-06-18  1:23       ` Xishi Qiu
  2015-06-18  5:58         ` Vlastimil Babka
  0 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-18  1:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/6/16 17:46, Vlastimil Babka wrote:

> On 06/16/2015 10:17 AM, Xishi Qiu wrote:
>> On 2015/6/16 15:53, Vlastimil Babka wrote:
>>
>>> On 06/04/2015 02:54 PM, Xishi Qiu wrote:
>>>>
>>>> I think add a new migratetype is btter and easier than a new zone, so I use
>>>
>>> If the mirrored memory is in a single reasonably compact (no large holes) range
>>> (per NUMA node) and won't dynamically change its size, then zone might be a
>>> better option. For one thing, it will still allow distinguishing movable and
>>> unmovable allocations within the mirrored memory.
>>>
>>> We had enough fun with MIGRATE_CMA and all kinds of checks it added to allocator
>>> hot paths, and even CMA is now considering moving to a separate zone.
>>>
>>
>> Hi, how about the problem of this case:
>> e.g. node 0: 0-4G(dma and dma32)
>>      node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal),
>> so more than one normal zone in a node? or normal zone just span the mirror zone?
> 
> Normal zone can span the mirror zone just fine. However, it will result in zone
> scanners such as compaction to skip over the mirror zone inefficiently. Hmm...
> 

Hi Vlastimil,

If there are many mirror regions in one node, then it will be many holes in the
normal zone, is this fine?

Thanks,
Xishi Qiu

> 
> .
> 




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-18  1:23       ` Xishi Qiu
@ 2015-06-18  5:58         ` Vlastimil Babka
  2015-06-18  9:37           ` Xishi Qiu
  0 siblings, 1 reply; 62+ messages in thread
From: Vlastimil Babka @ 2015-06-18  5:58 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 18.6.2015 3:23, Xishi Qiu wrote:
> On 2015/6/16 17:46, Vlastimil Babka wrote:
> 
>> On 06/16/2015 10:17 AM, Xishi Qiu wrote:
>>> On 2015/6/16 15:53, Vlastimil Babka wrote:
>>>
>>>> On 06/04/2015 02:54 PM, Xishi Qiu wrote:
>>>>>
>>>>> I think add a new migratetype is btter and easier than a new zone, so I use
>>>>
>>>> If the mirrored memory is in a single reasonably compact (no large holes) range
>>>> (per NUMA node) and won't dynamically change its size, then zone might be a
>>>> better option. For one thing, it will still allow distinguishing movable and
>>>> unmovable allocations within the mirrored memory.
>>>>
>>>> We had enough fun with MIGRATE_CMA and all kinds of checks it added to allocator
>>>> hot paths, and even CMA is now considering moving to a separate zone.
>>>>
>>>
>>> Hi, how about the problem of this case:
>>> e.g. node 0: 0-4G(dma and dma32)
>>>      node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal),
>>> so more than one normal zone in a node? or normal zone just span the mirror zone?
>>
>> Normal zone can span the mirror zone just fine. However, it will result in zone
>> scanners such as compaction to skip over the mirror zone inefficiently. Hmm...

On the other hand, it would skip just as inefficiently over MIGRATE_MIRROR
pageblocks within a Normal zone. Since migrating pages between MIGRATE_MIRROR
and other types pageblocks would violate what the allocations requested.

Having separate zone instead would allow compaction to run specifically on the
zone and defragment movable allocations there (i.e. userspace pages if/when
userspace requesting mirrored memory is supported).

>>
> 
> Hi Vlastimil,
> 
> If there are many mirror regions in one node, then it will be many holes in the
> normal zone, is this fine?

Yeah, it doesn't matter how many holes there are.

> Thanks,
> Xishi Qiu
> 
>>
>> .
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-18  5:58         ` Vlastimil Babka
@ 2015-06-18  9:37           ` Xishi Qiu
  2015-06-18  9:55             ` Vlastimil Babka
  0 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-18  9:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/6/18 13:58, Vlastimil Babka wrote:

> On 18.6.2015 3:23, Xishi Qiu wrote:
>> On 2015/6/16 17:46, Vlastimil Babka wrote:
>>
>>> On 06/16/2015 10:17 AM, Xishi Qiu wrote:
>>>> On 2015/6/16 15:53, Vlastimil Babka wrote:
>>>>
>>>>> On 06/04/2015 02:54 PM, Xishi Qiu wrote:
>>>>>>
>>>>>> I think add a new migratetype is btter and easier than a new zone, so I use
>>>>>
>>>>> If the mirrored memory is in a single reasonably compact (no large holes) range
>>>>> (per NUMA node) and won't dynamically change its size, then zone might be a
>>>>> better option. For one thing, it will still allow distinguishing movable and
>>>>> unmovable allocations within the mirrored memory.
>>>>>
>>>>> We had enough fun with MIGRATE_CMA and all kinds of checks it added to allocator
>>>>> hot paths, and even CMA is now considering moving to a separate zone.
>>>>>
>>>>
>>>> Hi, how about the problem of this case:
>>>> e.g. node 0: 0-4G(dma and dma32)
>>>>      node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal),
>>>> so more than one normal zone in a node? or normal zone just span the mirror zone?
>>>
>>> Normal zone can span the mirror zone just fine. However, it will result in zone
>>> scanners such as compaction to skip over the mirror zone inefficiently. Hmm...
> 
> On the other hand, it would skip just as inefficiently over MIGRATE_MIRROR
> pageblocks within a Normal zone. Since migrating pages between MIGRATE_MIRROR
> and other types pageblocks would violate what the allocations requested.
> 
> Having separate zone instead would allow compaction to run specifically on the
> zone and defragment movable allocations there (i.e. userspace pages if/when
> userspace requesting mirrored memory is supported).
> 
>>>
>>
>> Hi Vlastimil,
>>
>> If there are many mirror regions in one node, then it will be many holes in the
>> normal zone, is this fine?
> 
> Yeah, it doesn't matter how many holes there are.

So mirror zone and normal zone will span each other, right?

e.g. node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal), 16-24G(mirror), 24-28G(normal) ...
normal: start=4G, size=28-4=24G,
mirror: start=8G, size=24-8=16G,

I think zone is defined according to the special address range, like 16M(DMA), 4G(DMA32),
and is it appropriate to add a new mirror zone with a volatile physical address?

Thanks,
Xishi Qiu


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-18  9:37           ` Xishi Qiu
@ 2015-06-18  9:55             ` Vlastimil Babka
  2015-06-18 20:33               ` Luck, Tony
  0 siblings, 1 reply; 62+ messages in thread
From: Vlastimil Babka @ 2015-06-18  9:55 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 06/18/2015 11:37 AM, Xishi Qiu wrote:
> On 2015/6/18 13:58, Vlastimil Babka wrote:
>
>> On 18.6.2015 3:23, Xishi Qiu wrote:
>>> On 2015/6/16 17:46, Vlastimil Babka wrote:
>>>
>>
>> On the other hand, it would skip just as inefficiently over MIGRATE_MIRROR
>> pageblocks within a Normal zone. Since migrating pages between MIGRATE_MIRROR
>> and other types pageblocks would violate what the allocations requested.
>>
>> Having separate zone instead would allow compaction to run specifically on the
>> zone and defragment movable allocations there (i.e. userspace pages if/when
>> userspace requesting mirrored memory is supported).
>>
>>>>
>>>
>>> Hi Vlastimil,
>>>
>>> If there are many mirror regions in one node, then it will be many holes in the
>>> normal zone, is this fine?
>>
>> Yeah, it doesn't matter how many holes there are.
>
> So mirror zone and normal zone will span each other, right?
>
> e.g. node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal), 16-24G(mirror), 24-28G(normal) ...
> normal: start=4G, size=28-4=24G,
> mirror: start=8G, size=24-8=16G,

Yes, that works. It's somewhat unfortunate wrt performance that the 
hardware does it like this though.

> I think zone is defined according to the special address range, like 16M(DMA), 4G(DMA32),

Traditionally yes. But then there is ZONE_MOVABLE, this year's LSF/MM we 
discussed (and didn't outright deny) ZONE_CMA...
I'm not saying others will favour the new zone approach though, it's 
just my opinion that it might be a better option than a new migratetype.

> and is it appropriate to add a new mirror zone with a volatile physical address?

By "volatile" you mean what, that the example above would change 
dynamically? That would be rather challenging...

> Thanks,
> Xishi Qiu
>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-18  9:55             ` Vlastimil Babka
@ 2015-06-18 20:33               ` Luck, Tony
  2015-06-19  1:36                 ` Xishi Qiu
  0 siblings, 1 reply; 62+ messages in thread
From: Luck, Tony @ 2015-06-18 20:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Xishi Qiu, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Linux MM, LKML

On Thu, Jun 18, 2015 at 11:55:42AM +0200, Vlastimil Babka wrote:
> >>>If there are many mirror regions in one node, then it will be many holes in the
> >>>normal zone, is this fine?
> >>
> >>Yeah, it doesn't matter how many holes there are.
> >
> >So mirror zone and normal zone will span each other, right?
> >
> >e.g. node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal), 16-24G(mirror), 24-28G(normal) ...
> >normal: start=4G, size=28-4=24G,
> >mirror: start=8G, size=24-8=16G,
> 
> Yes, that works. It's somewhat unfortunate wrt performance that the hardware
> does it like this though.

With current Xeon h/w you can have one mirrored range per memory
controller ... and there are two memory controllers on a cpu socket,
so two mirrored ranges per node.  So a map might look like:

SKT0: MC0: 0-2G Mirrored (but we may want to ignore mirror here to keep it for ZONE_DMA)
SKT0: MC0: 2G-4G No memory ... I/O mapping area
SKT0: MC0: 4G-34G Not mirrored
SKT0: MC1: 34G-40G Mirrored
SKT0: MC1: 40G-66G Not mirrored

SKT1: MC0: 66G-70G Mirror
SKT1: MC0: 70G-98G Not Mirrored
SKT1: MC1: 98G-102G Mirror
SKT1: MC1: 102G-130G Not Mirrored

... and so on.

> >I think zone is defined according to the special address range, like 16M(DMA), 4G(DMA32),
> 
> Traditionally yes. But then there is ZONE_MOVABLE, this year's LSF/MM we
> discussed (and didn't outright deny) ZONE_CMA...
> I'm not saying others will favour the new zone approach though, it's just my
> opinion that it might be a better option than a new migratetype.

If we are going to have lots of zones ... then perhaps we will
need a fast way to look at a "struct page" and decide which zone
it belongs to.  Complicated math on the address deosn't sound ideal.
If the complex zone model is just for 64-bit, are there enough bits
available in page->flags (3 bits for 8 options ... which we are close
to filling now ... 4 bits for future breathing room).

> >and is it appropriate to add a new mirror zone with a volatile physical address?
> 
> By "volatile" you mean what, that the example above would change
> dynamically? That would be rather challenging...

If we hot-add another cpu together with on die memory controllers connected
to more memory ... then some of the new memory might be mirrored.  Current
h/w doesn't allow mirrored areas to grow/shrink (though if there are a lot
of errors we may break a mirror so a whole range could lose the mirror attribute).

-Tony

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-18 20:33               ` Luck, Tony
@ 2015-06-19  1:36                 ` Xishi Qiu
  2015-06-19 18:42                   ` Luck, Tony
  0 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-19  1:36 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Vlastimil Babka, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Linux MM, LKML

On 2015/6/19 4:33, Luck, Tony wrote:

> On Thu, Jun 18, 2015 at 11:55:42AM +0200, Vlastimil Babka wrote:
>>>>> If there are many mirror regions in one node, then it will be many holes in the
>>>>> normal zone, is this fine?
>>>>
>>>> Yeah, it doesn't matter how many holes there are.
>>>
>>> So mirror zone and normal zone will span each other, right?
>>>
>>> e.g. node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal), 16-24G(mirror), 24-28G(normal) ...
>>> normal: start=4G, size=28-4=24G,
>>> mirror: start=8G, size=24-8=16G,
>>
>> Yes, that works. It's somewhat unfortunate wrt performance that the hardware
>> does it like this though.
> 
> With current Xeon h/w you can have one mirrored range per memory
> controller ... and there are two memory controllers on a cpu socket,
> so two mirrored ranges per node.  So a map might look like:
> 
> SKT0: MC0: 0-2G Mirrored (but we may want to ignore mirror here to keep it for ZONE_DMA)
> SKT0: MC0: 2G-4G No memory ... I/O mapping area
> SKT0: MC0: 4G-34G Not mirrored
> SKT0: MC1: 34G-40G Mirrored
> SKT0: MC1: 40G-66G Not mirrored
> 
> SKT1: MC0: 66G-70G Mirror
> SKT1: MC0: 70G-98G Not Mirrored
> SKT1: MC1: 98G-102G Mirror
> SKT1: MC1: 102G-130G Not Mirrored
> 
> ... and so on.
> 
>>> I think zone is defined according to the special address range, like 16M(DMA), 4G(DMA32),
>>
>> Traditionally yes. But then there is ZONE_MOVABLE, this year's LSF/MM we
>> discussed (and didn't outright deny) ZONE_CMA...
>> I'm not saying others will favour the new zone approach though, it's just my
>> opinion that it might be a better option than a new migratetype.
> 
> If we are going to have lots of zones ... then perhaps we will
> need a fast way to look at a "struct page" and decide which zone
> it belongs to.  Complicated math on the address deosn't sound ideal.
> If the complex zone model is just for 64-bit, are there enough bits
> available in page->flags (3 bits for 8 options ... which we are close
> to filling now ... 4 bits for future breathing room).
> 
>>> and is it appropriate to add a new mirror zone with a volatile physical address?
>>
>> By "volatile" you mean what, that the example above would change
>> dynamically? That would be rather challenging...
> 
> If we hot-add another cpu together with on die memory controllers connected
> to more memory ... then some of the new memory might be mirrored.  Current
> h/w doesn't allow mirrored areas to grow/shrink (though if there are a lot
> of errors we may break a mirror so a whole range could lose the mirror attribute).
> 
> -Tony
> 

Hi Tony,

What's your suggestions? a new zone or a new migratetype?
Maybe add a new zone will change more mm code.

Thanks,
Xishi Qiu

> .
> 




^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations
  2015-06-19  1:36                 ` Xishi Qiu
@ 2015-06-19 18:42                   ` Luck, Tony
  0 siblings, 0 replies; 62+ messages in thread
From: Luck, Tony @ 2015-06-19 18:42 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Vlastimil Babka, Andrew Morton, nao.horiguchi, Yinghai Lu,
	H. Peter Anvin, Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo,
	Linux MM, LKML

> What's your suggestions? a new zone or a new migratetype?
> Maybe add a new zone will change more mm code.

I don't understand this code well enough (yet) to make a recommendation.  I think
our primary concern may not be "how much code we change", but more "how can
we minimize the run-time impact on systems that don't have any mirrored memory.

Just putting all the heavy work behind a CONFIG option isn't sufficient ... we want
enterprise distributions to ship with the option turned on ... even though most
machines won't be using this feature.

-Tony


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-10  3:06       ` Kamezawa Hiroyuki
  2015-06-10 20:40         ` Luck, Tony
@ 2015-06-25  9:44         ` Xishi Qiu
  2015-06-25 23:54           ` Kamezawa Hiroyuki
  1 sibling, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-25  9:44 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/6/10 11:06, Kamezawa Hiroyuki wrote:

> On 2015/06/09 19:04, Xishi Qiu wrote:
>> On 2015/6/9 15:12, Kamezawa Hiroyuki wrote:
>>
>>> On 2015/06/04 22:04, Xishi Qiu wrote:
>>>> Add the buddy system interface for address range mirroring feature.
>>>> Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
>>>> left, use other types pages.
>>>>
>>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>>> ---
>>>>    mm/page_alloc.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>>>    1 file changed, 39 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index d4d2066..0fb55288 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -599,6 +599,26 @@ static inline bool is_mirror_pfn(unsigned long pfn)
>>>>
>>>>        return false;
>>>>    }
>>>> +
>>>> +static inline bool change_to_mirror(gfp_t gfp_flags, int high_zoneidx)
>>>> +{
>>>> +    /*
>>>> +     * Do not alloc mirrored memory below 4G, because 0-4G is
>>>> +     * all mirrored by default, and the list is always empty.
>>>> +     */
>>>> +    if (high_zoneidx < ZONE_NORMAL)
>>>> +        return false;
>>>> +
>>>> +    /* Alloc mirrored memory for only kernel */
>>>> +    if (gfp_flags & __GFP_MIRROR)
>>>> +        return true;
>>>
>>> GFP_KERNEL itself should imply mirror, I think.
>>>
>>
>> Hi Kame,
>>
>> How about like this: #define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_MIRROR) ?
>>
> 
> Hm.... it cannot cover GFP_ATOMIC at el.
> 
> I guess, mirrored memory should be allocated if !__GFP_HIGHMEM or !__GFP_MOVABLE


Hi Kame,

Can we distinguish allocations form user or kernel only by GFP flags?

Thanks,
Xishi Qiu


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-25  9:44         ` Xishi Qiu
@ 2015-06-25 23:54           ` Kamezawa Hiroyuki
  2015-06-26  1:43             ` Xishi Qiu
  0 siblings, 1 reply; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-25 23:54 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/06/25 18:44, Xishi Qiu wrote:
> On 2015/6/10 11:06, Kamezawa Hiroyuki wrote:
>
>> On 2015/06/09 19:04, Xishi Qiu wrote:
>>> On 2015/6/9 15:12, Kamezawa Hiroyuki wrote:
>>>
>>>> On 2015/06/04 22:04, Xishi Qiu wrote:
>>>>> Add the buddy system interface for address range mirroring feature.
>>>>> Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
>>>>> left, use other types pages.
>>>>>
>>>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>>>> ---
>>>>>     mm/page_alloc.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>>>>     1 file changed, 39 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index d4d2066..0fb55288 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -599,6 +599,26 @@ static inline bool is_mirror_pfn(unsigned long pfn)
>>>>>
>>>>>         return false;
>>>>>     }
>>>>> +
>>>>> +static inline bool change_to_mirror(gfp_t gfp_flags, int high_zoneidx)
>>>>> +{
>>>>> +    /*
>>>>> +     * Do not alloc mirrored memory below 4G, because 0-4G is
>>>>> +     * all mirrored by default, and the list is always empty.
>>>>> +     */
>>>>> +    if (high_zoneidx < ZONE_NORMAL)
>>>>> +        return false;
>>>>> +
>>>>> +    /* Alloc mirrored memory for only kernel */
>>>>> +    if (gfp_flags & __GFP_MIRROR)
>>>>> +        return true;
>>>>
>>>> GFP_KERNEL itself should imply mirror, I think.
>>>>
>>>
>>> Hi Kame,
>>>
>>> How about like this: #define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_MIRROR) ?
>>>
>>
>> Hm.... it cannot cover GFP_ATOMIC at el.
>>
>> I guess, mirrored memory should be allocated if !__GFP_HIGHMEM or !__GFP_MOVABLE
>
>
> Hi Kame,
>
> Can we distinguish allocations form user or kernel only by GFP flags?
>

Allocation from user and file caches are now *always* done with __GFP_MOVABLE.

By this, pages will be allocated from MIGRATE_MOVABLE migration type.
MOVABLE migration type means it's can
be the target for page compaction or memory-hot-remove.

Thanks,
-Kame








^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-25 23:54           ` Kamezawa Hiroyuki
@ 2015-06-26  1:43             ` Xishi Qiu
  2015-06-26  8:34               ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-26  1:43 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/6/26 7:54, Kamezawa Hiroyuki wrote:

> On 2015/06/25 18:44, Xishi Qiu wrote:
>> On 2015/6/10 11:06, Kamezawa Hiroyuki wrote:
>>
>>> On 2015/06/09 19:04, Xishi Qiu wrote:
>>>> On 2015/6/9 15:12, Kamezawa Hiroyuki wrote:
>>>>
>>>>> On 2015/06/04 22:04, Xishi Qiu wrote:
>>>>>> Add the buddy system interface for address range mirroring feature.
>>>>>> Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
>>>>>> left, use other types pages.
>>>>>>
>>>>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>>>>> ---
>>>>>>     mm/page_alloc.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>>>>>     1 file changed, 39 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>> index d4d2066..0fb55288 100644
>>>>>> --- a/mm/page_alloc.c
>>>>>> +++ b/mm/page_alloc.c
>>>>>> @@ -599,6 +599,26 @@ static inline bool is_mirror_pfn(unsigned long pfn)
>>>>>>
>>>>>>         return false;
>>>>>>     }
>>>>>> +
>>>>>> +static inline bool change_to_mirror(gfp_t gfp_flags, int high_zoneidx)
>>>>>> +{
>>>>>> +    /*
>>>>>> +     * Do not alloc mirrored memory below 4G, because 0-4G is
>>>>>> +     * all mirrored by default, and the list is always empty.
>>>>>> +     */
>>>>>> +    if (high_zoneidx < ZONE_NORMAL)
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /* Alloc mirrored memory for only kernel */
>>>>>> +    if (gfp_flags & __GFP_MIRROR)
>>>>>> +        return true;
>>>>>
>>>>> GFP_KERNEL itself should imply mirror, I think.
>>>>>
>>>>
>>>> Hi Kame,
>>>>
>>>> How about like this: #define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_MIRROR) ?
>>>>
>>>
>>> Hm.... it cannot cover GFP_ATOMIC at el.
>>>
>>> I guess, mirrored memory should be allocated if !__GFP_HIGHMEM or !__GFP_MOVABLE
>>
>>
>> Hi Kame,
>>
>> Can we distinguish allocations form user or kernel only by GFP flags?
>>
> 
> Allocation from user and file caches are now *always* done with __GFP_MOVABLE.
> 
> By this, pages will be allocated from MIGRATE_MOVABLE migration type.
> MOVABLE migration type means it's can
> be the target for page compaction or memory-hot-remove.
> 
> Thanks,
> -Kame
> 

So if we want all kernel memory allocated from mirror, how about change like this?
__alloc_pages_nodemask()
  gfpflags_to_migratetype()
    if (!(gfp_mask & __GFP_MOVABLE))
	return MIGRATE_MIRROR

Thanks,
Xishi Qiu

> 
> 
> 
> 
> 
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-26  1:43             ` Xishi Qiu
@ 2015-06-26  8:34               ` Kamezawa Hiroyuki
  2015-06-26 10:38                 ` Xishi Qiu
  0 siblings, 1 reply; 62+ messages in thread
From: Kamezawa Hiroyuki @ 2015-06-26  8:34 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/06/26 10:43, Xishi Qiu wrote:
> On 2015/6/26 7:54, Kamezawa Hiroyuki wrote:
>
>> On 2015/06/25 18:44, Xishi Qiu wrote:
>>> On 2015/6/10 11:06, Kamezawa Hiroyuki wrote:
>>>
>>>> On 2015/06/09 19:04, Xishi Qiu wrote:
>>>>> On 2015/6/9 15:12, Kamezawa Hiroyuki wrote:
>>>>>
>>>>>> On 2015/06/04 22:04, Xishi Qiu wrote:
>>>>>>> Add the buddy system interface for address range mirroring feature.
>>>>>>> Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
>>>>>>> left, use other types pages.
>>>>>>>
>>>>>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>>>>>> ---
>>>>>>>      mm/page_alloc.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>>>>>>      1 file changed, 39 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>> index d4d2066..0fb55288 100644
>>>>>>> --- a/mm/page_alloc.c
>>>>>>> +++ b/mm/page_alloc.c
>>>>>>> @@ -599,6 +599,26 @@ static inline bool is_mirror_pfn(unsigned long pfn)
>>>>>>>
>>>>>>>          return false;
>>>>>>>      }
>>>>>>> +
>>>>>>> +static inline bool change_to_mirror(gfp_t gfp_flags, int high_zoneidx)
>>>>>>> +{
>>>>>>> +    /*
>>>>>>> +     * Do not alloc mirrored memory below 4G, because 0-4G is
>>>>>>> +     * all mirrored by default, and the list is always empty.
>>>>>>> +     */
>>>>>>> +    if (high_zoneidx < ZONE_NORMAL)
>>>>>>> +        return false;
>>>>>>> +
>>>>>>> +    /* Alloc mirrored memory for only kernel */
>>>>>>> +    if (gfp_flags & __GFP_MIRROR)
>>>>>>> +        return true;
>>>>>>
>>>>>> GFP_KERNEL itself should imply mirror, I think.
>>>>>>
>>>>>
>>>>> Hi Kame,
>>>>>
>>>>> How about like this: #define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_MIRROR) ?
>>>>>
>>>>
>>>> Hm.... it cannot cover GFP_ATOMIC at el.
>>>>
>>>> I guess, mirrored memory should be allocated if !__GFP_HIGHMEM or !__GFP_MOVABLE
>>>
>>>
>>> Hi Kame,
>>>
>>> Can we distinguish allocations form user or kernel only by GFP flags?
>>>
>>
>> Allocation from user and file caches are now *always* done with __GFP_MOVABLE.
>>
>> By this, pages will be allocated from MIGRATE_MOVABLE migration type.
>> MOVABLE migration type means it's can
>> be the target for page compaction or memory-hot-remove.
>>
>> Thanks,
>> -Kame
>>
>
> So if we want all kernel memory allocated from mirror, how about change like this?
> __alloc_pages_nodemask()
>    gfpflags_to_migratetype()
>      if (!(gfp_mask & __GFP_MOVABLE))
> 	return MIGRATE_MIRROR

Maybe used with jump label can reduce performance impact.
==
static inline bool memory_mirror_enabled(void)
{
         return static_key_false(&memory_mirror_enabled);
}



gfpflags_to_migratetype()
   if (memory_mirror_enabled()) { /* We want to mirror all unmovable pages */
       if (!(gfp_mask & __GFP_MOVABLE))
            return MIGRATE_MIRROR
   }
==

BTW, I think current memory compaction code scans ranges of MOVABLE migrate type.
So, if you use other migration type than MOVABLE for user pages, you may see
page fragmentation. If you want to expand this MIRROR to user pages, please check
mm/compaction.c


Thanks,
-Kame





^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-26  8:34               ` Kamezawa Hiroyuki
@ 2015-06-26 10:38                 ` Xishi Qiu
  2015-06-26 18:42                   ` Luck, Tony
  0 siblings, 1 reply; 62+ messages in thread
From: Xishi Qiu @ 2015-06-26 10:38 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Luck, Tony,
	Linux MM, LKML

On 2015/6/26 16:34, Kamezawa Hiroyuki wrote:

> On 2015/06/26 10:43, Xishi Qiu wrote:
>> On 2015/6/26 7:54, Kamezawa Hiroyuki wrote:
>>
>>> On 2015/06/25 18:44, Xishi Qiu wrote:
>>>> On 2015/6/10 11:06, Kamezawa Hiroyuki wrote:
>>>>
>>>>> On 2015/06/09 19:04, Xishi Qiu wrote:
>>>>>> On 2015/6/9 15:12, Kamezawa Hiroyuki wrote:
>>>>>>
>>>>>>> On 2015/06/04 22:04, Xishi Qiu wrote:
>>>>>>>> Add the buddy system interface for address range mirroring feature.
>>>>>>>> Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
>>>>>>>> left, use other types pages.
>>>>>>>>
>>>>>>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>>>>>>> ---
>>>>>>>>      mm/page_alloc.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>>>>>>>      1 file changed, 39 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>>> index d4d2066..0fb55288 100644
>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>> @@ -599,6 +599,26 @@ static inline bool is_mirror_pfn(unsigned long pfn)
>>>>>>>>
>>>>>>>>          return false;
>>>>>>>>      }
>>>>>>>> +
>>>>>>>> +static inline bool change_to_mirror(gfp_t gfp_flags, int high_zoneidx)
>>>>>>>> +{
>>>>>>>> +    /*
>>>>>>>> +     * Do not alloc mirrored memory below 4G, because 0-4G is
>>>>>>>> +     * all mirrored by default, and the list is always empty.
>>>>>>>> +     */
>>>>>>>> +    if (high_zoneidx < ZONE_NORMAL)
>>>>>>>> +        return false;
>>>>>>>> +
>>>>>>>> +    /* Alloc mirrored memory for only kernel */
>>>>>>>> +    if (gfp_flags & __GFP_MIRROR)
>>>>>>>> +        return true;
>>>>>>>
>>>>>>> GFP_KERNEL itself should imply mirror, I think.
>>>>>>>
>>>>>>
>>>>>> Hi Kame,
>>>>>>
>>>>>> How about like this: #define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_MIRROR) ?
>>>>>>
>>>>>
>>>>> Hm.... it cannot cover GFP_ATOMIC at el.
>>>>>
>>>>> I guess, mirrored memory should be allocated if !__GFP_HIGHMEM or !__GFP_MOVABLE
>>>>
>>>>
>>>> Hi Kame,
>>>>
>>>> Can we distinguish allocations form user or kernel only by GFP flags?
>>>>
>>>
>>> Allocation from user and file caches are now *always* done with __GFP_MOVABLE.
>>>
>>> By this, pages will be allocated from MIGRATE_MOVABLE migration type.
>>> MOVABLE migration type means it's can
>>> be the target for page compaction or memory-hot-remove.
>>>
>>> Thanks,
>>> -Kame
>>>
>>
>> So if we want all kernel memory allocated from mirror, how about change like this?
>> __alloc_pages_nodemask()
>>    gfpflags_to_migratetype()
>>      if (!(gfp_mask & __GFP_MOVABLE))
>>     return MIGRATE_MIRROR
> 
> Maybe used with jump label can reduce performance impact.

Hi Kame,

I am not understand jump label, but I wil try.

> ==
> static inline bool memory_mirror_enabled(void)
> {
>         return static_key_false(&memory_mirror_enabled);
> }
> 
> 
> 
> gfpflags_to_migratetype()
>   if (memory_mirror_enabled()) { /* We want to mirror all unmovable pages */
>       if (!(gfp_mask & __GFP_MOVABLE))
>            return MIGRATE_MIRROR
>   }
> ==
> 
> BTW, I think current memory compaction code scans ranges of MOVABLE migrate type.
> So, if you use other migration type than MOVABLE for user pages, you may see
> page fragmentation. If you want to expand this MIRROR to user pages, please check
> mm/compaction.c
> 

As Tony said "how can we minimize the run-time impact on systems that don't have
any mirrored memory.", I think the idea "kernel only from MIRROR / user only from
MOVABLE" may be better.

Thanks,
Xishi Qiu




^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [RFC PATCH 10/12] mm: add the buddy system interface
  2015-06-26 10:38                 ` Xishi Qiu
@ 2015-06-26 18:42                   ` Luck, Tony
  0 siblings, 0 replies; 62+ messages in thread
From: Luck, Tony @ 2015-06-26 18:42 UTC (permalink / raw)
  To: Xishi Qiu, Kamezawa Hiroyuki
  Cc: Andrew Morton, nao.horiguchi, Yinghai Lu, H. Peter Anvin,
	Thomas Gleixner, mingo, Xiexiuqi, Hanjun Guo, Linux MM, LKML

> gfpflags_to_migratetype()
>   if (memory_mirror_enabled()) { /* We want to mirror all unmovable pages */
>       if (!(gfp_mask & __GFP_MOVABLE))
>            return MIGRATE_MIRROR
>   }

I'm not sure that we can divide memory into just two buckets of "mirrored" and "movable".

My expectation is that there will be memory that is neither mirrored, nor movable.  We'd
allocate that memory to user proceses.  Uncorrected errors in that memory would result
in the death of the process (except in the case where the page is a clean copy mapped from
a disk file ... e.g. .text mapping instructions from an executable).  Linux would offline
the affected 4K page so as not to hit the problem again.

-Tony

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2015-06-26 18:42 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-04 12:54 [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Xishi Qiu
2015-06-04 12:56 ` [RFC PATCH 01/12] mm: add a new config to manage the code Xishi Qiu
2015-06-08 11:52   ` Leon Romanovsky
2015-06-08 15:14     ` Luck, Tony
2015-06-08 16:36       ` Leon Romanovsky
2015-06-09  6:44   ` Kamezawa Hiroyuki
2015-06-09 10:10     ` Xishi Qiu
2015-06-10  3:07       ` Kamezawa Hiroyuki
2015-06-04 12:57 ` [RFC PATCH 02/12] mm: introduce mirror_info Xishi Qiu
2015-06-04 16:57   ` Luck, Tony
2015-06-05  1:53     ` Xishi Qiu
2015-06-09  6:48   ` Kamezawa Hiroyuki
2015-06-04 12:58 ` [RFC PATCH 03/12] mm: introduce MIGRATE_MIRROR to manage the mirrored, pages Xishi Qiu
2015-06-09  6:54   ` Kamezawa Hiroyuki
2015-06-04 12:59 ` [RFC PATCH 04/12] mm: add mirrored pages to buddy system Xishi Qiu
2015-06-04 13:00 ` [RFC PATCH 05/12] mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES Xishi Qiu
2015-06-04 13:01 ` [RFC PATCH 06/12] mm: add free mirrored pages info Xishi Qiu
2015-06-04 13:02 ` [RFC PATCH 07/12] mm: introduce __GFP_MIRROR to allocate mirrored pages Xishi Qiu
2015-06-09  7:01   ` Kamezawa Hiroyuki
2015-06-04 13:02 ` [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory Xishi Qiu
2015-06-04 17:01   ` Luck, Tony
2015-06-04 18:41   ` Dave Hansen
2015-06-05  3:13     ` Xishi Qiu
2015-06-09  7:06   ` Kamezawa Hiroyuki
2015-06-09 10:09     ` Xishi Qiu
2015-06-10  3:09       ` Kamezawa Hiroyuki
2015-06-12  8:05   ` Naoya Horiguchi
2015-06-04 13:03 ` [RFC PATCH 09/12] mm: enable allocate mirrored memory at boot time Xishi Qiu
2015-06-04 13:04 ` [RFC PATCH 10/12] mm: add the buddy system interface Xishi Qiu
2015-06-04 17:09   ` Luck, Tony
2015-06-05  3:14     ` Xishi Qiu
2015-06-09  7:12   ` Kamezawa Hiroyuki
2015-06-09 10:04     ` Xishi Qiu
2015-06-10  3:06       ` Kamezawa Hiroyuki
2015-06-10 20:40         ` Luck, Tony
2015-06-15  8:47           ` Kamezawa Hiroyuki
2015-06-15 17:20             ` Luck, Tony
2015-06-16  0:31               ` Kamezawa Hiroyuki
2015-06-25  9:44         ` Xishi Qiu
2015-06-25 23:54           ` Kamezawa Hiroyuki
2015-06-26  1:43             ` Xishi Qiu
2015-06-26  8:34               ` Kamezawa Hiroyuki
2015-06-26 10:38                 ` Xishi Qiu
2015-06-26 18:42                   ` Luck, Tony
2015-06-04 13:04 ` [RFC PATCH 11/12] mm: add the PCP interface Xishi Qiu
2015-06-04 18:44   ` Dave Hansen
2015-06-04 13:05 ` [RFC PATCH 12/12] mm: let slab/slub/slob use mirrored memory Xishi Qiu
2015-06-04 17:14   ` Luck, Tony
2015-06-12  8:42 ` [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Naoya Horiguchi
2015-06-12  9:09   ` Xishi Qiu
2015-06-12 19:03   ` Luck, Tony
2015-06-15  0:25     ` Naoya Horiguchi
2015-06-16  7:53 ` Vlastimil Babka
2015-06-16  8:17   ` Xishi Qiu
2015-06-16  9:46     ` Vlastimil Babka
2015-06-18  1:23       ` Xishi Qiu
2015-06-18  5:58         ` Vlastimil Babka
2015-06-18  9:37           ` Xishi Qiu
2015-06-18  9:55             ` Vlastimil Babka
2015-06-18 20:33               ` Luck, Tony
2015-06-19  1:36                 ` Xishi Qiu
2015-06-19 18:42                   ` Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).