linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/5] x86: Create direct mappings for E820_RAM only
@ 2012-08-24 23:55 Jacob Shin
  2012-08-24 23:55 ` [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping Jacob Shin
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Jacob Shin @ 2012-08-24 23:55 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, H. Peter Anvin, Yinghai Lu, Tejun Heo, Dave Young,
	Chao Wang, Vivek Goyal, Andreas Herrmann, Borislav Petkov,
	Jacob Shin

Currently kernel direct mappings are created for all pfns between
[ 0 to max_low_pfn ) and [ 4GB to max_pfn ). When we introduce memory
holes, we end up mapping memory ranges that are not backed by physical
DRAM. This is fine for lower memory addresses which can be marked as UC
by fixed/variable range MTRRs, however we run in to trouble with high
addresses.

The following patchset creates direct mappings only for E820_RAM regions
between 0 ~ max_low_pfn and 4GB ~ max_pfn. And leaves non-E820_RAM and
memory holes unmapped.

This fourth revision of the patchset attempts to resolve comments and
concerns from the following threads:

* https://lkml.org/lkml/2012/8/22/680
* https://lkml.org/lkml/2012/8/13/512
* https://lkml.org/lkml/2012/8/9/536
* https://lkml.org/lkml/2011/10/20/323

Jacob Shin (5):
  x86: Move enabling of PSE and PGE out of init_memory_mapping
  x86: find_early_table_space based on memory ranges that are being
    mapped
  x86: Only direct map addresses that are marked as E820_RAM
  x86: Fixup code testing if a pfn is direct mapped
  x86: if kernel .text .data .bss are not marked as E820_RAM, complain
    and fix

 arch/x86/include/asm/page_types.h |    9 +++
 arch/x86/kernel/cpu/amd.c         |    6 +-
 arch/x86/kernel/setup.c           |  130 ++++++++++++++++++++++++++++++++-----
 arch/x86/mm/init.c                |   74 ++++++++++-----------
 arch/x86/mm/init_64.c             |    6 +-
 arch/x86/platform/efi/efi.c       |    8 +--
 6 files changed, 167 insertions(+), 66 deletions(-)

-- 
1.7.9.5



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping
  2012-08-24 23:55 [PATCH V4 0/5] x86: Create direct mappings for E820_RAM only Jacob Shin
@ 2012-08-24 23:55 ` Jacob Shin
  2012-08-25  1:25   ` Yinghai Lu
  2012-08-24 23:55 ` [PATCH 2/5] x86: find_early_table_space based on memory ranges that are being mapped Jacob Shin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Jacob Shin @ 2012-08-24 23:55 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, H. Peter Anvin, Yinghai Lu, Tejun Heo, Dave Young,
	Chao Wang, Vivek Goyal, Andreas Herrmann, Borislav Petkov,
	Jacob Shin

Depending on the platform, init_memory_mapping() may be called multiple
times. Move it out to setup_arch() to avoid writing to cr4 on every call.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/setup.c |   10 ++++++++++
 arch/x86/mm/init.c      |   10 ----------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f4b9b80..751e020 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -913,6 +913,16 @@ void __init setup_arch(char **cmdline_p)
 
 	init_gbpages();
 
+	/* Enable PSE if available */
+	if (cpu_has_pse)
+		set_in_cr4(X86_CR4_PSE);
+
+	/* Enable PGE if available */
+	if (cpu_has_pge) {
+		set_in_cr4(X86_CR4_PGE);
+		__supported_pte_mask |= _PAGE_GLOBAL;
+	}
+
 	/* max_pfn_mapped is updated here */
 	max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
 	max_pfn_mapped = max_low_pfn_mapped;
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e0e6990..2f07e09 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -149,16 +149,6 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 	use_gbpages = direct_gbpages;
 #endif
 
-	/* Enable PSE if available */
-	if (cpu_has_pse)
-		set_in_cr4(X86_CR4_PSE);
-
-	/* Enable PGE if available */
-	if (cpu_has_pge) {
-		set_in_cr4(X86_CR4_PGE);
-		__supported_pte_mask |= _PAGE_GLOBAL;
-	}
-
 	if (use_gbpages)
 		page_size_mask |= 1 << PG_LEVEL_1G;
 	if (use_pse)
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/5] x86: find_early_table_space based on memory ranges that are being mapped
  2012-08-24 23:55 [PATCH V4 0/5] x86: Create direct mappings for E820_RAM only Jacob Shin
  2012-08-24 23:55 ` [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping Jacob Shin
@ 2012-08-24 23:55 ` Jacob Shin
  2012-10-21 21:22   ` Tom Rini
  2012-08-24 23:55 ` [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM Jacob Shin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Jacob Shin @ 2012-08-24 23:55 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, H. Peter Anvin, Yinghai Lu, Tejun Heo, Dave Young,
	Chao Wang, Vivek Goyal, Andreas Herrmann, Borislav Petkov,
	Jacob Shin

Current logic finds enough space for direct mapping page tables from 0
to end. Instead, we only need to find enough space to cover mr[0].start
to mr[nr_range].end -- the range that is actually being mapped by
init_memory_mapping()

This patch also reportedly fixes suspend/resume issue reported in:

https://lkml.org/lkml/2012/8/11/83

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/mm/init.c |   62 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 35 insertions(+), 27 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 2f07e09..e2b21e0 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -35,40 +35,48 @@ struct map_range {
 	unsigned page_size_mask;
 };
 
-static void __init find_early_table_space(struct map_range *mr, unsigned long end,
-					  int use_pse, int use_gbpages)
+/*
+ * First calculate space needed for kernel direct mapping page tables to cover
+ * mr[0].start to mr[nr_range - 1].end, while accounting for possible 2M and 1GB
+ * pages. Then find enough contiguous space for those page tables.
+ */
+static void __init find_early_table_space(struct map_range *mr, int nr_range)
 {
-	unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
+	int i;
+	unsigned long puds = 0, pmds = 0, ptes = 0, tables;
+	unsigned long start = 0, good_end;
 	phys_addr_t base;
 
-	puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
-	tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
-
-	if (use_gbpages) {
-		unsigned long extra;
-
-		extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
-		pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;
-	} else
-		pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
+	for (i = 0; i < nr_range; i++) {
+		unsigned long range, extra;
 
-	tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
+		range = mr[i].end - mr[i].start;
+		puds += (range + PUD_SIZE - 1) >> PUD_SHIFT;
 
-	if (use_pse) {
-		unsigned long extra;
+		if (mr[i].page_size_mask & (1 << PG_LEVEL_1G)) {
+			extra = range - ((range >> PUD_SHIFT) << PUD_SHIFT);
+			pmds += (extra + PMD_SIZE - 1) >> PMD_SHIFT;
+		} else {
+			pmds += (range + PMD_SIZE - 1) >> PMD_SHIFT;
+		}
 
-		extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
+		if (mr[i].page_size_mask & (1 << PG_LEVEL_2M)) {
+			extra = range - ((range >> PMD_SHIFT) << PMD_SHIFT);
 #ifdef CONFIG_X86_32
-		extra += PMD_SIZE;
+			extra += PMD_SIZE;
 #endif
-		/* The first 2/4M doesn't use large pages. */
-		if (mr->start < PMD_SIZE)
-			extra += mr->end - mr->start;
-
-		ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	} else
-		ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
+			/* The first 2/4M doesn't use large pages. */
+			if (mr[i].start < PMD_SIZE)
+				extra += range;
+
+			ptes += (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		} else {
+			ptes += (range + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		}
+	}
 
+	tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
+	tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
 	tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);
 
 #ifdef CONFIG_X86_32
@@ -86,7 +94,7 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
 	pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);
 
 	printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx]\n",
-		end - 1, pgt_buf_start << PAGE_SHIFT,
+		mr[nr_range - 1].end - 1, pgt_buf_start << PAGE_SHIFT,
 		(pgt_buf_top << PAGE_SHIFT) - 1);
 }
 
@@ -257,7 +265,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 	 * nodes are discovered.
 	 */
 	if (!after_bootmem)
-		find_early_table_space(&mr[0], end, use_pse, use_gbpages);
+		find_early_table_space(mr, nr_range);
 
 	for (i = 0; i < nr_range; i++)
 		ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-24 23:55 [PATCH V4 0/5] x86: Create direct mappings for E820_RAM only Jacob Shin
  2012-08-24 23:55 ` [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping Jacob Shin
  2012-08-24 23:55 ` [PATCH 2/5] x86: find_early_table_space based on memory ranges that are being mapped Jacob Shin
@ 2012-08-24 23:55 ` Jacob Shin
  2012-08-25  0:17   ` Jacob Shin
                     ` (2 more replies)
  2012-08-24 23:55 ` [PATCH 4/5] x86: Fixup code testing if a pfn is direct mapped Jacob Shin
  2012-08-24 23:55 ` [PATCH 5/5] x86: if kernel .text .data .bss are not marked as E820_RAM, complain and fix Jacob Shin
  4 siblings, 3 replies; 29+ messages in thread
From: Jacob Shin @ 2012-08-24 23:55 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, H. Peter Anvin, Yinghai Lu, Tejun Heo, Dave Young,
	Chao Wang, Vivek Goyal, Andreas Herrmann, Borislav Petkov,
	Jacob Shin

Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
backed by actual DRAM. This is fine for holes under 4GB which are covered
by fixed and variable range MTRRs to be UC. However, we run into trouble
on higher memory addresses which cannot be covered by MTRRs.

Our system with 1TB of RAM has an e820 that looks like this:

 BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
 BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
 BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
 BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
 BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
 BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
 BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
 BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
 BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
 BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
 BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
 BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
 BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable

and so direct mappings are created for huge memory hole between
0x000000e038000000 to 0x0000010000000000. Even though the kernel never
generates memory accesses in that region, since the page tables mark
them incorrectly as being WB, our (AMD) processor ends up causing a MCE
while doing some memory bookkeeping/optimizations around that area.

This patch iterates through e820 and only direct maps ranges that are
marked as E820_RAM, and keeps track of those pfn ranges. Depending on
the alignment of E820 ranges, this may possibly result in using smaller
size (i.e. 4K instead of 2M or 1G) page tables.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/include/asm/page_types.h |    9 +++
 arch/x86/kernel/setup.c           |  125 +++++++++++++++++++++++++++++--------
 arch/x86/mm/init.c                |    2 +
 arch/x86/mm/init_64.c             |    6 +-
 4 files changed, 112 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index e21fdd1..409047a 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -3,6 +3,7 @@
 
 #include <linux/const.h>
 #include <linux/types.h>
+#include <asm/e820.h>
 
 /* PAGE_SHIFT determines the page size */
 #define PAGE_SHIFT	12
@@ -40,12 +41,20 @@
 #endif	/* CONFIG_X86_64 */
 
 #ifndef __ASSEMBLY__
+#include <linux/range.h>
 
 extern int devmem_is_allowed(unsigned long pagenr);
 
 extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
+extern struct range pfn_mapped[E820_X_MAX];
+extern int nr_pfn_mapped;
+
+extern void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn);
+extern bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
+extern bool pfn_is_mapped(unsigned long pfn);
+
 static inline phys_addr_t get_max_mapped(void)
 {
 	return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 751e020..4217fb4 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -115,13 +115,46 @@
 #include <asm/prom.h>
 
 /*
- * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
- * The direct mapping extends to max_pfn_mapped, so that we can directly access
- * apertures, ACPI and other tables without having to play with fixmaps.
+ * max_low_pfn_mapped: highest direct mapped pfn under 4GB
+ * max_pfn_mapped:     highest direct mapped pfn over 4GB
+ *
+ * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
+ * represented by pfn_mapped
  */
 unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
+struct range pfn_mapped[E820_X_MAX];
+int nr_pfn_mapped;
+
+void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
+{
+	nr_pfn_mapped = add_range_with_merge(pfn_mapped, E820_X_MAX,
+					     nr_pfn_mapped, start_pfn, end_pfn);
+
+	max_pfn_mapped = max(max_pfn_mapped, end_pfn);
+
+	if (end_pfn <= (1UL << (32 - PAGE_SHIFT)))
+		max_low_pfn_mapped = max(max_low_pfn_mapped, end_pfn);
+}
+
+bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
+{
+	int i;
+
+	for (i = 0; i < nr_pfn_mapped; i++)
+		if ((start_pfn >= pfn_mapped[i].start) &&
+		    (end_pfn <= pfn_mapped[i].end))
+			return true;
+
+	return false;
+}
+
+bool pfn_is_mapped(unsigned long pfn)
+{
+	return pfn_range_is_mapped(pfn, pfn + 1);
+}
+
 #ifdef CONFIG_DMI
 RESERVE_BRK(dmi_alloc, 65536);
 #endif
@@ -296,6 +329,68 @@ static void __init cleanup_highmap(void)
 }
 #endif
 
+/*
+ * Iterate through E820 memory map and create direct mappings for only E820_RAM
+ * regions. We cannot simply create direct mappings for all pfns from
+ * [0 to max_low_pfn) and [4GB to max_pfn) because of possible memory holes in
+ * high addresses that cannot be marked as UC by fixed/variable range MTRRs.
+ * Depending on the alignment of E820 ranges, this may possibly result in using
+ * smaller size (i.e. 4K instead of 2M or 1G) page tables.
+ */
+static void __init init_memory(void)
+{
+	int i;
+
+	init_gbpages();
+
+	/* Enable PSE if available */
+	if (cpu_has_pse)
+		set_in_cr4(X86_CR4_PSE);
+
+	/* Enable PGE if available */
+	if (cpu_has_pge) {
+		set_in_cr4(X86_CR4_PGE);
+		__supported_pte_mask |= _PAGE_GLOBAL;
+	}
+
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		u64 start = ei->addr;
+		u64 end = ei->addr + ei->size;
+
+		/* we only map E820_RAM */
+		if (ei->type != E820_RAM)
+			continue;
+
+		if (end <= ISA_END_ADDRESS)
+			continue;
+
+		if (start <= ISA_END_ADDRESS)
+			start = 0;
+#ifdef CONFIG_X86_32
+		/* on 32 bit, we only map up to max_low_pfn */
+		if ((start >> PAGE_SHIFT) >= max_low_pfn)
+			continue;
+
+		if ((end >> PAGE_SHIFT) > max_low_pfn)
+			end = max_low_pfn << PAGE_SHIFT;
+#endif
+		/* the ISA range is always mapped regardless of holes */
+		if (!pfn_range_is_mapped(0, ISA_END_ADDRESS << PAGE_SHIFT) &&
+		    start != 0)
+			init_memory_mapping(0, ISA_END_ADDRESS);
+
+		init_memory_mapping(start, end);
+	}
+
+#ifdef CONFIG_X86_64
+	if (max_pfn > max_low_pfn) {
+		/* can we preseve max_low_pfn ?*/
+		max_low_pfn = max_pfn;
+	}
+#endif
+}
+
 static void __init reserve_brk(void)
 {
 	if (_brk_end > _brk_start)
@@ -911,30 +1006,8 @@ void __init setup_arch(char **cmdline_p)
 
 	setup_real_mode();
 
-	init_gbpages();
-
-	/* Enable PSE if available */
-	if (cpu_has_pse)
-		set_in_cr4(X86_CR4_PSE);
+	init_memory();
 
-	/* Enable PGE if available */
-	if (cpu_has_pge) {
-		set_in_cr4(X86_CR4_PGE);
-		__supported_pte_mask |= _PAGE_GLOBAL;
-	}
-
-	/* max_pfn_mapped is updated here */
-	max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
-	max_pfn_mapped = max_low_pfn_mapped;
-
-#ifdef CONFIG_X86_64
-	if (max_pfn > max_low_pfn) {
-		max_pfn_mapped = init_memory_mapping(1UL<<32,
-						     max_pfn<<PAGE_SHIFT);
-		/* can we preseve max_low_pfn ?*/
-		max_low_pfn = max_pfn;
-	}
-#endif
 	memblock.current_limit = get_max_mapped();
 	dma_contiguous_reserve(0);
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e2b21e0..d7b24da 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -301,6 +301,8 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 	if (!after_bootmem)
 		early_memtest(start, end);
 
+	add_pfn_range_mapped(start >> PAGE_SHIFT, ret >> PAGE_SHIFT);
+
 	return ret >> PAGE_SHIFT;
 }
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 2b6b4a3..ab558eb 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -657,13 +657,11 @@ int arch_add_memory(int nid, u64 start, u64 size)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
 	struct zone *zone = pgdat->node_zones + ZONE_NORMAL;
-	unsigned long last_mapped_pfn, start_pfn = start >> PAGE_SHIFT;
+	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	last_mapped_pfn = init_memory_mapping(start, start + size);
-	if (last_mapped_pfn > max_pfn_mapped)
-		max_pfn_mapped = last_mapped_pfn;
+	init_memory_mapping(start, start + size);
 
 	ret = __add_pages(nid, zone, start_pfn, nr_pages);
 	WARN_ON_ONCE(ret);
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/5] x86: Fixup code testing if a pfn is direct mapped
  2012-08-24 23:55 [PATCH V4 0/5] x86: Create direct mappings for E820_RAM only Jacob Shin
                   ` (2 preceding siblings ...)
  2012-08-24 23:55 ` [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM Jacob Shin
@ 2012-08-24 23:55 ` Jacob Shin
  2012-08-24 23:55 ` [PATCH 5/5] x86: if kernel .text .data .bss are not marked as E820_RAM, complain and fix Jacob Shin
  4 siblings, 0 replies; 29+ messages in thread
From: Jacob Shin @ 2012-08-24 23:55 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, H. Peter Anvin, Yinghai Lu, Tejun Heo, Dave Young,
	Chao Wang, Vivek Goyal, Andreas Herrmann, Borislav Petkov,
	Jacob Shin

Update code that previously assumed pfns [ 0 - max_low_pfn_mapped ) and
[ 4GB - max_pfn_mapped ) were always direct mapped, to now look up
pfn_mapped ranges instead.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/cpu/amd.c   |    6 +-----
 arch/x86/platform/efi/efi.c |    8 ++++----
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 9d92e19..554ccfc 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -677,11 +677,7 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
 		 */
 		if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) {
 			printk(KERN_DEBUG "tseg: %010llx\n", tseg);
-			if ((tseg>>PMD_SHIFT) <
-				(max_low_pfn_mapped>>(PMD_SHIFT-PAGE_SHIFT)) ||
-				((tseg>>PMD_SHIFT) <
-				(max_pfn_mapped>>(PMD_SHIFT-PAGE_SHIFT)) &&
-				(tseg>>PMD_SHIFT) >= (1ULL<<(32 - PMD_SHIFT))))
+			if (pfn_is_mapped(tseg))
 				set_memory_4k((unsigned long)__va(tseg), 1);
 		}
 	}
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 92660eda..f1facde 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -776,7 +776,7 @@ void __init efi_enter_virtual_mode(void)
 	efi_memory_desc_t *md, *prev_md = NULL;
 	efi_status_t status;
 	unsigned long size;
-	u64 end, systab, addr, npages, end_pfn;
+	u64 end, systab, addr, npages, start_pfn, end_pfn;
 	void *p, *va, *new_memmap = NULL;
 	int count = 0;
 
@@ -827,10 +827,10 @@ void __init efi_enter_virtual_mode(void)
 		size = md->num_pages << EFI_PAGE_SHIFT;
 		end = md->phys_addr + size;
 
+		start_pfn = PFN_DOWN(md->phys_addr);
 		end_pfn = PFN_UP(end);
-		if (end_pfn <= max_low_pfn_mapped
-		    || (end_pfn > (1UL << (32 - PAGE_SHIFT))
-			&& end_pfn <= max_pfn_mapped))
+
+		if (pfn_range_is_mapped(start_pfn, end_pfn))
 			va = __va(md->phys_addr);
 		else
 			va = efi_ioremap(md->phys_addr, size, md->type);
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/5] x86: if kernel .text .data .bss are not marked as E820_RAM, complain and fix
  2012-08-24 23:55 [PATCH V4 0/5] x86: Create direct mappings for E820_RAM only Jacob Shin
                   ` (3 preceding siblings ...)
  2012-08-24 23:55 ` [PATCH 4/5] x86: Fixup code testing if a pfn is direct mapped Jacob Shin
@ 2012-08-24 23:55 ` Jacob Shin
  2012-08-25  1:23   ` Yinghai Lu
  4 siblings, 1 reply; 29+ messages in thread
From: Jacob Shin @ 2012-08-24 23:55 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, H. Peter Anvin, Yinghai Lu, Tejun Heo, Dave Young,
	Chao Wang, Vivek Goyal, Andreas Herrmann, Borislav Petkov,
	Jacob Shin

There could be cases where user supplied memmap=exactmap memory
mappings do not mark the region where the kernel .text .data and
.bss reside as E820_RAM as reported here:

https://lkml.org/lkml/2012/8/14/86

Handle it by complaining, and adding the range back into the e820.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/setup.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 4217fb4..b84aceb5 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -926,6 +926,21 @@ void __init setup_arch(char **cmdline_p)
 	insert_resource(&iomem_resource, &data_resource);
 	insert_resource(&iomem_resource, &bss_resource);
 
+	/*
+	 * Complain if .text .data and .bss are not marked as E820_RAM and
+	 * attempt to fix it by adding the range. We may have a confused BIOS,
+	 * or the user may have incorrectly supplied it via memmap=exactmap. If
+	 * we really are running on top non-RAM, we will crash later anyways.
+	 */
+	if (!e820_all_mapped(code_resource.start, bss_resource.end, E820_RAM)) {
+		pr_warn(".text .data .bss are not marked as E820_RAM!\n");
+
+		e820_add_region(code_resource.start,
+				bss_resource.end - code_resource.start + 1,
+				E820_RAM);
+		sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
+	}
+
 	trim_bios_range();
 #ifdef CONFIG_X86_32
 	if (ppro_with_ram_bug()) {
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-24 23:55 ` [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM Jacob Shin
@ 2012-08-25  0:17   ` Jacob Shin
  2012-08-25  0:30   ` H. Peter Anvin
  2012-08-25  1:07   ` Yinghai Lu
  2 siblings, 0 replies; 29+ messages in thread
From: Jacob Shin @ 2012-08-25  0:17 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, H. Peter Anvin, Yinghai Lu, Tejun Heo, Dave Young,
	Chao Wang, Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 06:55:14PM -0500, Jacob Shin wrote:
> Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
> and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
> backed by actual DRAM. This is fine for holes under 4GB which are covered
> by fixed and variable range MTRRs to be UC. However, we run into trouble
> on higher memory addresses which cannot be covered by MTRRs.
> 
> Our system with 1TB of RAM has an e820 that looks like this:
> 
>  BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
>  BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
>  BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
>  BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
>  BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
>  BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
>  BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
>  BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
>  BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
>  BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
>  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
>  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
>  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> 
> and so direct mappings are created for huge memory hole between
> 0x000000e038000000 to 0x0000010000000000. Even though the kernel never
> generates memory accesses in that region, since the page tables mark
> them incorrectly as being WB, our (AMD) processor ends up causing a MCE
> while doing some memory bookkeeping/optimizations around that area.
> 
> This patch iterates through e820 and only direct maps ranges that are
> marked as E820_RAM, and keeps track of those pfn ranges. Depending on
> the alignment of E820 ranges, this may possibly result in using smaller
> size (i.e. 4K instead of 2M or 1G) page tables.
> 
> Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> ---
>  arch/x86/include/asm/page_types.h |    9 +++
>  arch/x86/kernel/setup.c           |  125 +++++++++++++++++++++++++++++--------
>  arch/x86/mm/init.c                |    2 +
>  arch/x86/mm/init_64.c             |    6 +-
>  4 files changed, 112 insertions(+), 30 deletions(-)

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 751e020..4217fb4 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -115,13 +115,46 @@
>  #include <asm/prom.h>
>  
>  /*
> - * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
> - * The direct mapping extends to max_pfn_mapped, so that we can directly access
> - * apertures, ACPI and other tables without having to play with fixmaps.
> + * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> + * max_pfn_mapped:     highest direct mapped pfn over 4GB
> + *
> + * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
> + * represented by pfn_mapped
>   */
>  unsigned long max_low_pfn_mapped;
>  unsigned long max_pfn_mapped;
>  
> +struct range pfn_mapped[E820_X_MAX];
> +int nr_pfn_mapped;
> +
> +void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	nr_pfn_mapped = add_range_with_merge(pfn_mapped, E820_X_MAX,
> +					     nr_pfn_mapped, start_pfn, end_pfn);
> +
> +	max_pfn_mapped = max(max_pfn_mapped, end_pfn);
> +
> +	if (end_pfn <= (1UL << (32 - PAGE_SHIFT)))
> +		max_low_pfn_mapped = max(max_low_pfn_mapped, end_pfn);
> +}
> +
> +bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	int i;
> +
> +	for (i = 0; i < nr_pfn_mapped; i++)
> +		if ((start_pfn >= pfn_mapped[i].start) &&
> +		    (end_pfn <= pfn_mapped[i].end))
> +			return true;
> +
> +	return false;
> +}
> +
> +bool pfn_is_mapped(unsigned long pfn)
> +{
> +	return pfn_range_is_mapped(pfn, pfn + 1);
> +}
> +
>  #ifdef CONFIG_DMI
>  RESERVE_BRK(dmi_alloc, 65536);
>  #endif
> @@ -296,6 +329,68 @@ static void __init cleanup_highmap(void)
>  }
>  #endif
>  
> +/*
> + * Iterate through E820 memory map and create direct mappings for only E820_RAM
> + * regions. We cannot simply create direct mappings for all pfns from
> + * [0 to max_low_pfn) and [4GB to max_pfn) because of possible memory holes in
> + * high addresses that cannot be marked as UC by fixed/variable range MTRRs.
> + * Depending on the alignment of E820 ranges, this may possibly result in using
> + * smaller size (i.e. 4K instead of 2M or 1G) page tables.
> + */
> +static void __init init_memory(void)
> +{
> +	int i;
> +
> +	init_gbpages();
> +
> +	/* Enable PSE if available */
> +	if (cpu_has_pse)
> +		set_in_cr4(X86_CR4_PSE);
> +
> +	/* Enable PGE if available */
> +	if (cpu_has_pge) {
> +		set_in_cr4(X86_CR4_PGE);
> +		__supported_pte_mask |= _PAGE_GLOBAL;
> +	}
> +
> +	for (i = 0; i < e820.nr_map; i++) {
> +		struct e820entry *ei = &e820.map[i];
> +		u64 start = ei->addr;
> +		u64 end = ei->addr + ei->size;
> +
> +		/* we only map E820_RAM */
> +		if (ei->type != E820_RAM)
> +			continue;
> +
> +		if (end <= ISA_END_ADDRESS)
> +			continue;
> +
> +		if (start <= ISA_END_ADDRESS)
> +			start = 0;
> +#ifdef CONFIG_X86_32
> +		/* on 32 bit, we only map up to max_low_pfn */
> +		if ((start >> PAGE_SHIFT) >= max_low_pfn)
> +			continue;
> +
> +		if ((end >> PAGE_SHIFT) > max_low_pfn)
> +			end = max_low_pfn << PAGE_SHIFT;
> +#endif
> +		/* the ISA range is always mapped regardless of holes */
> +		if (!pfn_range_is_mapped(0, ISA_END_ADDRESS << PAGE_SHIFT) &&

Darn, there is a typo here, should be '>>' not '<<', so sorry about that,
I'll resend in the bit .. my local testing didn't catch that because '<<'
caused the value to be 0.

> +		    start != 0)
> +			init_memory_mapping(0, ISA_END_ADDRESS);
> +
> +		init_memory_mapping(start, end);
> +	}
> +
> +#ifdef CONFIG_X86_64
> +	if (max_pfn > max_low_pfn) {
> +		/* can we preseve max_low_pfn ?*/
> +		max_low_pfn = max_pfn;
> +	}
> +#endif
> +}
> +
>  static void __init reserve_brk(void)
>  {
>  	if (_brk_end > _brk_start)
> @@ -911,30 +1006,8 @@ void __init setup_arch(char **cmdline_p)
>  
>  	setup_real_mode();
>  
> -	init_gbpages();
> -
> -	/* Enable PSE if available */
> -	if (cpu_has_pse)
> -		set_in_cr4(X86_CR4_PSE);
> +	init_memory();
>  
> -	/* Enable PGE if available */
> -	if (cpu_has_pge) {
> -		set_in_cr4(X86_CR4_PGE);
> -		__supported_pte_mask |= _PAGE_GLOBAL;
> -	}
> -
> -	/* max_pfn_mapped is updated here */
> -	max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
> -	max_pfn_mapped = max_low_pfn_mapped;
> -
> -#ifdef CONFIG_X86_64
> -	if (max_pfn > max_low_pfn) {
> -		max_pfn_mapped = init_memory_mapping(1UL<<32,
> -						     max_pfn<<PAGE_SHIFT);
> -		/* can we preseve max_low_pfn ?*/
> -		max_low_pfn = max_pfn;
> -	}
> -#endif
>  	memblock.current_limit = get_max_mapped();
>  	dma_contiguous_reserve(0);
>  


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-24 23:55 ` [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM Jacob Shin
  2012-08-25  0:17   ` Jacob Shin
@ 2012-08-25  0:30   ` H. Peter Anvin
  2012-08-25  0:49     ` Jacob Shin
  2012-08-25  1:07   ` Yinghai Lu
  2 siblings, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2012-08-25  0:30 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, Yinghai Lu, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On 08/24/2012 04:55 PM, Jacob Shin wrote:
> +
> +	for (i = 0; i < e820.nr_map; i++) {
> +		struct e820entry *ei = &e820.map[i];
> +		u64 start = ei->addr;
> +		u64 end = ei->addr + ei->size;
> +
> +		/* we only map E820_RAM */
> +		if (ei->type != E820_RAM)
> +			continue;
> +
> +		if (end <= ISA_END_ADDRESS)
> +			continue;
> +
> +		if (start <= ISA_END_ADDRESS)
> +			start = 0;
> +#ifdef CONFIG_X86_32
> +		/* on 32 bit, we only map up to max_low_pfn */
> +		if ((start >> PAGE_SHIFT) >= max_low_pfn)
> +			continue;
> +
> +		if ((end >> PAGE_SHIFT) > max_low_pfn)
> +			end = max_low_pfn << PAGE_SHIFT;
> +#endif
> +		/* the ISA range is always mapped regardless of holes */
> +		if (!pfn_range_is_mapped(0, ISA_END_ADDRESS << PAGE_SHIFT) &&
> +		    start != 0)
> +			init_memory_mapping(0, ISA_END_ADDRESS);
> +
> +		init_memory_mapping(start, end);
> +	}
> +

The ISA range mapping doesn't really make sense *inside* the loop, no? 
It seems you could do that before you enter the loop and then simply have:

+		if (end <= ISA_END_ADDRESS)
+			continue;
+
+		if (start <= ISA_END_ADDRESS)
+			start = ISA_END_ADDRESS;

... no?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-25  0:30   ` H. Peter Anvin
@ 2012-08-25  0:49     ` Jacob Shin
  2012-08-25  1:13       ` H. Peter Anvin
  0 siblings, 1 reply; 29+ messages in thread
From: Jacob Shin @ 2012-08-25  0:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: X86-ML, LKML, Yinghai Lu, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 05:30:21PM -0700, H. Peter Anvin wrote:
> On 08/24/2012 04:55 PM, Jacob Shin wrote:
> >+
> >+	for (i = 0; i < e820.nr_map; i++) {
> >+		struct e820entry *ei = &e820.map[i];
> >+		u64 start = ei->addr;
> >+		u64 end = ei->addr + ei->size;
> >+
> >+		/* we only map E820_RAM */
> >+		if (ei->type != E820_RAM)
> >+			continue;
> >+
> >+		if (end <= ISA_END_ADDRESS)
> >+			continue;
> >+
> >+		if (start <= ISA_END_ADDRESS)
> >+			start = 0;
> >+#ifdef CONFIG_X86_32
> >+		/* on 32 bit, we only map up to max_low_pfn */
> >+		if ((start >> PAGE_SHIFT) >= max_low_pfn)
> >+			continue;
> >+
> >+		if ((end >> PAGE_SHIFT) > max_low_pfn)
> >+			end = max_low_pfn << PAGE_SHIFT;
> >+#endif
> >+		/* the ISA range is always mapped regardless of holes */
> >+		if (!pfn_range_is_mapped(0, ISA_END_ADDRESS << PAGE_SHIFT) &&
> >+		    start != 0)
> >+			init_memory_mapping(0, ISA_END_ADDRESS);
> >+
> >+		init_memory_mapping(start, end);
> >+	}
> >+
> 
> The ISA range mapping doesn't really make sense *inside* the loop,
> no? It seems you could do that before you enter the loop and then
> simply have:
> 
> +		if (end <= ISA_END_ADDRESS)
> +			continue;
> +
> +		if (start <= ISA_END_ADDRESS)
> +			start = ISA_END_ADDRESS;
> 
> ... no?

Right, I think what I was attempting to do was to merge the 1MB
with E820_RAM right above 1MB:

So instead of:

init_memory_mapping(0, 1MB)
init_memory_mapping(1MB, 2GB)

It would be:

init_memory_mapping(0, 2GB)

While taking care of the odd case where there is a gap right after
1MB.

But if its not worth it, I can move it out of the loop.

> 
> 	-hpa
> 
> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-24 23:55 ` [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM Jacob Shin
  2012-08-25  0:17   ` Jacob Shin
  2012-08-25  0:30   ` H. Peter Anvin
@ 2012-08-25  1:07   ` Yinghai Lu
  2012-08-25  4:24     ` Jacob Shin
  2 siblings, 1 reply; 29+ messages in thread
From: Yinghai Lu @ 2012-08-25  1:07 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, H. Peter Anvin, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
> and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
> backed by actual DRAM. This is fine for holes under 4GB which are covered
> by fixed and variable range MTRRs to be UC. However, we run into trouble
> on higher memory addresses which cannot be covered by MTRRs.
>
> Our system with 1TB of RAM has an e820 that looks like this:
>
>  BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
>  BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
>  BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
>  BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
>  BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
>  BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
>  BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
>  BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
>  BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
>  BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
>  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
>  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
>  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
>
> and so direct mappings are created for huge memory hole between
> 0x000000e038000000 to 0x0000010000000000. Even though the kernel never
> generates memory accesses in that region, since the page tables mark
> them incorrectly as being WB, our (AMD) processor ends up causing a MCE
> while doing some memory bookkeeping/optimizations around that area.
>
> This patch iterates through e820 and only direct maps ranges that are
> marked as E820_RAM, and keeps track of those pfn ranges. Depending on
> the alignment of E820 ranges, this may possibly result in using smaller
> size (i.e. 4K instead of 2M or 1G) page tables.
>
> Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> ---
>  arch/x86/include/asm/page_types.h |    9 +++
>  arch/x86/kernel/setup.c           |  125 +++++++++++++++++++++++++++++--------
>  arch/x86/mm/init.c                |    2 +
>  arch/x86/mm/init_64.c             |    6 +-
>  4 files changed, 112 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
> index e21fdd1..409047a 100644
> --- a/arch/x86/include/asm/page_types.h
> +++ b/arch/x86/include/asm/page_types.h
> @@ -3,6 +3,7 @@
>
>  #include <linux/const.h>
>  #include <linux/types.h>
> +#include <asm/e820.h>
>
>  /* PAGE_SHIFT determines the page size */
>  #define PAGE_SHIFT     12
> @@ -40,12 +41,20 @@
>  #endif /* CONFIG_X86_64 */
>
>  #ifndef __ASSEMBLY__
> +#include <linux/range.h>
>
>  extern int devmem_is_allowed(unsigned long pagenr);
>
>  extern unsigned long max_low_pfn_mapped;
>  extern unsigned long max_pfn_mapped;
>
> +extern struct range pfn_mapped[E820_X_MAX];
> +extern int nr_pfn_mapped;
> +
> +extern void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn);
> +extern bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
> +extern bool pfn_is_mapped(unsigned long pfn);
> +
>  static inline phys_addr_t get_max_mapped(void)
>  {
>         return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 751e020..4217fb4 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -115,13 +115,46 @@
>  #include <asm/prom.h>
>
>  /*
> - * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
> - * The direct mapping extends to max_pfn_mapped, so that we can directly access
> - * apertures, ACPI and other tables without having to play with fixmaps.
> + * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> + * max_pfn_mapped:     highest direct mapped pfn over 4GB
> + *
> + * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
> + * represented by pfn_mapped
>   */
>  unsigned long max_low_pfn_mapped;
>  unsigned long max_pfn_mapped;
>
> +struct range pfn_mapped[E820_X_MAX];
> +int nr_pfn_mapped;
> +
> +void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +       nr_pfn_mapped = add_range_with_merge(pfn_mapped, E820_X_MAX,
> +                                            nr_pfn_mapped, start_pfn, end_pfn);
> +
> +       max_pfn_mapped = max(max_pfn_mapped, end_pfn);
> +
> +       if (end_pfn <= (1UL << (32 - PAGE_SHIFT)))
> +               max_low_pfn_mapped = max(max_low_pfn_mapped, end_pfn);
> +}
> +
> +bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +       int i;
> +
> +       for (i = 0; i < nr_pfn_mapped; i++)
> +               if ((start_pfn >= pfn_mapped[i].start) &&
> +                   (end_pfn <= pfn_mapped[i].end))
> +                       return true;
> +
> +       return false;
> +}
> +
> +bool pfn_is_mapped(unsigned long pfn)
> +{
> +       return pfn_range_is_mapped(pfn, pfn + 1);
> +}
> +

looks like you could avoid add pfn_mapped[] array.

pfn_range_is_mapped() should be
check max_low_pfn_mapped, max_pfn_mapped with
e820_all_mapped(start, end, E820_RAM).

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-25  0:49     ` Jacob Shin
@ 2012-08-25  1:13       ` H. Peter Anvin
  2012-08-25  4:20         ` Jacob Shin
  0 siblings, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2012-08-25  1:13 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, Yinghai Lu, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On 08/24/2012 05:49 PM, Jacob Shin wrote:
> 
> Right, I think what I was attempting to do was to merge the 1MB
> with E820_RAM right above 1MB:
> 
> So instead of:
> 
> init_memory_mapping(0, 1MB)
> init_memory_mapping(1MB, 2GB)
> 
> It would be:
> 
> init_memory_mapping(0, 2GB)
> 
> While taking care of the odd case where there is a gap right after
> 1MB.
> 
> But if its not worth it, I can move it out of the loop.
> 

What is the benefit?

	-hpa


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/5] x86: if kernel .text .data .bss are not marked as E820_RAM, complain and fix
  2012-08-24 23:55 ` [PATCH 5/5] x86: if kernel .text .data .bss are not marked as E820_RAM, complain and fix Jacob Shin
@ 2012-08-25  1:23   ` Yinghai Lu
  2012-08-25  4:25     ` Jacob Shin
  0 siblings, 1 reply; 29+ messages in thread
From: Yinghai Lu @ 2012-08-25  1:23 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, H. Peter Anvin, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> There could be cases where user supplied memmap=exactmap memory
> mappings do not mark the region where the kernel .text .data and
> .bss reside as E820_RAM as reported here:
>
> https://lkml.org/lkml/2012/8/14/86
>
> Handle it by complaining, and adding the range back into the e820.
>
> Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> ---
>  arch/x86/kernel/setup.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 4217fb4..b84aceb5 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -926,6 +926,21 @@ void __init setup_arch(char **cmdline_p)
>         insert_resource(&iomem_resource, &data_resource);
>         insert_resource(&iomem_resource, &bss_resource);
>
> +       /*
> +        * Complain if .text .data and .bss are not marked as E820_RAM and
> +        * attempt to fix it by adding the range. We may have a confused BIOS,
> +        * or the user may have incorrectly supplied it via memmap=exactmap. If
> +        * we really are running on top non-RAM, we will crash later anyways.
> +        */
> +       if (!e820_all_mapped(code_resource.start, bss_resource.end, E820_RAM)) {
> +               pr_warn(".text .data .bss are not marked as E820_RAM!\n");
> +
> +               e820_add_region(code_resource.start,
> +                               bss_resource.end - code_resource.start + 1,
> +                               E820_RAM);
> +               sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);

           this sanitze_e820_map could be spared. trim_bios_range will
that always.

> +       }
> +
>         trim_bios_range();
>  #ifdef CONFIG_X86_32
>         if (ppro_with_ram_bug()) {

also should use brk_limit instead of bss_resource.end. aka need to
keep the map for brk area.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping
  2012-08-24 23:55 ` [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping Jacob Shin
@ 2012-08-25  1:25   ` Yinghai Lu
  2012-08-25  1:49     ` Yinghai Lu
  2012-08-25  4:13     ` Jacob Shin
  0 siblings, 2 replies; 29+ messages in thread
From: Yinghai Lu @ 2012-08-25  1:25 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, H. Peter Anvin, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> Depending on the platform, init_memory_mapping() may be called multiple
> times. Move it out to setup_arch() to avoid writing to cr4 on every call.
>
> Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> ---
>  arch/x86/kernel/setup.c |   10 ++++++++++
>  arch/x86/mm/init.c      |   10 ----------
>  2 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index f4b9b80..751e020 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -913,6 +913,16 @@ void __init setup_arch(char **cmdline_p)
>
>         init_gbpages();
>
> +       /* Enable PSE if available */
> +       if (cpu_has_pse)
> +               set_in_cr4(X86_CR4_PSE);
> +
> +       /* Enable PGE if available */
> +       if (cpu_has_pge) {
> +               set_in_cr4(X86_CR4_PGE);
> +               __supported_pte_mask |= _PAGE_GLOBAL;
> +       }
> +

please don't put it directly in setup_arch().

and another function.

Thanks

Yinghai

>         /* max_pfn_mapped is updated here */
>         max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
>         max_pfn_mapped = max_low_pfn_mapped;
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index e0e6990..2f07e09 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -149,16 +149,6 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
>         use_gbpages = direct_gbpages;
>  #endif
>
> -       /* Enable PSE if available */
> -       if (cpu_has_pse)
> -               set_in_cr4(X86_CR4_PSE);
> -
> -       /* Enable PGE if available */
> -       if (cpu_has_pge) {
> -               set_in_cr4(X86_CR4_PGE);
> -               __supported_pte_mask |= _PAGE_GLOBAL;
> -       }
> -
>         if (use_gbpages)
>                 page_size_mask |= 1 << PG_LEVEL_1G;
>         if (use_pse)
> --
> 1.7.9.5
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping
  2012-08-25  1:25   ` Yinghai Lu
@ 2012-08-25  1:49     ` Yinghai Lu
  2012-08-25  2:06       ` Yinghai Lu
  2012-08-25  4:13     ` Jacob Shin
  1 sibling, 1 reply; 29+ messages in thread
From: Yinghai Lu @ 2012-08-25  1:49 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, H. Peter Anvin, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]

On Fri, Aug 24, 2012 at 6:25 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
>> Depending on the platform, init_memory_mapping() may be called multiple
>> times. Move it out to setup_arch() to avoid writing to cr4 on every call.
>>
>> Signed-off-by: Jacob Shin <jacob.shin@amd.com>
>> ---
>>  arch/x86/kernel/setup.c |   10 ++++++++++
>>  arch/x86/mm/init.c      |   10 ----------
>>  2 files changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index f4b9b80..751e020 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -913,6 +913,16 @@ void __init setup_arch(char **cmdline_p)
>>
>>         init_gbpages();
>>
>> +       /* Enable PSE if available */
>> +       if (cpu_has_pse)
>> +               set_in_cr4(X86_CR4_PSE);
>> +
>> +       /* Enable PGE if available */
>> +       if (cpu_has_pge) {
>> +               set_in_cr4(X86_CR4_PGE);
>> +               __supported_pte_mask |= _PAGE_GLOBAL;
>> +       }
>> +
>
> please don't put it directly in setup_arch().
>
> and another function.
>

Jacob, hpa

can you use attached one to replace the first patch?

Thanks

Yinghai

[-- Attachment #2: get_page_size_mask.patch --]
[-- Type: application/octet-stream, Size: 4702 bytes --]

Subject: [PATCH 1/3] x86, mm:  Introduce global page_size_mask

Add probe_page_size_mask() to detect if need to use 1G or 2M.
and store them in page_size_mask.

Only probe them at first init_memory_mapping calling.
second and later init_memory_mapping() calling does not need probe again.
also we don't need to pass use_gbpages around.

Suggested-by: Ingo Molnar <mingo@elte.hu>
Signe-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/pgtable.h |    1 
 arch/x86/mm/init.c             |   70 +++++++++++++++++++++--------------------
 2 files changed, 37 insertions(+), 34 deletions(-)

Index: linux-2.6/arch/x86/include/asm/pgtable.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/pgtable.h
+++ linux-2.6/arch/x86/include/asm/pgtable.h
@@ -597,6 +597,7 @@ static inline int pgd_none(pgd_t pgd)
 #ifndef __ASSEMBLY__
 
 extern int direct_gbpages;
+void probe_page_size_mask(void);
 
 /* local pte updates need not use xchg for locking */
 static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -35,8 +35,10 @@ struct map_range {
 	unsigned page_size_mask;
 };
 
-static void __init find_early_table_space(struct map_range *mr, unsigned long end,
-					  int use_pse, int use_gbpages)
+static int page_size_mask = -1;
+
+static void __init find_early_table_space(struct map_range *mr,
+					  unsigned long end)
 {
 	unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
 	phys_addr_t base;
@@ -44,7 +46,7 @@ static void __init find_early_table_spac
 	puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
 	tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
 
-	if (use_gbpages) {
+	if (page_size_mask & (1 << PG_LEVEL_1G)) {
 		unsigned long extra;
 
 		extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
@@ -54,7 +56,7 @@ static void __init find_early_table_spac
 
 	tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
 
-	if (use_pse) {
+	if (page_size_mask & (1 << PG_LEVEL_2M)) {
 		unsigned long extra;
 
 		extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
@@ -90,6 +92,34 @@ static void __init find_early_table_spac
 		(pgt_buf_top << PAGE_SHIFT) - 1);
 }
 
+void probe_page_size_mask(void)
+{
+	if (page_size_mask != -1)
+		return;
+
+	page_size_mask = 0;
+#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK)
+	/*
+	 * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.
+	 * This will simplify cpa(), which otherwise needs to support splitting
+	 * large pages into small in interrupt context, etc.
+	 */
+	if (direct_gbpages)
+		page_size_mask |= 1 << PG_LEVEL_1G;
+	if (cpu_has_pse)
+		page_size_mask |= 1 << PG_LEVEL_2M;
+#endif
+
+	/* Enable PSE if available */
+	if (cpu_has_pse)
+		set_in_cr4(X86_CR4_PSE);
+
+	/* Enable PGE if available */
+	if (cpu_has_pge) {
+		set_in_cr4(X86_CR4_PGE);
+		__supported_pte_mask |= _PAGE_GLOBAL;
+	}
+}
 void __init native_pagetable_reserve(u64 start, u64 end)
 {
 	memblock_reserve(start, end - start);
@@ -125,44 +155,16 @@ static int __meminit save_mr(struct map_
 unsigned long __init_refok init_memory_mapping(unsigned long start,
 					       unsigned long end)
 {
-	unsigned long page_size_mask = 0;
 	unsigned long start_pfn, end_pfn;
 	unsigned long ret = 0;
 	unsigned long pos;
-
 	struct map_range mr[NR_RANGE_MR];
 	int nr_range, i;
-	int use_pse, use_gbpages;
 
 	printk(KERN_INFO "init_memory_mapping: [mem %#010lx-%#010lx]\n",
 	       start, end - 1);
 
-#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KMEMCHECK)
-	/*
-	 * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.
-	 * This will simplify cpa(), which otherwise needs to support splitting
-	 * large pages into small in interrupt context, etc.
-	 */
-	use_pse = use_gbpages = 0;
-#else
-	use_pse = cpu_has_pse;
-	use_gbpages = direct_gbpages;
-#endif
-
-	/* Enable PSE if available */
-	if (cpu_has_pse)
-		set_in_cr4(X86_CR4_PSE);
-
-	/* Enable PGE if available */
-	if (cpu_has_pge) {
-		set_in_cr4(X86_CR4_PGE);
-		__supported_pte_mask |= _PAGE_GLOBAL;
-	}
-
-	if (use_gbpages)
-		page_size_mask |= 1 << PG_LEVEL_1G;
-	if (use_pse)
-		page_size_mask |= 1 << PG_LEVEL_2M;
+	probe_page_size_mask();
 
 	memset(mr, 0, sizeof(mr));
 	nr_range = 0;
@@ -267,7 +269,7 @@ unsigned long __init_refok init_memory_m
 	 * nodes are discovered.
 	 */
 	if (!after_bootmem)
-		find_early_table_space(&mr[0], end, use_pse, use_gbpages);
+		find_early_table_space(&mr[0], end);
 
 	for (i = 0; i < nr_range; i++)
 		ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping
  2012-08-25  1:49     ` Yinghai Lu
@ 2012-08-25  2:06       ` Yinghai Lu
  2012-08-25  4:15         ` Jacob Shin
  0 siblings, 1 reply; 29+ messages in thread
From: Yinghai Lu @ 2012-08-25  2:06 UTC (permalink / raw)
  To: Jacob Shin, H. Peter Anvin, Ingo Molnar
  Cc: X86-ML, LKML, Tejun Heo, Dave Young, Chao Wang, Vivek Goyal,
	Andreas Herrmann, Borislav Petkov

[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]

On Fri, Aug 24, 2012 at 6:49 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Aug 24, 2012 at 6:25 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
>>> Depending on the platform, init_memory_mapping() may be called multiple
>>> times. Move it out to setup_arch() to avoid writing to cr4 on every call.
>>>
>>> Signed-off-by: Jacob Shin <jacob.shin@amd.com>
>>> ---
>>>  arch/x86/kernel/setup.c |   10 ++++++++++
>>>  arch/x86/mm/init.c      |   10 ----------
>>>  2 files changed, 10 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>> index f4b9b80..751e020 100644
>>> --- a/arch/x86/kernel/setup.c
>>> +++ b/arch/x86/kernel/setup.c
>>> @@ -913,6 +913,16 @@ void __init setup_arch(char **cmdline_p)
>>>
>>>         init_gbpages();
>>>
>>> +       /* Enable PSE if available */
>>> +       if (cpu_has_pse)
>>> +               set_in_cr4(X86_CR4_PSE);
>>> +
>>> +       /* Enable PGE if available */
>>> +       if (cpu_has_pge) {
>>> +               set_in_cr4(X86_CR4_PGE);
>>> +               __supported_pte_mask |= _PAGE_GLOBAL;
>>> +       }
>>> +
>>
>> please don't put it directly in setup_arch().
>>
>> and another function.
>>
>
> Jacob, hpa
>
> can you use attached one to replace the first patch?

Please use attached two instead.

Thanks

Yinghai

[-- Attachment #2: get_page_size_mask_v3.patch --]
[-- Type: application/octet-stream, Size: 4889 bytes --]

Subject: [PATCH 1/2] x86, mm: Add page_size_mask()

detect if need to use 1G or 2M and store them in page_size_mask.

Only probe them one time.

Suggested-by: Ingo Molnar <mingo@elte.hu>
Signe-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/pgtable.h |    1 
 arch/x86/kernel/setup.c        |    1 
 arch/x86/mm/init.c             |   66 +++++++++++++++++++----------------------
 3 files changed, 33 insertions(+), 35 deletions(-)

Index: linux-2.6/arch/x86/include/asm/pgtable.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/pgtable.h
+++ linux-2.6/arch/x86/include/asm/pgtable.h
@@ -597,6 +597,7 @@ static inline int pgd_none(pgd_t pgd)
 #ifndef __ASSEMBLY__
 
 extern int direct_gbpages;
+void probe_page_size_mask(void);
 
 /* local pte updates need not use xchg for locking */
 static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -35,8 +35,10 @@ struct map_range {
 	unsigned page_size_mask;
 };
 
-static void __init find_early_table_space(struct map_range *mr, unsigned long end,
-					  int use_pse, int use_gbpages)
+static int page_size_mask;
+
+static void __init find_early_table_space(struct map_range *mr,
+					  unsigned long end)
 {
 	unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
 	phys_addr_t base;
@@ -44,7 +46,7 @@ static void __init find_early_table_spac
 	puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
 	tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
 
-	if (use_gbpages) {
+	if (page_size_mask & (1 << PG_LEVEL_1G)) {
 		unsigned long extra;
 
 		extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
@@ -54,7 +56,7 @@ static void __init find_early_table_spac
 
 	tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
 
-	if (use_pse) {
+	if (page_size_mask & (1 << PG_LEVEL_2M)) {
 		unsigned long extra;
 
 		extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
@@ -90,6 +92,30 @@ static void __init find_early_table_spac
 		(pgt_buf_top << PAGE_SHIFT) - 1);
 }
 
+void probe_page_size_mask(void)
+{
+#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK)
+	/*
+	 * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.
+	 * This will simplify cpa(), which otherwise needs to support splitting
+	 * large pages into small in interrupt context, etc.
+	 */
+	if (direct_gbpages)
+		page_size_mask |= 1 << PG_LEVEL_1G;
+	if (cpu_has_pse)
+		page_size_mask |= 1 << PG_LEVEL_2M;
+#endif
+
+	/* Enable PSE if available */
+	if (cpu_has_pse)
+		set_in_cr4(X86_CR4_PSE);
+
+	/* Enable PGE if available */
+	if (cpu_has_pge) {
+		set_in_cr4(X86_CR4_PGE);
+		__supported_pte_mask |= _PAGE_GLOBAL;
+	}
+}
 void __init native_pagetable_reserve(u64 start, u64 end)
 {
 	memblock_reserve(start, end - start);
@@ -125,45 +151,15 @@ static int __meminit save_mr(struct map_
 unsigned long __init_refok init_memory_mapping(unsigned long start,
 					       unsigned long end)
 {
-	unsigned long page_size_mask = 0;
 	unsigned long start_pfn, end_pfn;
 	unsigned long ret = 0;
 	unsigned long pos;
-
 	struct map_range mr[NR_RANGE_MR];
 	int nr_range, i;
-	int use_pse, use_gbpages;
 
 	printk(KERN_INFO "init_memory_mapping: [mem %#010lx-%#010lx]\n",
 	       start, end - 1);
 
-#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KMEMCHECK)
-	/*
-	 * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.
-	 * This will simplify cpa(), which otherwise needs to support splitting
-	 * large pages into small in interrupt context, etc.
-	 */
-	use_pse = use_gbpages = 0;
-#else
-	use_pse = cpu_has_pse;
-	use_gbpages = direct_gbpages;
-#endif
-
-	/* Enable PSE if available */
-	if (cpu_has_pse)
-		set_in_cr4(X86_CR4_PSE);
-
-	/* Enable PGE if available */
-	if (cpu_has_pge) {
-		set_in_cr4(X86_CR4_PGE);
-		__supported_pte_mask |= _PAGE_GLOBAL;
-	}
-
-	if (use_gbpages)
-		page_size_mask |= 1 << PG_LEVEL_1G;
-	if (use_pse)
-		page_size_mask |= 1 << PG_LEVEL_2M;
-
 	memset(mr, 0, sizeof(mr));
 	nr_range = 0;
 
@@ -267,7 +263,7 @@ unsigned long __init_refok init_memory_m
 	 * nodes are discovered.
 	 */
 	if (!after_bootmem)
-		find_early_table_space(&mr[0], end, use_pse, use_gbpages);
+		find_early_table_space(&mr[0], end);
 
 	for (i = 0; i < nr_range; i++)
 		ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -912,6 +912,7 @@ void __init setup_arch(char **cmdline_p)
 	setup_real_mode();
 
 	init_gbpages();
+	probe_page_size_mask();
 
 	/* max_pfn_mapped is updated here */
 	max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);

[-- Attachment #3: mr_cal.patch --]
[-- Type: application/octet-stream, Size: 2290 bytes --]

Subject: [PATCH 2/2] x86, mm: Split out split_mem_range

from init_memory_mapping, so make init_memory_mapping readable.

Suggested-by: Ingo Molnar <mingo@elte.hu>
Signe-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init.c |   42 ++++++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 16 deletions(-)

Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -143,25 +143,13 @@ static int __meminit save_mr(struct map_
 	return nr_range;
 }
 
-/*
- * Setup the direct mapping of the physical memory at PAGE_OFFSET.
- * This runs before bootmem is initialized and gets pages directly from
- * the physical memory. To access them they are temporarily mapped.
- */
-unsigned long __init_refok init_memory_mapping(unsigned long start,
-					       unsigned long end)
+static int __meminit split_mem_range(struct map_range *mr, int nr_range,
+				     unsigned long start,
+				     unsigned long end)
 {
 	unsigned long start_pfn, end_pfn;
-	unsigned long ret = 0;
 	unsigned long pos;
-	struct map_range mr[NR_RANGE_MR];
-	int nr_range, i;
-
-	printk(KERN_INFO "init_memory_mapping: [mem %#010lx-%#010lx]\n",
-	       start, end - 1);
-
-	memset(mr, 0, sizeof(mr));
-	nr_range = 0;
+	int i;
 
 	/* head if not big page alignment ? */
 	start_pfn = start >> PAGE_SHIFT;
@@ -255,6 +243,28 @@ unsigned long __init_refok init_memory_m
 			(mr[i].page_size_mask & (1<<PG_LEVEL_1G))?"1G":(
 			 (mr[i].page_size_mask & (1<<PG_LEVEL_2M))?"2M":"4k"));
 
+	return nr_range;
+}
+
+/*
+ * Setup the direct mapping of the physical memory at PAGE_OFFSET.
+ * This runs before bootmem is initialized and gets pages directly from
+ * the physical memory. To access them they are temporarily mapped.
+ */
+unsigned long __init_refok init_memory_mapping(unsigned long start,
+					       unsigned long end)
+{
+	struct map_range mr[NR_RANGE_MR];
+	unsigned long ret = 0;
+	int nr_range, i;
+
+	printk(KERN_INFO "init_memory_mapping: [mem %#010lx-%#010lx]\n",
+	       start, end - 1);
+
+	memset(mr, 0, sizeof(mr));
+	nr_range = 0;
+	nr_range = split_mem_range(mr, nr_range, start, end);
+
 	/*
 	 * Find space for the kernel direct mapping tables.
 	 *

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping
  2012-08-25  1:25   ` Yinghai Lu
  2012-08-25  1:49     ` Yinghai Lu
@ 2012-08-25  4:13     ` Jacob Shin
  1 sibling, 0 replies; 29+ messages in thread
From: Jacob Shin @ 2012-08-25  4:13 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: X86-ML, LKML, H. Peter Anvin, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 06:25:38PM -0700, Yinghai Lu wrote:
> On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> > Depending on the platform, init_memory_mapping() may be called multiple
> > times. Move it out to setup_arch() to avoid writing to cr4 on every call.
> >
> > Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> > ---
> >  arch/x86/kernel/setup.c |   10 ++++++++++
> >  arch/x86/mm/init.c      |   10 ----------
> >  2 files changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index f4b9b80..751e020 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -913,6 +913,16 @@ void __init setup_arch(char **cmdline_p)
> >
> >         init_gbpages();
> >
> > +       /* Enable PSE if available */
> > +       if (cpu_has_pse)
> > +               set_in_cr4(X86_CR4_PSE);
> > +
> > +       /* Enable PGE if available */
> > +       if (cpu_has_pge) {
> > +               set_in_cr4(X86_CR4_PGE);
> > +               __supported_pte_mask |= _PAGE_GLOBAL;
> > +       }
> > +
> 
> please don't put it directly in setup_arch().
> 
> and another function.

It actually gets moved out to another function in patch 3/5

> 
> Thanks
> 
> Yinghai
> 
> >         /* max_pfn_mapped is updated here */
> >         max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
> >         max_pfn_mapped = max_low_pfn_mapped;
> > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> > index e0e6990..2f07e09 100644
> > --- a/arch/x86/mm/init.c
> > +++ b/arch/x86/mm/init.c
> > @@ -149,16 +149,6 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
> >         use_gbpages = direct_gbpages;
> >  #endif
> >
> > -       /* Enable PSE if available */
> > -       if (cpu_has_pse)
> > -               set_in_cr4(X86_CR4_PSE);
> > -
> > -       /* Enable PGE if available */
> > -       if (cpu_has_pge) {
> > -               set_in_cr4(X86_CR4_PGE);
> > -               __supported_pte_mask |= _PAGE_GLOBAL;
> > -       }
> > -
> >         if (use_gbpages)
> >                 page_size_mask |= 1 << PG_LEVEL_1G;
> >         if (use_pse)
> > --
> > 1.7.9.5
> >
> >
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping
  2012-08-25  2:06       ` Yinghai Lu
@ 2012-08-25  4:15         ` Jacob Shin
  0 siblings, 0 replies; 29+ messages in thread
From: Jacob Shin @ 2012-08-25  4:15 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Ingo Molnar, X86-ML, LKML, Tejun Heo, Dave Young,
	Chao Wang, Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 07:06:42PM -0700, Yinghai Lu wrote:
> On Fri, Aug 24, 2012 at 6:49 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Fri, Aug 24, 2012 at 6:25 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> >>> Depending on the platform, init_memory_mapping() may be called multiple
> >>> times. Move it out to setup_arch() to avoid writing to cr4 on every call.
> >>>
> >>> Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> >>> ---
> >>>  arch/x86/kernel/setup.c |   10 ++++++++++
> >>>  arch/x86/mm/init.c      |   10 ----------
> >>>  2 files changed, 10 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >>> index f4b9b80..751e020 100644
> >>> --- a/arch/x86/kernel/setup.c
> >>> +++ b/arch/x86/kernel/setup.c
> >>> @@ -913,6 +913,16 @@ void __init setup_arch(char **cmdline_p)
> >>>
> >>>         init_gbpages();
> >>>
> >>> +       /* Enable PSE if available */
> >>> +       if (cpu_has_pse)
> >>> +               set_in_cr4(X86_CR4_PSE);
> >>> +
> >>> +       /* Enable PGE if available */
> >>> +       if (cpu_has_pge) {
> >>> +               set_in_cr4(X86_CR4_PGE);
> >>> +               __supported_pte_mask |= _PAGE_GLOBAL;
> >>> +       }
> >>> +
> >>
> >> please don't put it directly in setup_arch().
> >>
> >> and another function.
> >>
> >
> > Jacob, hpa
> >
> > can you use attached one to replace the first patch?
> 
> Please use attached two instead.

Hmm .. okay I'll test with these two patches applied on Monday ..

> 
> Thanks
> 
> Yinghai





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-25  1:13       ` H. Peter Anvin
@ 2012-08-25  4:20         ` Jacob Shin
  2012-08-25  4:21           ` H. Peter Anvin
  0 siblings, 1 reply; 29+ messages in thread
From: Jacob Shin @ 2012-08-25  4:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: X86-ML, LKML, Yinghai Lu, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 06:13:02PM -0700, H. Peter Anvin wrote:
> On 08/24/2012 05:49 PM, Jacob Shin wrote:
> > 
> > Right, I think what I was attempting to do was to merge the 1MB
> > with E820_RAM right above 1MB:
> > 
> > So instead of:
> > 
> > init_memory_mapping(0, 1MB)
> > init_memory_mapping(1MB, 2GB)
> > 
> > It would be:
> > 
> > init_memory_mapping(0, 2GB)
> > 
> > While taking care of the odd case where there is a gap right after
> > 1MB.
> > 
> > But if its not worth it, I can move it out of the loop.
> > 
> 
> What is the benefit?

So that in the case where we have E820_RAM right above 1MB, we don't
call init_memory_mapping twice, first on 0 ~ 1MB and then 1MB ~ something

we only call it once. 0 ~ something.

I'll get it out of the loop if you don't think its a good idea.

> 
> 	-hpa
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-25  4:20         ` Jacob Shin
@ 2012-08-25  4:21           ` H. Peter Anvin
  2012-08-27 19:17             ` Jacob Shin
  0 siblings, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2012-08-25  4:21 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, Yinghai Lu, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On 08/24/2012 09:20 PM, Jacob Shin wrote:
>>
>> What is the benefit?
>
> So that in the case where we have E820_RAM right above 1MB, we don't
> call init_memory_mapping twice, first on 0 ~ 1MB and then 1MB ~ something
>
> we only call it once. 0 ~ something.
>

So what is the benefit?

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-25  1:07   ` Yinghai Lu
@ 2012-08-25  4:24     ` Jacob Shin
  2012-08-25  4:54       ` Yinghai Lu
  0 siblings, 1 reply; 29+ messages in thread
From: Jacob Shin @ 2012-08-25  4:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: X86-ML, LKML, H. Peter Anvin, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 06:07:01PM -0700, Yinghai Lu wrote:
> On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> > Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
> > and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
> > backed by actual DRAM. This is fine for holes under 4GB which are covered
> > by fixed and variable range MTRRs to be UC. However, we run into trouble
> > on higher memory addresses which cannot be covered by MTRRs.
> >
> > Our system with 1TB of RAM has an e820 that looks like this:
> >
> >  BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
> >  BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
> >  BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
> >  BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
> >  BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
> >  BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
> >  BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
> >  BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
> >  BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
> >  BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
> >  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
> >  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
> >  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> >
> > and so direct mappings are created for huge memory hole between
> > 0x000000e038000000 to 0x0000010000000000. Even though the kernel never
> > generates memory accesses in that region, since the page tables mark
> > them incorrectly as being WB, our (AMD) processor ends up causing a MCE
> > while doing some memory bookkeeping/optimizations around that area.
> >
> > This patch iterates through e820 and only direct maps ranges that are
> > marked as E820_RAM, and keeps track of those pfn ranges. Depending on
> > the alignment of E820 ranges, this may possibly result in using smaller
> > size (i.e. 4K instead of 2M or 1G) page tables.
> >
> > Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> > ---
> >  arch/x86/include/asm/page_types.h |    9 +++
> >  arch/x86/kernel/setup.c           |  125 +++++++++++++++++++++++++++++--------
> >  arch/x86/mm/init.c                |    2 +
> >  arch/x86/mm/init_64.c             |    6 +-
> >  4 files changed, 112 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
> > index e21fdd1..409047a 100644
> > --- a/arch/x86/include/asm/page_types.h
> > +++ b/arch/x86/include/asm/page_types.h
> > @@ -3,6 +3,7 @@
> >
> >  #include <linux/const.h>
> >  #include <linux/types.h>
> > +#include <asm/e820.h>
> >
> >  /* PAGE_SHIFT determines the page size */
> >  #define PAGE_SHIFT     12
> > @@ -40,12 +41,20 @@
> >  #endif /* CONFIG_X86_64 */
> >
> >  #ifndef __ASSEMBLY__
> > +#include <linux/range.h>
> >
> >  extern int devmem_is_allowed(unsigned long pagenr);
> >
> >  extern unsigned long max_low_pfn_mapped;
> >  extern unsigned long max_pfn_mapped;
> >
> > +extern struct range pfn_mapped[E820_X_MAX];
> > +extern int nr_pfn_mapped;
> > +
> > +extern void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn);
> > +extern bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
> > +extern bool pfn_is_mapped(unsigned long pfn);
> > +
> >  static inline phys_addr_t get_max_mapped(void)
> >  {
> >         return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 751e020..4217fb4 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -115,13 +115,46 @@
> >  #include <asm/prom.h>
> >
> >  /*
> > - * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
> > - * The direct mapping extends to max_pfn_mapped, so that we can directly access
> > - * apertures, ACPI and other tables without having to play with fixmaps.
> > + * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> > + * max_pfn_mapped:     highest direct mapped pfn over 4GB
> > + *
> > + * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
> > + * represented by pfn_mapped
> >   */
> >  unsigned long max_low_pfn_mapped;
> >  unsigned long max_pfn_mapped;
> >
> > +struct range pfn_mapped[E820_X_MAX];
> > +int nr_pfn_mapped;
> > +
> > +void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
> > +{
> > +       nr_pfn_mapped = add_range_with_merge(pfn_mapped, E820_X_MAX,
> > +                                            nr_pfn_mapped, start_pfn, end_pfn);
> > +
> > +       max_pfn_mapped = max(max_pfn_mapped, end_pfn);
> > +
> > +       if (end_pfn <= (1UL << (32 - PAGE_SHIFT)))
> > +               max_low_pfn_mapped = max(max_low_pfn_mapped, end_pfn);
> > +}
> > +
> > +bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < nr_pfn_mapped; i++)
> > +               if ((start_pfn >= pfn_mapped[i].start) &&
> > +                   (end_pfn <= pfn_mapped[i].end))
> > +                       return true;
> > +
> > +       return false;
> > +}
> > +
> > +bool pfn_is_mapped(unsigned long pfn)
> > +{
> > +       return pfn_range_is_mapped(pfn, pfn + 1);
> > +}
> > +
> 
> looks like you could avoid add pfn_mapped[] array.
> 
> pfn_range_is_mapped() should be
> check max_low_pfn_mapped, max_pfn_mapped with
> e820_all_mapped(start, end, E820_RAM).

Hmm .. I guess that could work .. but what about EFI code that keys off of
EFI memory map? Does the EFI code update e820 and mark as E820_RAM whatever
ranges that it calls init_memory_mapping on (via efi_ioremap?)

I'll try your suggestion on Monday.

> 
> Thanks
> 
> Yinghai
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/5] x86: if kernel .text .data .bss are not marked as E820_RAM, complain and fix
  2012-08-25  1:23   ` Yinghai Lu
@ 2012-08-25  4:25     ` Jacob Shin
  0 siblings, 0 replies; 29+ messages in thread
From: Jacob Shin @ 2012-08-25  4:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: X86-ML, LKML, H. Peter Anvin, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 06:23:48PM -0700, Yinghai Lu wrote:
> On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> > There could be cases where user supplied memmap=exactmap memory
> > mappings do not mark the region where the kernel .text .data and
> > .bss reside as E820_RAM as reported here:
> >
> > https://lkml.org/lkml/2012/8/14/86
> >
> > Handle it by complaining, and adding the range back into the e820.
> >
> > Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> > ---
> >  arch/x86/kernel/setup.c |   15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 4217fb4..b84aceb5 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -926,6 +926,21 @@ void __init setup_arch(char **cmdline_p)
> >         insert_resource(&iomem_resource, &data_resource);
> >         insert_resource(&iomem_resource, &bss_resource);
> >
> > +       /*
> > +        * Complain if .text .data and .bss are not marked as E820_RAM and
> > +        * attempt to fix it by adding the range. We may have a confused BIOS,
> > +        * or the user may have incorrectly supplied it via memmap=exactmap. If
> > +        * we really are running on top non-RAM, we will crash later anyways.
> > +        */
> > +       if (!e820_all_mapped(code_resource.start, bss_resource.end, E820_RAM)) {
> > +               pr_warn(".text .data .bss are not marked as E820_RAM!\n");
> > +
> > +               e820_add_region(code_resource.start,
> > +                               bss_resource.end - code_resource.start + 1,
> > +                               E820_RAM);
> > +               sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
> 
>            this sanitze_e820_map could be spared. trim_bios_range will
> that always.

Ah. okay

> 
> > +       }
> > +
> >         trim_bios_range();
> >  #ifdef CONFIG_X86_32
> >         if (ppro_with_ram_bug()) {
> 
> also should use brk_limit instead of bss_resource.end. aka need to
> keep the map for brk area.

Okay.. will fix on Monday

> 
> Thanks
> 
> Yinghai
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-25  4:24     ` Jacob Shin
@ 2012-08-25  4:54       ` Yinghai Lu
  2012-08-27 18:49         ` Jacob Shin
  2012-08-28 16:06         ` Jacob Shin
  0 siblings, 2 replies; 29+ messages in thread
From: Yinghai Lu @ 2012-08-25  4:54 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, H. Peter Anvin, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 9:24 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> On Fri, Aug 24, 2012 at 06:07:01PM -0700, Yinghai Lu wrote:
>> On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
>> > Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
>> > and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
>> > backed by actual DRAM. This is fine for holes under 4GB which are covered
>> > by fixed and variable range MTRRs to be UC. However, we run into trouble
>> > on higher memory addresses which cannot be covered by MTRRs.
>> >
>> > Our system with 1TB of RAM has an e820 that looks like this:
>> >
>> >  BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
>> >  BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
>> >  BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
>> >  BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
>> >  BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
>> >  BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
>> >  BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
>> >  BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
>> >  BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
>> >  BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
>> >  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
>> >  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
>> >  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
>> >
>> > and so direct mappings are created for huge memory hole between
>> > 0x000000e038000000 to 0x0000010000000000. Even though the kernel never
>> > generates memory accesses in that region, since the page tables mark
>> > them incorrectly as being WB, our (AMD) processor ends up causing a MCE
>> > while doing some memory bookkeeping/optimizations around that area.
>> >
>> > This patch iterates through e820 and only direct maps ranges that are
>> > marked as E820_RAM, and keeps track of those pfn ranges. Depending on
>> > the alignment of E820 ranges, this may possibly result in using smaller
>> > size (i.e. 4K instead of 2M or 1G) page tables.
>> >
>> > Signed-off-by: Jacob Shin <jacob.shin@amd.com>
>> > ---
>> >  arch/x86/include/asm/page_types.h |    9 +++
>> >  arch/x86/kernel/setup.c           |  125 +++++++++++++++++++++++++++++--------
>> >  arch/x86/mm/init.c                |    2 +
>> >  arch/x86/mm/init_64.c             |    6 +-
>> >  4 files changed, 112 insertions(+), 30 deletions(-)
>> >
>> > diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
>> > index e21fdd1..409047a 100644
>> > --- a/arch/x86/include/asm/page_types.h
>> > +++ b/arch/x86/include/asm/page_types.h
>> > @@ -3,6 +3,7 @@
>> >
>> >  #include <linux/const.h>
>> >  #include <linux/types.h>
>> > +#include <asm/e820.h>
>> >
>> >  /* PAGE_SHIFT determines the page size */
>> >  #define PAGE_SHIFT     12
>> > @@ -40,12 +41,20 @@
>> >  #endif /* CONFIG_X86_64 */
>> >
>> >  #ifndef __ASSEMBLY__
>> > +#include <linux/range.h>
>> >
>> >  extern int devmem_is_allowed(unsigned long pagenr);
>> >
>> >  extern unsigned long max_low_pfn_mapped;
>> >  extern unsigned long max_pfn_mapped;
>> >
>> > +extern struct range pfn_mapped[E820_X_MAX];
>> > +extern int nr_pfn_mapped;
>> > +
>> > +extern void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn);
>> > +extern bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
>> > +extern bool pfn_is_mapped(unsigned long pfn);
>> > +
>> >  static inline phys_addr_t get_max_mapped(void)
>> >  {
>> >         return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
>> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> > index 751e020..4217fb4 100644
>> > --- a/arch/x86/kernel/setup.c
>> > +++ b/arch/x86/kernel/setup.c
>> > @@ -115,13 +115,46 @@
>> >  #include <asm/prom.h>
>> >
>> >  /*
>> > - * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
>> > - * The direct mapping extends to max_pfn_mapped, so that we can directly access
>> > - * apertures, ACPI and other tables without having to play with fixmaps.
>> > + * max_low_pfn_mapped: highest direct mapped pfn under 4GB
>> > + * max_pfn_mapped:     highest direct mapped pfn over 4GB
>> > + *
>> > + * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
>> > + * represented by pfn_mapped
>> >   */
>> >  unsigned long max_low_pfn_mapped;
>> >  unsigned long max_pfn_mapped;
>> >
>> > +struct range pfn_mapped[E820_X_MAX];
>> > +int nr_pfn_mapped;
>> > +
>> > +void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
>> > +{
>> > +       nr_pfn_mapped = add_range_with_merge(pfn_mapped, E820_X_MAX,
>> > +                                            nr_pfn_mapped, start_pfn, end_pfn);
>> > +
>> > +       max_pfn_mapped = max(max_pfn_mapped, end_pfn);
>> > +
>> > +       if (end_pfn <= (1UL << (32 - PAGE_SHIFT)))
>> > +               max_low_pfn_mapped = max(max_low_pfn_mapped, end_pfn);
>> > +}
>> > +
>> > +bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
>> > +{
>> > +       int i;
>> > +
>> > +       for (i = 0; i < nr_pfn_mapped; i++)
>> > +               if ((start_pfn >= pfn_mapped[i].start) &&
>> > +                   (end_pfn <= pfn_mapped[i].end))
>> > +                       return true;
>> > +
>> > +       return false;
>> > +}
>> > +
>> > +bool pfn_is_mapped(unsigned long pfn)
>> > +{
>> > +       return pfn_range_is_mapped(pfn, pfn + 1);
>> > +}
>> > +
>>
>> looks like you could avoid add pfn_mapped[] array.
>>
>> pfn_range_is_mapped() should be
>> check max_low_pfn_mapped, max_pfn_mapped with
>> e820_all_mapped(start, end, E820_RAM).
>
> Hmm .. I guess that could work .. but what about EFI code that keys off of
> EFI memory map? Does the EFI code update e820 and mark as E820_RAM whatever
> ranges that it calls init_memory_mapping on (via efi_ioremap?)

they are converted to e820 memmap before init_memory_mapping is called.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-25  4:54       ` Yinghai Lu
@ 2012-08-27 18:49         ` Jacob Shin
  2012-08-27 20:16           ` H. Peter Anvin
  2012-08-28 16:06         ` Jacob Shin
  1 sibling, 1 reply; 29+ messages in thread
From: Jacob Shin @ 2012-08-27 18:49 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: X86-ML, LKML, H. Peter Anvin, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 09:54:04PM -0700, Yinghai Lu wrote:
> On Fri, Aug 24, 2012 at 9:24 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> > On Fri, Aug 24, 2012 at 06:07:01PM -0700, Yinghai Lu wrote:
> >> On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> >> > Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
> >> > and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
> >> > backed by actual DRAM. This is fine for holes under 4GB which are covered
> >> > by fixed and variable range MTRRs to be UC. However, we run into trouble
> >> > on higher memory addresses which cannot be covered by MTRRs.
> >> >
> >> > Our system with 1TB of RAM has an e820 that looks like this:
> >> >
> >> >  BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
> >> >  BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
> >> >  BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
> >> >  BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
> >> >  BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
> >> >  BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
> >> >  BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
> >> >  BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
> >> >  BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
> >> >  BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
> >> >  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
> >> >  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
> >> >  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> >> >
> >> > and so direct mappings are created for huge memory hole between
> >> > 0x000000e038000000 to 0x0000010000000000. Even though the kernel never
> >> > generates memory accesses in that region, since the page tables mark
> >> > them incorrectly as being WB, our (AMD) processor ends up causing a MCE
> >> > while doing some memory bookkeeping/optimizations around that area.
> >> >
> >> > This patch iterates through e820 and only direct maps ranges that are
> >> > marked as E820_RAM, and keeps track of those pfn ranges. Depending on
> >> > the alignment of E820 ranges, this may possibly result in using smaller
> >> > size (i.e. 4K instead of 2M or 1G) page tables.
> >> >
> >> > Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> >> > ---
> >> >  arch/x86/include/asm/page_types.h |    9 +++
> >> >  arch/x86/kernel/setup.c           |  125 +++++++++++++++++++++++++++++--------
> >> >  arch/x86/mm/init.c                |    2 +
> >> >  arch/x86/mm/init_64.c             |    6 +-
> >> >  4 files changed, 112 insertions(+), 30 deletions(-)
> >> >
> >> > diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
> >> > index e21fdd1..409047a 100644
> >> > --- a/arch/x86/include/asm/page_types.h
> >> > +++ b/arch/x86/include/asm/page_types.h
> >> > @@ -3,6 +3,7 @@
> >> >
> >> >  #include <linux/const.h>
> >> >  #include <linux/types.h>
> >> > +#include <asm/e820.h>
> >> >
> >> >  /* PAGE_SHIFT determines the page size */
> >> >  #define PAGE_SHIFT     12
> >> > @@ -40,12 +41,20 @@
> >> >  #endif /* CONFIG_X86_64 */
> >> >
> >> >  #ifndef __ASSEMBLY__
> >> > +#include <linux/range.h>
> >> >
> >> >  extern int devmem_is_allowed(unsigned long pagenr);
> >> >
> >> >  extern unsigned long max_low_pfn_mapped;
> >> >  extern unsigned long max_pfn_mapped;
> >> >
> >> > +extern struct range pfn_mapped[E820_X_MAX];
> >> > +extern int nr_pfn_mapped;
> >> > +
> >> > +extern void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn);
> >> > +extern bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
> >> > +extern bool pfn_is_mapped(unsigned long pfn);
> >> > +
> >> >  static inline phys_addr_t get_max_mapped(void)
> >> >  {
> >> >         return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
> >> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >> > index 751e020..4217fb4 100644
> >> > --- a/arch/x86/kernel/setup.c
> >> > +++ b/arch/x86/kernel/setup.c
> >> > @@ -115,13 +115,46 @@
> >> >  #include <asm/prom.h>
> >> >
> >> >  /*
> >> > - * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
> >> > - * The direct mapping extends to max_pfn_mapped, so that we can directly access
> >> > - * apertures, ACPI and other tables without having to play with fixmaps.
> >> > + * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> >> > + * max_pfn_mapped:     highest direct mapped pfn over 4GB
> >> > + *
> >> > + * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
> >> > + * represented by pfn_mapped
> >> >   */
> >> >  unsigned long max_low_pfn_mapped;
> >> >  unsigned long max_pfn_mapped;
> >> >
> >> > +struct range pfn_mapped[E820_X_MAX];
> >> > +int nr_pfn_mapped;
> >> > +
> >> > +void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
> >> > +{
> >> > +       nr_pfn_mapped = add_range_with_merge(pfn_mapped, E820_X_MAX,
> >> > +                                            nr_pfn_mapped, start_pfn, end_pfn);
> >> > +
> >> > +       max_pfn_mapped = max(max_pfn_mapped, end_pfn);
> >> > +
> >> > +       if (end_pfn <= (1UL << (32 - PAGE_SHIFT)))
> >> > +               max_low_pfn_mapped = max(max_low_pfn_mapped, end_pfn);
> >> > +}
> >> > +
> >> > +bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
> >> > +{
> >> > +       int i;
> >> > +
> >> > +       for (i = 0; i < nr_pfn_mapped; i++)
> >> > +               if ((start_pfn >= pfn_mapped[i].start) &&
> >> > +                   (end_pfn <= pfn_mapped[i].end))
> >> > +                       return true;
> >> > +
> >> > +       return false;
> >> > +}
> >> > +
> >> > +bool pfn_is_mapped(unsigned long pfn)
> >> > +{
> >> > +       return pfn_range_is_mapped(pfn, pfn + 1);
> >> > +}
> >> > +
> >>
> >> looks like you could avoid add pfn_mapped[] array.
> >>
> >> pfn_range_is_mapped() should be
> >> check max_low_pfn_mapped, max_pfn_mapped with
> >> e820_all_mapped(start, end, E820_RAM).
> >
> > Hmm .. I guess that could work .. but what about EFI code that keys off of
> > EFI memory map? Does the EFI code update e820 and mark as E820_RAM whatever
> > ranges that it calls init_memory_mapping on (via efi_ioremap?)
> 
> they are converted to e820 memmap before init_memory_mapping is called.

Yinghai, another question, what about hotplug? Are we guaranteed that
we will always be adding memory above max_pfn_mapped? And hotplug will
also update e820 to mark the range as E820_RAM as well?

Thanks!

-Jacob

> 
> Thanks
> 
> Yinghai
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-25  4:21           ` H. Peter Anvin
@ 2012-08-27 19:17             ` Jacob Shin
  2012-08-27 20:15               ` H. Peter Anvin
  0 siblings, 1 reply; 29+ messages in thread
From: Jacob Shin @ 2012-08-27 19:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: X86-ML, LKML, Yinghai Lu, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 09:21:18PM -0700, H. Peter Anvin wrote:
> On 08/24/2012 09:20 PM, Jacob Shin wrote:
> >>
> >>What is the benefit?
> >
> >So that in the case where we have E820_RAM right above 1MB, we don't
> >call init_memory_mapping twice, first on 0 ~ 1MB and then 1MB ~ something
> >
> >we only call it once. 0 ~ something.
> >
> 
> So what is the benefit?

if there is E820_RAM right above ISA region, then you get to initialize
0 ~ max_low_pfn in one big chunk, which results in some memory configurations
for more 2M or 1G page tables which means less space used for page tables.

im also worried about the case where that first call to init_memory_mapping
for 0 ~ 1MB, results in max_pfn_mapped = 1MB, and the next call to
init_memory_mapping is some large enough area, where we don't have enough
space under 1MB for all the page tables needed (maybe only 4K page tables
are supported or something).

-Jacob

> 
> 	-hpa
> 
> 
> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-27 19:17             ` Jacob Shin
@ 2012-08-27 20:15               ` H. Peter Anvin
  0 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2012-08-27 20:15 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, Yinghai Lu, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On 08/27/2012 12:17 PM, Jacob Shin wrote:
> 
> if there is E820_RAM right above ISA region, then you get to initialize
> 0 ~ max_low_pfn in one big chunk, which results in some memory configurations
> for more 2M or 1G page tables which means less space used for page tables.
> 

We need to be able to coalesce small page tables to large, anyway; there
are plenty of machines in the field who do small chunks.  I'm not too
worried about the legacy region being in 4K pages; it will be broken
into 4K pages anyway by the TLB.

Another thing is that we may want to map from the top down (on i386 at
least top of lowmem down); we don't want to fill low memory with page
tables because of devices with restricted DMA masks.

> im also worried about the case where that first call to init_memory_mapping
> for 0 ~ 1MB, results in max_pfn_mapped = 1MB, and the next call to
> init_memory_mapping is some large enough area, where we don't have enough
> space under 1MB for all the page tables needed (maybe only 4K page tables
> are supported or something).

This is serious... I'm worrying that this might be a more general
problem.  In that case we probably need to handle the case where we have
filled up all the "free" memory with page tables for the next chunk;
however, in that case the answer is pretty simple: we can then allow the
memory already mapped to become page tables for the new chunk.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-27 18:49         ` Jacob Shin
@ 2012-08-27 20:16           ` H. Peter Anvin
  0 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2012-08-27 20:16 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Yinghai Lu, X86-ML, LKML, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On 08/27/2012 11:49 AM, Jacob Shin wrote:
> 
> Yinghai, another question, what about hotplug? Are we guaranteed that
> we will always be adding memory above max_pfn_mapped?

I don't think we can rely on this...

> And hotplug will also update e820 to mark the range as E820_RAM as well?

This is really should.

	-hpa


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-25  4:54       ` Yinghai Lu
  2012-08-27 18:49         ` Jacob Shin
@ 2012-08-28 16:06         ` Jacob Shin
  2012-08-28 16:11           ` H. Peter Anvin
  1 sibling, 1 reply; 29+ messages in thread
From: Jacob Shin @ 2012-08-28 16:06 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: X86-ML, LKML, H. Peter Anvin, Tejun Heo, Dave Young, Chao Wang,
	Vivek Goyal, Andreas Herrmann, Borislav Petkov

On Fri, Aug 24, 2012 at 09:54:04PM -0700, Yinghai Lu wrote:
> On Fri, Aug 24, 2012 at 9:24 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> > On Fri, Aug 24, 2012 at 06:07:01PM -0700, Yinghai Lu wrote:
> >> On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> >>
> >> looks like you could avoid add pfn_mapped[] array.
> >>
> >> pfn_range_is_mapped() should be
> >> check max_low_pfn_mapped, max_pfn_mapped with
> >> e820_all_mapped(start, end, E820_RAM).
> >
> > Hmm .. I guess that could work .. but what about EFI code that keys off of
> > EFI memory map? Does the EFI code update e820 and mark as E820_RAM whatever
> > ranges that it calls init_memory_mapping on (via efi_ioremap?)
> 
> they are converted to e820 memmap before init_memory_mapping is called.

Yinghai, looking into this further on my EFI enabled machine, there is a
memory range where:

- e820 marks it as E820_RESERVED
- EFI memory map marks it as EFI_RUNTIME_SERVICES_DATA

During EFI init, the range is added (redundantly) to e820 as E820_RESERVED,
but during efi_enter_virtual_mode, direct mappings are created for the
range with a call to efi_ioremap.

Another such region is EFI_RUNTIME_SERVICES_CODE which is marked as
ACPI NVS.

So these are not E820_RAM, but direct mapped by EFI code path .. what do
we do here? I think we should just stick with keeping the pfn_mapped[]
array .. no?

-Jacob

> 
> Thanks
> 
> Yinghai
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-28 16:06         ` Jacob Shin
@ 2012-08-28 16:11           ` H. Peter Anvin
  0 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2012-08-28 16:11 UTC (permalink / raw)
  To: Jacob Shin, Yinghai Lu
  Cc: X86-ML, LKML, Tejun Heo, Dave Young, Chao Wang, Vivek Goyal,
	Andreas Herrmann, Borislav Petkov

The EFI runtime code we need to map, but we shouldn't use as RAM.  I suspect this is also true for the ACPI areas.

Jacob Shin <jacob.shin@amd.com> wrote:

>On Fri, Aug 24, 2012 at 09:54:04PM -0700, Yinghai Lu wrote:
>> On Fri, Aug 24, 2012 at 9:24 PM, Jacob Shin <jacob.shin@amd.com>
>wrote:
>> > On Fri, Aug 24, 2012 at 06:07:01PM -0700, Yinghai Lu wrote:
>> >> On Fri, Aug 24, 2012 at 4:55 PM, Jacob Shin <jacob.shin@amd.com>
>wrote:
>> >>
>> >> looks like you could avoid add pfn_mapped[] array.
>> >>
>> >> pfn_range_is_mapped() should be
>> >> check max_low_pfn_mapped, max_pfn_mapped with
>> >> e820_all_mapped(start, end, E820_RAM).
>> >
>> > Hmm .. I guess that could work .. but what about EFI code that keys
>off of
>> > EFI memory map? Does the EFI code update e820 and mark as E820_RAM
>whatever
>> > ranges that it calls init_memory_mapping on (via efi_ioremap?)
>> 
>> they are converted to e820 memmap before init_memory_mapping is
>called.
>
>Yinghai, looking into this further on my EFI enabled machine, there is
>a
>memory range where:
>
>- e820 marks it as E820_RESERVED
>- EFI memory map marks it as EFI_RUNTIME_SERVICES_DATA
>
>During EFI init, the range is added (redundantly) to e820 as
>E820_RESERVED,
>but during efi_enter_virtual_mode, direct mappings are created for the
>range with a call to efi_ioremap.
>
>Another such region is EFI_RUNTIME_SERVICES_CODE which is marked as
>ACPI NVS.
>
>So these are not E820_RAM, but direct mapped by EFI code path .. what
>do
>we do here? I think we should just stick with keeping the pfn_mapped[]
>array .. no?
>
>-Jacob
>
>> 
>> Thanks
>> 
>> Yinghai
>> 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/5] x86: find_early_table_space based on memory ranges that are being mapped
  2012-08-24 23:55 ` [PATCH 2/5] x86: find_early_table_space based on memory ranges that are being mapped Jacob Shin
@ 2012-10-21 21:22   ` Tom Rini
  0 siblings, 0 replies; 29+ messages in thread
From: Tom Rini @ 2012-10-21 21:22 UTC (permalink / raw)
  To: Jacob Shin; +Cc: X86-ML, linux-kernel

On Fri, Aug 24, 2012 at 06:55:13PM -0500, Jacob Shin wrote:

> Current logic finds enough space for direct mapping page tables from 0
> to end. Instead, we only need to find enough space to cover mr[0].start
> to mr[nr_range].end -- the range that is actually being mapped by
> init_memory_mapping()
> 
> This patch also reportedly fixes suspend/resume issue reported in:
> 
> https://lkml.org/lkml/2012/8/11/83
> 
> Signed-off-by: Jacob Shin <jacob.shin@amd.com>

This fixes a bug I see that was introduced in 1bbbbe7

Tested-by: Tom Rini <trini@ti.com>

-- 
Tom

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-10-21 21:22 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-24 23:55 [PATCH V4 0/5] x86: Create direct mappings for E820_RAM only Jacob Shin
2012-08-24 23:55 ` [PATCH 1/5] x86: Move enabling of PSE and PGE out of init_memory_mapping Jacob Shin
2012-08-25  1:25   ` Yinghai Lu
2012-08-25  1:49     ` Yinghai Lu
2012-08-25  2:06       ` Yinghai Lu
2012-08-25  4:15         ` Jacob Shin
2012-08-25  4:13     ` Jacob Shin
2012-08-24 23:55 ` [PATCH 2/5] x86: find_early_table_space based on memory ranges that are being mapped Jacob Shin
2012-10-21 21:22   ` Tom Rini
2012-08-24 23:55 ` [PATCH 3/5] x86: Only direct map addresses that are marked as E820_RAM Jacob Shin
2012-08-25  0:17   ` Jacob Shin
2012-08-25  0:30   ` H. Peter Anvin
2012-08-25  0:49     ` Jacob Shin
2012-08-25  1:13       ` H. Peter Anvin
2012-08-25  4:20         ` Jacob Shin
2012-08-25  4:21           ` H. Peter Anvin
2012-08-27 19:17             ` Jacob Shin
2012-08-27 20:15               ` H. Peter Anvin
2012-08-25  1:07   ` Yinghai Lu
2012-08-25  4:24     ` Jacob Shin
2012-08-25  4:54       ` Yinghai Lu
2012-08-27 18:49         ` Jacob Shin
2012-08-27 20:16           ` H. Peter Anvin
2012-08-28 16:06         ` Jacob Shin
2012-08-28 16:11           ` H. Peter Anvin
2012-08-24 23:55 ` [PATCH 4/5] x86: Fixup code testing if a pfn is direct mapped Jacob Shin
2012-08-24 23:55 ` [PATCH 5/5] x86: if kernel .text .data .bss are not marked as E820_RAM, complain and fix Jacob Shin
2012-08-25  1:23   ` Yinghai Lu
2012-08-25  4:25     ` Jacob Shin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).