linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/4] x86: Create direct mappings for E820_RAM only
@ 2012-08-14 22:39 Jacob Shin
  2012-08-14 22:39 ` [PATCH 1/4] x86: Move enabling of PSE and PGE out of init_memory_mapping Jacob Shin
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Jacob Shin @ 2012-08-14 22:39 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal, Jacob Shin

Currently kernel direct mappings are created for all pfns between
[ 0 to max_low_pfn ) and [ 4GB to max_pfn ). When we introduce memory
holes, we end up mapping memory ranges that are not backed by physical
DRAM. This is fine for lower memory addresses which can be marked as UC
by fixed/variable range MTRRs, however we run in to trouble with high
addresses.

The following patchset creates direct mappings only for E820_RAM regions
between 0 ~ max_low_pfn and 4GB ~ max_pfn. And leaves non-E820_RAM and
memory holes unmapped.

This third revision of the patchset attempts to resolve comments and
concerns from the following threads:

* https://lkml.org/lkml/2012/8/13/512
* https://lkml.org/lkml/2012/8/9/536
* https://lkml.org/lkml/2011/10/20/323

Jacob Shin (4):
  x86: Move enabling of PSE and PGE out of init_memory_mapping
  x86: find_early_table_space based on memory ranges that are being
    mapped
  x86: Only direct map addresses that are marked as E820_RAM
  x86: Fixup code testing if a pfn is direct mapped

 arch/x86/include/asm/page_types.h |    9 +++
 arch/x86/kernel/cpu/amd.c         |    6 +-
 arch/x86/kernel/setup.c           |  121 +++++++++++++++++++++++++++++++------
 arch/x86/mm/init.c                |   74 +++++++++++------------
 arch/x86/mm/init_64.c             |    6 +-
 arch/x86/platform/efi/efi.c       |    8 +--
 6 files changed, 155 insertions(+), 69 deletions(-)

-- 
1.7.9.5



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] x86: Move enabling of PSE and PGE out of init_memory_mapping
  2012-08-14 22:39 [PATCH V3 0/4] x86: Create direct mappings for E820_RAM only Jacob Shin
@ 2012-08-14 22:39 ` Jacob Shin
  2012-08-14 22:39 ` [PATCH 2/4] x86: find_early_table_space based on memory ranges that are being mapped Jacob Shin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Jacob Shin @ 2012-08-14 22:39 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal, Jacob Shin

Depending on the platform, init_memory_mapping() may be called multiple
times. Move it out to setup_arch() to avoid writing to cr4 on every call.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/setup.c |   10 ++++++++++
 arch/x86/mm/init.c      |   10 ----------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f4b9b80..751e020 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -913,6 +913,16 @@ void __init setup_arch(char **cmdline_p)
 
 	init_gbpages();
 
+	/* Enable PSE if available */
+	if (cpu_has_pse)
+		set_in_cr4(X86_CR4_PSE);
+
+	/* Enable PGE if available */
+	if (cpu_has_pge) {
+		set_in_cr4(X86_CR4_PGE);
+		__supported_pte_mask |= _PAGE_GLOBAL;
+	}
+
 	/* max_pfn_mapped is updated here */
 	max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
 	max_pfn_mapped = max_low_pfn_mapped;
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e0e6990..2f07e09 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -149,16 +149,6 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 	use_gbpages = direct_gbpages;
 #endif
 
-	/* Enable PSE if available */
-	if (cpu_has_pse)
-		set_in_cr4(X86_CR4_PSE);
-
-	/* Enable PGE if available */
-	if (cpu_has_pge) {
-		set_in_cr4(X86_CR4_PGE);
-		__supported_pte_mask |= _PAGE_GLOBAL;
-	}
-
 	if (use_gbpages)
 		page_size_mask |= 1 << PG_LEVEL_1G;
 	if (use_pse)
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] x86: find_early_table_space based on memory ranges that are being mapped
  2012-08-14 22:39 [PATCH V3 0/4] x86: Create direct mappings for E820_RAM only Jacob Shin
  2012-08-14 22:39 ` [PATCH 1/4] x86: Move enabling of PSE and PGE out of init_memory_mapping Jacob Shin
@ 2012-08-14 22:39 ` Jacob Shin
  2012-08-14 22:39 ` [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM Jacob Shin
  2012-08-14 22:39 ` [PATCH 4/4] x86: Fixup code testing if a pfn is direct mapped Jacob Shin
  3 siblings, 0 replies; 13+ messages in thread
From: Jacob Shin @ 2012-08-14 22:39 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal, Jacob Shin

Current logic finds enough space for direct mapping page tables from 0
to end. Instead, we only need to find enough space to cover mr[0].start
to mr[nr_range].end -- the range that is actually being mapped by
init_memory_mapping()

This patch also reportedly fixes suspend/resume issue reported in:

https://lkml.org/lkml/2012/8/11/83

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/mm/init.c |   62 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 35 insertions(+), 27 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 2f07e09..e2b21e0 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -35,40 +35,48 @@ struct map_range {
 	unsigned page_size_mask;
 };
 
-static void __init find_early_table_space(struct map_range *mr, unsigned long end,
-					  int use_pse, int use_gbpages)
+/*
+ * First calculate space needed for kernel direct mapping page tables to cover
+ * mr[0].start to mr[nr_range - 1].end, while accounting for possible 2M and 1GB
+ * pages. Then find enough contiguous space for those page tables.
+ */
+static void __init find_early_table_space(struct map_range *mr, int nr_range)
 {
-	unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
+	int i;
+	unsigned long puds = 0, pmds = 0, ptes = 0, tables;
+	unsigned long start = 0, good_end;
 	phys_addr_t base;
 
-	puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
-	tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
-
-	if (use_gbpages) {
-		unsigned long extra;
-
-		extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
-		pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;
-	} else
-		pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
+	for (i = 0; i < nr_range; i++) {
+		unsigned long range, extra;
 
-	tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
+		range = mr[i].end - mr[i].start;
+		puds += (range + PUD_SIZE - 1) >> PUD_SHIFT;
 
-	if (use_pse) {
-		unsigned long extra;
+		if (mr[i].page_size_mask & (1 << PG_LEVEL_1G)) {
+			extra = range - ((range >> PUD_SHIFT) << PUD_SHIFT);
+			pmds += (extra + PMD_SIZE - 1) >> PMD_SHIFT;
+		} else {
+			pmds += (range + PMD_SIZE - 1) >> PMD_SHIFT;
+		}
 
-		extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
+		if (mr[i].page_size_mask & (1 << PG_LEVEL_2M)) {
+			extra = range - ((range >> PMD_SHIFT) << PMD_SHIFT);
 #ifdef CONFIG_X86_32
-		extra += PMD_SIZE;
+			extra += PMD_SIZE;
 #endif
-		/* The first 2/4M doesn't use large pages. */
-		if (mr->start < PMD_SIZE)
-			extra += mr->end - mr->start;
-
-		ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	} else
-		ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
+			/* The first 2/4M doesn't use large pages. */
+			if (mr[i].start < PMD_SIZE)
+				extra += range;
+
+			ptes += (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		} else {
+			ptes += (range + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		}
+	}
 
+	tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
+	tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
 	tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);
 
 #ifdef CONFIG_X86_32
@@ -86,7 +94,7 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
 	pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);
 
 	printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx]\n",
-		end - 1, pgt_buf_start << PAGE_SHIFT,
+		mr[nr_range - 1].end - 1, pgt_buf_start << PAGE_SHIFT,
 		(pgt_buf_top << PAGE_SHIFT) - 1);
 }
 
@@ -257,7 +265,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 	 * nodes are discovered.
 	 */
 	if (!after_bootmem)
-		find_early_table_space(&mr[0], end, use_pse, use_gbpages);
+		find_early_table_space(mr, nr_range);
 
 	for (i = 0; i < nr_range; i++)
 		ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-14 22:39 [PATCH V3 0/4] x86: Create direct mappings for E820_RAM only Jacob Shin
  2012-08-14 22:39 ` [PATCH 1/4] x86: Move enabling of PSE and PGE out of init_memory_mapping Jacob Shin
  2012-08-14 22:39 ` [PATCH 2/4] x86: find_early_table_space based on memory ranges that are being mapped Jacob Shin
@ 2012-08-14 22:39 ` Jacob Shin
  2012-08-22 23:30   ` H. Peter Anvin
  2012-08-14 22:39 ` [PATCH 4/4] x86: Fixup code testing if a pfn is direct mapped Jacob Shin
  3 siblings, 1 reply; 13+ messages in thread
From: Jacob Shin @ 2012-08-14 22:39 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal, Jacob Shin

Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
backed by actual DRAM. This is fine for holes under 4GB which are covered
by fixed and variable range MTRRs to be UC. However, we run into trouble
on higher memory addresses which cannot be covered by MTRRs.

Our system with 1TB of RAM has an e820 that looks like this:

 BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
 BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
 BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
 BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
 BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
 BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
 BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
 BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
 BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
 BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
 BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
 BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
 BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable

and so direct mappings are created for huge memory hole between
0x000000e038000000 to 0x0000010000000000. Even though the kernel never
generates memory accesses in that region, since the page tables mark
them incorrectly as being WB, our (AMD) processor ends up causing a MCE
while doing some memory bookkeeping/optimizations around that area.

This patch iterates through e820 and only direct maps ranges that are
marked as E820_RAM, and keeps track of those pfn ranges. Depending on
the alignment of E820 ranges, this may possibly result in using smaller
size (i.e. 4K instead of 2M or 1G) page tables.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/include/asm/page_types.h |    9 +++
 arch/x86/kernel/setup.c           |  131 +++++++++++++++++++++++++++++--------
 arch/x86/mm/init.c                |    2 +
 arch/x86/mm/init_64.c             |    6 +-
 4 files changed, 115 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index e21fdd1..409047a 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -3,6 +3,7 @@
 
 #include <linux/const.h>
 #include <linux/types.h>
+#include <asm/e820.h>
 
 /* PAGE_SHIFT determines the page size */
 #define PAGE_SHIFT	12
@@ -40,12 +41,20 @@
 #endif	/* CONFIG_X86_64 */
 
 #ifndef __ASSEMBLY__
+#include <linux/range.h>
 
 extern int devmem_is_allowed(unsigned long pagenr);
 
 extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
+extern struct range pfn_mapped[E820_X_MAX];
+extern int nr_pfn_mapped;
+
+extern void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn);
+extern bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
+extern bool pfn_is_mapped(unsigned long pfn);
+
 static inline phys_addr_t get_max_mapped(void)
 {
 	return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 751e020..cba080e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -115,13 +115,46 @@
 #include <asm/prom.h>
 
 /*
- * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
- * The direct mapping extends to max_pfn_mapped, so that we can directly access
- * apertures, ACPI and other tables without having to play with fixmaps.
+ * max_low_pfn_mapped: highest direct mapped pfn under 4GB
+ * max_pfn_mapped:     highest direct mapped pfn over 4GB
+ *
+ * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
+ * represented by pfn_mapped
  */
 unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
+struct range pfn_mapped[E820_X_MAX];
+int nr_pfn_mapped;
+
+void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
+{
+	nr_pfn_mapped = add_range_with_merge(pfn_mapped, E820_X_MAX,
+					     nr_pfn_mapped, start_pfn, end_pfn);
+
+	max_pfn_mapped = max(max_pfn_mapped, end_pfn);
+
+	if (end_pfn <= (1UL << (32 - PAGE_SHIFT)))
+		max_low_pfn_mapped = max(max_low_pfn_mapped, end_pfn);
+}
+
+bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
+{
+	int i;
+
+	for (i = 0; i < nr_pfn_mapped; i++)
+		if ((start_pfn >= pfn_mapped[i].start) &&
+		    (end_pfn <= pfn_mapped[i].end))
+			return true;
+
+	return false;
+}
+
+bool pfn_is_mapped(unsigned long pfn)
+{
+	return pfn_range_is_mapped(pfn, pfn + 1);
+}
+
 #ifdef CONFIG_DMI
 RESERVE_BRK(dmi_alloc, 65536);
 #endif
@@ -296,6 +329,71 @@ static void __init cleanup_highmap(void)
 }
 #endif
 
+/*
+ * Iterate through E820 memory map and create direct mappings for only E820_RAM
+ * regions. We cannot simply create direct mappings for all pfns from
+ * [0 to max_low_pfn) and [4GB to max_pfn) because of possible memory holes in
+ * high addresses that cannot be marked as UC by fixed/variable range MTRRs.
+ * Depending on the alignment of E820 ranges, this may possibly result in using
+ * smaller size (i.e. 4K instead of 2M or 1G) page tables.
+ */
+static void __init init_memory(void)
+{
+	int i;
+	unsigned long init_pfn = max_pfn_mapped;
+
+	/* add initial memory already mapped by early boot process */
+	add_pfn_range_mapped(0, init_pfn);
+
+	pr_info("initial memory mapped: [mem 0x00000000-%#010lx]\n",
+			(init_pfn<<PAGE_SHIFT) - 1);
+
+	init_gbpages();
+
+	/* Enable PSE if available */
+	if (cpu_has_pse)
+		set_in_cr4(X86_CR4_PSE);
+
+	/* Enable PGE if available */
+	if (cpu_has_pge) {
+		set_in_cr4(X86_CR4_PGE);
+		__supported_pte_mask |= _PAGE_GLOBAL;
+	}
+
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		u64 start = ei->addr;
+		u64 end = ei->addr + ei->size;
+
+		/* we only map E820_RAM */
+		if (ei->type != E820_RAM)
+			continue;
+
+		/* skip over initial memory mapped */
+		if ((end >> PAGE_SHIFT) <= init_pfn)
+			continue;
+
+		if ((start >> PAGE_SHIFT) < init_pfn)
+			start = init_pfn << PAGE_SHIFT;
+#ifdef CONFIG_X86_32
+		/* on 32 bit, we only map up to max_low_pfn */
+		if ((start >> PAGE_SHIFT) >= max_low_pfn)
+			continue;
+
+		if ((end >> PAGE_SHIFT) > max_low_pfn)
+			end = max_low_pfn << PAGE_SHIFT;
+#endif
+		init_memory_mapping(start, end);
+	}
+
+#ifdef CONFIG_X86_64
+	if (max_pfn > max_low_pfn) {
+		/* can we preseve max_low_pfn ?*/
+		max_low_pfn = max_pfn;
+	}
+#endif
+}
+
 static void __init reserve_brk(void)
 {
 	if (_brk_end > _brk_start)
@@ -906,35 +1004,10 @@ void __init setup_arch(char **cmdline_p)
 	setup_bios_corruption_check();
 #endif
 
-	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
-			(max_pfn_mapped<<PAGE_SHIFT) - 1);
-
 	setup_real_mode();
 
-	init_gbpages();
+	init_memory();
 
-	/* Enable PSE if available */
-	if (cpu_has_pse)
-		set_in_cr4(X86_CR4_PSE);
-
-	/* Enable PGE if available */
-	if (cpu_has_pge) {
-		set_in_cr4(X86_CR4_PGE);
-		__supported_pte_mask |= _PAGE_GLOBAL;
-	}
-
-	/* max_pfn_mapped is updated here */
-	max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
-	max_pfn_mapped = max_low_pfn_mapped;
-
-#ifdef CONFIG_X86_64
-	if (max_pfn > max_low_pfn) {
-		max_pfn_mapped = init_memory_mapping(1UL<<32,
-						     max_pfn<<PAGE_SHIFT);
-		/* can we preseve max_low_pfn ?*/
-		max_low_pfn = max_pfn;
-	}
-#endif
 	memblock.current_limit = get_max_mapped();
 	dma_contiguous_reserve(0);
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e2b21e0..d7b24da 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -301,6 +301,8 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 	if (!after_bootmem)
 		early_memtest(start, end);
 
+	add_pfn_range_mapped(start >> PAGE_SHIFT, ret >> PAGE_SHIFT);
+
 	return ret >> PAGE_SHIFT;
 }
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 2b6b4a3..ab558eb 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -657,13 +657,11 @@ int arch_add_memory(int nid, u64 start, u64 size)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
 	struct zone *zone = pgdat->node_zones + ZONE_NORMAL;
-	unsigned long last_mapped_pfn, start_pfn = start >> PAGE_SHIFT;
+	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	last_mapped_pfn = init_memory_mapping(start, start + size);
-	if (last_mapped_pfn > max_pfn_mapped)
-		max_pfn_mapped = last_mapped_pfn;
+	init_memory_mapping(start, start + size);
 
 	ret = __add_pages(nid, zone, start_pfn, nr_pages);
 	WARN_ON_ONCE(ret);
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] x86: Fixup code testing if a pfn is direct mapped
  2012-08-14 22:39 [PATCH V3 0/4] x86: Create direct mappings for E820_RAM only Jacob Shin
                   ` (2 preceding siblings ...)
  2012-08-14 22:39 ` [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM Jacob Shin
@ 2012-08-14 22:39 ` Jacob Shin
  3 siblings, 0 replies; 13+ messages in thread
From: Jacob Shin @ 2012-08-14 22:39 UTC (permalink / raw)
  To: X86-ML
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal, Jacob Shin

Update code that previously assumed pfns [ 0 - max_low_pfn_mapped ) and
[ 4GB - max_pfn_mapped ) were always direct mapped, to now look up
pfn_mapped ranges instead.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/cpu/amd.c   |    6 +-----
 arch/x86/platform/efi/efi.c |    8 ++++----
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 9d92e19..554ccfc 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -677,11 +677,7 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
 		 */
 		if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) {
 			printk(KERN_DEBUG "tseg: %010llx\n", tseg);
-			if ((tseg>>PMD_SHIFT) <
-				(max_low_pfn_mapped>>(PMD_SHIFT-PAGE_SHIFT)) ||
-				((tseg>>PMD_SHIFT) <
-				(max_pfn_mapped>>(PMD_SHIFT-PAGE_SHIFT)) &&
-				(tseg>>PMD_SHIFT) >= (1ULL<<(32 - PMD_SHIFT))))
+			if (pfn_is_mapped(tseg))
 				set_memory_4k((unsigned long)__va(tseg), 1);
 		}
 	}
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 2dc29f5..4810ab3 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -754,7 +754,7 @@ void __init efi_enter_virtual_mode(void)
 	efi_memory_desc_t *md, *prev_md = NULL;
 	efi_status_t status;
 	unsigned long size;
-	u64 end, systab, addr, npages, end_pfn;
+	u64 end, systab, addr, npages, start_pfn, end_pfn;
 	void *p, *va, *new_memmap = NULL;
 	int count = 0;
 
@@ -805,10 +805,10 @@ void __init efi_enter_virtual_mode(void)
 		size = md->num_pages << EFI_PAGE_SHIFT;
 		end = md->phys_addr + size;
 
+		start_pfn = PFN_DOWN(md->phys_addr);
 		end_pfn = PFN_UP(end);
-		if (end_pfn <= max_low_pfn_mapped
-		    || (end_pfn > (1UL << (32 - PAGE_SHIFT))
-			&& end_pfn <= max_pfn_mapped))
+
+		if (pfn_range_is_mapped(start_pfn, end_pfn))
 			va = __va(md->phys_addr);
 		else
 			va = efi_ioremap(md->phys_addr, size, md->type);
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-14 22:39 ` [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM Jacob Shin
@ 2012-08-22 23:30   ` H. Peter Anvin
  2012-08-23 14:50     ` Jacob Shin
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2012-08-22 23:30 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, Yinghai Lu, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal

On 08/14/2012 03:39 PM, Jacob Shin wrote:
> Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
> and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
> backed by actual DRAM. This is fine for holes under 4GB which are covered
> by fixed and variable range MTRRs to be UC. However, we run into trouble
> on higher memory addresses which cannot be covered by MTRRs.
> 
> Our system with 1TB of RAM has an e820 that looks like this:
> 
>  BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
>  BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
>  BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
>  BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
>  BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
>  BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
>  BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
>  BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
>  BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
>  BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
>  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
>  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
>  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> 
> and so direct mappings are created for huge memory hole between
> 0x000000e038000000 to 0x0000010000000000. Even though the kernel never
> generates memory accesses in that region, since the page tables mark
> them incorrectly as being WB, our (AMD) processor ends up causing a MCE
> while doing some memory bookkeeping/optimizations around that area.
> 
> This patch iterates through e820 and only direct maps ranges that are
> marked as E820_RAM, and keeps track of those pfn ranges. Depending on
> the alignment of E820 ranges, this may possibly result in using smaller
> size (i.e. 4K instead of 2M or 1G) page tables.
> 
> Signed-off-by: Jacob Shin <jacob.shin@amd.com>

I have one concern with this, which is that it leaves in place mapping
below the initial max_pfn_mapped.  Although that neatly resolves the
legacy area (0-1 MiB) issues, it really isn't right above the 1 MiB
point.  Any way I could get you to seek out and unmap any such ranges?
We have already seen some Dell machines which put memory holes in low
RAM, and perhaps there are still some machines out there with an I/O
hole at 15 MiB.

	-hpa


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-22 23:30   ` H. Peter Anvin
@ 2012-08-23 14:50     ` Jacob Shin
  2012-08-23 15:39       ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Jacob Shin @ 2012-08-23 14:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: X86-ML, LKML, Yinghai Lu, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal

On Wed, Aug 22, 2012 at 04:30:49PM -0700, H. Peter Anvin wrote:
> On 08/14/2012 03:39 PM, Jacob Shin wrote:
> > Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
> > and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
> > backed by actual DRAM. This is fine for holes under 4GB which are covered
> > by fixed and variable range MTRRs to be UC. However, we run into trouble
> > on higher memory addresses which cannot be covered by MTRRs.
> > 
> > Our system with 1TB of RAM has an e820 that looks like this:
> > 
> >  BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
> >  BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
> >  BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
> >  BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
> >  BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
> >  BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
> >  BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
> >  BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
> >  BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
> >  BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
> >  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
> >  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
> >  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> > 
> > and so direct mappings are created for huge memory hole between
> > 0x000000e038000000 to 0x0000010000000000. Even though the kernel never
> > generates memory accesses in that region, since the page tables mark
> > them incorrectly as being WB, our (AMD) processor ends up causing a MCE
> > while doing some memory bookkeeping/optimizations around that area.
> > 
> > This patch iterates through e820 and only direct maps ranges that are
> > marked as E820_RAM, and keeps track of those pfn ranges. Depending on
> > the alignment of E820 ranges, this may possibly result in using smaller
> > size (i.e. 4K instead of 2M or 1G) page tables.
> > 
> > Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> 
> I have one concern with this, which is that it leaves in place mapping
> below the initial max_pfn_mapped.  Although that neatly resolves the
> legacy area (0-1 MiB) issues, it really isn't right above the 1 MiB
> point.  Any way I could get you to seek out and unmap any such ranges?
> We have already seen some Dell machines which put memory holes in low
> RAM, and perhaps there are still some machines out there with an I/O
> hole at 15 MiB.

So I believe in V2 of the patchset this was done, however, Dave Young
from redhat reported that it broke their KVM guest with a user supplied
memory map that looked like this:

>> [    0.000000] e820: user-defined physical RAM map:
>> [    0.000000] user: [mem 0x0000000000010000-0x000000000009dbff] usable
>> [    0.000000] user: [mem 0x0000000024000000-0x0000000033f6bfff] usable

And looking into that scenario, the early boot code seems to allocates
space for fixmap right under initial max_pfn_mapped, which is no longer
direct mapped with my patch, and that seems to cause problems for later
APIC code that initializes APIC base address into the fixmap area.

So I guess to address your concern, we need to go back to V2 and try to
resolve the fixmap problem with user supplied memory map that reserves
memory below initial max_pfn_mapped ?

> 
> 	-hpa
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-23 14:50     ` Jacob Shin
@ 2012-08-23 15:39       ` H. Peter Anvin
  2012-08-23 20:12         ` Konrad Rzeszutek Wilk
  2012-08-23 22:35         ` Jacob Shin
  0 siblings, 2 replies; 13+ messages in thread
From: H. Peter Anvin @ 2012-08-23 15:39 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, Yinghai Lu, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal

On 08/23/2012 07:50 AM, Jacob Shin wrote:
>>
>> I have one concern with this, which is that it leaves in place mapping
>> below the initial max_pfn_mapped.  Although that neatly resolves the
>> legacy area (0-1 MiB) issues, it really isn't right above the 1 MiB
>> point.  Any way I could get you to seek out and unmap any such ranges?
>> We have already seen some Dell machines which put memory holes in low
>> RAM, and perhaps there are still some machines out there with an I/O
>> hole at 15 MiB.
>
> So I believe in V2 of the patchset this was done, however, Dave Young
> from redhat reported that it broke their KVM guest with a user supplied
> memory map that looked like this:
>
>>> [    0.000000] e820: user-defined physical RAM map:
>>> [    0.000000] user: [mem 0x0000000000010000-0x000000000009dbff] usable
>>> [    0.000000] user: [mem 0x0000000024000000-0x0000000033f6bfff] usable
>
> And looking into that scenario, the early boot code seems to allocates
> space for fixmap right under initial max_pfn_mapped, which is no longer
> direct mapped with my patch, and that seems to cause problems for later
> APIC code that initializes APIC base address into the fixmap area.
>
> So I guess to address your concern, we need to go back to V2 and try to
> resolve the fixmap problem with user supplied memory map that reserves
> memory below initial max_pfn_mapped ?
>

Okay... I think I need to grok that a bit better.  For memory 
allocations, we probably should just use brk allocations, for virtual 
space allocations it is called the fixmap for a reason (even though the 
Xen people managed to break that on 32 bits, sigh!)

I guess I need to go back and look at David's bug report...

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-23 15:39       ` H. Peter Anvin
@ 2012-08-23 20:12         ` Konrad Rzeszutek Wilk
  2012-08-23 21:23           ` H. Peter Anvin
  2012-08-23 22:35         ` Jacob Shin
  1 sibling, 1 reply; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-23 20:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jacob Shin, X86-ML, LKML, Yinghai Lu, Andreas Herrmann,
	Tejun Heo, Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal

On Thu, Aug 23, 2012 at 08:39:10AM -0700, H. Peter Anvin wrote:
> On 08/23/2012 07:50 AM, Jacob Shin wrote:
> >>
> >>I have one concern with this, which is that it leaves in place mapping
> >>below the initial max_pfn_mapped.  Although that neatly resolves the
> >>legacy area (0-1 MiB) issues, it really isn't right above the 1 MiB
> >>point.  Any way I could get you to seek out and unmap any such ranges?
> >>We have already seen some Dell machines which put memory holes in low
> >>RAM, and perhaps there are still some machines out there with an I/O
> >>hole at 15 MiB.
> >
> >So I believe in V2 of the patchset this was done, however, Dave Young
> >from redhat reported that it broke their KVM guest with a user supplied
> >memory map that looked like this:
> >
> >>>[    0.000000] e820: user-defined physical RAM map:
> >>>[    0.000000] user: [mem 0x0000000000010000-0x000000000009dbff] usable
> >>>[    0.000000] user: [mem 0x0000000024000000-0x0000000033f6bfff] usable
> >
> >And looking into that scenario, the early boot code seems to allocates
> >space for fixmap right under initial max_pfn_mapped, which is no longer
> >direct mapped with my patch, and that seems to cause problems for later
> >APIC code that initializes APIC base address into the fixmap area.
> >
> >So I guess to address your concern, we need to go back to V2 and try to
> >resolve the fixmap problem with user supplied memory map that reserves
> >memory below initial max_pfn_mapped ?
> >
> 
> Okay... I think I need to grok that a bit better.  For memory
> allocations, we probably should just use brk allocations, for
> virtual space allocations it is called the fixmap for a reason (even
> though the Xen people managed to break that on 32 bits, sigh!)

Can you rope me in on that? Was that added ooh years ago ?
> 
> I guess I need to go back and look at David's bug report...
> 
> 	-hpa
> 
> 
> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-23 20:12         ` Konrad Rzeszutek Wilk
@ 2012-08-23 21:23           ` H. Peter Anvin
  0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2012-08-23 21:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jacob Shin, X86-ML, LKML, Yinghai Lu, Andreas Herrmann,
	Tejun Heo, Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal

On 08/23/2012 01:12 PM, Konrad Rzeszutek Wilk wrote:
>>
>> Okay... I think I need to grok that a bit better.  For memory
>> allocations, we probably should just use brk allocations, for
>> virtual space allocations it is called the fixmap for a reason (even
>> though the Xen people managed to break that on 32 bits, sigh!)
> 
> Can you rope me in on that? Was that added ooh years ago ?

Yeah, it dates back to the original paravirt work.

The issue is that Xen likes to live at the top of the virtual address
space, which is where the fixmap lives.  Suddenly the fixmap is no
longer at a fixed address, which was sort of the whole idea.

The problem is that on i386 there are very few unmovable boundaries left
in the virtual address space.  The only other one left, really, is at
the very beginning of kernel space.

At this point people's appetite for large changes on 32 bits is limited,
of course.

	-hpa



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-23 15:39       ` H. Peter Anvin
  2012-08-23 20:12         ` Konrad Rzeszutek Wilk
@ 2012-08-23 22:35         ` Jacob Shin
  2012-08-23 23:46           ` Yinghai Lu
  2012-08-23 23:53           ` H. Peter Anvin
  1 sibling, 2 replies; 13+ messages in thread
From: Jacob Shin @ 2012-08-23 22:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: X86-ML, LKML, Yinghai Lu, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal

On Thu, Aug 23, 2012 at 08:39:10AM -0700, H. Peter Anvin wrote:
> On 08/23/2012 07:50 AM, Jacob Shin wrote:
> >>
> >>I have one concern with this, which is that it leaves in place mapping
> >>below the initial max_pfn_mapped.  Although that neatly resolves the
> >>legacy area (0-1 MiB) issues, it really isn't right above the 1 MiB
> >>point.  Any way I could get you to seek out and unmap any such ranges?
> >>We have already seen some Dell machines which put memory holes in low
> >>RAM, and perhaps there are still some machines out there with an I/O
> >>hole at 15 MiB.
> >
> >So I believe in V2 of the patchset this was done, however, Dave Young
> >from redhat reported that it broke their KVM guest with a user supplied
> >memory map that looked like this:
> >
> >>>[    0.000000] e820: user-defined physical RAM map:
> >>>[    0.000000] user: [mem 0x0000000000010000-0x000000000009dbff] usable
> >>>[    0.000000] user: [mem 0x0000000024000000-0x0000000033f6bfff] usable
> >

I looked into this a bit more, and I think what's happening is that this
user defined memory map leaves out the region where the kernel is loaded on
to during the boot process. The kernel and the direct mapped page tables up
to initial max_pfn_mapped reside somwhere under 512M (KERNEL_IMAGE_SIZE),
I guess it depends on how big your uncompressed kernel is.

And at the first attempt to set_fixmap_nocache(FIX_APIC_BASE, address) in
arch/x86/apic/apic.c: register_lapic_address runs into badness because the
memory region where the initial page tables live is no longer mapped
because of the above user supplied memory map.

So I guess there is a disconnect between really early code that seems to
rely on the boot loader as to where in physical memory it resides and its
initial page tables live, and the later memory initialization code where
it looks at the E820 (and here user can interject their own memory map
using the command line arguments)

Not really sure how to handle this case .. any advice?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-23 22:35         ` Jacob Shin
@ 2012-08-23 23:46           ` Yinghai Lu
  2012-08-23 23:53           ` H. Peter Anvin
  1 sibling, 0 replies; 13+ messages in thread
From: Yinghai Lu @ 2012-08-23 23:46 UTC (permalink / raw)
  To: Jacob Shin
  Cc: H. Peter Anvin, X86-ML, LKML, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal

On Thu, Aug 23, 2012 at 3:35 PM, Jacob Shin <jacob.shin@amd.com> wrote:
>> >So I believe in V2 of the patchset this was done, however, Dave Young
>> >from redhat reported that it broke their KVM guest with a user supplied
>> >memory map that looked like this:
>> >
>> >>>[    0.000000] e820: user-defined physical RAM map:
>> >>>[    0.000000] user: [mem 0x0000000000010000-0x000000000009dbff] usable
>> >>>[    0.000000] user: [mem 0x0000000024000000-0x0000000033f6bfff] usable
>> >
>
> I looked into this a bit more, and I think what's happening is that this
> user defined memory map leaves out the region where the kernel is loaded on
> to during the boot process. The kernel and the direct mapped page tables up
> to initial max_pfn_mapped reside somwhere under 512M (KERNEL_IMAGE_SIZE),
> I guess it depends on how big your uncompressed kernel is.
>
> And at the first attempt to set_fixmap_nocache(FIX_APIC_BASE, address) in
> arch/x86/apic/apic.c: register_lapic_address runs into badness because the
> memory region where the initial page tables live is no longer mapped
> because of the above user supplied memory map.
>
> So I guess there is a disconnect between really early code that seems to
> rely on the boot loader as to where in physical memory it resides and its
> initial page tables live, and the later memory initialization code where
> it looks at the E820 (and here user can interject their own memory map
> using the command line arguments)
>
> Not really sure how to handle this case .. any advice?

Maybe you can just put back kernel range to E820 as RAM?

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f4b9b80..e1a1f28 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -830,6 +830,9 @@ void __init setup_arch(char **cmdline_p)
        insert_resource(&iomem_resource, &code_resource);
        insert_resource(&iomem_resource, &data_resource);
        insert_resource(&iomem_resource, &bss_resource);
+       e820_add_range(code_resource.start, code_resource.end -
code_resource.start + 1, E820_RAM);
+       e820_add_range(data_resource.start, data_resource.end -
data_resource.start + 1, E820_RAM);
+       e820_add_range(bss_resource.start, bss_resource.end -
bss_resource.start + 1, E820_RAM);

        trim_bios_range();
 #ifdef CONFIG_X86_32

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM
  2012-08-23 22:35         ` Jacob Shin
  2012-08-23 23:46           ` Yinghai Lu
@ 2012-08-23 23:53           ` H. Peter Anvin
  1 sibling, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2012-08-23 23:53 UTC (permalink / raw)
  To: Jacob Shin
  Cc: X86-ML, LKML, Yinghai Lu, Andreas Herrmann, Tejun Heo,
	Borislav Petkov, Dave Young, Chao Wang, Vivek Goyal

On 08/23/2012 03:35 PM, Jacob Shin wrote:
> 
> I looked into this a bit more, and I think what's happening is that this
> user defined memory map leaves out the region where the kernel is loaded on
> to during the boot process. The kernel and the direct mapped page tables up
> to initial max_pfn_mapped reside somwhere under 512M (KERNEL_IMAGE_SIZE),
> I guess it depends on how big your uncompressed kernel is.
> 
> And at the first attempt to set_fixmap_nocache(FIX_APIC_BASE, address) in
> arch/x86/apic/apic.c: register_lapic_address runs into badness because the
> memory region where the initial page tables live is no longer mapped
> because of the above user supplied memory map.
> 
> So I guess there is a disconnect between really early code that seems to
> rely on the boot loader as to where in physical memory it resides and its
> initial page tables live, and the later memory initialization code where
> it looks at the E820 (and here user can interject their own memory map
> using the command line arguments)
> 
> Not really sure how to handle this case .. any advice?
> 

We have two options: one scream really loud and die, assuming the
bootloader actually loaded us on top of non-memory and we're going to
die anyway; or scream really loud but try to continue (i.e. override the
memory type).  I would suggest doing the latter in the near term, and
shift to the former a bit further down the line.

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-08-23 23:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-14 22:39 [PATCH V3 0/4] x86: Create direct mappings for E820_RAM only Jacob Shin
2012-08-14 22:39 ` [PATCH 1/4] x86: Move enabling of PSE and PGE out of init_memory_mapping Jacob Shin
2012-08-14 22:39 ` [PATCH 2/4] x86: find_early_table_space based on memory ranges that are being mapped Jacob Shin
2012-08-14 22:39 ` [PATCH 3/4] x86: Only direct map addresses that are marked as E820_RAM Jacob Shin
2012-08-22 23:30   ` H. Peter Anvin
2012-08-23 14:50     ` Jacob Shin
2012-08-23 15:39       ` H. Peter Anvin
2012-08-23 20:12         ` Konrad Rzeszutek Wilk
2012-08-23 21:23           ` H. Peter Anvin
2012-08-23 22:35         ` Jacob Shin
2012-08-23 23:46           ` Yinghai Lu
2012-08-23 23:53           ` H. Peter Anvin
2012-08-14 22:39 ` [PATCH 4/4] x86: Fixup code testing if a pfn is direct mapped Jacob Shin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).