All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
@ 2011-10-20 21:15 Jacob Shin
  2011-10-20 21:28 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Jacob Shin @ 2011-10-20 21:15 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Yinghai Lu, Andreas Herrmann, x86, linux-kernel, Jacob Shin

On systems with very large memory (1 TB in our case), BIOS may report a
reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
these from the direct mapping.

Cc: stable@kernel.org   # > 2.6.32
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/setup.c |   17 +++++++++++++++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index afaf384..af19a61 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -937,8 +937,21 @@ void __init setup_arch(char **cmdline_p)
 
 #ifdef CONFIG_X86_64
 	if (max_pfn > max_low_pfn) {
-		max_pfn_mapped = init_memory_mapping(1UL<<32,
-						     max_pfn<<PAGE_SHIFT);
+		int i;
+		for (i = 0; i < e820.nr_map; i++) {
+			struct e820entry *ei = &e820.map[i];
+
+			if (ei->addr + ei->size <= 1UL << 32)
+				continue;
+
+			if (ei->type == E820_RESERVED)
+				continue;
+
+			max_pfn_mapped = init_memory_mapping(
+				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
+				ei->addr + ei->size);
+		}
+
 		/* can we preseve max_low_pfn ?*/
 		max_low_pfn = max_pfn;
 	}
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-10-20 21:15 [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping Jacob Shin
@ 2011-10-20 21:28 ` Andi Kleen
  2011-10-20 21:30   ` H. Peter Anvin
  2011-10-20 22:20 ` H. Peter Anvin
  2012-10-17 19:17 ` [tip:x86/urgent] " tip-bot for Jacob Shin
  2 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2011-10-20 21:28 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Yinghai Lu,
	Andreas Herrmann, x86, linux-kernel

Jacob Shin <jacob.shin@amd.com> writes:

> On systems with very large memory (1 TB in our case), BIOS may report a
> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> these from the direct mapping.

This doesn't make much sense.  Holes above 4GB are completely legal.

If you need to workaround a specific broken BIOS you would need a quirk
only matching that system, with a suitable "BIOS is broken" message.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-10-20 21:28 ` Andi Kleen
@ 2011-10-20 21:30   ` H. Peter Anvin
  2011-10-20 21:32     ` H. Peter Anvin
  2011-10-20 22:10     ` Jacob Shin
  0 siblings, 2 replies; 18+ messages in thread
From: H. Peter Anvin @ 2011-10-20 21:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jacob Shin, Thomas Gleixner, Ingo Molnar, Yinghai Lu,
	Andreas Herrmann, x86, linux-kernel

On 10/20/2011 02:28 PM, Andi Kleen wrote:
> Jacob Shin <jacob.shin@amd.com> writes:
> 
>> On systems with very large memory (1 TB in our case), BIOS may report a
>> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
>> these from the direct mapping.
> 
> This doesn't make much sense.  Holes above 4GB are completely legal.
> 
> If you need to workaround a specific broken BIOS you would need a quirk
> only matching that system, with a suitable "BIOS is broken" message.
> 

The problem is that apparently right now we map those unconditionally
into the 1:1 map and mark them cacheable in PAT, which we *don't* for
the < 4 GiB map.

This thus makes the behavior match < 4 GiB, which is the correct
behavior; this should be made clear in the patch description.

	-hpa

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-10-20 21:30   ` H. Peter Anvin
@ 2011-10-20 21:32     ` H. Peter Anvin
  2011-10-20 22:10     ` Jacob Shin
  1 sibling, 0 replies; 18+ messages in thread
From: H. Peter Anvin @ 2011-10-20 21:32 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jacob Shin, Thomas Gleixner, Ingo Molnar, Yinghai Lu,
	Andreas Herrmann, x86, linux-kernel

On 10/20/2011 02:30 PM, H. Peter Anvin wrote:
> On 10/20/2011 02:28 PM, Andi Kleen wrote:
>> Jacob Shin <jacob.shin@amd.com> writes:
>>
>>> On systems with very large memory (1 TB in our case), BIOS may report a
>>> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
>>> these from the direct mapping.
>>
>> This doesn't make much sense.  Holes above 4GB are completely legal.
>>
>> If you need to workaround a specific broken BIOS you would need a quirk
>> only matching that system, with a suitable "BIOS is broken" message.
>>
> 
> The problem is that apparently right now we map those unconditionally
> into the 1:1 map and mark them cacheable in PAT, which we *don't* for
> the < 4 GiB map.
> 
> This thus makes the behavior match < 4 GiB, which is the correct
> behavior; this should be made clear in the patch description.
> 

Specifically, it's fine for them to be mapped; it's not fine for them to
be mapped cacheable.

	-hpa


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-10-20 21:30   ` H. Peter Anvin
  2011-10-20 21:32     ` H. Peter Anvin
@ 2011-10-20 22:10     ` Jacob Shin
  2011-10-20 22:43       ` H. Peter Anvin
  1 sibling, 1 reply; 18+ messages in thread
From: Jacob Shin @ 2011-10-20 22:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andi Kleen, Thomas Gleixner, Ingo Molnar, Yinghai Lu, Herrmann3,
	Andreas, x86, linux-kernel

On Thu, 2011-10-20 at 16:30 -0500, H. Peter Anvin wrote:
> On 10/20/2011 02:28 PM, Andi Kleen wrote:
> > Jacob Shin <jacob.shin@amd.com> writes:
> > 
> >> On systems with very large memory (1 TB in our case), BIOS may report a
> >> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> >> these from the direct mapping.
> > 
> > This doesn't make much sense.  Holes above 4GB are completely legal.
> > 
> > If you need to workaround a specific broken BIOS you would need a quirk
> > only matching that system, with a suitable "BIOS is broken" message.
> > 
> 
> The problem is that apparently right now we map those unconditionally
> into the 1:1 map and mark them cacheable in PAT, which we *don't* for
> the < 4 GiB map.
> 
> This thus makes the behavior match < 4 GiB, which is the correct
> behavior; this should be made clear in the patch description.

Will something like:

"On systems with very large memory (1 TB in our case), BIOS may report a
reserved region or a hole in the E820 map, even above the 4 GB range.
Avoid mapping them unconditionally into kernel 1:1 direct mapping as
cacheable memory, as we also already do for the MMIO hole under 4 GB."

Work?

Otherwise, does the patch look acceptable?

Thanks!

-Jacob Shin
Advanced Micro Devices, Inc.

> 
> 	-hpa
> 




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-10-20 21:15 [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping Jacob Shin
  2011-10-20 21:28 ` Andi Kleen
@ 2011-10-20 22:20 ` H. Peter Anvin
  2011-10-20 22:26   ` Jacob Shin
  2012-10-17 19:17 ` [tip:x86/urgent] " tip-bot for Jacob Shin
  2 siblings, 1 reply; 18+ messages in thread
From: H. Peter Anvin @ 2011-10-20 22:20 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Thomas Gleixner, Ingo Molnar, Yinghai Lu, Andreas Herrmann, x86,
	linux-kernel

On 10/20/2011 02:15 PM, Jacob Shin wrote:
> On systems with very large memory (1 TB in our case), BIOS may report a
> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> these from the direct mapping.

> +			if (ei->type == E820_RESERVED)
> +				continue;

This should probably be ei->type != E820_RAM or something similar.  I
haven't looked yet, what does the < 4 GiB code do?

	-hpa


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-10-20 22:20 ` H. Peter Anvin
@ 2011-10-20 22:26   ` Jacob Shin
  2011-12-14 22:42     ` H. Peter Anvin
  0 siblings, 1 reply; 18+ messages in thread
From: Jacob Shin @ 2011-10-20 22:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Yinghai Lu, Herrmann3, Andreas,
	x86, linux-kernel

On Thu, 2011-10-20 at 17:20 -0500, H. Peter Anvin wrote:
> On 10/20/2011 02:15 PM, Jacob Shin wrote:
> > On systems with very large memory (1 TB in our case), BIOS may report a
> > reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> > these from the direct mapping.
> 
> > +			if (ei->type == E820_RESERVED)
> > +				continue;
> 
> This should probably be ei->type != E820_RAM or something similar.  I
> haven't looked yet, what does the < 4 GiB code do?

Hm, okay, it calls e820_end_of_low_ram_pfn() which effectively is !=
E820_RAM.

I'll fix this, test, then resend.

Thanks!

-Jacob Shin
Advanced Micro Devices, Inc.

> 
> 	-hpa
> 
> 




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-10-20 22:10     ` Jacob Shin
@ 2011-10-20 22:43       ` H. Peter Anvin
  0 siblings, 0 replies; 18+ messages in thread
From: H. Peter Anvin @ 2011-10-20 22:43 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Andi Kleen, Thomas Gleixner, Ingo Molnar, Yinghai Lu, Herrmann3,
	Andreas, x86, linux-kernel

On 10/20/2011 03:10 PM, Jacob Shin wrote:
> On Thu, 2011-10-20 at 16:30 -0500, H. Peter Anvin wrote:
>> On 10/20/2011 02:28 PM, Andi Kleen wrote:
>>> Jacob Shin <jacob.shin@amd.com> writes:
>>>
>>>> On systems with very large memory (1 TB in our case), BIOS may report a
>>>> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
>>>> these from the direct mapping.
>>>
>>> This doesn't make much sense.  Holes above 4GB are completely legal.
>>>
>>> If you need to workaround a specific broken BIOS you would need a quirk
>>> only matching that system, with a suitable "BIOS is broken" message.
>>>
>>
>> The problem is that apparently right now we map those unconditionally
>> into the 1:1 map and mark them cacheable in PAT, which we *don't* for
>> the < 4 GiB map.
>>
>> This thus makes the behavior match < 4 GiB, which is the correct
>> behavior; this should be made clear in the patch description.
> 
> Will something like:
> 
> "On systems with very large memory (1 TB in our case), BIOS may report a
> reserved region or a hole in the E820 map, even above the 4 GB range.
> Avoid mapping them unconditionally into kernel 1:1 direct mapping as
> cacheable memory, as we also already do for the MMIO hole under 4 GB."
> 
> Work?
> 
> Otherwise, does the patch look acceptable?
> 

Drop the first half, and stop talking about "the MMIO hole" or anything
else with a definite article.

We're either doing this correctly for all holes or we have a bug.  If we
have a bug we should fix it in a general way, and I'm not convinced that
your patch is general enough.

I'm very bandwidth-constrained between kernel.org remediation and about
to leave for kernel summit, so I'm not sure how much detail I can look
into it right now.

	-hpa


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-10-20 22:26   ` Jacob Shin
@ 2011-12-14 22:42     ` H. Peter Anvin
  2011-12-14 23:14       ` Jacob Shin
  0 siblings, 1 reply; 18+ messages in thread
From: H. Peter Anvin @ 2011-12-14 22:42 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Thomas Gleixner, Ingo Molnar, Yinghai Lu, Herrmann3, Andreas,
	x86, linux-kernel

On 10/20/2011 03:26 PM, Jacob Shin wrote:
> On Thu, 2011-10-20 at 17:20 -0500, H. Peter Anvin wrote:
>> On 10/20/2011 02:15 PM, Jacob Shin wrote:
>>> On systems with very large memory (1 TB in our case), BIOS may report a
>>> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
>>> these from the direct mapping.
>>
>>> +			if (ei->type == E820_RESERVED)
>>> +				continue;
>>
>> This should probably be ei->type != E820_RAM or something similar.  I
>> haven't looked yet, what does the < 4 GiB code do?
> 
> Hm, okay, it calls e820_end_of_low_ram_pfn() which effectively is !=
> E820_RAM.
> 
> I'll fix this, test, then resend.
> 

I never got any kind of updated patch, did I?

	-hpa


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-12-14 22:42     ` H. Peter Anvin
@ 2011-12-14 23:14       ` Jacob Shin
  2011-12-16 16:20         ` Jacob Shin
  0 siblings, 1 reply; 18+ messages in thread
From: Jacob Shin @ 2011-12-14 23:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Yinghai Lu, Herrmann3, Andreas,
	x86, linux-kernel

On Wed, Dec 14, 2011 at 02:42:50PM -0800, H. Peter Anvin wrote:
> On 10/20/2011 03:26 PM, Jacob Shin wrote:
> > On Thu, 2011-10-20 at 17:20 -0500, H. Peter Anvin wrote:
> >> On 10/20/2011 02:15 PM, Jacob Shin wrote:
> >>> On systems with very large memory (1 TB in our case), BIOS may report a
> >>> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> >>> these from the direct mapping.
> >>
> >>> +			if (ei->type == E820_RESERVED)
> >>> +				continue;
> >>
> >> This should probably be ei->type != E820_RAM or something similar.  I
> >> haven't looked yet, what does the < 4 GiB code do?
> > 
> > Hm, okay, it calls e820_end_of_low_ram_pfn() which effectively is !=
> > E820_RAM.
> > 
> > I'll fix this, test, then resend.
> > 
> 
> I never got any kind of updated patch, did I?

No, I never sent one out, because it would have still only covered > 4GB, and
in later emails, you said that you wanted a general one that covered all x86.

I'll give it another shot at the generic patch, making a special case for the
< 1MB ISA region.

Thanks!

> 
> 	-hpa
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-12-14 23:14       ` Jacob Shin
@ 2011-12-16 16:20         ` Jacob Shin
  2011-12-16 17:42           ` Yinghai Lu
  0 siblings, 1 reply; 18+ messages in thread
From: Jacob Shin @ 2011-12-16 16:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Yinghai Lu, Herrmann3, Andreas,
	x86, linux-kernel, Joerg Roedel

On Wed, Dec 14, 2011 at 05:14:25PM -0600, Jacob Shin wrote:
> On Wed, Dec 14, 2011 at 02:42:50PM -0800, H. Peter Anvin wrote:
> > On 10/20/2011 03:26 PM, Jacob Shin wrote:
> > > On Thu, 2011-10-20 at 17:20 -0500, H. Peter Anvin wrote:
> > >> On 10/20/2011 02:15 PM, Jacob Shin wrote:
> > >>> On systems with very large memory (1 TB in our case), BIOS may report a
> > >>> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> > >>> these from the direct mapping.
> > >>
> > >>> +			if (ei->type == E820_RESERVED)
> > >>> +				continue;
> > >>
> > >> This should probably be ei->type != E820_RAM or something similar.  I
> > >> haven't looked yet, what does the < 4 GiB code do?
> > > 
> > > Hm, okay, it calls e820_end_of_low_ram_pfn() which effectively is !=
> > > E820_RAM.
> > > 
> > > I'll fix this, test, then resend.
> > > 
> > 
> > I never got any kind of updated patch, did I?
> 
> No, I never sent one out, because it would have still only covered > 4GB, and
> in later emails, you said that you wanted a general one that covered all x86.
> 
> I'll give it another shot at the generic patch, making a special case for the
> < 1MB ISA region.
> 

Here is the new patch, thanks!

>From dad99fe54eb26d4022a48f1f9b88c21f77809282 Mon Sep 17 00:00:00 2001
From: Jacob Shin <jacob.shin@amd.com>
Date: Thu, 15 Dec 2011 10:56:14 -0500
Subject: [PATCH] x86: Only include address ranges marked as E820_RAM in kernel direct mapping

Currently, 0 ~ max_low_pfn is first mapped, then 4GB ~ max_pfn is
mapped. On some systems that have large memory holes that occur
within those two regions, we end up with PATs that mark pages that
are not backed by actual DRAM -- as cacheable.

This patch first maps 0 ~ 1MB ISA region, then iterates over the
E820 to map useable E820_RAM ranges.

Cc: stable@kernel.org   # > 2.6.32
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Reviewed-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
---
 arch/x86/kernel/setup.c |   29 ++++++++++++++++++++++++++---
 1 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index cf0ef98..eae6b41 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -691,6 +691,8 @@ early_param("reservelow", parse_reservelow);
 
 void __init setup_arch(char **cmdline_p)
 {
+	int i;
+
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
@@ -932,13 +934,34 @@ void __init setup_arch(char **cmdline_p)
 	init_gbpages();
 
 	/* max_pfn_mapped is updated here */
-	max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
+	max_low_pfn_mapped = init_memory_mapping(0, 0x100000);
 	max_pfn_mapped = max_low_pfn_mapped;
 
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		u64 start = ei->addr;
+		u64 end = ei->addr + ei->size;
+
+		if (ei->type != E820_RAM)
+			continue;
+
+		if (start < 0x100000)
+			continue;
+#ifdef CONFIG_X86_32
+		if ((start >> PAGE_SHIFT) >= max_low_pfn)
+			continue;
+
+		if ((end >> PAGE_SHIFT) > max_low_pfn)
+			end = max_low_pfn << PAGE_SHIFT;
+#endif
+		max_pfn_mapped = init_memory_mapping(start, end);
+
+		if ((end >> PAGE_SHIFT) == max_low_pfn)
+			max_low_pfn_mapped = max_pfn_mapped;
+	}
+
 #ifdef CONFIG_X86_64
 	if (max_pfn > max_low_pfn) {
-		max_pfn_mapped = init_memory_mapping(1UL<<32,
-						     max_pfn<<PAGE_SHIFT);
 		/* can we preseve max_low_pfn ?*/
 		max_low_pfn = max_pfn;
 	}
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-12-16 16:20         ` Jacob Shin
@ 2011-12-16 17:42           ` Yinghai Lu
  2011-12-16 17:54             ` H. Peter Anvin
  0 siblings, 1 reply; 18+ messages in thread
From: Yinghai Lu @ 2011-12-16 17:42 UTC (permalink / raw)
  To: Jacob Shin
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Herrmann3, Andreas,
	x86, linux-kernel, Joerg Roedel

On 12/16/2011 08:20 AM, Jacob Shin wrote:

> On Wed, Dec 14, 2011 at 05:14:25PM -0600, Jacob Shin wrote:
>> On Wed, Dec 14, 2011 at 02:42:50PM -0800, H. Peter Anvin wrote:
>>> On 10/20/2011 03:26 PM, Jacob Shin wrote:
>>>> On Thu, 2011-10-20 at 17:20 -0500, H. Peter Anvin wrote:
>>>>> On 10/20/2011 02:15 PM, Jacob Shin wrote:
>>>>>> On systems with very large memory (1 TB in our case), BIOS may report a
>>>>>> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
>>>>>> these from the direct mapping.
>>>>>
>>>>>> +			if (ei->type == E820_RESERVED)
>>>>>> +				continue;
>>>>>
>>>>> This should probably be ei->type != E820_RAM or something similar.  I
>>>>> haven't looked yet, what does the < 4 GiB code do?
>>>>
>>>> Hm, okay, it calls e820_end_of_low_ram_pfn() which effectively is !=
>>>> E820_RAM.
>>>>
>>>> I'll fix this, test, then resend.
>>>>
>>>
>>> I never got any kind of updated patch, did I?
>>
>> No, I never sent one out, because it would have still only covered > 4GB, and
>> in later emails, you said that you wanted a general one that covered all x86.
>>
>> I'll give it another shot at the generic patch, making a special case for the
>> < 1MB ISA region.
>>
> 
> Here is the new patch, thanks!
> 
> From dad99fe54eb26d4022a48f1f9b88c21f77809282 Mon Sep 17 00:00:00 2001
> From: Jacob Shin <jacob.shin@amd.com>
> Date: Thu, 15 Dec 2011 10:56:14 -0500
> Subject: [PATCH] x86: Only include address ranges marked as E820_RAM in kernel direct mapping
> 
> Currently, 0 ~ max_low_pfn is first mapped, then 4GB ~ max_pfn is
> mapped. On some systems that have large memory holes that occur
> within those two regions, we end up with PATs that mark pages that
> are not backed by actual DRAM -- as cacheable.
> 
> This patch first maps 0 ~ 1MB ISA region, then iterates over the
> E820 to map useable E820_RAM ranges.
> 
> Cc: stable@kernel.org   # > 2.6.32
> Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> Reviewed-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
> ---
>  arch/x86/kernel/setup.c |   29 ++++++++++++++++++++++++++---
>  1 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index cf0ef98..eae6b41 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -691,6 +691,8 @@ early_param("reservelow", parse_reservelow);
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> +	int i;
> +
>  #ifdef CONFIG_X86_32
>  	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
>  	visws_early_detect();
> @@ -932,13 +934,34 @@ void __init setup_arch(char **cmdline_p)
>  	init_gbpages();
>  
>  	/* max_pfn_mapped is updated here */
> -	max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
> +	max_low_pfn_mapped = init_memory_mapping(0, 0x100000);
>  	max_pfn_mapped = max_low_pfn_mapped;
>  
> +	for (i = 0; i < e820.nr_map; i++) {
> +		struct e820entry *ei = &e820.map[i];
> +		u64 start = ei->addr;
> +		u64 end = ei->addr + ei->size;
> +
> +		if (ei->type != E820_RAM)
> +			continue;
> +
> +		if (start < 0x100000)
> +			continue;
> +#ifdef CONFIG_X86_32
> +		if ((start >> PAGE_SHIFT) >= max_low_pfn)
> +			continue;
> +
> +		if ((end >> PAGE_SHIFT) > max_low_pfn)
> +			end = max_low_pfn << PAGE_SHIFT;
> +#endif
> +		max_pfn_mapped = init_memory_mapping(start, end);
> +
> +		if ((end >> PAGE_SHIFT) == max_low_pfn)
> +			max_low_pfn_mapped = max_pfn_mapped;
> +	}
> +
>  #ifdef CONFIG_X86_64
>  	if (max_pfn > max_low_pfn) {
> -		max_pfn_mapped = init_memory_mapping(1UL<<32,
> -						     max_pfn<<PAGE_SHIFT);
>  		/* can we preseve max_low_pfn ?*/
>  		max_low_pfn = max_pfn;
>  	}


no, you change the meaning max_low_pfn_mapped and max_pfn_mapped for x86_64 at least.

before your patch:
max_low_pfn_mapped is the mapped pfn beblow 4g.
max_pfn_mapped: is mapped pfn.

after your patch, those two variables does not mean the memory [0, max_low_pfn_mapped) and [4g<<12, max_pfn_mapped)
are really mapped.

so in arch/x86/platform/efi/efi.c

                if (end_pfn <= max_low_pfn_mapped
                    || (end_pfn > (1UL << (32 - PAGE_SHIFT))
                        && end_pfn <= max_pfn_mapped))
                        va = __va(md->phys_addr);
                else
                        va = efi_ioremap(md->phys_addr, size, md->type);


and others will have problem.

to solve your problem:
1. unmap the HT range ?
2. or introduce mapped_pfn_range array?

Thanks

Yinghai Lu



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-12-16 17:42           ` Yinghai Lu
@ 2011-12-16 17:54             ` H. Peter Anvin
  2011-12-16 18:29               ` Yinghai Lu
  0 siblings, 1 reply; 18+ messages in thread
From: H. Peter Anvin @ 2011-12-16 17:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jacob Shin, Thomas Gleixner, Ingo Molnar, Herrmann3, Andreas,
	x86, linux-kernel, Joerg Roedel

On 12/16/2011 09:42 AM, Yinghai Lu wrote:
>
> no, you change the meaning max_low_pfn_mapped and max_pfn_mapped for x86_64 at least.
>
> before your patch:
> max_low_pfn_mapped is the mapped pfn beblow 4g.
> max_pfn_mapped: is mapped pfn.
>
> after your patch, those two variables does not mean the memory [0, max_low_pfn_mapped) and [4g<<12, max_pfn_mapped)
> are really mapped.
>

And that's exactly the problem.  It is BROKEN -- as in fundamentally 
dangerous -- for these mappings to exist.  It is because the model is 
too inflexible.

> so in arch/x86/platform/efi/efi.c
>
>                  if (end_pfn<= max_low_pfn_mapped
>                      || (end_pfn>  (1UL<<  (32 - PAGE_SHIFT))
>                          &&  end_pfn<= max_pfn_mapped))
>                          va = __va(md->phys_addr);
>                  else
>                          va = efi_ioremap(md->phys_addr, size, md->type);
>
>
> and others will have problem.
>
> to solve your problem:
> 1. unmap the HT range ?
> 2. or introduce mapped_pfn_range array?

1 is fundamentally a braindead hack that solves one case without solving 
the overall problem.

For 2 - why can't we simply make the invariant that E820_RAM is mapped 
and nothing else, with the sole exceptions being the 1 MiB (fixed MTRR)?

For things like efi.c we should make sure to have interfaces instead of 
open-code this kind of stuff.

	-hpa


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-12-16 17:54             ` H. Peter Anvin
@ 2011-12-16 18:29               ` Yinghai Lu
  2011-12-16 18:32                 ` H. Peter Anvin
  0 siblings, 1 reply; 18+ messages in thread
From: Yinghai Lu @ 2011-12-16 18:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jacob Shin, Thomas Gleixner, Ingo Molnar, Herrmann3, Andreas,
	x86, linux-kernel, Joerg Roedel

On Fri, Dec 16, 2011 at 9:54 AM, H. Peter Anvin <hpa@zytor.com> wrote:

>
> For 2 - why can't we simply make the invariant that E820_RAM is mapped and
> nothing else, with the sole exceptions being the 1 MiB (fixed MTRR)?

Yes, we could do that. but need to track mapped area in simple way.
like later memory hotadd or remove to update those tracking.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-12-16 18:29               ` Yinghai Lu
@ 2011-12-16 18:32                 ` H. Peter Anvin
  2011-12-16 20:59                   ` Yinghai Lu
  0 siblings, 1 reply; 18+ messages in thread
From: H. Peter Anvin @ 2011-12-16 18:32 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jacob Shin, Thomas Gleixner, Ingo Molnar, Herrmann3, Andreas,
	x86, linux-kernel, Joerg Roedel

On 12/16/2011 10:29 AM, Yinghai Lu wrote:
> On Fri, Dec 16, 2011 at 9:54 AM, H. Peter Anvin<hpa@zytor.com>  wrote:
>
>>
>> For 2 - why can't we simply make the invariant that E820_RAM is mapped and
>> nothing else, with the sole exceptions being the 1 MiB (fixed MTRR)?
>
> Yes, we could do that. but need to track mapped area in simple way.
> like later memory hotadd or remove to update those tracking.
>

Well, there are two options for memory hotplug: either we always leave 
address space that can be used for memory hotplug mapped at all times, 
or we need to track it anyway.  Either way we need to know where there 
regions are.  Am I correct that right now we always map memory 
hotpluggable regions, being below the top pfn?

	-hpa


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-12-16 18:32                 ` H. Peter Anvin
@ 2011-12-16 20:59                   ` Yinghai Lu
  2011-12-17  0:57                     ` H. Peter Anvin
  0 siblings, 1 reply; 18+ messages in thread
From: Yinghai Lu @ 2011-12-16 20:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jacob Shin, Thomas Gleixner, Ingo Molnar, Herrmann3, Andreas,
	x86, linux-kernel, Joerg Roedel

On Fri, Dec 16, 2011 at 10:32 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Well, there are two options for memory hotplug: either we always leave
> address space that can be used for memory hotplug mapped at all times, or we
> need to track it anyway.  Either way we need to know where there regions
> are.  Am I correct that right now we always map memory hotpluggable regions,
> being below the top pfn?

in arch/x86/mm/init_64.c::arch_add_memory()
it will call init_memory_mapping to map new added memory.
and will update max_pfn_mapped, max-pfn, max_low_pfn.

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  2011-12-16 20:59                   ` Yinghai Lu
@ 2011-12-17  0:57                     ` H. Peter Anvin
  0 siblings, 0 replies; 18+ messages in thread
From: H. Peter Anvin @ 2011-12-17  0:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jacob Shin, Thomas Gleixner, Ingo Molnar, Herrmann3, Andreas,
	x86, linux-kernel, Joerg Roedel

On 12/16/2011 12:59 PM, Yinghai Lu wrote:
> On Fri, Dec 16, 2011 at 10:32 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> Well, there are two options for memory hotplug: either we always leave
>> address space that can be used for memory hotplug mapped at all times, or we
>> need to track it anyway.  Either way we need to know where there regions
>> are.  Am I correct that right now we always map memory hotpluggable regions,
>> being below the top pfn?
> 
> in arch/x86/mm/init_64.c::arch_add_memory()
> it will call init_memory_mapping to map new added memory.
> and will update max_pfn_mapped, max-pfn, max_low_pfn.
> 

OK, so that would seem to match the invariant that all the memory blocks
which are RAM have mappings, and the ones that don't don't have them, or
is that violated elsewhere?  Either way, the same notion applies -- we
should either leverage an existing range map or have a new one, but
there is nothing magic about holes above or below the 4 GiB boundary;
the only "magic" is for 32 bits the HIGHMEM boundary and < 1 MiB
(because of the fixed MTRRs).

	-hpa


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:x86/urgent] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping .
  2011-10-20 21:15 [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping Jacob Shin
  2011-10-20 21:28 ` Andi Kleen
  2011-10-20 22:20 ` H. Peter Anvin
@ 2012-10-17 19:17 ` tip-bot for Jacob Shin
  2 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Jacob Shin @ 2012-10-17 19:17 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, jacob.shin, tglx, hpa

Commit-ID:  1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a
Gitweb:     http://git.kernel.org/tip/1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a
Author:     Jacob Shin <jacob.shin@amd.com>
AuthorDate: Thu, 20 Oct 2011 16:15:26 -0500
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Wed, 17 Oct 2012 10:59:39 -0700

x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.

On systems with very large memory (1 TB in our case), BIOS may report a
reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
these from the direct mapping.

[ hpa: this should be done not just for > 4 GB but for everything above the legacy
  region (1 MB), at the very least.  That, however, turns out to require significant
  restructuring.  That work is well underway, but is not suitable for rc/stable. ]

Cc: stable@kernel.org   # > 2.6.32
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/setup.c |   17 +++++++++++++++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f4b9b80..198e774 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -919,8 +919,21 @@ void __init setup_arch(char **cmdline_p)
 
 #ifdef CONFIG_X86_64
 	if (max_pfn > max_low_pfn) {
-		max_pfn_mapped = init_memory_mapping(1UL<<32,
-						     max_pfn<<PAGE_SHIFT);
+		int i;
+		for (i = 0; i < e820.nr_map; i++) {
+			struct e820entry *ei = &e820.map[i];
+
+			if (ei->addr + ei->size <= 1UL << 32)
+				continue;
+
+			if (ei->type == E820_RESERVED)
+				continue;
+
+			max_pfn_mapped = init_memory_mapping(
+				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
+				ei->addr + ei->size);
+		}
+
 		/* can we preseve max_low_pfn ?*/
 		max_low_pfn = max_pfn;
 	}

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-10-17 19:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-20 21:15 [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping Jacob Shin
2011-10-20 21:28 ` Andi Kleen
2011-10-20 21:30   ` H. Peter Anvin
2011-10-20 21:32     ` H. Peter Anvin
2011-10-20 22:10     ` Jacob Shin
2011-10-20 22:43       ` H. Peter Anvin
2011-10-20 22:20 ` H. Peter Anvin
2011-10-20 22:26   ` Jacob Shin
2011-12-14 22:42     ` H. Peter Anvin
2011-12-14 23:14       ` Jacob Shin
2011-12-16 16:20         ` Jacob Shin
2011-12-16 17:42           ` Yinghai Lu
2011-12-16 17:54             ` H. Peter Anvin
2011-12-16 18:29               ` Yinghai Lu
2011-12-16 18:32                 ` H. Peter Anvin
2011-12-16 20:59                   ` Yinghai Lu
2011-12-17  0:57                     ` H. Peter Anvin
2012-10-17 19:17 ` [tip:x86/urgent] " tip-bot for Jacob Shin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.