x86/cpu/amd: Remove dead code for TSEG region remapping
diff mbox series

Message ID 20201127171324.1846019-1-nivedita@alum.mit.edu
State Accepted
Commit 262bd5724afdefd4c48a260d6100e78cc43ee06b
Headers show
Series
  • x86/cpu/amd: Remove dead code for TSEG region remapping
Related show

Commit Message

Arvind Sankar Nov. 27, 2020, 5:13 p.m. UTC
Commit
  26bfa5f89486 ("x86, amd: Cleanup init_amd")
moved the code that remaps the TSEG region using 4k pages from
init_amd() to bsp_init_amd().

However, bsp_init_amd() is executed well before the direct mapping is
actually created:

  setup_arch()
    -> early_cpu_init()
      -> early_identify_cpu()
        -> this_cpu->c_bsp_init()
	  -> bsp_init_amd()
    ...
    -> init_mem_mapping()

So the change effectively disabled the 4k remapping, because
pfn_range_is_mapped() is always false at this point.

It has been over six years since the commit, and no-one seems to have
noticed this, so just remove the code. The original code was also
incomplete, since it doesn't check how large the TSEG address range
actually is, so it might remap only part of it in any case.

Hygon has copied the incorrect version, so the code has never run on it
since the cpu support was added two years ago. Remove it from there as
well.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
---
 arch/x86/kernel/cpu/amd.c   | 21 ---------------------
 arch/x86/kernel/cpu/hygon.c | 20 --------------------
 2 files changed, 41 deletions(-)

Comments

Borislav Petkov Nov. 27, 2020, 5:27 p.m. UTC | #1
On Fri, Nov 27, 2020 at 12:13:24PM -0500, Arvind Sankar wrote:
> Commit
>   26bfa5f89486 ("x86, amd: Cleanup init_amd")
> moved the code that remaps the TSEG region using 4k pages from
> init_amd() to bsp_init_amd().
> 
> However, bsp_init_amd() is executed well before the direct mapping is
> actually created:
> 
>   setup_arch()
>     -> early_cpu_init()
>       -> early_identify_cpu()
>         -> this_cpu->c_bsp_init()
> 	  -> bsp_init_amd()
>     ...
>     -> init_mem_mapping()
> 
> So the change effectively disabled the 4k remapping, because
> pfn_range_is_mapped() is always false at this point.
> 
> It has been over six years since the commit, and no-one seems to have
> noticed this, so just remove the code. The original code was also
> incomplete, since it doesn't check how large the TSEG address range
> actually is, so it might remap only part of it in any case.

Yah, and the patch which added this:

6c62aa4a3c12 ("x86: make amd.c have 64bit support code")

does not say what for (I'm not surprised, frankly).

So if AMD folks on Cc don't have any need for actually fixing this
properly, yap, we can zap it.

Thx.
Tom Lendacky Dec. 2, 2020, 5:58 p.m. UTC | #2
On 11/27/20 11:27 AM, Borislav Petkov wrote:
> On Fri, Nov 27, 2020 at 12:13:24PM -0500, Arvind Sankar wrote:
>> Commit
>>    26bfa5f89486 ("x86, amd: Cleanup init_amd")
>> moved the code that remaps the TSEG region using 4k pages from
>> init_amd() to bsp_init_amd().
>>
>> However, bsp_init_amd() is executed well before the direct mapping is
>> actually created:
>>
>>    setup_arch()
>>      -> early_cpu_init()
>>        -> early_identify_cpu()
>>          -> this_cpu->c_bsp_init()
>> 	  -> bsp_init_amd()
>>      ...
>>      -> init_mem_mapping()
>>
>> So the change effectively disabled the 4k remapping, because
>> pfn_range_is_mapped() is always false at this point.
>>
>> It has been over six years since the commit, and no-one seems to have
>> noticed this, so just remove the code. The original code was also
>> incomplete, since it doesn't check how large the TSEG address range
>> actually is, so it might remap only part of it in any case.
> 
> Yah, and the patch which added this:
> 
> 6c62aa4a3c12 ("x86: make amd.c have 64bit support code")
> 
> does not say what for (I'm not surprised, frankly).
> 
> So if AMD folks on Cc don't have any need for actually fixing this
> properly, yap, we can zap it.

I believe this is geared towards performance. If the TSEG base address is 
not 2MB aligned, then hardware has to break down a 2MB TLB entry if the OS 
references the memory within the 2MB page that is before the TSEG base 
address. This can occur whenever the 2MB TLB entry is re-installed because 
of TLB flushes, etc.

I would hope that newer BIOSes are 2MB aligning the TSEG base address, but 
if not, then this can help.

So moving it back wouldn't be a bad thing. It should probably only do the 
set_memory_4k() if the TSEG base address is not 2MB aligned, which I think 
is covered by the pfn_range_is_mapped() call?

Thanks,
Tom

> 
> Thx.
>
Borislav Petkov Dec. 2, 2020, 6:10 p.m. UTC | #3
On Wed, Dec 02, 2020 at 11:58:15AM -0600, Tom Lendacky wrote:
> I believe this is geared towards performance. If the TSEG base address is
> not 2MB aligned, then hardware has to break down a 2MB TLB entry if the OS
> references the memory within the 2MB page that is before the TSEG base
> address. This can occur whenever the 2MB TLB entry is re-installed because
> of TLB flushes, etc.

And if this gets reinstated properly, then that explanation belongs over
it because nothing else explains what that thing did. So thanks for
digging it out.
Arvind Sankar Dec. 2, 2020, 10:32 p.m. UTC | #4
On Wed, Dec 02, 2020 at 11:58:15AM -0600, Tom Lendacky wrote:
> On 11/27/20 11:27 AM, Borislav Petkov wrote:
> > On Fri, Nov 27, 2020 at 12:13:24PM -0500, Arvind Sankar wrote:
> >> Commit
> >>    26bfa5f89486 ("x86, amd: Cleanup init_amd")
> >> moved the code that remaps the TSEG region using 4k pages from
> >> init_amd() to bsp_init_amd().
> >>
> >> However, bsp_init_amd() is executed well before the direct mapping is
> >> actually created:
> >>
> >>    setup_arch()
> >>      -> early_cpu_init()
> >>        -> early_identify_cpu()
> >>          -> this_cpu->c_bsp_init()
> >> 	  -> bsp_init_amd()
> >>      ...
> >>      -> init_mem_mapping()
> >>
> >> So the change effectively disabled the 4k remapping, because
> >> pfn_range_is_mapped() is always false at this point.
> >>
> >> It has been over six years since the commit, and no-one seems to have
> >> noticed this, so just remove the code. The original code was also
> >> incomplete, since it doesn't check how large the TSEG address range
> >> actually is, so it might remap only part of it in any case.
> > 
> > Yah, and the patch which added this:
> > 
> > 6c62aa4a3c12 ("x86: make amd.c have 64bit support code")
> > 
> > does not say what for (I'm not surprised, frankly).
> > 
> > So if AMD folks on Cc don't have any need for actually fixing this
> > properly, yap, we can zap it.
> 
> I believe this is geared towards performance. If the TSEG base address is 
> not 2MB aligned, then hardware has to break down a 2MB TLB entry if the OS 
> references the memory within the 2MB page that is before the TSEG base 
> address. This can occur whenever the 2MB TLB entry is re-installed because 
> of TLB flushes, etc.
> 
> I would hope that newer BIOSes are 2MB aligning the TSEG base address, but 
> if not, then this can help.
> 
> So moving it back wouldn't be a bad thing. It should probably only do the 
> set_memory_4k() if the TSEG base address is not 2MB aligned, which I think 
> is covered by the pfn_range_is_mapped() call?
> 

The pfn_range_is_mapped() call just checks whether it is mapped at all
in the direct mapping. Is the TSEG range supposed to be marked as
non-RAM in the E820 map? AFAICS, the only case when a direct mapping is
created for non-RAM is for the 0-1Mb real-mode range, and that will
always use 4k pages. Above that anything not marked as RAM will create
an unmapped hole in the direct map, so in this case the memory just
below the TSEG base would already use smaller pages if needed.

If it's possible that the E820 mapping says this range is RAM, then
should we also break up the direct map just after the end of the TSEG
range for the same reason?

Thanks.
Borislav Petkov Dec. 3, 2020, 8:48 a.m. UTC | #5
On Wed, Dec 02, 2020 at 05:32:32PM -0500, Arvind Sankar wrote:
> The pfn_range_is_mapped() call just checks whether it is mapped at all
> in the direct mapping. Is the TSEG range supposed to be marked as
> non-RAM in the E820 map? AFAICS, the only case when a direct mapping is
> created for non-RAM is for the 0-1Mb real-mode range, and that will
> always use 4k pages. Above that anything not marked as RAM will create
> an unmapped hole in the direct map, so in this case the memory just
> below the TSEG base would already use smaller pages if needed.
> 
> If it's possible that the E820 mapping says this range is RAM, then
> should we also break up the direct map just after the end of the TSEG
> range for the same reason?

So I have a machine where TSEG is not 2M aligned and somewhere in the 1G
range:

[    1.135094] tseg: 003bf00000

It is not in the E820 map either:

[    0.019784] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.020014] init_memory_mapping: [mem 0x3bc00000-0x3bdfffff]
[    0.020166] init_memory_mapping: [mem 0x20000000-0x3bbfffff]
[    0.020327] init_memory_mapping: [mem 0x00100000-0x1fffffff]
[    0.020677] init_memory_mapping: [mem 0x3be00000-0x3be8ffff]

That doesn't mean that it can happen that there might be some
configuration where it ends up being mapped.

So looking at what the code does, it kinda makes sense: you want the 2M
range between 0x3be00000 and 0x3c000000 to be split into 4K mappings,
*if* it is mapped.

I need to find a box where it is mapped *and* not 2M aligned, though,
for testing. Which appears kinda hard to do as all the new ones are
aligned.

The above is from a K8 box which should already be dead, as a matter of
fact.
Arvind Sankar Dec. 3, 2020, 4:14 p.m. UTC | #6
On Thu, Dec 03, 2020 at 09:48:57AM +0100, Borislav Petkov wrote:
> On Wed, Dec 02, 2020 at 05:32:32PM -0500, Arvind Sankar wrote:
> > The pfn_range_is_mapped() call just checks whether it is mapped at all
> > in the direct mapping. Is the TSEG range supposed to be marked as
> > non-RAM in the E820 map? AFAICS, the only case when a direct mapping is
> > created for non-RAM is for the 0-1Mb real-mode range, and that will
> > always use 4k pages. Above that anything not marked as RAM will create
> > an unmapped hole in the direct map, so in this case the memory just
> > below the TSEG base would already use smaller pages if needed.
> > 
> > If it's possible that the E820 mapping says this range is RAM, then
> > should we also break up the direct map just after the end of the TSEG
> > range for the same reason?
> 
> So I have a machine where TSEG is not 2M aligned and somewhere in the 1G
> range:
> 
> [    1.135094] tseg: 003bf00000
> 
> It is not in the E820 map either:
> 
> [    0.019784] init_memory_mapping: [mem 0x00000000-0x000fffff]
> [    0.020014] init_memory_mapping: [mem 0x3bc00000-0x3bdfffff]
> [    0.020166] init_memory_mapping: [mem 0x20000000-0x3bbfffff]
> [    0.020327] init_memory_mapping: [mem 0x00100000-0x1fffffff]
> [    0.020677] init_memory_mapping: [mem 0x3be00000-0x3be8ffff]
> 
> That doesn't mean that it can happen that there might be some
> configuration where it ends up being mapped.
> 
> So looking at what the code does, it kinda makes sense: you want the 2M
> range between 0x3be00000 and 0x3c000000 to be split into 4K mappings,
> *if* it is mapped.
> 
> I need to find a box where it is mapped *and* not 2M aligned, though,
> for testing. Which appears kinda hard to do as all the new ones are
> aligned.

Do any of them have it mapped at all, regardless of the alignment? There
seems to be nothing else in the kernel that ever looks at the TSEG MSR,
so I would guess that it has to be non-RAM in the E820 map, otherwise
nothing would prevent the kernel from allocating and using that space.

I found the actual original commit, which does has a description of the
reasoning. It's
  8346ea17aa20 ("x86: split large page mapping for AMD TSEG")

It looks like at the time, the direct mapping didn't really look at the
E820 map in any detail, and was always set up with at least 2Mb pages,
or Gb pages if they were available, from 0 to max_pfn_mapped. So the
direct mapping would have covered even holes that weren't in the E820
map.

Commit
  66520ebc2df3 ("x86, mm: Only direct map addresses that are marked as E820_RAM")
changed the direct map setup to avoid mapping holes, because it
apparently became more serious than performance issues: this commit
mentions MCE's getting triggered because of the overmapping.

> 
> The above is from a K8 box which should already be dead, as a matter of
> fact.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov Dec. 3, 2020, 4:44 p.m. UTC | #7
On Thu, Dec 03, 2020 at 11:14:06AM -0500, Arvind Sankar wrote:
> Do any of them have it mapped at all, regardless of the alignment? There
> seems to be nothing else in the kernel that ever looks at the TSEG MSR,
> so I would guess that it has to be non-RAM in the E820 map, otherwise
> nothing would prevent the kernel from allocating and using that space.

Ha, that's a very good question. If all those BIOSes from K8 onwards
would put the TSEG in a non-RAM area and after

  66520ebc2df3 ("x86, mm: Only direct map addresses that are marked as E820_RAM")

(great investigative work, btw, thanks for that!) then we can simply say
that that splitting is not needed anymore.

Maybe Tom can ask BIOS people whether they always did that - that being
to put the TSEG into a non-RAM area. I can boot my debug patch on my
boxes here but that doesn't mean a whole lot...

Thx.

Patch
diff mbox series

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 1f71c7616917..f8ca66f3d861 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -23,7 +23,6 @@ 
 
 #ifdef CONFIG_X86_64
 # include <asm/mmconfig.h>
-# include <asm/set_memory.h>
 #endif
 
 #include "cpu.h"
@@ -509,26 +508,6 @@  static void early_init_amd_mc(struct cpuinfo_x86 *c)
 
 static void bsp_init_amd(struct cpuinfo_x86 *c)
 {
-
-#ifdef CONFIG_X86_64
-	if (c->x86 >= 0xf) {
-		unsigned long long tseg;
-
-		/*
-		 * Split up direct mapping around the TSEG SMM area.
-		 * Don't do it for gbpages because there seems very little
-		 * benefit in doing so.
-		 */
-		if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) {
-			unsigned long pfn = tseg >> PAGE_SHIFT;
-
-			pr_debug("tseg: %010llx\n", tseg);
-			if (pfn_range_is_mapped(pfn, pfn + 1))
-				set_memory_4k((unsigned long)__va(tseg), 1);
-		}
-	}
-#endif
-
 	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
 
 		if (c->x86 > 0x10 ||
diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c
index dc0840aae26c..ae59115d18f9 100644
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -14,9 +14,6 @@ 
 #include <asm/cacheinfo.h>
 #include <asm/spec-ctrl.h>
 #include <asm/delay.h>
-#ifdef CONFIG_X86_64
-# include <asm/set_memory.h>
-#endif
 
 #include "cpu.h"
 
@@ -203,23 +200,6 @@  static void early_init_hygon_mc(struct cpuinfo_x86 *c)
 
 static void bsp_init_hygon(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_X86_64
-	unsigned long long tseg;
-
-	/*
-	 * Split up direct mapping around the TSEG SMM area.
-	 * Don't do it for gbpages because there seems very little
-	 * benefit in doing so.
-	 */
-	if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) {
-		unsigned long pfn = tseg >> PAGE_SHIFT;
-
-		pr_debug("tseg: %010llx\n", tseg);
-		if (pfn_range_is_mapped(pfn, pfn + 1))
-			set_memory_4k((unsigned long)__va(tseg), 1);
-	}
-#endif
-
 	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
 		u64 val;