Message ID | 20201127171324.1846019-1-nivedita@alum.mit.edu |
---|---|
State | Accepted |
Commit | 262bd5724afdefd4c48a260d6100e78cc43ee06b |
Headers | show |
Series |
|
Related | show |
On Fri, Nov 27, 2020 at 12:13:24PM -0500, Arvind Sankar wrote: > Commit > 26bfa5f89486 ("x86, amd: Cleanup init_amd") > moved the code that remaps the TSEG region using 4k pages from > init_amd() to bsp_init_amd(). > > However, bsp_init_amd() is executed well before the direct mapping is > actually created: > > setup_arch() > -> early_cpu_init() > -> early_identify_cpu() > -> this_cpu->c_bsp_init() > -> bsp_init_amd() > ... > -> init_mem_mapping() > > So the change effectively disabled the 4k remapping, because > pfn_range_is_mapped() is always false at this point. > > It has been over six years since the commit, and no-one seems to have > noticed this, so just remove the code. The original code was also > incomplete, since it doesn't check how large the TSEG address range > actually is, so it might remap only part of it in any case. Yah, and the patch which added this: 6c62aa4a3c12 ("x86: make amd.c have 64bit support code") does not say what for (I'm not surprised, frankly). So if AMD folks on Cc don't have any need for actually fixing this properly, yap, we can zap it. Thx.
On 11/27/20 11:27 AM, Borislav Petkov wrote: > On Fri, Nov 27, 2020 at 12:13:24PM -0500, Arvind Sankar wrote: >> Commit >> 26bfa5f89486 ("x86, amd: Cleanup init_amd") >> moved the code that remaps the TSEG region using 4k pages from >> init_amd() to bsp_init_amd(). >> >> However, bsp_init_amd() is executed well before the direct mapping is >> actually created: >> >> setup_arch() >> -> early_cpu_init() >> -> early_identify_cpu() >> -> this_cpu->c_bsp_init() >> -> bsp_init_amd() >> ... >> -> init_mem_mapping() >> >> So the change effectively disabled the 4k remapping, because >> pfn_range_is_mapped() is always false at this point. >> >> It has been over six years since the commit, and no-one seems to have >> noticed this, so just remove the code. The original code was also >> incomplete, since it doesn't check how large the TSEG address range >> actually is, so it might remap only part of it in any case. > > Yah, and the patch which added this: > > 6c62aa4a3c12 ("x86: make amd.c have 64bit support code") > > does not say what for (I'm not surprised, frankly). > > So if AMD folks on Cc don't have any need for actually fixing this > properly, yap, we can zap it. I believe this is geared towards performance. If the TSEG base address is not 2MB aligned, then hardware has to break down a 2MB TLB entry if the OS references the memory within the 2MB page that is before the TSEG base address. This can occur whenever the 2MB TLB entry is re-installed because of TLB flushes, etc. I would hope that newer BIOSes are 2MB aligning the TSEG base address, but if not, then this can help. So moving it back wouldn't be a bad thing. It should probably only do the set_memory_4k() if the TSEG base address is not 2MB aligned, which I think is covered by the pfn_range_is_mapped() call? Thanks, Tom > > Thx. >
On Wed, Dec 02, 2020 at 11:58:15AM -0600, Tom Lendacky wrote: > I believe this is geared towards performance. If the TSEG base address is > not 2MB aligned, then hardware has to break down a 2MB TLB entry if the OS > references the memory within the 2MB page that is before the TSEG base > address. This can occur whenever the 2MB TLB entry is re-installed because > of TLB flushes, etc. And if this gets reinstated properly, then that explanation belongs over it because nothing else explains what that thing did. So thanks for digging it out.
On Wed, Dec 02, 2020 at 11:58:15AM -0600, Tom Lendacky wrote: > On 11/27/20 11:27 AM, Borislav Petkov wrote: > > On Fri, Nov 27, 2020 at 12:13:24PM -0500, Arvind Sankar wrote: > >> Commit > >> 26bfa5f89486 ("x86, amd: Cleanup init_amd") > >> moved the code that remaps the TSEG region using 4k pages from > >> init_amd() to bsp_init_amd(). > >> > >> However, bsp_init_amd() is executed well before the direct mapping is > >> actually created: > >> > >> setup_arch() > >> -> early_cpu_init() > >> -> early_identify_cpu() > >> -> this_cpu->c_bsp_init() > >> -> bsp_init_amd() > >> ... > >> -> init_mem_mapping() > >> > >> So the change effectively disabled the 4k remapping, because > >> pfn_range_is_mapped() is always false at this point. > >> > >> It has been over six years since the commit, and no-one seems to have > >> noticed this, so just remove the code. The original code was also > >> incomplete, since it doesn't check how large the TSEG address range > >> actually is, so it might remap only part of it in any case. > > > > Yah, and the patch which added this: > > > > 6c62aa4a3c12 ("x86: make amd.c have 64bit support code") > > > > does not say what for (I'm not surprised, frankly). > > > > So if AMD folks on Cc don't have any need for actually fixing this > > properly, yap, we can zap it. > > I believe this is geared towards performance. If the TSEG base address is > not 2MB aligned, then hardware has to break down a 2MB TLB entry if the OS > references the memory within the 2MB page that is before the TSEG base > address. This can occur whenever the 2MB TLB entry is re-installed because > of TLB flushes, etc. > > I would hope that newer BIOSes are 2MB aligning the TSEG base address, but > if not, then this can help. > > So moving it back wouldn't be a bad thing. It should probably only do the > set_memory_4k() if the TSEG base address is not 2MB aligned, which I think > is covered by the pfn_range_is_mapped() call? > The pfn_range_is_mapped() call just checks whether it is mapped at all in the direct mapping. Is the TSEG range supposed to be marked as non-RAM in the E820 map? AFAICS, the only case when a direct mapping is created for non-RAM is for the 0-1Mb real-mode range, and that will always use 4k pages. Above that anything not marked as RAM will create an unmapped hole in the direct map, so in this case the memory just below the TSEG base would already use smaller pages if needed. If it's possible that the E820 mapping says this range is RAM, then should we also break up the direct map just after the end of the TSEG range for the same reason? Thanks.
On Wed, Dec 02, 2020 at 05:32:32PM -0500, Arvind Sankar wrote: > The pfn_range_is_mapped() call just checks whether it is mapped at all > in the direct mapping. Is the TSEG range supposed to be marked as > non-RAM in the E820 map? AFAICS, the only case when a direct mapping is > created for non-RAM is for the 0-1Mb real-mode range, and that will > always use 4k pages. Above that anything not marked as RAM will create > an unmapped hole in the direct map, so in this case the memory just > below the TSEG base would already use smaller pages if needed. > > If it's possible that the E820 mapping says this range is RAM, then > should we also break up the direct map just after the end of the TSEG > range for the same reason? So I have a machine where TSEG is not 2M aligned and somewhere in the 1G range: [ 1.135094] tseg: 003bf00000 It is not in the E820 map either: [ 0.019784] init_memory_mapping: [mem 0x00000000-0x000fffff] [ 0.020014] init_memory_mapping: [mem 0x3bc00000-0x3bdfffff] [ 0.020166] init_memory_mapping: [mem 0x20000000-0x3bbfffff] [ 0.020327] init_memory_mapping: [mem 0x00100000-0x1fffffff] [ 0.020677] init_memory_mapping: [mem 0x3be00000-0x3be8ffff] That doesn't mean that it can happen that there might be some configuration where it ends up being mapped. So looking at what the code does, it kinda makes sense: you want the 2M range between 0x3be00000 and 0x3c000000 to be split into 4K mappings, *if* it is mapped. I need to find a box where it is mapped *and* not 2M aligned, though, for testing. Which appears kinda hard to do as all the new ones are aligned. The above is from a K8 box which should already be dead, as a matter of fact.
On Thu, Dec 03, 2020 at 09:48:57AM +0100, Borislav Petkov wrote: > On Wed, Dec 02, 2020 at 05:32:32PM -0500, Arvind Sankar wrote: > > The pfn_range_is_mapped() call just checks whether it is mapped at all > > in the direct mapping. Is the TSEG range supposed to be marked as > > non-RAM in the E820 map? AFAICS, the only case when a direct mapping is > > created for non-RAM is for the 0-1Mb real-mode range, and that will > > always use 4k pages. Above that anything not marked as RAM will create > > an unmapped hole in the direct map, so in this case the memory just > > below the TSEG base would already use smaller pages if needed. > > > > If it's possible that the E820 mapping says this range is RAM, then > > should we also break up the direct map just after the end of the TSEG > > range for the same reason? > > So I have a machine where TSEG is not 2M aligned and somewhere in the 1G > range: > > [ 1.135094] tseg: 003bf00000 > > It is not in the E820 map either: > > [ 0.019784] init_memory_mapping: [mem 0x00000000-0x000fffff] > [ 0.020014] init_memory_mapping: [mem 0x3bc00000-0x3bdfffff] > [ 0.020166] init_memory_mapping: [mem 0x20000000-0x3bbfffff] > [ 0.020327] init_memory_mapping: [mem 0x00100000-0x1fffffff] > [ 0.020677] init_memory_mapping: [mem 0x3be00000-0x3be8ffff] > > That doesn't mean that it can happen that there might be some > configuration where it ends up being mapped. > > So looking at what the code does, it kinda makes sense: you want the 2M > range between 0x3be00000 and 0x3c000000 to be split into 4K mappings, > *if* it is mapped. > > I need to find a box where it is mapped *and* not 2M aligned, though, > for testing. Which appears kinda hard to do as all the new ones are > aligned. Do any of them have it mapped at all, regardless of the alignment? There seems to be nothing else in the kernel that ever looks at the TSEG MSR, so I would guess that it has to be non-RAM in the E820 map, otherwise nothing would prevent the kernel from allocating and using that space. I found the actual original commit, which does has a description of the reasoning. It's 8346ea17aa20 ("x86: split large page mapping for AMD TSEG") It looks like at the time, the direct mapping didn't really look at the E820 map in any detail, and was always set up with at least 2Mb pages, or Gb pages if they were available, from 0 to max_pfn_mapped. So the direct mapping would have covered even holes that weren't in the E820 map. Commit 66520ebc2df3 ("x86, mm: Only direct map addresses that are marked as E820_RAM") changed the direct map setup to avoid mapping holes, because it apparently became more serious than performance issues: this commit mentions MCE's getting triggered because of the overmapping. > > The above is from a K8 box which should already be dead, as a matter of > fact. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Dec 03, 2020 at 11:14:06AM -0500, Arvind Sankar wrote: > Do any of them have it mapped at all, regardless of the alignment? There > seems to be nothing else in the kernel that ever looks at the TSEG MSR, > so I would guess that it has to be non-RAM in the E820 map, otherwise > nothing would prevent the kernel from allocating and using that space. Ha, that's a very good question. If all those BIOSes from K8 onwards would put the TSEG in a non-RAM area and after 66520ebc2df3 ("x86, mm: Only direct map addresses that are marked as E820_RAM") (great investigative work, btw, thanks for that!) then we can simply say that that splitting is not needed anymore. Maybe Tom can ask BIOS people whether they always did that - that being to put the TSEG into a non-RAM area. I can boot my debug patch on my boxes here but that doesn't mean a whole lot... Thx.
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 1f71c7616917..f8ca66f3d861 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -23,7 +23,6 @@ #ifdef CONFIG_X86_64 # include <asm/mmconfig.h> -# include <asm/set_memory.h> #endif #include "cpu.h" @@ -509,26 +508,6 @@ static void early_init_amd_mc(struct cpuinfo_x86 *c) static void bsp_init_amd(struct cpuinfo_x86 *c) { - -#ifdef CONFIG_X86_64 - if (c->x86 >= 0xf) { - unsigned long long tseg; - - /* - * Split up direct mapping around the TSEG SMM area. - * Don't do it for gbpages because there seems very little - * benefit in doing so. - */ - if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) { - unsigned long pfn = tseg >> PAGE_SHIFT; - - pr_debug("tseg: %010llx\n", tseg); - if (pfn_range_is_mapped(pfn, pfn + 1)) - set_memory_4k((unsigned long)__va(tseg), 1); - } - } -#endif - if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) { if (c->x86 > 0x10 || diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c index dc0840aae26c..ae59115d18f9 100644 --- a/arch/x86/kernel/cpu/hygon.c +++ b/arch/x86/kernel/cpu/hygon.c @@ -14,9 +14,6 @@ #include <asm/cacheinfo.h> #include <asm/spec-ctrl.h> #include <asm/delay.h> -#ifdef CONFIG_X86_64 -# include <asm/set_memory.h> -#endif #include "cpu.h" @@ -203,23 +200,6 @@ static void early_init_hygon_mc(struct cpuinfo_x86 *c) static void bsp_init_hygon(struct cpuinfo_x86 *c) { -#ifdef CONFIG_X86_64 - unsigned long long tseg; - - /* - * Split up direct mapping around the TSEG SMM area. - * Don't do it for gbpages because there seems very little - * benefit in doing so. - */ - if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) { - unsigned long pfn = tseg >> PAGE_SHIFT; - - pr_debug("tseg: %010llx\n", tseg); - if (pfn_range_is_mapped(pfn, pfn + 1)) - set_memory_4k((unsigned long)__va(tseg), 1); - } -#endif - if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) { u64 val;
Commit 26bfa5f89486 ("x86, amd: Cleanup init_amd") moved the code that remaps the TSEG region using 4k pages from init_amd() to bsp_init_amd(). However, bsp_init_amd() is executed well before the direct mapping is actually created: setup_arch() -> early_cpu_init() -> early_identify_cpu() -> this_cpu->c_bsp_init() -> bsp_init_amd() ... -> init_mem_mapping() So the change effectively disabled the 4k remapping, because pfn_range_is_mapped() is always false at this point. It has been over six years since the commit, and no-one seems to have noticed this, so just remove the code. The original code was also incomplete, since it doesn't check how large the TSEG address range actually is, so it might remap only part of it in any case. Hygon has copied the incorrect version, so the code has never run on it since the cpu support was added two years ago. Remove it from there as well. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> --- arch/x86/kernel/cpu/amd.c | 21 --------------------- arch/x86/kernel/cpu/hygon.c | 20 -------------------- 2 files changed, 41 deletions(-)