xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
       [not found]             ` <1457745396.6393.257.camel@hpe.com>
@ 2016-03-15  0:15               ` Luis R. Rodriguez
  2016-03-15 23:48                 ` Toshi Kani
  0 siblings, 1 reply; 23+ messages in thread
From: Luis R. Rodriguez @ 2016-03-15  0:15 UTC (permalink / raw)
  To: Toshi Kani, Juergen Gross, Boris Ostrovsky, Matt Fleming,
	Olof Johansson, Paul Stewart
  Cc: Luis R. Rodriguez, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	Thomas Gleixner, Paul Gortmaker, X86 ML, linux-kernel, xen-devel

On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> > > > > On Fri, 2016-03-11 at 10:24 +0100, Borislav Petkov wrote:
> > > > > > On Thu, Mar 10, 2016 at 09:45:46PM -0700, Toshi Kani wrote:
> > > > > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > > > > handler that initializes PAT as part of MTRR initialization.
>  :
> > > > > 
> > > > > No, it does not fix it. The problem in this particular case, i.e.
> > > > > MTRR disabled by its MSR, is that mtrr_bp_init() calls pat_init()
> > > > > (as PAT enabled) and initializes PAT on BSP. After APs are
> > > > > launched, we need the MTRR's rendezvous handler to initialize PAT
> > > > > on APs to be consistent with BSP. However, MTRR rendezvous handler
> > > > > is no-op since MTRR is disabled.
> > > > 
> > > > This seems like a hack on enabling PAT through MTRR code, can we have
> > > > a PAT rendezvous handler on its own, or provide a generic rendezvous
> > > > handler that lets you deal with whatever interfaces need setup. Then
> > > > conflicts can just be negotiated early.
> > > 
> > > The MTRR code can be enhanced so that the rendezvous handler can handle
> > > MTRR and PAT state independently.  I noted this case as (*) in the
> > > table of this patch description.  This is a separate item, however.
> > > 
> > > MTRR calling PAT was not a hack (as I suppose we did not have VMs at
> > > that time), although this can surely be improved.  As Intel SDM state
> > > below, both MTRR and PAT require the same procedure, and the PAT
> > > initialization sequence is defined in the MTRR section.
> > > 
> > > ===
> > > 11.12.4 Programming the PAT
> > >  :
> > > The operating system is responsible for insuring that changes to a PAT
> > > entry occur in a manner that maintains the consistency of the processor
> > > caches and translation lookaside buffers (TLB). This is accomplished by
> > > following the procedure as specified in Section 11.11.8, “MTRR
> > > Considerations in MP Systems,” for changing the value of an MTRR in a
> > > multiple processor system. It requires a specific sequence of
> > > operations that includes flushing the processors caches and TLBs.
> > > ===
> > > 
> > > > What I'm after is seeing if we can ultimately disable MTRR on kernel
> > > > code but still have PAT enabled. I realize you've mentioned BIOS code
> > > > may use some MTRR setup code but this is only true for some systems.
> > > > I know for a fact Xen cannot use MTRR, it seems qemu32 does not
> > > > enable
> > > > it either. So why not have the ability to skip through its set up ?
> > > 
> > > MTRR support has two meanings:
> > >  1) The kernel keeps the MTRR setup by BIOS.
> > >  2) The kernel modifies the MTRR setup.
> > > 
> > > I am in a position that we need 1) but 2).
> > 
> > I take it you meant "but not 2)" ? 
> 
> Yes. :)

OK -- we are in agreement but we know 1) is only needed for a portion of
systems: Xen and qemu32 systems fly with no MTRR set up, and as such it
would be incorrect to run MTRR code on such systems. To these systems
MTRR functionality code should be dead, since PAT currently depends on
MTRR PAT should also be dead but as the report you're fixing shows
it wasn't. That's an issue for qemu that uses the regular x86 init path
but not for Xen. Its different for Xen as the hypervisor is the one that
set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is:

void xen_start_kernel(void)
{
	...
        rdmsrl(MSR_IA32_CR_PAT, pat);                                           
        pat_init_cache_modes(pat); 
	...
}

Fortunately we only have to call pat_init_cache_modes() once, its
not per-CPU. Xen has shown then that you *can* live with PAT without
any of the complex MTRR setup / code. Please add to your table the
Xen case as well then as its important to consider. If you make it
a strong requirement to have MTRR enabled to enable PAT you'd
be disabling PAT on Xen guest boots.

As-is then your this patch which calls pat_disable() on mtrr_bp_init() for the
case where MTRR is disabled would essentially break PAT on Xen guests, so this
cannot be done. It is no longer true that if MTRR is disabled you can force
disable PAT. To do what you want you want to do we have to consider Xen.

I don't think its a good idea to keep PAT initialization meshed together
with MTRR and making it a strong requirement on enabling PAT. The MTRR
code is extremely complex. I'd like instead to encourage for us to
consider for this situation to let PAT become a first class citizen,
if MTRR is disabled but you've enabled PAT you should be able to use
it, just as Xen does. There are more reasons to enable such setup than
not to. Long term I'm advocating to see if we can get an ACPI legacy
MTRR flag that can tell us if the BIOS has MTRR ripped out, then we
can at run time also take advantage of ignoring PAT completely as well.

Note, if you insisted you didn't want to disable PAT on Xen, you could
in theory check for the subarch -- but note that the subarch is unused
yet on Xen, even though it was added to the x86 boot protocol years ago.
I have a slew of patches to make use of it to help put paravirt_enabled()
in the grave, but based on discussions with Ingo, we don't want to spread
use of the subarch in random x86 code paths, we want to compartamentalize
that. If you still want to follow your approach of just force-disabling
PAT on MTRR code if MTRR was disabled you'd have to use semantics to
figure out if the boot path came from Xen, to be more specific for Xen
PV guest types only... The current agreed approach to avoid directly
using subarch is to categorize differences between what some guests
need and bare metal under an x86 platform quirk and legacy set of components.

On the x86 init path we'd call something check for the subarch and based
on that set a series of x86 legacy features / quirks that need to be
disabled / enabled. We could add MTRR as one. I'm unifying some of this
with a bit of what goes into the ACPI IA-PC boot architecture, see
section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the
paravirt_enabled() series I'm working on happens to also dabble into
the no CMOS RTC case for Xen, generalizing this knocks a bit of birds
with one stone. I think we can do the same with MTRR but also be
proactive and see if we can get ACPI_FADT_NO_MTRR added as well for
a future ACPI spec to enable BIOS manufacturers to rip MTRR out.

[0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf 

/* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */
#define ACPI_FADT_LEGACY_DEVICES    (1)         /* 00: [V2] System has LPC or ISA bus devices */
#define ACPI_FADT_8042              (1<<1)      /* 01: [V3] System has an 8042 controller on port 60/64 */
#define ACPI_FADT_NO_VGA            (1<<2)      /* 02: [V4] It is not safe to probe for VGA hardware */
#define ACPI_FADT_NO_MSI            (1<<3)      /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */
#define ACPI_FADT_NO_ASPM           (1<<4)      /* 04: [V4] PCIe ASPM control must not be enabled */
#define ACPI_FADT_NO_CMOS_RTC       (1<<5)      /* 05: [V5] No CMOS real-time clock present */

> > There *are folks however who do
> > more as I noted earlier. Perhaps not now, but in the future I'd
> > encourage folks to rip MTRR out of their own BIOS, and enable a new
> > ACPI legacy flag to say "MTRR required". That'd eventually can help
> > bury MTRR for good while remaining backward compatible.
> 
> Well, BIOS using MTRR is better than BIOS setting page tables in the SMI
> handler.

Can some BIOSes be developed without MTRR? For instance I suspect Google might
be able to easily pull of ripping MTRR out of their BIOS if they didn't do it
already for the ChromeOS devices. If possible not only should it help with removing
complexity on the BIOS but not even having to think about that code *ever* running
on the kernel at all should be nice.

> The kernel can be ignorant of the MTRR setup as long as it does
> not modify it.

Sure, we're already there. The kernel no longer modifies the
MTRR setup unless of course you boot without PAT enabled. I think
we need to move beyond that to ACPI if we can to let regular
Linux boots never have to deal with MTRR at all. The code is
complex and nasty why not put let folks put a nail on the coffin for good?

> > I can read the above description to also say:
> > 
> > "Hey you need to implement PAT with the same skeleton code as MTRR"
> 
> No, I did not say that.  MTRR's rendezvous handler can be generalized to
> work with both MTRR and PAT.  We do not need two separate handlers.  In
> fact, it needs to be a single handler so that both can be initialized
> together.

I'm not sure if that's really needed. Doesn't PAT just require setting
the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?

> > If we do that, we can pave the way to deprecate MTRR as legacy for
> > good first on Linux.
> 
> I do not think such change will deprecate MTRR.

Not even for shiny new BIOSes? Post ACPI 5?

> It just means that Linux can enable PAT on virtual CPUs with PAT & !MTRR capability.
> 
> > > In fact, the kernel disabling MTRRs is the same as 2).
> > > 
> > > > I'll also note Xen managed to enable PAT only without enabling MTRR,
> > > > this was done through pat_init_cache_modes() -- not sure if this can
> > > > be leveraged for qemu32...
> > > 
> > > I am interested to know how Xen managed this.  Is this done by the Xen
> > > hypervisor initializes guest's PAT on behalf of the guest kernel?
> > 
> > Yup. And the cache read thingy was reading back its own setup, which
> > was different than what Linux used by default IIRC. Juergen can
> > elaborate more.
> 
> Yeah, I'd like to make sure that my changes won't break it.

I checked through code inspection and indeed, it seems it would break
Xen's PAT setup.

For the record: the issue here was code that should not run ran, that
is dead code ran. I'm working towards a generic solution for this.

  Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] x86/mm/pat: Change pat_disable() to emulate PAT table
       [not found]           ` <20160312115544.GA23410@pd.tnic>
@ 2016-03-15  0:29             ` Luis R. Rodriguez
       [not found]             ` <20160315002921.GG25147@wotan.suse.de>
  1 sibling, 0 replies; 23+ messages in thread
From: Luis R. Rodriguez @ 2016-03-15  0:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: jgross, Toshi Kani, boris.ostrovsky, x86, linux-kernel,
	paul.gortmaker, hpa, xen-devel, tglx, mingo


I like this approach more as it stuff more PAT setup on its own type
of calls, but:

On Sat, Mar 12, 2016 at 12:55:44PM +0100, Borislav Petkov wrote:
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index 10f8d4796240..5c442b4bd52a 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -759,8 +761,11 @@ void __init mtrr_bp_init(void)
>  		}
>  	}
>  
> -	if (!mtrr_enabled())
> +	if (!__mtrr_enabled) {
>  		pr_info("MTRR: Disabled\n");
> +		pat_disable("PAT disabled by MTRR");
> +		pat_setup();
> +	}
>  }

This hunk would break PAT on Xen.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] x86/mm/pat: Change pat_disable() to emulate PAT table
       [not found]             ` <20160315002921.GG25147@wotan.suse.de>
@ 2016-03-15  3:11               ` Toshi Kani
       [not found]               ` <1458011476.6393.327.camel@hpe.com>
  1 sibling, 0 replies; 23+ messages in thread
From: Toshi Kani @ 2016-03-15  3:11 UTC (permalink / raw)
  To: Luis R. Rodriguez, Borislav Petkov
  Cc: jgross, boris.ostrovsky, x86, linux-kernel, paul.gortmaker, hpa,
	xen-devel, tglx, mingo

[-- Attachment #1: Type: text/plain, Size: 977 bytes --]

On Tue, 2016-03-15 at 01:29 +0100, Luis R. Rodriguez wrote:
> I like this approach more as it stuff more PAT setup on its own type
> of calls, but:
> 
> On Sat, Mar 12, 2016 at 12:55:44PM +0100, Borislav Petkov wrote:
> > diff --git a/arch/x86/kernel/cpu/mtrr/main.c
> > b/arch/x86/kernel/cpu/mtrr/main.c
> > index 10f8d4796240..5c442b4bd52a 100644
> > --- a/arch/x86/kernel/cpu/mtrr/main.c
> > +++ b/arch/x86/kernel/cpu/mtrr/main.c
> > @@ -759,8 +761,11 @@ void __init mtrr_bp_init(void)
> >  		}
> >  	}
> >  
> > -	if (!mtrr_enabled())
> > +	if (!__mtrr_enabled) {
> >  		pr_info("MTRR: Disabled\n");
> > +		pat_disable("PAT disabled by MTRR");
> > +		pat_setup();
> > +	}
> >  }
> 
> This hunk would break PAT on Xen.

Can you try the attached patches?  They apply on top of my original patch-
set.  With this change, PAT code generally supports Xen, and the PAT init
code in Xen is now removed.  If they look OK, I will reorganize the patch
series.

Thanks,
-Toshi

[-- Attachment #2: 03-pat-emu.patch --]
[-- Type: text/x-patch, Size: 3382 bytes --]

From: Toshi Kani <toshi.kani@hpe.com>

Add support of PAT emulation that matches with the PAT MSR.

---
 arch/x86/mm/pat.c |   73 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 43 insertions(+), 30 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 1ff8aa9..565a478 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -40,7 +40,7 @@
 static bool boot_cpu_done;
 
 static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
-static void pat_disable_init(void);
+static void pat_emu_init(void);
 
 void pat_disable(const char *reason)
 {
@@ -52,7 +52,7 @@ void pat_disable(const char *reason)
 	__pat_enabled = 0;
 	pr_info("x86/PAT: %s\n", reason);
 
-	pat_disable_init();
+	pat_emu_init();
 }
 
 static int __init nopat(char *str)
@@ -239,40 +239,53 @@ static void pat_ap_init(u64 pat)
 	wrmsrl(MSR_IA32_CR_PAT, pat);
 }
 
-static void pat_disable_init(void)
+static void pat_emu_init(void)
 {
-	u64 pat;
-	static int disable_init_done;
+	u64 pat = 0;
+	static int emu_init_done;
 
-	if (disable_init_done)
+	if (emu_init_done)
 		return;
 
-	/*
-	 * No PAT. Emulate the PAT table that corresponds to the two
-	 * cache bits, PWT (Write Through) and PCD (Cache Disable). This
-	 * setup is the same as the BIOS default setup when the system
-	 * has PAT but the "nopat" boot option has been specified. This
-	 * emulated PAT table is used when MSR_IA32_CR_PAT returns 0.
-	 *
-	 * PTE encoding:
-	 *
-	 *       PCD
-	 *       |PWT  PAT
-	 *       ||    slot
-	 *       00    0    WB : _PAGE_CACHE_MODE_WB
-	 *       01    1    WT : _PAGE_CACHE_MODE_WT
-	 *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
-	 *       11    3    UC : _PAGE_CACHE_MODE_UC
-	 *
-	 * NOTE: When WC or WP is used, it is redirected to UC- per
-	 * the default setup in __cachemode2pte_tbl[].
-	 */
-	pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
-	      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+	if (cpu_has_pat) {
+		/*
+		 * CPU supports PAT. Initialize the PAT table to match with
+		 * the PAT MSR value. This setup is used by "nopat" boot
+		 * option, or by virtual machine environments which do not
+		 * support MTRRs but support PAT.
+		 *
+		 * If the MSR returns 0, it is considered invalid and emulate
+		 * as No PAT.
+		 */
+		rdmsrl(MSR_IA32_CR_PAT, pat);
+	}
+
+	if (!pat) {
+		/*
+		 * No PAT. Emulate the PAT table that corresponds to the two
+		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
+		 * This setup is also the same as the BIOS default setup.
+		 *
+		 * PTE encoding:
+		 *
+		 *       PCD
+		 *       |PWT  PAT
+		 *       ||    slot
+		 *       00    0    WB : _PAGE_CACHE_MODE_WB
+		 *       01    1    WT : _PAGE_CACHE_MODE_WT
+		 *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
+		 *       11    3    UC : _PAGE_CACHE_MODE_UC
+		 *
+		 * NOTE: When WC or WP is used, it is redirected to UC- per
+		 * the default setup in __cachemode2pte_tbl[].
+		 */
+		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
+		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+	}
 
 	pat_init_cache_modes(pat);
 
-	disable_init_done = 1;
+	emu_init_done = 1;
 }
 
 void pat_init(void)
@@ -281,7 +294,7 @@ void pat_init(void)
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 
 	if (!pat_enabled()) {
-		pat_disable_init();
+		pat_emu_init();
 		return;
 	}
 

[-- Attachment #3: 04-xen-mtrr.patch --]
[-- Type: text/x-patch, Size: 2197 bytes --]

From: Toshi Kani <toshi.kani@hpe.com>

Delete the PAT table initialization code from Xen as this case
is now supported by PAT.

---
 arch/x86/include/asm/mtrr.h     |    2 +-
 arch/x86/kernel/cpu/mtrr/main.c |    4 ++--
 arch/x86/xen/enlighten.c        |    9 ---------
 3 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index a965e74..77420c3 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -86,7 +86,7 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 }
 static inline void mtrr_bp_init(void)
 {
-	pat_disable("PAT disabled by MTRR");
+	pat_disable("PAT emulation");
 }
 
 #define mtrr_ap_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index d9e91f1..e10fba4 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -767,10 +767,10 @@ void __init mtrr_bp_init(void)
 
 		/*
 		 * PAT initialization relies on MTRR's rendezvous handler.
-		 * Disable PAT until the handler can initialize both features
+		 * Emulate PAT until the handler can initialize both features
 		 * independently.
 		 */
-		pat_disable("PAT disabled by MTRR");
+		pat_disable("PAT emulation");
 	}
 }
 
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index d09e4c9..c883f11 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -74,7 +74,6 @@
 #include <asm/mach_traps.h>
 #include <asm/mwait.h>
 #include <asm/pci_x86.h>
-#include <asm/pat.h>
 #include <asm/cpu.h>
 
 #ifdef CONFIG_ACPI
@@ -1510,7 +1509,6 @@ asmlinkage __visible void __init xen_start_kernel(void)
 {
 	struct physdev_set_iopl set_iopl;
 	unsigned long initrd_start = 0;
-	u64 pat;
 	int rc;
 
 	if (!xen_start_info)
@@ -1617,13 +1615,6 @@ asmlinkage __visible void __init xen_start_kernel(void)
 				   xen_start_info->nr_pages);
 	xen_reserve_special_pages();
 
-	/*
-	 * Modify the cache mode translation tables to match Xen's PAT
-	 * configuration.
-	 */
-	rdmsrl(MSR_IA32_CR_PAT, pat);
-	pat_init_cache_modes(pat);
-
 	/* keep using Xen gdt for now; no urgent need to change it */
 
 #ifdef CONFIG_X86_32

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] x86/mm/pat: Change pat_disable() to emulate PAT table
       [not found]               ` <1458011476.6393.327.camel@hpe.com>
@ 2016-03-15 11:01                 ` Borislav Petkov
       [not found]                 ` <20160315110148.GC4559@pd.tnic>
  2016-03-15 21:31                 ` Luis R. Rodriguez
  2 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2016-03-15 11:01 UTC (permalink / raw)
  To: Toshi Kani
  Cc: jgross, boris.ostrovsky, x86, linux-kernel, paul.gortmaker,
	Luis R. Rodriguez, hpa, xen-devel, tglx, mingo

On Mon, Mar 14, 2016 at 09:11:16PM -0600, Toshi Kani wrote:
> -	pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> -	      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> +	if (cpu_has_pat) {

Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast paths
static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX ugliness.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] x86/mm/pat: Change pat_disable() to emulate PAT table
       [not found]                 ` <20160315110148.GC4559@pd.tnic>
@ 2016-03-15 15:43                   ` Toshi Kani
       [not found]                   ` <1458056595.6393.332.camel@hpe.com>
  1 sibling, 0 replies; 23+ messages in thread
From: Toshi Kani @ 2016-03-15 15:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: jgross, boris.ostrovsky, x86, linux-kernel, paul.gortmaker,
	Luis R. Rodriguez, hpa, xen-devel, tglx, mingo

On Tue, 2016-03-15 at 11:01 +0000, Borislav Petkov wrote:
> On Mon, Mar 14, 2016 at 09:11:16PM -0600, Toshi Kani wrote:
> > -	pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC)
> > |
> > -	      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> > +	if (cpu_has_pat) {
> 
> Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast paths
> static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX ugliness.

'cpu_has_pat' is defined as 'boot_cpu_has(X86_FEATURE_PAT)'.  Do you mean
it should explicitly use 'boot_cpu_has(X86_FEATURE_PAT)'?

Thanks,
-Toshi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] x86/mm/pat: Change pat_disable() to emulate PAT table
       [not found]                   ` <1458056595.6393.332.camel@hpe.com>
@ 2016-03-15 15:47                     ` Borislav Petkov
       [not found]                     ` <20160315154731.GD4559@pd.tnic>
  1 sibling, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2016-03-15 15:47 UTC (permalink / raw)
  To: Toshi Kani
  Cc: jgross, boris.ostrovsky, x86, linux-kernel, paul.gortmaker,
	Luis R. Rodriguez, hpa, xen-devel, tglx, mingo

On Tue, Mar 15, 2016 at 09:43:15AM -0600, Toshi Kani wrote:
> > Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast paths
> > static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX ugliness.
> 
> 'cpu_has_pat' is defined as 'boot_cpu_has(X86_FEATURE_PAT)'.  Do you mean
> it should explicitly use 'boot_cpu_has(X86_FEATURE_PAT)'?

No, read what I said.

We use boot_cpu_has(<feature_bit>) on slow paths (i.e., init, bootup,
etc), where speed is not that important. static_cpu_has(<feature_bit>)
is an optimized version which should be used in hot paths.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] x86/mm/pat: Change pat_disable() to emulate PAT table
       [not found]                       ` <1458061883.6393.359.camel@hpe.com>
@ 2016-03-15 16:33                         ` Borislav Petkov
  0 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2016-03-15 16:33 UTC (permalink / raw)
  To: Toshi Kani
  Cc: jgross, boris.ostrovsky, x86, linux-kernel, paul.gortmaker,
	Luis R. Rodriguez, hpa, xen-devel, tglx, mingo

On Tue, Mar 15, 2016 at 11:11:23AM -0600, Toshi Kani wrote:
> While cpu_has_pat is the same as boot_cpu_has(X86_FEATURE_PAT), cpu_has_XXX
> should not be used.  So, this code needs to be changed to use
> boot_cpu_has(X86_FEATURE_PAT) directly.
> 
> Is this right?

Yes.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] x86/mm/pat: Change pat_disable() to emulate PAT table
       [not found]                     ` <20160315154731.GD4559@pd.tnic>
       [not found]                       ` <1458061883.6393.359.camel@hpe.com>
@ 2016-03-15 17:11                       ` Toshi Kani
  1 sibling, 0 replies; 23+ messages in thread
From: Toshi Kani @ 2016-03-15 17:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: jgross, boris.ostrovsky, x86, linux-kernel, paul.gortmaker,
	Luis R. Rodriguez, hpa, xen-devel, tglx, mingo

On Tue, 2016-03-15 at 16:47 +0100, Borislav Petkov wrote:
> On Tue, Mar 15, 2016 at 09:43:15AM -0600, Toshi Kani wrote:
> > > Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast
> > > paths static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX
> > > ugliness.
> > 
> > 'cpu_has_pat' is defined as 'boot_cpu_has(X86_FEATURE_PAT)'.  Do you
> > mean it should explicitly use 'boot_cpu_has(X86_FEATURE_PAT)'?
> 
> No, read what I said.
> 
> We use boot_cpu_has(<feature_bit>) on slow paths (i.e., init, bootup,
> etc), where speed is not that important. static_cpu_has(<feature_bit>)
> is an optimized version which should be used in hot paths.

Yes, I understand that part.  Let me rephrase my question.

This PAT code is on init paths and speed is not that important.  So, it
needs to use 'boot_cpu_has()' here.  'cpu_has_pat' is defined
as boot_cpu_has(X86_FEATURE_PAT), and hence it uses boot_cpu_has() already.
 
While cpu_has_pat is the same as boot_cpu_has(X86_FEATURE_PAT), cpu_has_XXX
should not be used.  So, this code needs to be changed to use
boot_cpu_has(X86_FEATURE_PAT) directly.

Is this right?

Thanks,
-Toshi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] x86/mm/pat: Change pat_disable() to emulate PAT table
       [not found]               ` <1458011476.6393.327.camel@hpe.com>
  2016-03-15 11:01                 ` Borislav Petkov
       [not found]                 ` <20160315110148.GC4559@pd.tnic>
@ 2016-03-15 21:31                 ` Luis R. Rodriguez
  2 siblings, 0 replies; 23+ messages in thread
From: Luis R. Rodriguez @ 2016-03-15 21:31 UTC (permalink / raw)
  To: Toshi Kani
  Cc: jgross, boris.ostrovsky, x86, linux-kernel, paul.gortmaker,
	Luis R. Rodriguez, Borislav Petkov, hpa, xen-devel, tglx, mingo

On Mon, Mar 14, 2016 at 09:11:16PM -0600, Toshi Kani wrote:
> On Tue, 2016-03-15 at 01:29 +0100, Luis R. Rodriguez wrote:
> > I like this approach more as it stuff more PAT setup on its own type
> > of calls, but:
> > 
> > On Sat, Mar 12, 2016 at 12:55:44PM +0100, Borislav Petkov wrote:
> > > diff --git a/arch/x86/kernel/cpu/mtrr/main.c
> > > b/arch/x86/kernel/cpu/mtrr/main.c
> > > index 10f8d4796240..5c442b4bd52a 100644
> > > --- a/arch/x86/kernel/cpu/mtrr/main.c
> > > +++ b/arch/x86/kernel/cpu/mtrr/main.c
> > > @@ -759,8 +761,11 @@ void __init mtrr_bp_init(void)
> > >  		}
> > >  	}
> > >  
> > > -	if (!mtrr_enabled())
> > > +	if (!__mtrr_enabled) {
> > >  		pr_info("MTRR: Disabled\n");
> > > +		pat_disable("PAT disabled by MTRR");
> > > +		pat_setup();
> > > +	}
> > >  }
> > 
> > This hunk would break PAT on Xen.
> 
> Can you try the attached patches?  They apply on top of my original patch-
> set.  With this change, PAT code generally supports Xen, and the PAT init
> code in Xen is now removed.  If they look OK, I will reorganize the patch
> series.

I don't have time to test this at this time but on a cursory review this should
in theory work, a few nitpicks:

> 
> Thanks,
> -Toshi

> From: Toshi Kani <toshi.kani@hpe.com>
> 
> Add support of PAT emulation that matches with the PAT MSR.
> 
> ---
>  arch/x86/mm/pat.c |   73 +++++++++++++++++++++++++++++++----------------------
>  1 file changed, 43 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index 1ff8aa9..565a478 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -40,7 +40,7 @@
>  static bool boot_cpu_done;
>  
>  static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
> -static void pat_disable_init(void);
> +static void pat_emu_init(void);
>  
>  void pat_disable(const char *reason)
>  {
> @@ -52,7 +52,7 @@ void pat_disable(const char *reason)
>  	__pat_enabled = 0;
>  	pr_info("x86/PAT: %s\n", reason);
>  
> -	pat_disable_init();
> +	pat_emu_init();
>  }
>  
>  static int __init nopat(char *str)
> @@ -239,40 +239,53 @@ static void pat_ap_init(u64 pat)
>  	wrmsrl(MSR_IA32_CR_PAT, pat);
>  }
>  
> -static void pat_disable_init(void)
> +static void pat_emu_init(void)
>  {
> -	u64 pat;
> -	static int disable_init_done;
> +	u64 pat = 0;
> +	static int emu_init_done;
>  
> -	if (disable_init_done)
> +	if (emu_init_done)
>  		return;
>  
> -	/*
> -	 * No PAT. Emulate the PAT table that corresponds to the two
> -	 * cache bits, PWT (Write Through) and PCD (Cache Disable). This
> -	 * setup is the same as the BIOS default setup when the system
> -	 * has PAT but the "nopat" boot option has been specified. This
> -	 * emulated PAT table is used when MSR_IA32_CR_PAT returns 0.
> -	 *
> -	 * PTE encoding:
> -	 *
> -	 *       PCD
> -	 *       |PWT  PAT
> -	 *       ||    slot
> -	 *       00    0    WB : _PAGE_CACHE_MODE_WB
> -	 *       01    1    WT : _PAGE_CACHE_MODE_WT
> -	 *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
> -	 *       11    3    UC : _PAGE_CACHE_MODE_UC
> -	 *
> -	 * NOTE: When WC or WP is used, it is redirected to UC- per
> -	 * the default setup in __cachemode2pte_tbl[].
> -	 */
> -	pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> -	      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> +	if (cpu_has_pat) {

First, this is not emulation, to be clear.

> +		/*
> +		 * CPU supports PAT. Initialize the PAT table to match with
> +		 * the PAT MSR value. This setup is used by "nopat" boot

Did you mean "nomtrr" option ? If not why would "nopat" land you here and if
"nopat" was used and you ended up here why would we want to go ahead and
read the MSR to keep the PAT set up ?

> +		 * option, or by virtual machine environments which do not
> +		 * support MTRRs but support PAT.

This might end up supporting PAT for some other virtual environments ;)

> +		 *
> +		 * If the MSR returns 0, it is considered invalid and emulate
> +		 * as No PAT.
> +		 */
> +		rdmsrl(MSR_IA32_CR_PAT, pat);
> +	}
> +
> +	if (!pat) {
> +		/*
> +		 * No PAT. Emulate the PAT table that corresponds to the two
> +		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> +		 * This setup is also the same as the BIOS default setup.
> +		 *
> +		 * PTE encoding:
> +		 *
> +		 *       PCD
> +		 *       |PWT  PAT
> +		 *       ||    slot
> +		 *       00    0    WB : _PAGE_CACHE_MODE_WB
> +		 *       01    1    WT : _PAGE_CACHE_MODE_WT
> +		 *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
> +		 *       11    3    UC : _PAGE_CACHE_MODE_UC
> +		 *
> +		 * NOTE: When WC or WP is used, it is redirected to UC- per
> +		 * the default setup in __cachemode2pte_tbl[].
> +		 */
> +		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> +		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> +	}

a) pat_emu_init() -- this should probably be renamed to something that includes
   not just emulation as for virtual environments that's not true and using
   emulation is very misleading for Xen.

b) Will this mean that then you can support bare metal with no MTRR but with PAT
   enabled too?

c) Why not just split this up to enable PAT init to be a first class citizen ?

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-15 23:48                 ` Toshi Kani
@ 2016-03-15 23:29                   ` Luis R. Rodriguez
  2016-03-17 21:56                     ` Toshi Kani
  0 siblings, 1 reply; 23+ messages in thread
From: Luis R. Rodriguez @ 2016-03-15 23:29 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Luis R. Rodriguez, Juergen Gross, Boris Ostrovsky, Matt Fleming,
	Olof Johansson, Paul Stewart, Borislav Petkov, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner, Paul Gortmaker, X86 ML,
	linux-kernel, xen-devel

On Tue, Mar 15, 2016 at 05:48:44PM -0600, Toshi Kani wrote:
> On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <toshi.kani@hpe.com>
> > > > wrote:
> > > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
>  :
> > > > > > What I'm after is seeing if we can ultimately disable MTRR on
> > > > > > kernel code but still have PAT enabled. I realize you've
> > > > > > mentioned BIOS code may use some MTRR setup code but this is only
> > > > > > true for some systems. I know for a fact Xen cannot use MTRR, it
> > > > > > seems qemu32 does not enable it either. So why not have the
> > > > > > ability to skip through its set up ?
> > > > > 
> > > > > MTRR support has two meanings:
> > > > >  1) The kernel keeps the MTRR setup by BIOS.
> > > > >  2) The kernel modifies the MTRR setup.
> > > > > 
> > > > > I am in a position that we need 1) but 2).
> > > > 
> > > > I take it you meant "but not 2)" ? 
> > > 
> > > Yes. :)
> > 
> > OK -- we are in agreement but we know 1) is only needed for a portion of
> > systems: Xen and qemu32 systems fly with no MTRR set up, and as such it
> > would be incorrect to run MTRR code on such systems. To these systems
> > MTRR functionality code should be dead, since PAT currently depends on
> > MTRR PAT should also be dead but as the report you're fixing shows
> > it wasn't. That's an issue for qemu that uses the regular x86 init path
> > but not for Xen. Its different for Xen as the hypervisor is the one that
> > set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is:
> 
> MTRR code does not have to be dead for the virtual machines with no MTRR
> support.  The code just needs to handle the disabled case properly.  I
> consider this is part of 1) that the kernel keeps the MTRR state as
> disabled.

For Xen it was possible to use PAT without any of the MTRR code running,
I don't see why its needed then and why other virtual machines that
only need PAT need it.

> > void xen_start_kernel(void)
> > {
> > 	...
> >         rdmsrl(MSR_IA32_CR_PAT, pat);                                    
> >        
> >         pat_init_cache_modes(pat); 
> > 	...
> > }
> > 
> > Fortunately we only have to call pat_init_cache_modes() once, its
> > not per-CPU. Xen has shown then that you *can* live with PAT without
> > any of the complex MTRR setup / code. Please add to your table the
> > Xen case as well then as its important to consider. If you make it
> > a strong requirement to have MTRR enabled to enable PAT you'd
> > be disabling PAT on Xen guest boots.
> 
> Yes, I understand that the original fix will break Xen.  Thanks for
> pointing this out!  In the next version (which is the same approach as the
> two additional patches I sent you yesterday), I am going to integrate the
> Xen's use-case into the framework.  xen_start_kernel() will no longer need
> to call pat_init_cache_modes() as a result.  So, please review the next
> version, and let me know if there is any issue.

Sure.

> > As-is then your this patch which calls pat_disable() on mtrr_bp_init()
> > for the case where MTRR is disabled would essentially break PAT on Xen
> > guests, so this cannot be done. It is no longer true that if MTRR is
> > disabled you can force disable PAT. To do what you want you want to do we
> > have to consider Xen.
> > 
> > I don't think its a good idea to keep PAT initialization meshed together
> > with MTRR and making it a strong requirement on enabling PAT. The MTRR
> > code is extremely complex. I'd like instead to encourage for us to
> > consider for this situation to let PAT become a first class citizen,
> > if MTRR is disabled but you've enabled PAT you should be able to use
> > it, just as Xen does. There are more reasons to enable such setup than
> > not to. Long term I'm advocating to see if we can get an ACPI legacy
> > MTRR flag that can tell us if the BIOS has MTRR ripped out, then we
> > can at run time also take advantage of ignoring PAT completely as well.
> > 
> > Note, if you insisted you didn't want to disable PAT on Xen, you could
> > in theory check for the subarch -- but note that the subarch is unused
> > yet on Xen, even though it was added to the x86 boot protocol years ago.
> > I have a slew of patches to make use of it to help put paravirt_enabled()
> > in the grave, but based on discussions with Ingo, we don't want to spread
> > use of the subarch in random x86 code paths, we want to compartamentalize
> > that. If you still want to follow your approach of just force-disabling
> > PAT on MTRR code if MTRR was disabled you'd have to use semantics to
> > figure out if the boot path came from Xen, to be more specific for Xen
> > PV guest types only... The current agreed approach to avoid directly
> > using subarch is to categorize differences between what some guests
> > need and bare metal under an x86 platform quirk and legacy set of
> > components.
> >
> > On the x86 init path we'd call something check for the subarch and based
> > on that set a series of x86 legacy features / quirks that need to be
> > disabled / enabled. We could add MTRR as one. I'm unifying some of this
> > with a bit of what goes into the ACPI IA-PC boot architecture, see
> > section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the
> > paravirt_enabled() series I'm working on happens to also dabble into
> > the no CMOS RTC case for Xen, generalizing this knocks a bit of birds
> > with one stone. I think we can do the same with MTRR but also be
> > proactive and see if we can get ACPI_FADT_NO_MTRR added as well for
> > a future ACPI spec to enable BIOS manufacturers to rip MTRR out.
> 
> We do not need subarch for this case since presence of the MTRR feature can
> be tested with CPUID and MSR.

But in this case its about two different cases:

No MTRR - but you need to emulate PAT
No MTRR - but you can just read what the hypervisor already set up

So it a given MTRR is disabled for both, what we need then is semantics
to distinguish between qemu32 and Xen PV. Curious to see what you use,
in your current new patch it was not clear what you did.

> > [0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf 
> > 
> > /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags)
> > [Vx]=Introduced in this FADT revision */
> > #define ACPI_FADT_LEGACY_DEVICES    (1)         /* 00: [V2] System has
> > LPC or ISA bus devices */
> > #define ACPI_FADT_8042              (1<<1)      /* 01: [V3] System has an
> > 8042 controller on port 60/64 */
> > #define ACPI_FADT_NO_VGA            (1<<2)      /* 02: [V4] It is not
> > safe to probe for VGA hardware */
> > #define ACPI_FADT_NO_MSI            (1<<3)      /* 03: [V4] Message
> > Signaled Interrupts (MSI) must not be enabled */
> > #define ACPI_FADT_NO_ASPM           (1<<4)      /* 04: [V4] PCIe ASPM
> > control must not be enabled */
> > #define ACPI_FADT_NO_CMOS_RTC       (1<<5)      /* 05: [V5] No CMOS real-
> > time clock present */
> > 
> > > > There *are folks however who do
> > > > more as I noted earlier. Perhaps not now, but in the future I'd
> > > > encourage folks to rip MTRR out of their own BIOS, and enable a new
> > > > ACPI legacy flag to say "MTRR required". That'd eventually can help
> > > > bury MTRR for good while remaining backward compatible.
> > > 
> > > Well, BIOS using MTRR is better than BIOS setting page tables in the
> > > SMI handler.
> > 
> > Can some BIOSes be developed without MTRR? For instance I suspect Google
> > might be able to easily pull of ripping MTRR out of their BIOS if they
> > didn't do it already for the ChromeOS devices. If possible not only
> > should it help with removing complexity on the BIOS but not even having
> > to think about that code *ever* running on the kernel at all should be
> > nice.
> 
> AFAIK, MTRR is the only way to specify UC attribute in physical mode on
> x86.  On ia64, one can simply set the UC bit to a physical address to
> specify UC attribute.  I wish we had something similar.

Whoa, you mean on the BIOS? This is BIG deal if true. Why haven't you just
said this a long time ago?

On x86 Linux code we now have ioremap_uc() that can't use MTRR behind the
scenes, why would something like this on the BIOS not be possible? That
ultimately uses set_pte_at(). What limitations are there on the BIOS
that prevent us from just using strong UC for PAT on the BIOS?

> > > The kernel can be ignorant of the MTRR setup as long as it does
> > > not modify it.
> > 
> > Sure, we're already there. The kernel no longer modifies the
> > MTRR setup unless of course you boot without PAT enabled. I think
> > we need to move beyond that to ACPI if we can to let regular
> > Linux boots never have to deal with MTRR at all. The code is
> > complex and nasty why not put let folks put a nail on the coffin for
> > good?
> 
> I think we are good as long as we do not modify it.  The complexity comes
> with the modification.

Ew. I look at that code and cannot comprehend why we'd ever want to keep
it always running.

> > > > I can read the above description to also say:
> > > > 
> > > > "Hey you need to implement PAT with the same skeleton code as MTRR"
> > > 
> > > No, I did not say that.  MTRR's rendezvous handler can be generalized
> > > to work with both MTRR and PAT.  We do not need two separate handlers.
> > >  In fact, it needs to be a single handler so that both can be
> > > initialized together.
> > 
> > I'm not sure if that's really needed. Doesn't PAT just require setting
> > the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?
> 
> No, it requires the same procedure as MTRR.

MTRR has a bunch of junk that is outside of the scope of the generic
procedure which I'd hope we can skip.

> > > > If we do that, we can pave the way to deprecate MTRR as legacy for
> > > > good first on Linux.
> > > 
> > > I do not think such change will deprecate MTRR.
> > 
> > Not even for shiny new BIOSes? Post ACPI 5?
> 
> Nope.

Hrm...

> > >  It just means that Linux can enable PAT on virtual CPUs with PAT &
> > > !MTRR capability.
> > > 
> > > > > In fact, the kernel disabling MTRRs is the same as 2).
> > > > > 
> > > > > > I'll also note Xen managed to enable PAT only without enabling
> > > > > > MTRR, this was done through pat_init_cache_modes() -- not sure if
> > > > > > this can be leveraged for qemu32...
> > > > > 
> > > > > I am interested to know how Xen managed this.  Is this done by the
> > > > > Xen hypervisor initializes guest's PAT on behalf of the guest
> > > > > kernel?
> > > > 
> > > > Yup. And the cache read thingy was reading back its own setup, which
> > > > was different than what Linux used by default IIRC. Juergen can
> > > > elaborate more.
> > > 
> > > Yeah, I'd like to make sure that my changes won't break it.
> > 
> > I checked through code inspection and indeed, it seems it would break
> > Xen's PAT setup.
> > 
> > For the record: the issue here was code that should not run ran, that
> > is dead code ran. I'm working towards a generic solution for this.
> 
> Please review the next version.

Sure..

  Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-15  0:15               ` [PATCH 2/2] x86/mtrr: Refactor PAT initialization code Luis R. Rodriguez
@ 2016-03-15 23:48                 ` Toshi Kani
  2016-03-15 23:29                   ` Luis R. Rodriguez
  0 siblings, 1 reply; 23+ messages in thread
From: Toshi Kani @ 2016-03-15 23:48 UTC (permalink / raw)
  To: Luis R. Rodriguez, Juergen Gross, Boris Ostrovsky, Matt Fleming,
	Olof Johansson, Paul Stewart
  Cc: Borislav Petkov, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Paul Gortmaker, X86 ML, linux-kernel, xen-devel

On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <toshi.kani@hpe.com>
> > > wrote:
> > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
 :
> > > > > What I'm after is seeing if we can ultimately disable MTRR on
> > > > > kernel code but still have PAT enabled. I realize you've
> > > > > mentioned BIOS code may use some MTRR setup code but this is only
> > > > > true for some systems. I know for a fact Xen cannot use MTRR, it
> > > > > seems qemu32 does not enable it either. So why not have the
> > > > > ability to skip through its set up ?
> > > > 
> > > > MTRR support has two meanings:
> > > >  1) The kernel keeps the MTRR setup by BIOS.
> > > >  2) The kernel modifies the MTRR setup.
> > > > 
> > > > I am in a position that we need 1) but 2).
> > > 
> > > I take it you meant "but not 2)" ? 
> > 
> > Yes. :)
> 
> OK -- we are in agreement but we know 1) is only needed for a portion of
> systems: Xen and qemu32 systems fly with no MTRR set up, and as such it
> would be incorrect to run MTRR code on such systems. To these systems
> MTRR functionality code should be dead, since PAT currently depends on
> MTRR PAT should also be dead but as the report you're fixing shows
> it wasn't. That's an issue for qemu that uses the regular x86 init path
> but not for Xen. Its different for Xen as the hypervisor is the one that
> set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is:

MTRR code does not have to be dead for the virtual machines with no MTRR
support.  The code just needs to handle the disabled case properly.  I
consider this is part of 1) that the kernel keeps the MTRR state as
disabled.

> void xen_start_kernel(void)
> {
> 	...
>         rdmsrl(MSR_IA32_CR_PAT, pat);                                    
>        
>         pat_init_cache_modes(pat); 
> 	...
> }
> 
> Fortunately we only have to call pat_init_cache_modes() once, its
> not per-CPU. Xen has shown then that you *can* live with PAT without
> any of the complex MTRR setup / code. Please add to your table the
> Xen case as well then as its important to consider. If you make it
> a strong requirement to have MTRR enabled to enable PAT you'd
> be disabling PAT on Xen guest boots.

Yes, I understand that the original fix will break Xen.  Thanks for
pointing this out!  In the next version (which is the same approach as the
two additional patches I sent you yesterday), I am going to integrate the
Xen's use-case into the framework.  xen_start_kernel() will no longer need
to call pat_init_cache_modes() as a result.  So, please review the next
version, and let me know if there is any issue.

> As-is then your this patch which calls pat_disable() on mtrr_bp_init()
> for the case where MTRR is disabled would essentially break PAT on Xen
> guests, so this cannot be done. It is no longer true that if MTRR is
> disabled you can force disable PAT. To do what you want you want to do we
> have to consider Xen.
> 
> I don't think its a good idea to keep PAT initialization meshed together
> with MTRR and making it a strong requirement on enabling PAT. The MTRR
> code is extremely complex. I'd like instead to encourage for us to
> consider for this situation to let PAT become a first class citizen,
> if MTRR is disabled but you've enabled PAT you should be able to use
> it, just as Xen does. There are more reasons to enable such setup than
> not to. Long term I'm advocating to see if we can get an ACPI legacy
> MTRR flag that can tell us if the BIOS has MTRR ripped out, then we
> can at run time also take advantage of ignoring PAT completely as well.
> 
> Note, if you insisted you didn't want to disable PAT on Xen, you could
> in theory check for the subarch -- but note that the subarch is unused
> yet on Xen, even though it was added to the x86 boot protocol years ago.
> I have a slew of patches to make use of it to help put paravirt_enabled()
> in the grave, but based on discussions with Ingo, we don't want to spread
> use of the subarch in random x86 code paths, we want to compartamentalize
> that. If you still want to follow your approach of just force-disabling
> PAT on MTRR code if MTRR was disabled you'd have to use semantics to
> figure out if the boot path came from Xen, to be more specific for Xen
> PV guest types only... The current agreed approach to avoid directly
> using subarch is to categorize differences between what some guests
> need and bare metal under an x86 platform quirk and legacy set of
> components.
>
> On the x86 init path we'd call something check for the subarch and based
> on that set a series of x86 legacy features / quirks that need to be
> disabled / enabled. We could add MTRR as one. I'm unifying some of this
> with a bit of what goes into the ACPI IA-PC boot architecture, see
> section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the
> paravirt_enabled() series I'm working on happens to also dabble into
> the no CMOS RTC case for Xen, generalizing this knocks a bit of birds
> with one stone. I think we can do the same with MTRR but also be
> proactive and see if we can get ACPI_FADT_NO_MTRR added as well for
> a future ACPI spec to enable BIOS manufacturers to rip MTRR out.

We do not need subarch for this case since presence of the MTRR feature can
be tested with CPUID and MSR.

> [0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf 
> 
> /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags)
> [Vx]=Introduced in this FADT revision */
> #define ACPI_FADT_LEGACY_DEVICES    (1)         /* 00: [V2] System has
> LPC or ISA bus devices */
> #define ACPI_FADT_8042              (1<<1)      /* 01: [V3] System has an
> 8042 controller on port 60/64 */
> #define ACPI_FADT_NO_VGA            (1<<2)      /* 02: [V4] It is not
> safe to probe for VGA hardware */
> #define ACPI_FADT_NO_MSI            (1<<3)      /* 03: [V4] Message
> Signaled Interrupts (MSI) must not be enabled */
> #define ACPI_FADT_NO_ASPM           (1<<4)      /* 04: [V4] PCIe ASPM
> control must not be enabled */
> #define ACPI_FADT_NO_CMOS_RTC       (1<<5)      /* 05: [V5] No CMOS real-
> time clock present */
> 
> > > There *are folks however who do
> > > more as I noted earlier. Perhaps not now, but in the future I'd
> > > encourage folks to rip MTRR out of their own BIOS, and enable a new
> > > ACPI legacy flag to say "MTRR required". That'd eventually can help
> > > bury MTRR for good while remaining backward compatible.
> > 
> > Well, BIOS using MTRR is better than BIOS setting page tables in the
> > SMI handler.
> 
> Can some BIOSes be developed without MTRR? For instance I suspect Google
> might be able to easily pull of ripping MTRR out of their BIOS if they
> didn't do it already for the ChromeOS devices. If possible not only
> should it help with removing complexity on the BIOS but not even having
> to think about that code *ever* running on the kernel at all should be
> nice.

AFAIK, MTRR is the only way to specify UC attribute in physical mode on
x86.  On ia64, one can simply set the UC bit to a physical address to
specify UC attribute.  I wish we had something similar.

> > The kernel can be ignorant of the MTRR setup as long as it does
> > not modify it.
> 
> Sure, we're already there. The kernel no longer modifies the
> MTRR setup unless of course you boot without PAT enabled. I think
> we need to move beyond that to ACPI if we can to let regular
> Linux boots never have to deal with MTRR at all. The code is
> complex and nasty why not put let folks put a nail on the coffin for
> good?

I think we are good as long as we do not modify it.  The complexity comes
with the modification.

> > > I can read the above description to also say:
> > > 
> > > "Hey you need to implement PAT with the same skeleton code as MTRR"
> > 
> > No, I did not say that.  MTRR's rendezvous handler can be generalized
> > to work with both MTRR and PAT.  We do not need two separate handlers.
> >  In fact, it needs to be a single handler so that both can be
> > initialized together.
> 
> I'm not sure if that's really needed. Doesn't PAT just require setting
> the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?

No, it requires the same procedure as MTRR.

> > > If we do that, we can pave the way to deprecate MTRR as legacy for
> > > good first on Linux.
> > 
> > I do not think such change will deprecate MTRR.
> 
> Not even for shiny new BIOSes? Post ACPI 5?

Nope.

> >  It just means that Linux can enable PAT on virtual CPUs with PAT &
> > !MTRR capability.
> > 
> > > > In fact, the kernel disabling MTRRs is the same as 2).
> > > > 
> > > > > I'll also note Xen managed to enable PAT only without enabling
> > > > > MTRR, this was done through pat_init_cache_modes() -- not sure if
> > > > > this can be leveraged for qemu32...
> > > > 
> > > > I am interested to know how Xen managed this.  Is this done by the
> > > > Xen hypervisor initializes guest's PAT on behalf of the guest
> > > > kernel?
> > > 
> > > Yup. And the cache read thingy was reading back its own setup, which
> > > was different than what Linux used by default IIRC. Juergen can
> > > elaborate more.
> > 
> > Yeah, I'd like to make sure that my changes won't break it.
> 
> I checked through code inspection and indeed, it seems it would break
> Xen's PAT setup.
> 
> For the record: the issue here was code that should not run ran, that
> is dead code ran. I'm working towards a generic solution for this.

Please review the next version.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-15 23:29                   ` Luis R. Rodriguez
@ 2016-03-17 21:56                     ` Toshi Kani
  2016-03-18  0:06                       ` Luis R. Rodriguez
  2016-04-09  2:04                       ` Luis R. Rodriguez
  0 siblings, 2 replies; 23+ messages in thread
From: Toshi Kani @ 2016-03-17 21:56 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Juergen Gross, Boris Ostrovsky, Matt Fleming, Olof Johansson,
	Paul Stewart, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	Thomas Gleixner, Paul Gortmaker, X86 ML, linux-kernel, xen-devel

On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> On Tue, Mar 15, 2016 at 05:48:44PM -0600, Toshi Kani wrote:
> > On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> > > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > > > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > > > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <toshi.kani@hpe.com>
> > > > > wrote:
> > > > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> >  :
 :
> > MTRR code does not have to be dead for the virtual machines with no
> > MTRR support.  The code just needs to handle the disabled case
> > properly.  I consider this is part of 1) that the kernel keeps the MTRR
> > state as disabled.
> 
> For Xen it was possible to use PAT without any of the MTRR code running,
> I don't see why its needed then and why other virtual machines that
> only need PAT need it.

Virtual BIOS does not need MTRRs since it does not manage the platform.

> >
 :
> > We do not need subarch for this case since presence of the MTRR feature
> > can be tested with CPUID and MSR.
> 
> But in this case its about two different cases:
> 
> No MTRR - but you need to emulate PAT
> No MTRR - but you can just read what the hypervisor already set up
> 
> So it a given MTRR is disabled for both, what we need then is semantics
> to distinguish between qemu32 and Xen PV. Curious to see what you use,
> in your current new patch it was not clear what you did.

X86_FEATURE_PAT tells us if CPU supports PAT. So, the kernel can
distinguish the two cases above without knowing qemu32 or Xen.


> > > 
 :
> > AFAIK, MTRR is the only way to specify UC attribute in physical mode on
> > x86.  On ia64, one can simply set the UC bit to a physical address to
> > specify UC attribute.  I wish we had something similar.
> 
> Whoa, you mean on the BIOS? This is BIG deal if true. Why haven't you
> just said this a long time ago?

Yes, BIOS.  I think I've told you before when I mentioned that BIOS might
need to manage fan speed.  Virt BIOS does not need to do such thing.

> On x86 Linux code we now have ioremap_uc() that can't use MTRR behind the
> scenes, why would something like this on the BIOS not be possible? That
> ultimately uses set_pte_at(). What limitations are there on the BIOS
> that prevent us from just using strong UC for PAT on the BIOS?

Because it requires to run in virtual mode with page tables.

> > > > The kernel can be ignorant of the MTRR setup as long as it does
> > > > not modify it.
> > > 
> > > Sure, we're already there. The kernel no longer modifies the
> > > MTRR setup unless of course you boot without PAT enabled. I think
> > > we need to move beyond that to ACPI if we can to let regular
> > > Linux boots never have to deal with MTRR at all. The code is
> > > complex and nasty why not put let folks put a nail on the coffin for
> > > good?
> > 
> > I think we are good as long as we do not modify it.  The complexity
> > comes with the modification.
> 
> Ew. I look at that code and cannot comprehend why we'd ever want to keep
> it always running.

The code is basically no-op on Xen.

> > > > > I can read the above description to also say:
> > > > > 
> > > > > "Hey you need to implement PAT with the same skeleton code as
> > > > > MTRR"
> > > > 
> > > > No, I did not say that.  MTRR's rendezvous handler can be
> > > > generalized to work with both MTRR and PAT.  We do not need two
> > > > separate handlers.  In fact, it needs to be a single handler so
> > > > that both can be initialized together.
> > > 
> > > I'm not sure if that's really needed. Doesn't PAT just require
> > > setting the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?
> > 
> > No, it requires the same procedure as MTRR.
> 
> MTRR has a bunch of junk that is outside of the scope of the generic
> procedure which I'd hope we can skip.

We can skip the part that modifies MTRR setup. I think that is the part you
think is a junk.

> > > > > 
 :
> > > >  It just means that Linux can enable PAT on virtual CPUs with PAT &
> > > > !MTRR capability.
> > > > 
> > > > > > In fact, the kernel disabling MTRRs is the same as 2).
> > > > > > 
> > > > > > > I'll also note Xen managed to enable PAT only without
> > > > > > > enabling MTRR, this was done through pat_init_cache_modes()
> > > > > > > -- not sure if this can be leveraged for qemu32...
> > > > > > 
> > > > > > I am interested to know how Xen managed this.  Is this done by
> > > > > > the Xen hypervisor initializes guest's PAT on behalf of the
> > > > > > guest kernel?
> > > > > 
> > > > > Yup. And the cache read thingy was reading back its own setup,
> > > > > which was different than what Linux used by default IIRC. Juergen
> > > > > can elaborate more.
> > > > 
> > > > Yeah, I'd like to make sure that my changes won't break it.
> > > 
> > > I checked through code inspection and indeed, it seems it would break
> > > Xen's PAT setup.
> > > 
> > > For the record: the issue here was code that should not run ran, that
> > > is dead code ran. I'm working towards a generic solution for this.
> > 
> > Please review the next version.
> 
> Sure..

Thanks!
-Toshi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-17 21:56                     ` Toshi Kani
@ 2016-03-18  0:06                       ` Luis R. Rodriguez
  2016-03-18 21:35                         ` Toshi Kani
  2016-04-09  2:04                       ` Luis R. Rodriguez
  1 sibling, 1 reply; 23+ messages in thread
From: Luis R. Rodriguez @ 2016-03-18  0:06 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Boris Ostrovsky, xen-devel, Thomas Gleixner, Matt Fleming,
	Paul Gortmaker, Borislav Petkov, X86 ML, Paul Stewart,
	Ingo Molnar, Olof Johansson, linux-kernel, H. Peter Anvin,
	Juergen Gross

On Mar 17, 2016 2:04 PM, "Toshi Kani" <toshi.kani@hpe.com> wrote:
>
> On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> > On Tue, Mar 15, 2016 at 05:48:44PM -0600, Toshi Kani wrote:
> > > On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> > > > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > > > > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > > > > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <toshi.kani@hpe.com>
> > > > > > wrote:
> > > > > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> > >  :
>  :
> > > MTRR code does not have to be dead for the virtual machines with no
> > > MTRR support.  The code just needs to handle the disabled case
> > > properly.  I consider this is part of 1) that the kernel keeps the MTRR
> > > state as disabled.
> >
> > For Xen it was possible to use PAT without any of the MTRR code running,
> > I don't see why its needed then and why other virtual machines that
> > only need PAT need it.
>
> Virtual BIOS does not need MTRRs since it does not manage the platform.

Unless if in dom0 and if some of this purposely wants to be punted
there to leverage existing kernel code. On the Xen thread I'm asking
about the implications of that and how/if Xen should be doing. We can
follow up on this there as its Xen specific.

> > > We do not need subarch for this case since presence of the MTRR feature
> > > can be tested with CPUID and MSR.
> >
> > But in this case its about two different cases:
> >
> > No MTRR - but you need to emulate PAT
> > No MTRR - but you can just read what the hypervisor already set up
> >
> > So it a given MTRR is disabled for both, what we need then is semantics
> > to distinguish between qemu32 and Xen PV. Curious to see what you use,
> > in your current new patch it was not clear what you did.
>
> X86_FEATURE_PAT tells us if CPU supports PAT. So, the kernel can
> distinguish the two cases above without knowing qemu32 or Xen.

Ok indeed, neat.

> > > AFAIK, MTRR is the only way to specify UC attribute in physical mode on
> > > x86.  On ia64, one can simply set the UC bit to a physical address to
> > > specify UC attribute.  I wish we had something similar.
> >
> > Whoa, you mean on the BIOS? This is BIG deal if true. Why haven't you
> > just said this a long time ago?
>
> Yes, BIOS.  I think I've told you before when I mentioned that BIOS might
> need to manage fan speed.  Virt BIOS does not need to do such thing.

You had mentioned befor BIOS uses MTRR, not that *the BIOS can only
MTRR for UC*. These are two very different things, hence my surprise.

> > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind the
> > scenes, why would something like this on the BIOS not be possible? That
> > ultimately uses set_pte_at(). What limitations are there on the BIOS
> > that prevent us from just using strong UC for PAT on the BIOS?
>
> Because it requires to run in virtual mode with page tables.

Ah... interesting... is UC really needed, what is the default? If the
default is used would there be an issue ? Can such work be deferred to
a later time ? It seems like a high burden to require on large piece
of legacy architecture to just blow a fan.

> > > > > The kernel can be ignorant of the MTRR setup as long as it does
> > > > > not modify it.
> > > >
> > > > Sure, we're already there. The kernel no longer modifies the
> > > > MTRR setup unless of course you boot without PAT enabled. I think
> > > > we need to move beyond that to ACPI if we can to let regular
> > > > Linux boots never have to deal with MTRR at all. The code is
> > > > complex and nasty why not put let folks put a nail on the coffin for
> > > > good?
> > >
> > > I think we are good as long as we do not modify it.  The complexity
> > > comes with the modification.
> >
> > Ew. I look at that code and cannot comprehend why we'd ever want to keep
> > it always running.
>
> The code is basically no-op on Xen.

Right, but if we can strive towards similar goals with more up to date
BIOSes as an option on bare metal why not.

> > > > > > I can read the above description to also say:
> > > > > >
> > > > > > "Hey you need to implement PAT with the same skeleton code as
> > > > > > MTRR"
> > > > >
> > > > > No, I did not say that.  MTRR's rendezvous handler can be
> > > > > generalized to work with both MTRR and PAT.  We do not need two
> > > > > separate handlers.  In fact, it needs to be a single handler so
> > > > > that both can be initialized together.
> > > >
> > > > I'm not sure if that's really needed. Doesn't PAT just require
> > > > setting the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?
> > >
> > > No, it requires the same procedure as MTRR.
> >
> > MTRR has a bunch of junk that is outside of the scope of the generic
> > procedure which I'd hope we can skip.
>
> We can skip the part that modifies MTRR setup. I think that is the part you
> think is a junk.

Sure, but the more we can avoid any of that code the better. For
example consider the cleanup patch to increase the chunk size, do we
even need the cleanup anymore ?

  Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-18  0:06                       ` Luis R. Rodriguez
@ 2016-03-18 21:35                         ` Toshi Kani
  2016-03-29 17:14                           ` Luis R. Rodriguez
  0 siblings, 1 reply; 23+ messages in thread
From: Toshi Kani @ 2016-03-18 21:35 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Boris Ostrovsky, xen-devel, Thomas Gleixner, Matt Fleming,
	Paul Gortmaker, Borislav Petkov, X86 ML, Paul Stewart,
	Ingo Molnar, Olof Johansson, linux-kernel, H. Peter Anvin,
	Juergen Gross

On Thu, 2016-03-17 at 17:06 -0700, Luis R. Rodriguez wrote:
> On Mar 17, 2016 2:04 PM, "Toshi Kani" <toshi.kani@hpe.com> wrote:
> > 
> > On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> > > On Tue, Mar 15, 2016 at 05:48:44PM -0600, Toshi Kani wrote:
> > > > On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> > > > > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > > > > > 
> >  :
> > > > MTRR code does not have to be dead for the virtual machines with no
> > > > MTRR support.  The code just needs to handle the disabled case
> > > > properly.  I consider this is part of 1) that the kernel keeps the
> > > > MTRR state as disabled.
> > > 
> > > For Xen it was possible to use PAT without any of the MTRR code
> > > running, I don't see why its needed then and why other virtual
> > > machines that only need PAT need it.
> > 
> > Virtual BIOS does not need MTRRs since it does not manage the platform.
> 
> Unless if in dom0 and if some of this purposely wants to be punted
> there to leverage existing kernel code. On the Xen thread I'm asking
> about the implications of that and how/if Xen should be doing. We can
> follow up on this there as its Xen specific.

I do not see any issue for Xen, but sure, we can discuss about Xen in a
separate thread.


 :
> > > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind
> > > the scenes, why would something like this on the BIOS not be
> > > possible? That ultimately uses set_pte_at(). What limitations are
> > > there on the BIOS that prevent us from just using strong UC for PAT
> > > on the BIOS?
> > 
> > Because it requires to run in virtual mode with page tables.
> 
> Ah... interesting... is UC really needed, what is the default? If the
> default is used would there be an issue ? Can such work be deferred to
> a later time ? It seems like a high burden to require on large piece
> of legacy architecture to just blow a fan.

The default cache attribute (i.e. ranges not covered by MTRRs) is specified
by the MTRR default type MSR.

 :
> > > > > > 
> > > MTRR has a bunch of junk that is outside of the scope of the generic
> > > procedure which I'd hope we can skip.
> > 
> > We can skip the part that modifies MTRR setup. I think that is the part
> > you think is a junk.
> 
> Sure, but the more we can avoid any of that code the better. For
> example consider the cleanup patch to increase the chunk size, do we
> even need the cleanup anymore ?

No, I do not think we need it now.  I think this cleanup was needed to
allocate more free slots to MTRRs.  We do not need to care about the number
of free slots as long as the kernel does not insert any new entry to MTRRs.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-18 21:35                         ` Toshi Kani
@ 2016-03-29 17:14                           ` Luis R. Rodriguez
  2016-03-29 21:46                             ` Toshi Kani
  0 siblings, 1 reply; 23+ messages in thread
From: Luis R. Rodriguez @ 2016-03-29 17:14 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Boris Ostrovsky, xen-devel, Thomas Gleixner, Matt Fleming,
	Paul Gortmaker, Borislav Petkov, X86 ML, Paul Stewart,
	Ingo Molnar, Olof Johansson, linux-kernel, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Stuart Hayes, Yinghai Lu,
	Prarit Bhargava

On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> On Thu, 2016-03-17 at 17:06 -0700, Luis R. Rodriguez wrote:
>> On Mar 17, 2016 2:04 PM, "Toshi Kani" <toshi.kani@hpe.com> wrote:
>> >
>> > On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
>> > > On Tue, Mar 15, 2016 at 05:48:44PM -0600, Toshi Kani wrote:
>> > > > On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
>> > > > > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
>> > > > > >
>> >  :
>> > > > MTRR code does not have to be dead for the virtual machines with no
>> > > > MTRR support.  The code just needs to handle the disabled case
>> > > > properly.  I consider this is part of 1) that the kernel keeps the
>> > > > MTRR state as disabled.
>> > >
>> > > For Xen it was possible to use PAT without any of the MTRR code
>> > > running, I don't see why its needed then and why other virtual
>> > > machines that only need PAT need it.
>> >
>> > Virtual BIOS does not need MTRRs since it does not manage the platform.
>>
>> Unless if in dom0 and if some of this purposely wants to be punted
>> there to leverage existing kernel code. On the Xen thread I'm asking
>> about the implications of that and how/if Xen should be doing. We can
>> follow up on this there as its Xen specific.
>
> I do not see any issue for Xen, but sure, we can discuss about Xen in a
> separate thread.

While on point -- I'll just wanted to clarify that a while ago you had
hinted we needed to have Xen return a valid type with
mtrr_type_lookup(), I then also explained how Xen disables MTRR [0],
it was however unclear if you still believe mtrr_type_lookup() is
needed on the guest side. Jan had pointed out that the Xen Hypervisor
implements the XENPF_read_memtype hypercall. On the recent thread I
posted [1] I got into my review of the prospects of implementing
support for using this hypercall on Linux xen guests and issues and
concerns with it. Please feel free to follow up there and we can take
up the other items below here as they relate to bare metal.

[0] http://lkml.kernel.org/r/20150903235429.GZ8051@wotan.suse.de
[1] http://lkml.kernel.org/r/CAB=NE6UTp0T=rbOcAg88iPsfBJneY7O5-3c11VfFgAPiepoTNg@mail.gmail.com

>> > > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind
>> > > the scenes, why would something like this on the BIOS not be
>> > > possible? That ultimately uses set_pte_at(). What limitations are
>> > > there on the BIOS that prevent us from just using strong UC for PAT
>> > > on the BIOS?
>> >
>> > Because it requires to run in virtual mode with page tables.
>>
>> Ah... interesting... is UC really needed, what is the default? If the
>> default is used would there be an issue ? Can such work be deferred to
>> a later time ? It seems like a high burden to require on large piece
>> of legacy architecture to just blow a fan.
>
> The default cache attribute (i.e. ranges not covered by MTRRs) is specified
> by the MTRR default type MSR.

Do we really need UC for the fan? What is the default for PAT? Can't
the same be used so that we way by default all ranges match what is
also the default by PAT? Would that really break fan control ? If we
have a match should't we be able to not have to worry about MTRRs at
all in-kernel even on bare metal?

Another option, which I've alluded to on the Xen thread is skipping
over the MTRR space from the e820 map. Is that not possible ? This
could be last resort... but which I'm hinting more for the Xen side of
things if we *really* need get_mtrr() on the Xen guest side of
things...

>> > > > > >
>> > > MTRR has a bunch of junk that is outside of the scope of the generic
>> > > procedure which I'd hope we can skip.
>> >
>> > We can skip the part that modifies MTRR setup. I think that is the part
>> > you think is a junk.
>>
>> Sure, but the more we can avoid any of that code the better. For
>> example consider the cleanup patch to increase the chunk size, do we
>> even need the cleanup anymore ?
>
> No, I do not think we need it now.  I think this cleanup was needed to
> allocate more free slots to MTRRs.  We do not need to care about the number
> of free slots as long as the kernel does not insert any new entry to MTRRs.

Beautiful, thanks.

 Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-29 17:14                           ` Luis R. Rodriguez
@ 2016-03-29 21:46                             ` Toshi Kani
  2016-03-29 22:12                               ` Luis R. Rodriguez
  0 siblings, 1 reply; 23+ messages in thread
From: Toshi Kani @ 2016-03-29 21:46 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Boris Ostrovsky, xen-devel, Thomas Gleixner, Matt Fleming,
	Paul Gortmaker, Borislav Petkov, X86 ML, Paul Stewart,
	Ingo Molnar, Olof Johansson, linux-kernel, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Stuart Hayes, Yinghai Lu,
	Prarit Bhargava

On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
> On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > On Thu, 2016-03-17 at 17:06 -0700, Luis R. Rodriguez wrote:
> > > On Mar 17, 2016 2:04 PM, "Toshi Kani" <toshi.kani@hpe.com> wrote:
> > > > 
 :
> > 
> > I do not see any issue for Xen, but sure, we can discuss about Xen in a
> > separate thread.
> 
> While on point -- I'll just wanted to clarify that a while ago you had
> hinted we needed to have Xen return a valid type with
> mtrr_type_lookup(), I then also explained how Xen disables MTRR [0],
> it was however unclear if you still believe mtrr_type_lookup() is
> needed on the guest side. Jan had pointed out that the Xen Hypervisor
> implements the XENPF_read_memtype hypercall. On the recent thread I
> posted [1] I got into my review of the prospects of implementing
> support for using this hypercall on Linux xen guests and issues and
> concerns with it. Please feel free to follow up there and we can take
> up the other items below here as they relate to bare metal.

What I said in the email [0] was that "When MTRRs are enabled, the kernel
needs to check through mtrr_type_lookup()".

Since Xen guests have MTRRs disabled, this statement does not apply. It
returns MTRR_TYPE_INVALID when disabled, and this is fine for the guests.

> [0] http://lkml.kernel.org/r/20150903235429.GZ8051@wotan.suse.de
> [1] http://lkml.kernel.org/r/CAB=NE6UTp0T=rbOcAg88iPsfBJneY7O5-3c11VfFgAP
> iepoTNg@mail.gmail.com
> 
> > > > > On x86 Linux code we now have ioremap_uc() that can't use MTRR
> > > > > behind the scenes, why would something like this on the BIOS not
> > > > > be possible? That ultimately uses set_pte_at(). What limitations
> > > > > are there on the BIOS that prevent us from just using strong UC
> > > > > for PAT on the BIOS?
> > > > 
> > > > Because it requires to run in virtual mode with page tables.
> > > 
> > > Ah... interesting... is UC really needed, what is the default? If the
> > > default is used would there be an issue ? Can such work be deferred
> > > to a later time ? It seems like a high burden to require on large
> > > piece of legacy architecture to just blow a fan.
> > 
> > The default cache attribute (i.e. ranges not covered by MTRRs) is
> > specified by the MTRR default type MSR.
> 
> Do we really need UC for the fan? 

When you say "we", are you referring Xen guests?  Xen guests do not need to
control the fan, so they do not need UC set in MTRRs.

In general, yes, MMIO registers need UC when they need to be accessed.

> What is the default for PAT? 

There is no such thing as the default for PAT. 

> Can't
> the same be used so that we way by default all ranges match what is
> also the default by PAT? Would that really break fan control ? If we
> have a match should't we be able to not have to worry about MTRRs at
> all in-kernel even on bare metal?

We do not need to know about BIOS impl, such as fan control, etc.  The
point is that if BIOS sets MTRRs, then the kernel keeps their setup.  If
(virtual) BIOS does not enable MTRRs, the kernel keeps them disabled.  We
just need not to mess with the setup.

> Another option, which I've alluded to on the Xen thread is skipping
> over the MTRR space from the e820 map. Is that not possible ? This
> could be last resort... but which I'm hinting more for the Xen side of
> things if we *really* need get_mtrr() on the Xen guest side of
> things...

There is no MTRR space in the e820 map since they are MSRs.  Since Xen
guests disable MTRRs, I do not think you have any issue here...

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-29 21:46                             ` Toshi Kani
@ 2016-03-29 22:12                               ` Luis R. Rodriguez
  2016-03-30  0:16                                 ` Toshi Kani
  0 siblings, 1 reply; 23+ messages in thread
From: Luis R. Rodriguez @ 2016-03-29 22:12 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Boris Ostrovsky, xen-devel, Thomas Gleixner, Matt Fleming,
	Paul Gortmaker, Borislav Petkov, X86 ML, Paul Stewart,
	Ingo Molnar, Olof Johansson, linux-kernel, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Stuart Hayes, Yinghai Lu,
	Prarit Bhargava

On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
>> On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
>> > On Thu, 2016-03-17 at 17:06 -0700, Luis R. Rodriguez wrote:
>> > > On Mar 17, 2016 2:04 PM, "Toshi Kani" <toshi.kani@hpe.com> wrote:
>> > > >
>  :
>> >
>> > I do not see any issue for Xen, but sure, we can discuss about Xen in a
>> > separate thread.
>>
>> While on point -- I'll just wanted to clarify that a while ago you had
>> hinted we needed to have Xen return a valid type with
>> mtrr_type_lookup(), I then also explained how Xen disables MTRR [0],
>> it was however unclear if you still believe mtrr_type_lookup() is
>> needed on the guest side. Jan had pointed out that the Xen Hypervisor
>> implements the XENPF_read_memtype hypercall. On the recent thread I
>> posted [1] I got into my review of the prospects of implementing
>> support for using this hypercall on Linux xen guests and issues and
>> concerns with it. Please feel free to follow up there and we can take
>> up the other items below here as they relate to bare metal.
>
> What I said in the email [0] was that "When MTRRs are enabled, the kernel
> needs to check through mtrr_type_lookup()".
>
> Since Xen guests have MTRRs disabled, this statement does not apply. It
> returns MTRR_TYPE_INVALID when disabled, and this is fine for the guests.

Ah, I see thanks for the clarification!

>> [0] http://lkml.kernel.org/r/20150903235429.GZ8051@wotan.suse.de
>> [1] http://lkml.kernel.org/r/CAB=NE6UTp0T=rbOcAg88iPsfBJneY7O5-3c11VfFgAP
>> iepoTNg@mail.gmail.com
>>
>> > > > > On x86 Linux code we now have ioremap_uc() that can't use MTRR
>> > > > > behind the scenes, why would something like this on the BIOS not
>> > > > > be possible? That ultimately uses set_pte_at(). What limitations
>> > > > > are there on the BIOS that prevent us from just using strong UC
>> > > > > for PAT on the BIOS?
>> > > >
>> > > > Because it requires to run in virtual mode with page tables.
>> > >
>> > > Ah... interesting... is UC really needed, what is the default? If the
>> > > default is used would there be an issue ? Can such work be deferred
>> > > to a later time ? It seems like a high burden to require on large
>> > > piece of legacy architecture to just blow a fan.
>> >
>> > The default cache attribute (i.e. ranges not covered by MTRRs) is
>> > specified by the MTRR default type MSR.
>>
>> Do we really need UC for the fan?
>
> When you say "we", are you referring Xen guests?  Xen guests do not need to
> control the fan, so they do not need UC set in MTRRs.
>
> In general, yes, MMIO registers need UC when they need to be accessed.

Curious, what does a BIOS do for fan control when MTRRs are disabled?

Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?
Wouldn't that help simplify the BIOS when systems are known as not
wanting to deal with reading MTRRs on the kernel front, even if its
just to read the setup ?

I'm trying to determine exactly why a BIOS cannot simply enable use an
alternative for what it needs for fan control and let the kernel live
without any MTRR code at run time as an option. Although the
documentation says that the same "procedure" is needed for PAT setup,
I see it possible to split the skeleton of the code and have each
peace of code live separately and compartmentalized, they'd just have
respective calls on the skeleton of the procedure.

>> What is the default for PAT?
>
> There is no such thing as the default for PAT.
>
>> Can't
>> the same be used so that we way by default all ranges match what is
>> also the default by PAT? Would that really break fan control ? If we
>> have a match should't we be able to not have to worry about MTRRs at
>> all in-kernel even on bare metal?
>
> We do not need to know about BIOS impl, such as fan control, etc.  The
> point is that if BIOS sets MTRRs, then the kernel keeps their setup.

Right, if the kernel no longer uses it directly it seems like an
aweful lot of code to keep updating simply for a BIOS requirement, I'm
trying to see if we can have the option to live without this
requirement.

> If (virtual) BIOS does not enable MTRRs, the kernel keeps them disabled.  We
> just need not to mess with the setup.

Sure, thanks! I'm trying to see if we can have a similar option on bare metal.

>> Another option, which I've alluded to on the Xen thread is skipping
>> over the MTRR space from the e820 map. Is that not possible ? This
>> could be last resort... but which I'm hinting more for the Xen side of
>> things if we *really* need get_mtrr() on the Xen guest side of
>> things...
>
> There is no MTRR space in the e820 map since they are MSRs.  Since Xen
> guests disable MTRRs, I do not think you have any issue here...

Xen seems to clip the e820 map given to a guest in certain MTRR
conditions, see init_e820(), this calls
machine_specific_memory_setup() which later clips MTRR if
mtrr_top_of_ram(). This is an Intel check that trims the e820 map if
MTRRs were found to be enabled and the default MTRR is not write-back.
If returns the address of the first non write-back variable MTRR, it
uses clip_to_limit() to limit the exposed memory [0], notice how
clip_to_limit() is also used to generally limit exposed memory through
the opt_mem boot parameter as well. Its not exactly clear why that's
done, but this looks very similar to the Linux MTRR cleanup -- see
x86_get_mtrr_mem_range().

[0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/e820.c

  Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-30  0:16                                 ` Toshi Kani
@ 2016-03-29 23:43                                   ` Luis R. Rodriguez
  2016-03-30  1:07                                     ` Toshi Kani
  0 siblings, 1 reply; 23+ messages in thread
From: Luis R. Rodriguez @ 2016-03-29 23:43 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Boris Ostrovsky, xen-devel, Thomas Gleixner, Matt Fleming,
	Paul Gortmaker, Borislav Petkov, X86 ML, Paul Stewart,
	Ingo Molnar, Olof Johansson, linux-kernel, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Stuart Hayes, Yinghai Lu,
	Prarit Bhargava

On Tue, Mar 29, 2016 at 5:16 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> On Tue, 2016-03-29 at 15:12 -0700, Luis R. Rodriguez wrote:
>> On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
>> > On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
>> > > On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <toshi.kani@hpe.com>
>> > > wrote:
>  :
>> > >
>> > > Do we really need UC for the fan?
>> >
>> > When you say "we", are you referring Xen guests?  Xen guests do not
>> > need to control the fan, so they do not need UC set in MTRRs.
>> >
>> > In general, yes, MMIO registers need UC when they need to be accessed.
>>
>> Curious, what does a BIOS do for fan control when MTRRs are disabled?
>
> You mean, when the kernel modified the MTRR setup and disabled them.

Nope, but the below is good to know!

I meant to ask about the case where the option the lets a user go in a
muck with BIOS settings to disable MTRR e xists and the user disables
MTRR. What would happen for fan control in such situations? I'd
imagine such cases allow for a system to exist with proper fan
control, and allow the kernel to boot without having to deal with the
pesky MTRRs at all, while PAT lives on, no?

> BIOS
> would assume the original setup and still access the registers.  This may
> lead to undefined behavior and may result in a system crash.
>
>> Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?
>
> Many BIOSes actually set the default type to UC.

Thanks, I asked as I saw my BIOS uses write-back by default. Good to
know there are different strategies.

> MTRRs then cover regular memory with WB.

When you say regular memory you mean everything else we see as RAM? I
was under the impression we'd only need MTRR for a special range of
memory, and its up to implementation how they are used. If you can use
MTRR to change the cache attribute for regular RAM and if this is
actually a requirement if the default MTRR is UC then one way or
another a BIOS seems to always require MTRR, either for UC setting for
fan control or WB for regular RAM, is that right?

>> Wouldn't that help simplify the BIOS when systems are known as not
>> wanting to deal with reading MTRRs on the kernel front, even if its
>> just to read the setup ?
>
> Nope.
>
>> I'm trying to determine exactly why a BIOS cannot simply enable use an
>> alternative for what it needs for fan control and let the kernel live
>> without any MTRR code at run time as an option. Although the
>> documentation says that the same "procedure" is needed for PAT setup,
>> I see it possible to split the skeleton of the code and have each
>> peace of code live separately and compartmentalized, they'd just have
>> respective calls on the skeleton of the procedure.
>
> I agree that the MTRR rendezvous handler can be improved for PAT, but I do
> not see a compelling reason to make such change now.  With my fix, I think
> the code works reasonably for Xen.

Agreed, don't think its needed now, my questions are for future optimizations.

>> > > What is the default for PAT?
>> >
>> > There is no such thing as the default for PAT.
>> >
>> > > Can't
>> > > the same be used so that we way by default all ranges match what is
>> > > also the default by PAT? Would that really break fan control ? If we
>> > > have a match should't we be able to not have to worry about MTRRs at
>> > > all in-kernel even on bare metal?
>> >
>> > We do not need to know about BIOS impl, such as fan control, etc.  The
>> > point is that if BIOS sets MTRRs, then the kernel keeps their setup.
>>
>> Right, if the kernel no longer uses it directly it seems like an
>> aweful lot of code to keep updating simply for a BIOS requirement, I'm
>> trying to see if we can have the option to live without this
>> requirement.
>
> Please be aware of the hibernation case. I think this procedure involves
> setting MTRRs back to the original setup.

Eek, right, so best just disable them if we can.

>> > If (virtual) BIOS does not enable MTRRs, the kernel keeps them
>> > disabled.  We just need not to mess with the setup.
>>
>> Sure, thanks! I'm trying to see if we can have a similar option on bare
>> metal.
>>
>> > > Another option, which I've alluded to on the Xen thread is skipping
>> > > over the MTRR space from the e820 map. Is that not possible ? This
>> > > could be last resort... but which I'm hinting more for the Xen side
>> > > of things if we *really* need get_mtrr() on the Xen guest side of
>> > > things...
>> >
>> > There is no MTRR space in the e820 map since they are MSRs.  Since Xen
>> > guests disable MTRRs, I do not think you have any issue here...
>>
>> Xen seems to clip the e820 map given to a guest in certain MTRR
>> conditions, see init_e820(), this calls
>> machine_specific_memory_setup() which later clips MTRR if
>> mtrr_top_of_ram(). This is an Intel check that trims the e820 map if
>> MTRRs were found to be enabled and the default MTRR is not write-back.
>> If returns the address of the first non write-back variable MTRR, it
>> uses clip_to_limit() to limit the exposed memory [0], notice how
>> clip_to_limit() is also used to generally limit exposed memory through
>> the opt_mem boot parameter as well. Its not exactly clear why that's
>> done, but this looks very similar to the Linux MTRR cleanup -- see
>> x86_get_mtrr_mem_range().
>>
>> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/e820.c
>
> It looks to me that the code makes sure all E820_RAM ranges in the e820
> table are covered by WB entries of MTRRs.  If not, it trims the e820 table.
>
> I suppose it tries to react on a case when someone modified MTRRs and
> resulted in mismatch with the e820 table.  I'd think you do not need this
> code as long as you do not modify the MTRR setup.

Great thanks for that -- another optimization possible.

  Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-29 22:12                               ` Luis R. Rodriguez
@ 2016-03-30  0:16                                 ` Toshi Kani
  2016-03-29 23:43                                   ` Luis R. Rodriguez
  0 siblings, 1 reply; 23+ messages in thread
From: Toshi Kani @ 2016-03-30  0:16 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Boris Ostrovsky, xen-devel, Thomas Gleixner, Matt Fleming,
	Paul Gortmaker, Borislav Petkov, X86 ML, Paul Stewart,
	Ingo Molnar, Olof Johansson, linux-kernel, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Stuart Hayes, Yinghai Lu,
	Prarit Bhargava

On Tue, 2016-03-29 at 15:12 -0700, Luis R. Rodriguez wrote:
> On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
> > > On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <toshi.kani@hpe.com>
> > > wrote:
 :
> > > 
> > > Do we really need UC for the fan?
> > 
> > When you say "we", are you referring Xen guests?  Xen guests do not
> > need to control the fan, so they do not need UC set in MTRRs.
> > 
> > In general, yes, MMIO registers need UC when they need to be accessed.
> 
> Curious, what does a BIOS do for fan control when MTRRs are disabled?

You mean, when the kernel modified the MTRR setup and disabled them.  BIOS
would assume the original setup and still access the registers.  This may
lead to undefined behavior and may result in a system crash.

> Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?

Many BIOSes actually set the default type to UC.  MTRRs then cover regular
memory with WB.

> Wouldn't that help simplify the BIOS when systems are known as not
> wanting to deal with reading MTRRs on the kernel front, even if its
> just to read the setup ?

Nope.

> I'm trying to determine exactly why a BIOS cannot simply enable use an
> alternative for what it needs for fan control and let the kernel live
> without any MTRR code at run time as an option. Although the
> documentation says that the same "procedure" is needed for PAT setup,
> I see it possible to split the skeleton of the code and have each
> peace of code live separately and compartmentalized, they'd just have
> respective calls on the skeleton of the procedure.

I agree that the MTRR rendezvous handler can be improved for PAT, but I do
not see a compelling reason to make such change now.  With my fix, I think
the code works reasonably for Xen.

> > > What is the default for PAT?
> > 
> > There is no such thing as the default for PAT.
> > 
> > > Can't
> > > the same be used so that we way by default all ranges match what is
> > > also the default by PAT? Would that really break fan control ? If we
> > > have a match should't we be able to not have to worry about MTRRs at
> > > all in-kernel even on bare metal?
> > 
> > We do not need to know about BIOS impl, such as fan control, etc.  The
> > point is that if BIOS sets MTRRs, then the kernel keeps their setup.
> 
> Right, if the kernel no longer uses it directly it seems like an
> aweful lot of code to keep updating simply for a BIOS requirement, I'm
> trying to see if we can have the option to live without this
> requirement.

Please be aware of the hibernation case. I think this procedure involves
setting MTRRs back to the original setup.

> > If (virtual) BIOS does not enable MTRRs, the kernel keeps them
> > disabled.  We just need not to mess with the setup.
> 
> Sure, thanks! I'm trying to see if we can have a similar option on bare
> metal.
> 
> > > Another option, which I've alluded to on the Xen thread is skipping
> > > over the MTRR space from the e820 map. Is that not possible ? This
> > > could be last resort... but which I'm hinting more for the Xen side
> > > of things if we *really* need get_mtrr() on the Xen guest side of
> > > things...
> > 
> > There is no MTRR space in the e820 map since they are MSRs.  Since Xen
> > guests disable MTRRs, I do not think you have any issue here...
> 
> Xen seems to clip the e820 map given to a guest in certain MTRR
> conditions, see init_e820(), this calls
> machine_specific_memory_setup() which later clips MTRR if
> mtrr_top_of_ram(). This is an Intel check that trims the e820 map if
> MTRRs were found to be enabled and the default MTRR is not write-back.
> If returns the address of the first non write-back variable MTRR, it
> uses clip_to_limit() to limit the exposed memory [0], notice how
> clip_to_limit() is also used to generally limit exposed memory through
> the opt_mem boot parameter as well. Its not exactly clear why that's
> done, but this looks very similar to the Linux MTRR cleanup -- see
> x86_get_mtrr_mem_range().
> 
> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/e820.c

It looks to me that the code makes sure all E820_RAM ranges in the e820
table are covered by WB entries of MTRRs.  If not, it trims the e820 table.

I suppose it tries to react on a case when someone modified MTRRs and
resulted in mismatch with the e820 table.  I'd think you do not need this
code as long as you do not modify the MTRR setup.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-30  1:07                                     ` Toshi Kani
@ 2016-03-30  0:34                                       ` Luis R. Rodriguez
  0 siblings, 0 replies; 23+ messages in thread
From: Luis R. Rodriguez @ 2016-03-30  0:34 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Boris Ostrovsky, xen-devel, Thomas Gleixner, Matt Fleming,
	Paul Gortmaker, Borislav Petkov, X86 ML, Paul Stewart,
	Ingo Molnar, Olof Johansson, linux-kernel, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Stuart Hayes, Yinghai Lu,
	Prarit Bhargava

On Tue, Mar 29, 2016 at 6:07 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> On Tue, 2016-03-29 at 16:43 -0700, Luis R. Rodriguez wrote:
>> I meant to ask about the case where the option the lets a user go in a
>> muck with BIOS settings to disable MTRR e xists and the user disables
>> MTRR. What would happen for fan control in such situations? I'd
>> imagine such cases allow for a system to exist with proper fan
>> control, and allow the kernel to boot without having to deal with the
>> pesky MTRRs at all, while PAT lives on, no?
>
> You mean user disables MTRRs from BIOS setup menu?

Yup!

> I am not a BIOS guy,
> but I do not think it offers such option when the code depends on it...

Darn, I'm pretty sure I've seen such option before... can't seem to
find such a toggle now.

>> When you say regular memory you mean everything else we see as RAM? I
>> was under the impression we'd only need MTRR for a special range of
>> memory, and its up to implementation how they are used. If you can use
>> MTRR to change the cache attribute for regular RAM and if this is
>> actually a requirement if the default MTRR is UC then one way or
>> another a BIOS seems to always require MTRR, either for UC setting for
>> fan control or WB for regular RAM, is that right?
>
> Right, in one way or the other, MTRRs set WB to RAM and UC to MMIO.  PAT is
> overwritten by MTRRs, so RAM must be set to WB.

I see... thanks....

  Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-29 23:43                                   ` Luis R. Rodriguez
@ 2016-03-30  1:07                                     ` Toshi Kani
  2016-03-30  0:34                                       ` Luis R. Rodriguez
  0 siblings, 1 reply; 23+ messages in thread
From: Toshi Kani @ 2016-03-30  1:07 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Boris Ostrovsky, xen-devel, Thomas Gleixner, Matt Fleming,
	Paul Gortmaker, Borislav Petkov, X86 ML, Paul Stewart,
	Ingo Molnar, Olof Johansson, linux-kernel, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Stuart Hayes, Yinghai Lu,
	Prarit Bhargava

On Tue, 2016-03-29 at 16:43 -0700, Luis R. Rodriguez wrote:
> On Tue, Mar 29, 2016 at 5:16 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > On Tue, 2016-03-29 at 15:12 -0700, Luis R. Rodriguez wrote:
> > > On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani <toshi.kani@hpe.com>
> > > wrote:
> > > > On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
> > > > > On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <toshi.kani@hpe.com>
> > > > > wrote:
> >  :
> > > > > 
> > > > > Do we really need UC for the fan?
> > > > 
> > > > When you say "we", are you referring Xen guests?  Xen guests do not
> > > > need to control the fan, so they do not need UC set in MTRRs.
> > > > 
> > > > In general, yes, MMIO registers need UC when they need to be
> > > > accessed.
> > > 
> > > Curious, what does a BIOS do for fan control when MTRRs are disabled?
> > 
> > You mean, when the kernel modified the MTRR setup and disabled them.
> 
> Nope, but the below is good to know!
> 
> I meant to ask about the case where the option the lets a user go in a
> muck with BIOS settings to disable MTRR e xists and the user disables
> MTRR. What would happen for fan control in such situations? I'd
> imagine such cases allow for a system to exist with proper fan
> control, and allow the kernel to boot without having to deal with the
> pesky MTRRs at all, while PAT lives on, no?

You mean user disables MTRRs from BIOS setup menu?  I am not a BIOS guy,
but I do not think it offers such option when the code depends on it...

> > BIOS would assume the original setup and still access the
> > registers.  This may lead to undefined behavior and may result in a
> > system crash.
> > 
> > > Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?
> > 
> > Many BIOSes actually set the default type to UC.
> 
> Thanks, I asked as I saw my BIOS uses write-back by default. Good to
> know there are different strategies.
> 
> > MTRRs then cover regular memory with WB.
> 
> When you say regular memory you mean everything else we see as RAM? I
> was under the impression we'd only need MTRR for a special range of
> memory, and its up to implementation how they are used. If you can use
> MTRR to change the cache attribute for regular RAM and if this is
> actually a requirement if the default MTRR is UC then one way or
> another a BIOS seems to always require MTRR, either for UC setting for
> fan control or WB for regular RAM, is that right?

Right, in one way or the other, MTRRs set WB to RAM and UC to MMIO.  PAT is
overwritten by MTRRs, so RAM must be set to WB.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-03-17 21:56                     ` Toshi Kani
  2016-03-18  0:06                       ` Luis R. Rodriguez
@ 2016-04-09  2:04                       ` Luis R. Rodriguez
  2016-04-11 14:30                         ` Toshi Kani
  1 sibling, 1 reply; 23+ messages in thread
From: Luis R. Rodriguez @ 2016-04-09  2:04 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Luis R. Rodriguez, Juergen Gross, Boris Ostrovsky, Matt Fleming,
	Olof Johansson, Paul Stewart, Borislav Petkov, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner, Paul Gortmaker, Paul McKenney,
	X86 ML, linux-kernel, xen-devel

On Thu, Mar 17, 2016 at 03:56:47PM -0600, Toshi Kani wrote:
> On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind the
> > scenes, why would something like this on the BIOS not be possible? That
> > ultimately uses set_pte_at(). What limitations are there on the BIOS
> > that prevent us from just using strong UC for PAT on the BIOS?
> 
> Because it requires to run in virtual mode with page tables.

I see now. Specifically, BIOSes run in real mode, and PAT uses
paging. Paging requires bit 31 on CR0 set (PG), and PG has no
effect if the PE flag (Protection Enable) bit 0 on CR0 is clear.
If PE is clear we have real mode, which is what the BIOS uses.

Stupid question then:

are there no use case for a BIOS to enter PE, even if just
limited to set paging attributes for instance. For the simple
sake of burying MTRR this seems worthy, but I wonder if there
are other paging needs a BIOS might find use for.

  Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
  2016-04-09  2:04                       ` Luis R. Rodriguez
@ 2016-04-11 14:30                         ` Toshi Kani
  0 siblings, 0 replies; 23+ messages in thread
From: Toshi Kani @ 2016-04-11 14:30 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Juergen Gross, Boris Ostrovsky, Matt Fleming, Olof Johansson,
	Paul Stewart, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	Thomas Gleixner, Paul Gortmaker, Paul McKenney, X86 ML,
	linux-kernel, xen-devel

On Sat, 2016-04-09 at 04:04 +0200, Luis R. Rodriguez wrote:
> On Thu, Mar 17, 2016 at 03:56:47PM -0600, Toshi Kani wrote:
> > 
> > On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> > > 
> > > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind
> > > the scenes, why would something like this on the BIOS not be
> > > possible? That ultimately uses set_pte_at(). What limitations are
> > > there on the BIOS that prevent us from just using strong UC for PAT
> > > on the BIOS?
>>
> > Because it requires to run in virtual mode with page tables.
>
> I see now. Specifically, BIOSes run in real mode, and PAT uses
> paging. Paging requires bit 31 on CR0 set (PG), and PG has no
> effect if the PE flag (Protection Enable) bit 0 on CR0 is clear.
> If PE is clear we have real mode, which is what the BIOS uses.
> 
> Stupid question then:
> 
> are there no use case for a BIOS to enter PE, even if just
> limited to set paging attributes for instance. For the simple
> sake of burying MTRR this seems worthy, but I wonder if there
> are other paging needs a BIOS might find use for.

The OS calls EFI in virtual mode.  BIOS SMI handler, however, is called in
physical mode.  Since it has the highest priority, it needs to finish as
soon as possible.  So, it typically does not make additional steps to
switch to virtual mode.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-04-11 14:30 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1457671546-13486-1-git-send-email-toshi.kani@hpe.com>
     [not found] ` <1457671546-13486-3-git-send-email-toshi.kani@hpe.com>
     [not found]   ` <20160311092400.GB4347@pd.tnic>
     [not found]     ` <1457722632.6393.130.camel@hpe.com>
     [not found]       ` <20160311221747.GC25147@wotan.suse.de>
     [not found]         ` <1457740571.6393.236.camel@hpe.com>
     [not found]           ` <CAB=NE6UE7pKz8bmpRrFmAW=ySY0JnGoRY=MYn2zfOH4g_H81hw@mail.gmail.com>
     [not found]             ` <1457745396.6393.257.camel@hpe.com>
2016-03-15  0:15               ` [PATCH 2/2] x86/mtrr: Refactor PAT initialization code Luis R. Rodriguez
2016-03-15 23:48                 ` Toshi Kani
2016-03-15 23:29                   ` Luis R. Rodriguez
2016-03-17 21:56                     ` Toshi Kani
2016-03-18  0:06                       ` Luis R. Rodriguez
2016-03-18 21:35                         ` Toshi Kani
2016-03-29 17:14                           ` Luis R. Rodriguez
2016-03-29 21:46                             ` Toshi Kani
2016-03-29 22:12                               ` Luis R. Rodriguez
2016-03-30  0:16                                 ` Toshi Kani
2016-03-29 23:43                                   ` Luis R. Rodriguez
2016-03-30  1:07                                     ` Toshi Kani
2016-03-30  0:34                                       ` Luis R. Rodriguez
2016-04-09  2:04                       ` Luis R. Rodriguez
2016-04-11 14:30                         ` Toshi Kani
     [not found] ` <1457671546-13486-2-git-send-email-toshi.kani@hpe.com>
     [not found]   ` <20160311091229.GA4347@pd.tnic>
     [not found]     ` <1457713660.6393.55.camel@hpe.com>
     [not found]       ` <20160311155439.GF4312@pd.tnic>
     [not found]         ` <1457724504.6393.151.camel@hpe.com>
     [not found]           ` <20160312115544.GA23410@pd.tnic>
2016-03-15  0:29             ` [PATCH 1/2] x86/mm/pat: Change pat_disable() to emulate PAT table Luis R. Rodriguez
     [not found]             ` <20160315002921.GG25147@wotan.suse.de>
2016-03-15  3:11               ` Toshi Kani
     [not found]               ` <1458011476.6393.327.camel@hpe.com>
2016-03-15 11:01                 ` Borislav Petkov
     [not found]                 ` <20160315110148.GC4559@pd.tnic>
2016-03-15 15:43                   ` Toshi Kani
     [not found]                   ` <1458056595.6393.332.camel@hpe.com>
2016-03-15 15:47                     ` Borislav Petkov
     [not found]                     ` <20160315154731.GD4559@pd.tnic>
     [not found]                       ` <1458061883.6393.359.camel@hpe.com>
2016-03-15 16:33                         ` Borislav Petkov
2016-03-15 17:11                       ` Toshi Kani
2016-03-15 21:31                 ` Luis R. Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).