From mboxrd@z Thu Jan 1 00:00:00 1970 From: Toshi Kani Subject: Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code Date: Tue, 15 Mar 2016 17:48:44 -0600 Message-ID: <1458085724.6393.425.camel@hpe.com> References: <1457671546-13486-1-git-send-email-toshi.kani@hpe.com> <1457671546-13486-3-git-send-email-toshi.kani@hpe.com> <20160311092400.GB4347@pd.tnic> <1457722632.6393.130.camel@hpe.com> <20160311221747.GC25147@wotan.suse.de> <1457740571.6393.236.camel@hpe.com> <1457745396.6393.257.camel@hpe.com> <20160315001501.GF25147@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <20160315001501.GF25147@wotan.suse.de> Sender: linux-kernel-owner@vger.kernel.org To: "Luis R. Rodriguez" , Juergen Gross , Boris Ostrovsky , Matt Fleming , Olof Johansson , Paul Stewart Cc: Borislav Petkov , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , Paul Gortmaker , X86 ML , "linux-kernel@vger.kernel.org" , xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote: > On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote: > > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote: > > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani > > > wrote: > > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote: > > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote: =C2=A0: > > > > > What I'm after is seeing if we can ultimately disable MTRR on > > > > > kernel code but still have PAT enabled. I realize you've > > > > > mentioned BIOS code may use some MTRR setup code but this is = only > > > > > true for some systems. I know for a fact Xen cannot use MTRR,= it > > > > > seems qemu32 does not enable it either. So why not have the > > > > > ability to skip through its set up ? > > > >=20 > > > > MTRR support has two meanings: > > > > =C2=A01) The kernel keeps the MTRR setup by BIOS. > > > > =C2=A02) The kernel modifies the MTRR setup. > > > >=20 > > > > I am in a position that we need 1) but 2). > > >=20 > > > I take it you meant "but not 2)" ?=C2=A0 > >=20 > > Yes. :) >=20 > OK -- we are in agreement but we know 1) is only needed for a portion= of > systems: Xen and qemu32 systems fly with no MTRR set up, and as such = it > would be incorrect to run MTRR code on such systems. To these systems > MTRR functionality code should be dead, since PAT currently depends o= n > MTRR PAT should also be dead but as the report you're fixing shows > it wasn't. That's an issue for qemu that uses the regular x86 init pa= th > but not for Xen. Its different for Xen as the hypervisor is the one t= hat > set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is= : MTRR code does not have to be dead for the virtual machines with no MTR= R support. =C2=A0The code just needs to handle the disabled case properly= =2E =C2=A0I consider this is part of 1) that the kernel keeps the MTRR state as disabled. > void xen_start_kernel(void) > { > ... > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0rdmsrl(MSR_IA32_CR_PA= T,=C2=A0pat);=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0pat_init_cache_modes(= pat);=C2=A0 > ... > } >=20 > Fortunately we only have to call pat_init_cache_modes() once, its > not per-CPU. Xen has shown then that you *can* live with PAT without > any of the complex MTRR setup / code. Please add to your table the > Xen case as well then as its important to consider. If you make it > a strong requirement to have MTRR enabled to enable PAT you'd > be disabling PAT on Xen guest boots. Yes, I understand that the original fix will break Xen. =C2=A0Thanks fo= r pointing this out! =C2=A0In the next version (which is the same approac= h as the two additional patches I sent you yesterday), I am going to integrate t= he Xen's use-case into the framework. =C2=A0xen_start_kernel() will no lon= ger need to call=C2=A0pat_init_cache_modes() as a result. =C2=A0So, please revie= w the next version, and let me know if there is any issue. > As-is then your this patch which calls pat_disable() on mtrr_bp_init(= ) > for the case where MTRR is disabled would essentially break PAT on Xe= n > guests, so this cannot be done. It is no longer true that if MTRR is > disabled you can force disable PAT. To do what you want you want to d= o we > have to consider Xen. >=20 > I don't think its a good idea to keep PAT initialization meshed toget= her > with MTRR and making it a strong requirement on enabling PAT. The MTR= R > code is extremely complex. I'd like instead to encourage for us to > consider for this situation to let PAT become a first class citizen, > if MTRR is disabled but you've enabled PAT you should be able to use > it, just as Xen does. There are more reasons to enable such setup tha= n > not to. Long term I'm advocating to see if we can get an ACPI legacy > MTRR flag that can tell us if the BIOS has MTRR ripped out, then we > can at run time also take advantage of ignoring PAT completely as wel= l. >=20 > Note, if you insisted you didn't want to disable PAT on Xen, you coul= d > in theory check for the subarch -- but note that the subarch is unuse= d > yet on Xen, even though it was added to the x86 boot protocol years a= go. > I have a slew of patches to make use of it to help put paravirt_enabl= ed() > in the grave, but based on discussions with Ingo, we don't want to sp= read > use of the subarch in random x86 code paths, we want to compartamenta= lize > that. If you still want to follow your approach of just force-disabli= ng > PAT on MTRR code if MTRR was disabled you'd have to use semantics to > figure out if the boot path came from Xen, to be more specific for Xe= n > PV guest types only... The current agreed approach to avoid directly > using subarch is to categorize differences between what some guests > need and bare metal under an x86 platform quirk and legacy set of > components. > > On the x86 init path we'd call something check for the subarch and ba= sed > on that set a series of x86 legacy features / quirks that need to be > disabled / enabled. We could add MTRR as one. I'm unifying some of th= is > with a bit of what goes into the ACPI IA-PC boot architecture, see > section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the > paravirt_enabled() series I'm working on happens to also dabble into > the no CMOS RTC case for Xen, generalizing this knocks a bit of birds > with one stone. I think we can do the same with MTRR but also be > proactive and see if we can get ACPI_FADT_NO_MTRR added as well for > a future ACPI spec to enable BIOS manufacturers to rip MTRR out. We do not need subarch for this case since presence of the MTRR feature= can be tested with CPUID and MSR. > [0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf=C2=A0 >=20 > /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) > [Vx]=3DIntroduced in this FADT revision */ > #define ACPI_FADT_LEGACY_DEVICES=C2=A0=C2=A0=C2=A0=C2=A0(1)=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* 00: [V2] System has > LPC or ISA bus devices */ > #define ACPI_FADT_8042=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(1<<1)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0/* 01: [V3] System has an > 8042 controller on port 60/64 */ > #define ACPI_FADT_NO_VGA=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0(1<<2)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/*= 02: [V4] It is not > safe to probe for VGA hardware */ > #define ACPI_FADT_NO_MSI=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0(1<<3)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/*= 03: [V4] Message > Signaled Interrupts (MSI) must not be enabled */ > #define ACPI_FADT_NO_ASPM=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0(1<<4)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* 04: [= V4] PCIe ASPM > control must not be enabled */ > #define ACPI_FADT_NO_CMOS_RTC=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= (1<<5)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* 05: [V5] No CMOS real- > time clock present */ >=20 > > > There *are folks however who do > > > more as I noted earlier. Perhaps not now, but in the future I'd > > > encourage folks to rip MTRR out of their own BIOS, and enable a n= ew > > > ACPI legacy flag to say "MTRR required". That'd eventually can he= lp > > > bury MTRR for good while remaining backward compatible. > >=20 > > Well, BIOS using MTRR is better than BIOS setting page tables in th= e > > SMI handler. >=20 > Can some BIOSes be developed without MTRR? For instance I suspect Goo= gle > might be able to easily pull of ripping MTRR out of their BIOS if the= y > didn't do it already for the ChromeOS devices. If possible not only > should it help with removing complexity on the BIOS but not even havi= ng > to think about that code *ever* running on the kernel at all should b= e > nice. AFAIK, MTRR is the only way to specify UC attribute in physical mode on x86. =C2=A0On ia64, one can simply set the UC bit to a physical address= to specify UC attribute. =C2=A0I wish we had something similar. > > The kernel can be ignorant of the MTRR setup as long as it does > > not modify it. >=20 > Sure, we're already there. The kernel no longer modifies the > MTRR setup unless of course you boot without PAT enabled. I think > we need to move beyond that to ACPI if we can to let regular > Linux boots never have to deal with MTRR at all. The code is > complex and nasty why not put let folks put a nail on the coffin for > good? I think we are good as long as we do not modify it. =C2=A0The complexit= y comes with the modification. > > > I can read the above description to also say: > > >=20 > > > "Hey you need to implement PAT with the same skeleton code as MTR= R" > >=20 > > No, I did not say that. =C2=A0MTRR's rendezvous handler can be gene= ralized > > to work with both MTRR and PAT. =C2=A0We do not need two separate h= andlers. > > =C2=A0In fact, it needs to be a single handler so that both can be > > initialized together. >=20 > I'm not sure if that's really needed. Doesn't PAT just require settin= g > the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP? No, it requires the same procedure as MTRR. > > > If we do that, we can pave the way to deprecate MTRR as legacy fo= r > > > good first on Linux. > >=20 > > I do not think such change will deprecate MTRR. >=20 > Not even for shiny new BIOSes? Post ACPI 5? Nope. > > =C2=A0It just means that Linux can enable PAT on virtual CPUs with = PAT & > > !MTRR capability. > >=20 > > > > In fact, the kernel disabling MTRRs is the same as 2). > > > >=20 > > > > > I'll also note Xen managed to enable PAT only without enablin= g > > > > > MTRR, this was done through pat_init_cache_modes() -- not sur= e if > > > > > this can be leveraged for qemu32... > > > >=20 > > > > I am interested to know how Xen managed this.=C2=A0=C2=A0Is thi= s done by the > > > > Xen hypervisor initializes guest's PAT on behalf of the guest > > > > kernel? > > >=20 > > > Yup. And the cache read thingy was reading back its own setup, wh= ich > > > was different than what Linux used by default IIRC. Juergen can > > > elaborate more. > >=20 > > Yeah, I'd like to make sure that my changes won't break it. >=20 > I checked through code inspection and indeed, it seems it would break > Xen's PAT setup. >=20 > For the record: the issue here was code that should not run ran, that > is dead code ran. I'm working towards a generic solution for this. Please review the next version. Thanks, -Toshi