From mboxrd@z Thu Jan  1 00:00:00 1970
From: Toshi Kani <toshi.kani@hpe.com>
Subject: Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
Date: Tue, 15 Mar 2016 17:48:44 -0600
Message-ID: <1458085724.6393.425.camel@hpe.com>
References: <1457671546-13486-1-git-send-email-toshi.kani@hpe.com>
	 <1457671546-13486-3-git-send-email-toshi.kani@hpe.com>
	 <20160311092400.GB4347@pd.tnic> <1457722632.6393.130.camel@hpe.com>
	 <20160311221747.GC25147@wotan.suse.de> <1457740571.6393.236.camel@hpe.com>
	 <CAB=NE6UE7pKz8bmpRrFmAW=ySY0JnGoRY=MYn2zfOH4g_H81hw@mail.gmail.com>
	 <1457745396.6393.257.camel@hpe.com> <20160315001501.GF25147@wotan.suse.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
Return-path: <linux-kernel-owner@vger.kernel.org>
In-Reply-To: <20160315001501.GF25147@wotan.suse.de>
Sender: linux-kernel-owner@vger.kernel.org
To: "Luis R. Rodriguez" <mcgrof@kernel.org>, Juergen Gross <jgross@suse.com>, Boris Ostrovsky <boris.ostrovsky@oracle.com>, Matt Fleming <matt@codeblueprint.co.uk>, Olof Johansson <olof@lixom.net>, Paul Stewart <pstew@chromium.org>
Cc: Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>, Paul Gortmaker <paul.gortmaker@windriver.com>, X86 ML <x86@kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <toshi.kani@hpe.com>
> > > wrote:
> > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
=C2=A0:
> > > > > What I'm after is seeing if we can ultimately disable MTRR on
> > > > > kernel code but still have PAT enabled. I realize you've
> > > > > mentioned BIOS code may use some MTRR setup code but this is =
only
> > > > > true for some systems. I know for a fact Xen cannot use MTRR,=
 it
> > > > > seems qemu32 does not enable it either. So why not have the
> > > > > ability to skip through its set up ?
> > > >=20
> > > > MTRR support has two meanings:
> > > > =C2=A01) The kernel keeps the MTRR setup by BIOS.
> > > > =C2=A02) The kernel modifies the MTRR setup.
> > > >=20
> > > > I am in a position that we need 1) but 2).
> > >=20
> > > I take it you meant "but not 2)" ?=C2=A0
> >=20
> > Yes. :)
>=20
> OK -- we are in agreement but we know 1) is only needed for a portion=
 of
> systems: Xen and qemu32 systems fly with no MTRR set up, and as such =
it
> would be incorrect to run MTRR code on such systems. To these systems
> MTRR functionality code should be dead, since PAT currently depends o=
n
> MTRR PAT should also be dead but as the report you're fixing shows
> it wasn't. That's an issue for qemu that uses the regular x86 init pa=
th
> but not for Xen. Its different for Xen as the hypervisor is the one t=
hat
> set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is=
:

MTRR code does not have to be dead for the virtual machines with no MTR=
R
support. =C2=A0The code just needs to handle the disabled case properly=
=2E =C2=A0I
consider this is part of 1) that the kernel keeps the MTRR state as
disabled.

> void xen_start_kernel(void)
> {
> 	...
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0rdmsrl(MSR_IA32_CR_PA=
T,=C2=A0pat);=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0pat_init_cache_modes(=
pat);=C2=A0
> 	...
> }
>=20
> Fortunately we only have to call pat_init_cache_modes() once, its
> not per-CPU. Xen has shown then that you *can* live with PAT without
> any of the complex MTRR setup / code. Please add to your table the
> Xen case as well then as its important to consider. If you make it
> a strong requirement to have MTRR enabled to enable PAT you'd
> be disabling PAT on Xen guest boots.

Yes, I understand that the original fix will break Xen. =C2=A0Thanks fo=
r
pointing this out! =C2=A0In the next version (which is the same approac=
h as the
two additional patches I sent you yesterday), I am going to integrate t=
he
Xen's use-case into the framework. =C2=A0xen_start_kernel() will no lon=
ger need
to call=C2=A0pat_init_cache_modes() as a result. =C2=A0So, please revie=
w the next
version, and let me know if there is any issue.

> As-is then your this patch which calls pat_disable() on mtrr_bp_init(=
)
> for the case where MTRR is disabled would essentially break PAT on Xe=
n
> guests, so this cannot be done. It is no longer true that if MTRR is
> disabled you can force disable PAT. To do what you want you want to d=
o we
> have to consider Xen.
>=20
> I don't think its a good idea to keep PAT initialization meshed toget=
her
> with MTRR and making it a strong requirement on enabling PAT. The MTR=
R
> code is extremely complex. I'd like instead to encourage for us to
> consider for this situation to let PAT become a first class citizen,
> if MTRR is disabled but you've enabled PAT you should be able to use
> it, just as Xen does. There are more reasons to enable such setup tha=
n
> not to. Long term I'm advocating to see if we can get an ACPI legacy
> MTRR flag that can tell us if the BIOS has MTRR ripped out, then we
> can at run time also take advantage of ignoring PAT completely as wel=
l.
>=20
> Note, if you insisted you didn't want to disable PAT on Xen, you coul=
d
> in theory check for the subarch -- but note that the subarch is unuse=
d
> yet on Xen, even though it was added to the x86 boot protocol years a=
go.
> I have a slew of patches to make use of it to help put paravirt_enabl=
ed()
> in the grave, but based on discussions with Ingo, we don't want to sp=
read
> use of the subarch in random x86 code paths, we want to compartamenta=
lize
> that. If you still want to follow your approach of just force-disabli=
ng
> PAT on MTRR code if MTRR was disabled you'd have to use semantics to
> figure out if the boot path came from Xen, to be more specific for Xe=
n
> PV guest types only... The current agreed approach to avoid directly
> using subarch is to categorize differences between what some guests
> need and bare metal under an x86 platform quirk and legacy set of
> components.
>
> On the x86 init path we'd call something check for the subarch and ba=
sed
> on that set a series of x86 legacy features / quirks that need to be
> disabled / enabled. We could add MTRR as one. I'm unifying some of th=
is
> with a bit of what goes into the ACPI IA-PC boot architecture, see
> section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the
> paravirt_enabled() series I'm working on happens to also dabble into
> the no CMOS RTC case for Xen, generalizing this knocks a bit of birds
> with one stone. I think we can do the same with MTRR but also be
> proactive and see if we can get ACPI_FADT_NO_MTRR added as well for
> a future ACPI spec to enable BIOS manufacturers to rip MTRR out.

We do not need subarch for this case since presence of the MTRR feature=
 can
be tested with CPUID and MSR.

> [0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf=C2=A0
>=20
> /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags)
> [Vx]=3DIntroduced in this FADT revision */
> #define ACPI_FADT_LEGACY_DEVICES=C2=A0=C2=A0=C2=A0=C2=A0(1)=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* 00: [V2] System has
> LPC or ISA bus devices */
> #define ACPI_FADT_8042=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(1<<1)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0/* 01: [V3] System has an
> 8042 controller on port 60/64 */
> #define ACPI_FADT_NO_VGA=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0(1<<2)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/*=
 02: [V4] It is not
> safe to probe for VGA hardware */
> #define ACPI_FADT_NO_MSI=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0(1<<3)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/*=
 03: [V4] Message
> Signaled Interrupts (MSI) must not be enabled */
> #define ACPI_FADT_NO_ASPM=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0(1<<4)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* 04: [=
V4] PCIe ASPM
> control must not be enabled */
> #define ACPI_FADT_NO_CMOS_RTC=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
(1<<5)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* 05: [V5] No CMOS real-
> time clock present */
>=20
> > > There *are folks however who do
> > > more as I noted earlier. Perhaps not now, but in the future I'd
> > > encourage folks to rip MTRR out of their own BIOS, and enable a n=
ew
> > > ACPI legacy flag to say "MTRR required". That'd eventually can he=
lp
> > > bury MTRR for good while remaining backward compatible.
> >=20
> > Well, BIOS using MTRR is better than BIOS setting page tables in th=
e
> > SMI handler.
>=20
> Can some BIOSes be developed without MTRR? For instance I suspect Goo=
gle
> might be able to easily pull of ripping MTRR out of their BIOS if the=
y
> didn't do it already for the ChromeOS devices. If possible not only
> should it help with removing complexity on the BIOS but not even havi=
ng
> to think about that code *ever* running on the kernel at all should b=
e
> nice.

AFAIK, MTRR is the only way to specify UC attribute in physical mode on
x86. =C2=A0On ia64, one can simply set the UC bit to a physical address=
 to
specify UC attribute. =C2=A0I wish we had something similar.

> > The kernel can be ignorant of the MTRR setup as long as it does
> > not modify it.
>=20
> Sure, we're already there. The kernel no longer modifies the
> MTRR setup unless of course you boot without PAT enabled. I think
> we need to move beyond that to ACPI if we can to let regular
> Linux boots never have to deal with MTRR at all. The code is
> complex and nasty why not put let folks put a nail on the coffin for
> good?

I think we are good as long as we do not modify it. =C2=A0The complexit=
y comes
with the modification.

> > > I can read the above description to also say:
> > >=20
> > > "Hey you need to implement PAT with the same skeleton code as MTR=
R"
> >=20
> > No, I did not say that. =C2=A0MTRR's rendezvous handler can be gene=
ralized
> > to work with both MTRR and PAT. =C2=A0We do not need two separate h=
andlers.
> > =C2=A0In fact, it needs to be a single handler so that both can be
> > initialized together.
>=20
> I'm not sure if that's really needed. Doesn't PAT just require settin=
g
> the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?

No, it requires the same procedure as MTRR.

> > > If we do that, we can pave the way to deprecate MTRR as legacy fo=
r
> > > good first on Linux.
> >=20
> > I do not think such change will deprecate MTRR.
>=20
> Not even for shiny new BIOSes? Post ACPI 5?

Nope.

> > =C2=A0It just means that Linux can enable PAT on virtual CPUs with =
PAT &
> > !MTRR capability.
> >=20
> > > > In fact, the kernel disabling MTRRs is the same as 2).
> > > >=20
> > > > > I'll also note Xen managed to enable PAT only without enablin=
g
> > > > > MTRR, this was done through pat_init_cache_modes() -- not sur=
e if
> > > > > this can be leveraged for qemu32...
> > > >=20
> > > > I am interested to know how Xen managed this.=C2=A0=C2=A0Is thi=
s done by the
> > > > Xen hypervisor initializes guest's PAT on behalf of the guest
> > > > kernel?
> > >=20
> > > Yup. And the cache read thingy was reading back its own setup, wh=
ich
> > > was different than what Linux used by default IIRC. Juergen can
> > > elaborate more.
> >=20
> > Yeah, I'd like to make sure that my changes won't break it.
>=20
> I checked through code inspection and indeed, it seems it would break
> Xen's PAT setup.
>=20
> For the record: the issue here was code that should not run ran, that
> is dead code ran. I'm working towards a generic solution for this.

Please review the next version.

Thanks,
-Toshi