From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754299AbcCOAPJ (ORCPT <rfc822;w@1wt.eu>);
	Mon, 14 Mar 2016 20:15:09 -0400
Received: from mx2.suse.de ([195.135.220.15]:57100 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750853AbcCOAPF (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 14 Mar 2016 20:15:05 -0400
Date: Tue, 15 Mar 2016 01:15:01 +0100
From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: Toshi Kani <toshi.kani@hpe.com>, Juergen Gross <jgross@suse.com>,
        Boris Ostrovsky <boris.ostrovsky@oracle.com>,
        Matt Fleming <matt@codeblueprint.co.uk>,
        Olof Johansson <olof@lixom.net>, Paul Stewart <pstew@chromium.org>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>, Borislav Petkov <bp@alien8.de>,
        Ingo Molnar <mingo@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Paul Gortmaker <paul.gortmaker@windriver.com>, X86 ML <x86@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        xen-devel@lists.xensource.com
Subject: Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
Message-ID: <20160315001501.GF25147@wotan.suse.de>
References: <1457671546-13486-1-git-send-email-toshi.kani@hpe.com>
 <1457671546-13486-3-git-send-email-toshi.kani@hpe.com>
 <20160311092400.GB4347@pd.tnic>
 <1457722632.6393.130.camel@hpe.com>
 <20160311221747.GC25147@wotan.suse.de>
 <1457740571.6393.236.camel@hpe.com>
 <CAB=NE6UE7pKz8bmpRrFmAW=ySY0JnGoRY=MYn2zfOH4g_H81hw@mail.gmail.com>
 <1457745396.6393.257.camel@hpe.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1457745396.6393.257.camel@hpe.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
> > > > > On Fri, 2016-03-11 at 10:24 +0100, Borislav Petkov wrote:
> > > > > > On Thu, Mar 10, 2016 at 09:45:46PM -0700, Toshi Kani wrote:
> > > > > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > > > > handler that initializes PAT as part of MTRR initialization.
>  :
> > > > > 
> > > > > No, it does not fix it. The problem in this particular case, i.e.
> > > > > MTRR disabled by its MSR, is that mtrr_bp_init() calls pat_init()
> > > > > (as PAT enabled) and initializes PAT on BSP. After APs are
> > > > > launched, we need the MTRR's rendezvous handler to initialize PAT
> > > > > on APs to be consistent with BSP. However, MTRR rendezvous handler
> > > > > is no-op since MTRR is disabled.
> > > > 
> > > > This seems like a hack on enabling PAT through MTRR code, can we have
> > > > a PAT rendezvous handler on its own, or provide a generic rendezvous
> > > > handler that lets you deal with whatever interfaces need setup. Then
> > > > conflicts can just be negotiated early.
> > > 
> > > The MTRR code can be enhanced so that the rendezvous handler can handle
> > > MTRR and PAT state independently.  I noted this case as (*) in the
> > > table of this patch description.  This is a separate item, however.
> > > 
> > > MTRR calling PAT was not a hack (as I suppose we did not have VMs at
> > > that time), although this can surely be improved.  As Intel SDM state
> > > below, both MTRR and PAT require the same procedure, and the PAT
> > > initialization sequence is defined in the MTRR section.
> > > 
> > > ===
> > > 11.12.4 Programming the PAT
> > >  :
> > > The operating system is responsible for insuring that changes to a PAT
> > > entry occur in a manner that maintains the consistency of the processor
> > > caches and translation lookaside buffers (TLB). This is accomplished by
> > > following the procedure as specified in Section 11.11.8, “MTRR
> > > Considerations in MP Systems,” for changing the value of an MTRR in a
> > > multiple processor system. It requires a specific sequence of
> > > operations that includes flushing the processors caches and TLBs.
> > > ===
> > > 
> > > > What I'm after is seeing if we can ultimately disable MTRR on kernel
> > > > code but still have PAT enabled. I realize you've mentioned BIOS code
> > > > may use some MTRR setup code but this is only true for some systems.
> > > > I know for a fact Xen cannot use MTRR, it seems qemu32 does not
> > > > enable
> > > > it either. So why not have the ability to skip through its set up ?
> > > 
> > > MTRR support has two meanings:
> > >  1) The kernel keeps the MTRR setup by BIOS.
> > >  2) The kernel modifies the MTRR setup.
> > > 
> > > I am in a position that we need 1) but 2).
> > 
> > I take it you meant "but not 2)" ? 
> 
> Yes. :)

OK -- we are in agreement but we know 1) is only needed for a portion of
systems: Xen and qemu32 systems fly with no MTRR set up, and as such it
would be incorrect to run MTRR code on such systems. To these systems
MTRR functionality code should be dead, since PAT currently depends on
MTRR PAT should also be dead but as the report you're fixing shows
it wasn't. That's an issue for qemu that uses the regular x86 init path
but not for Xen. Its different for Xen as the hypervisor is the one that
set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is:

void xen_start_kernel(void)
{
	...
        rdmsrl(MSR_IA32_CR_PAT, pat);                                           
        pat_init_cache_modes(pat); 
	...
}

Fortunately we only have to call pat_init_cache_modes() once, its
not per-CPU. Xen has shown then that you *can* live with PAT without
any of the complex MTRR setup / code. Please add to your table the
Xen case as well then as its important to consider. If you make it
a strong requirement to have MTRR enabled to enable PAT you'd
be disabling PAT on Xen guest boots.

As-is then your this patch which calls pat_disable() on mtrr_bp_init() for the
case where MTRR is disabled would essentially break PAT on Xen guests, so this
cannot be done. It is no longer true that if MTRR is disabled you can force
disable PAT. To do what you want you want to do we have to consider Xen.

I don't think its a good idea to keep PAT initialization meshed together
with MTRR and making it a strong requirement on enabling PAT. The MTRR
code is extremely complex. I'd like instead to encourage for us to
consider for this situation to let PAT become a first class citizen,
if MTRR is disabled but you've enabled PAT you should be able to use
it, just as Xen does. There are more reasons to enable such setup than
not to. Long term I'm advocating to see if we can get an ACPI legacy
MTRR flag that can tell us if the BIOS has MTRR ripped out, then we
can at run time also take advantage of ignoring PAT completely as well.

Note, if you insisted you didn't want to disable PAT on Xen, you could
in theory check for the subarch -- but note that the subarch is unused
yet on Xen, even though it was added to the x86 boot protocol years ago.
I have a slew of patches to make use of it to help put paravirt_enabled()
in the grave, but based on discussions with Ingo, we don't want to spread
use of the subarch in random x86 code paths, we want to compartamentalize
that. If you still want to follow your approach of just force-disabling
PAT on MTRR code if MTRR was disabled you'd have to use semantics to
figure out if the boot path came from Xen, to be more specific for Xen
PV guest types only... The current agreed approach to avoid directly
using subarch is to categorize differences between what some guests
need and bare metal under an x86 platform quirk and legacy set of components.

On the x86 init path we'd call something check for the subarch and based
on that set a series of x86 legacy features / quirks that need to be
disabled / enabled. We could add MTRR as one. I'm unifying some of this
with a bit of what goes into the ACPI IA-PC boot architecture, see
section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the
paravirt_enabled() series I'm working on happens to also dabble into
the no CMOS RTC case for Xen, generalizing this knocks a bit of birds
with one stone. I think we can do the same with MTRR but also be
proactive and see if we can get ACPI_FADT_NO_MTRR added as well for
a future ACPI spec to enable BIOS manufacturers to rip MTRR out.

[0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf 

/* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */
#define ACPI_FADT_LEGACY_DEVICES    (1)         /* 00: [V2] System has LPC or ISA bus devices */
#define ACPI_FADT_8042              (1<<1)      /* 01: [V3] System has an 8042 controller on port 60/64 */
#define ACPI_FADT_NO_VGA            (1<<2)      /* 02: [V4] It is not safe to probe for VGA hardware */
#define ACPI_FADT_NO_MSI            (1<<3)      /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */
#define ACPI_FADT_NO_ASPM           (1<<4)      /* 04: [V4] PCIe ASPM control must not be enabled */
#define ACPI_FADT_NO_CMOS_RTC       (1<<5)      /* 05: [V5] No CMOS real-time clock present */

> > There *are folks however who do
> > more as I noted earlier. Perhaps not now, but in the future I'd
> > encourage folks to rip MTRR out of their own BIOS, and enable a new
> > ACPI legacy flag to say "MTRR required". That'd eventually can help
> > bury MTRR for good while remaining backward compatible.
> 
> Well, BIOS using MTRR is better than BIOS setting page tables in the SMI
> handler.

Can some BIOSes be developed without MTRR? For instance I suspect Google might
be able to easily pull of ripping MTRR out of their BIOS if they didn't do it
already for the ChromeOS devices. If possible not only should it help with removing
complexity on the BIOS but not even having to think about that code *ever* running
on the kernel at all should be nice.

> The kernel can be ignorant of the MTRR setup as long as it does
> not modify it.

Sure, we're already there. The kernel no longer modifies the
MTRR setup unless of course you boot without PAT enabled. I think
we need to move beyond that to ACPI if we can to let regular
Linux boots never have to deal with MTRR at all. The code is
complex and nasty why not put let folks put a nail on the coffin for good?

> > I can read the above description to also say:
> > 
> > "Hey you need to implement PAT with the same skeleton code as MTRR"
> 
> No, I did not say that.  MTRR's rendezvous handler can be generalized to
> work with both MTRR and PAT.  We do not need two separate handlers.  In
> fact, it needs to be a single handler so that both can be initialized
> together.

I'm not sure if that's really needed. Doesn't PAT just require setting
the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?

> > If we do that, we can pave the way to deprecate MTRR as legacy for
> > good first on Linux.
> 
> I do not think such change will deprecate MTRR.

Not even for shiny new BIOSes? Post ACPI 5?

> It just means that Linux can enable PAT on virtual CPUs with PAT & !MTRR capability.
> 
> > > In fact, the kernel disabling MTRRs is the same as 2).
> > > 
> > > > I'll also note Xen managed to enable PAT only without enabling MTRR,
> > > > this was done through pat_init_cache_modes() -- not sure if this can
> > > > be leveraged for qemu32...
> > > 
> > > I am interested to know how Xen managed this.  Is this done by the Xen
> > > hypervisor initializes guest's PAT on behalf of the guest kernel?
> > 
> > Yup. And the cache read thingy was reading back its own setup, which
> > was different than what Linux used by default IIRC. Juergen can
> > elaborate more.
> 
> Yeah, I'd like to make sure that my changes won't break it.

I checked through code inspection and indeed, it seems it would break
Xen's PAT setup.

For the record: the issue here was code that should not run ran, that
is dead code ran. I'm working towards a generic solution for this.

  Luis