On 11.07.22 16:18, Chuck Zmudzinski wrote:
> On 7/5/22 12:14 PM, Borislav Petkov wrote:
>> On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
>>> Re-using pat_disabled like you do in your suggestion below won't
>>> work, because mtrr_bp_init() calls pat_disable() when MTRRs
>>> appear to be disabled (from the kernel's view). The goal is to
>>> honor "nopat" without honoring any other calls to pat_disable().
>>
>> Actually, the current goal is to adjust Xen dom0 because:
>>
>> 1. it uses the PAT code
>>
>> 2. but then it does something special and hides the MTRRs
>>
>> which is not something real hardware does.
>>
>> So this one-off thing should be prominent, visible and not get in the
>> way.
> 
> Hello,
> 
> I have spent a fair amount of time this past weekend
> investigating this regression and what might have caused
> it and I also have done several tests on my Xen workstation
> that exhibits the regression to verify my understanding
> of the problem and also raise other problematic points.
> 
> I think, in addition to immediately fixing the regression by
> fixing the now five-year-old commit that is the root cause
> of the recently exposed regression, as discussed in my
> earlier message which proposes a simple patch to fix that
> five-year-old broken commit,
> 
> https://lore.kernel.org/lkml/63583497-152f-5880-4c8f-d47e2a81f5a6@netscape.net/
> 
> there needs to be a decision about how best to deal with
> this very aptly described "one-off Xen thing" in the future.
> 
> One problem in x86/mm/pat/memtype.c is the fact that there
> exist two functions, pat_init(), and init_cache_modes(), that
> do more or less the same thing, so that when one of those
> functions needs to be updated, the other also needs to
> be updated. In the past, when changes to the pat_enable
> and pat_disable functions and variables were made, all
> too often, the change was only applied to pat_init() and not
> to init_cache_modes() and the one-off Xen case was simply
> not addressed and dealt with properly.
> 
> So I propose the following:
> 
> 1) Identify those things that always need to be done in
> either pat_init() or init_cache_modes() and try to combine
> those things into a single function so that changes will
> be applied for both cases. We sort of have that now with
> __init_cache_modes(), but it is not really good enough to
> prevent the regressions we sometimes get in Xen PV
> domains such as the current regression we have with
> Xen Dom0 and certain particular Intel graphics devices.
> 
> 2) If it is not possible to do what is proposed in 1), at least
> re-factor the code to make it very clear that whenever
> either pat_init() needs to be adjusted or init_cache_modes()
> needs to be adjusted, care must be taken to ensure that
> all cases are properly dealt with, including the
> one-off Xen case of enabling PAT with MTRR disabled,
> Currently, AIUI, all one really needs to know is that
> init_cached_modes() handles two special cases:
> First, when PAT is disabled for any reason, including when
> the "nopat" boot option is set, and second, the one-off
> Xen case of MTRR being disabled but PAT being enabled.
> 
> I am trying to do number 2 with a patch series I am
> working on. It is up to the experts to decide if it
> is possible or even desirable to improve the code so
> that any changes to the caching configuration are more
> likely to be properly implemented for all supported platforms
> by successfully combining the two functions pat_init() and
> init_cache_modes() into a single function. The new function
> would probably need to be renamed and include bits from
> mtrr_bp_enabled, etc. for it to function properly.
> 
> If anyone wants to argue that it is not desirable to try
> combine pat_init() and init_cache_modes() into a single
> function, I think the burden of proof rests on that
> person to demonstrate why it is good and clean
> coding practice to continue to have them separate
> and independent from each other in the code when
> they both essentially do the same thing in the end, which
> is call __init_cache_modes() and determine the PAT
> state, and also probably the MTRR state.

I think the proper solution would be to make PAT and MTRR independent
from each other with some additional restructuring on the PAT side:

Today PAT can't be used without MTRR being available, unless MTRR is at
least configured via CONFIG_MTRR and the system is running as Xen PV
guest. In this case PAT is automatically available via the hypervisor,
but the PAT MSR can't be modified by the kernel and MTRR is disabled.

As an additional complexity the availability of PAT can't be queried
via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
to be disabled. This leads to some drivers believing that not all cache
modes are available, resulting in failures or degraded functionality
(the current regression).

The same applies to a kernel built with no MTRR support: it won't
allow to use the PAT MSR, even if there is no technical reason for
that, other than setting up PAT on all cpus the same way (which is a
requirement of the processor's cache management) is relying on some
MTRR specific code.

All of that should be fixable by:

- moving the function needed by PAT from MTRR specific code one level
   up
- adding a PAT indirection layer supporting the 3 cases "no or disabled
   PAT", "PAT under kernel control", and "PAT under Xen control"
- removing the dependency of PAT on MTRR

I'd be up for trying this, but right now I haven't got the time to do
it. I might be able to start working on that in September.

Meanwhile I think either Jan's patch or the simple one of Chuck
should be applied.


Juergen