On 12.07.22 15:22, Chuck Zmudzinski wrote: > On 7/12/2022 2:04 AM, Jan Beulich wrote: >> On 11.07.2022 19:41, Chuck Zmudzinski wrote: >>> Moreover... (please move to the bottom of the code snippet >>> for more information about my tests in the Xen PV environment...) >>> >>> void init_cache_modes(void) >>> { >>>     u64 pat = 0; >>> >>>     if (pat_cm_initialized) >>>         return; >>> >>>     if (boot_cpu_has(X86_FEATURE_PAT)) { >>>         /* >>>          * CPU supports PAT. Set PAT table to be consistent with >>>          * PAT MSR. This case supports "nopat" boot option, and >>>          * virtual machine environments which support PAT without >>>          * MTRRs. In specific, Xen has unique setup to PAT MSR. >>>          * >>>          * If PAT MSR returns 0, it is considered invalid and emulates >>>          * as No PAT. >>>          */ >>>         rdmsrl(MSR_IA32_CR_PAT, pat); >>>     } >>> >>>     if (!pat) { >>>         /* >>>          * No PAT. Emulate the PAT table that corresponds to the two >>>          * cache bits, PWT (Write Through) and PCD (Cache Disable). >>>          * This setup is also the same as the BIOS default setup. >>>          * >>>          * PTE encoding: >>>          * >>>          *       PCD >>>          *       |PWT  PAT >>>          *       ||    slot >>>          *       00    0    WB : _PAGE_CACHE_MODE_WB >>>          *       01    1    WT : _PAGE_CACHE_MODE_WT >>>          *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS >>>          *       11    3    UC : _PAGE_CACHE_MODE_UC >>>          * >>>          * NOTE: When WC or WP is used, it is redirected to UC- per >>>          * the default setup in __cachemode2pte_tbl[]. >>>          */ >>>         pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | >>>               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC); >>>     } >>> >>>     else if (!pat_bp_enabled) { >>>         /* >>>          * In some environments, specifically Xen PV, PAT >>>          * initialization is skipped because MTRRs are >>>          * disabled even though PAT is available. In such >>>          * environments, set PAT to initialized and enabled to >>>          * correctly indicate to callers of pat_enabled() that >>>          * PAT is available and prevent PAT from being disabled. >>>          */ >>>         pat_bp_enabled = true; >>>         pr_info("x86/PAT: PAT enabled by init_cache_modes\n"); >>>     } >>> >>>     __init_cache_modes(pat); >>> } >>> >>> This function, patched with the extra 'else if' block, fixes the >>> regression on my Xen worksatation, and the pr_info message >>> "x86/PAT: PAT enabled by init_cache_modes" appears in the logs >>> when running this patched kernel in my Xen Dom0. This means >>> that in the Xen PV environment on my Xen Dom0 workstation, >>> rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence >>> of PAT on the virtual CPU that Xen exposed to the Linux kernel on my >>> Xen Dom0 workstation. At least that is what I think my tests prove. >>> >>> So why is this not a valid way to test for the existence of >>> PAT in the Xen PV environment? Are the existing comments >>> in init_cache_modes() about supporting both the case when >>> the "nopat" boot option is set and the specific case of Xen and >>> MTRR disabled wrong? My testing confirms those comments are >>> correct. >> >> At the very least this ignores the possible "nopat" an admin may >> have passed to the kernel. > > I realize that. The patch I proposed here only fixes the regression. It > would be easy to also modify the patch to also observe the 'nopat" > setting. I think your patch had a force_pat_disable local variable that > is set if pat is disabled by the administrator with "nopat." With that > variable available, modifying the patch so in init_cache_modes we have: > >      if (!pat || force_pat_disable) { >          /* >           * No PAT. Emulate the PAT table that corresponds to the two > > Instead of: > >      if (!pat) { >          /* >           * No PAT. Emulate the PAT table that corresponds to the two > > would cause the kernel to respect the "nopat" setting by the administrator > in the Xen PV Dom0 environment. Chuck, could you please send out a proper patch with your initial fix (setting pat_bp_enabled) and the fix above? I've chatted with Boris Petkov on IRC and he is fine with that. > I agree this needs to be fixed up, because currently the code is very > confusing and the current variable names and function names do not > always accurately describe what they actually do in the code. That is > why I am working on a patch to do some re-factoring, which only consists > of function and variable name changes and comment changes to fix > the places where the comments in the code are misleading or incomplete. Boris and I agreed to pursue my approach further by removing the dependency between PAT and MTRR and to make this whole mess more clear. I will start to work on this as soon as possible, which will probably be some time in September. > I think perhaps the most misnamed variable here is the  local > variable pat_disabled in memtypes.c and the most misnamed function is the > pat_disable function in memtypes.c. They should be named pat_init_disabled > and pat_init_disable, respectively, because they do not really disable > PAT in > the code but only prevent execution of the pat_init function. That > leaves open > the possibility for PAT to be enabled by init_cache_modes, which actually > occurs in the current code in the Xen PV Dom0 environment, but the current > code neglects to set pat_bp_enabled to true in that case. So we need a patch > to fix that in order to fix the regression. In principle I agree, but you should be aware of my refactoring plans. Juergen