All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Borislav Petkov <bp@alien8.de>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-pm@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Pavel Machek <pavel@ucw.cz>, Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v5 00/16] x86: make PAT and MTRR independent from each other
Date: Tue, 8 Nov 2022 08:30:19 +0100	[thread overview]
Message-ID: <03e5624d-9e45-c51b-9ace-616a223ff092@suse.com> (raw)
In-Reply-To: <Y2lbmYRWQvXn0nPe@zn.tnic>


[-- Attachment #1.1.1: Type: text/plain, Size: 9430 bytes --]

On 07.11.22 20:25, Borislav Petkov wrote:
> On Thu, Nov 03, 2022 at 05:15:52PM +0100, Borislav Petkov wrote:
>> Lemme try to find a smaller box which shows that too - that one is a
>> pain to bisect on.
> 
> Ok, couldn't find a smaller one (or maybe it had to be a big one to
> tickle this out).
> 
> So I think it is the parallel setup thing:
> 
> x86/mtrr: Do MTRR/PAT setup on all secondary CPUs in parallel
> 
> Note that before it, it would do the configuration sequentially on each
> CPU:
> 
> [    0.759239] MTRR: prepare_set: CPU83, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [    0.759239] MTRR: set_mtrr_state: CPU83, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3
> [    0.760794] MTRR: post_set: CPU83, MSR_MTRRdefType will write: (0xc00:0)
> [    0.761151] MTRR: prepare_set: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [    0.761151] MTRR: set_mtrr_state: CPU70, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3
> [    0.761151] MTRR: post_set: CPU70, MSR_MTRRdefType will write: (0xc00:0)
> ...
> 
> and so on.
> 
> Now, it would do it all in parallel:
> 
> [    0.762006] MTRR: mtrr_disable: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [    0.761916] MTRR: mtrr_disable: CPU18, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [    0.761808] MTRR: mtrr_disable: CPU82, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [    0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
> 								      ^^^^^^
> 
> Note that last thing. That comes from (with debug output added):
> 
> void mtrr_disable(struct cache_state *state)
> {
>          unsigned int cpu = smp_processor_id();
>          u64 msrval;
> 
>          /* Save MTRR state */
>          rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
> 
>          /* Disable MTRRs, and set the default type to uncached */
>          mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
>                     state->mtrr_deftype_hi);
> 
>          rdmsrl(MSR_MTRRdefType, msrval);
> 
>          pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n",
>                  __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
> }
> 
> The "read: (0x0:0)" basically says that
> 
> 	state->mtrr_deftype_lo, state->mtrr_deftype_hi
> 
> are both 0 already. BUT(!), they should NOT be. The low piece is 0xc00 on most
> cores except a handful and it means that MTRRs and Fixed Range are
> enabled. In total, they're these cores here:
> 
> [    0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762247] MTRR: mtrr_disable: CPU26, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762685] MTRR: mtrr_disable: CPU68, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762725] MTRR: mtrr_disable: CPU17, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762685] MTRR: mtrr_disable: CPU69, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762800] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762734] MTRR: mtrr_disable: CPU13, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762720] MTRR: mtrr_disable: CPU24, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762696] MTRR: mtrr_disable: CPU66, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762716] MTRR: mtrr_disable: CPU48, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762693] MTRR: mtrr_disable: CPU57, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762519] MTRR: mtrr_disable: CPU87, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762532] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762755] MTRR: mtrr_disable: CPU32, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762693] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762861] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762724] MTRR: mtrr_disable: CPU21, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762640] MTRR: mtrr_disable: CPU15, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762615] MTRR: mtrr_disable: CPU50, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762741] MTRR: mtrr_disable: CPU40, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762738] MTRR: mtrr_disable: CPU37, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762716] MTRR: mtrr_disable: CPU25, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762512] MTRR: mtrr_disable: CPU59, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762721] MTRR: mtrr_disable: CPU45, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762682] MTRR: mtrr_disable: CPU56, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762583] MTRR: mtrr_disable: CPU124, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762751] MTRR: mtrr_disable: CPU12, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762741] MTRR: mtrr_disable: CPU9, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762575] MTRR: mtrr_disable: CPU51, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762632] MTRR: mtrr_disable: CPU100, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762688] MTRR: mtrr_disable: CPU61, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762610] MTRR: mtrr_disable: CPU105, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762721] MTRR: mtrr_disable: CPU20, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762583] MTRR: mtrr_disable: CPU47, MSR_MTRRdefType: 0x0, read: (0x0:0)
> 
> Now, if I add MFENCEs around those RDMSRs:
> 
> void mtrr_disable(struct cache_state *state)
> {
>          unsigned int cpu = smp_processor_id();
>          u64 msrval;
> 
>          /* Save MTRR state */
>          rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
> 
>          __mb();
> 
>          /* Disable MTRRs, and set the default type to uncached */
>          mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
>                     state->mtrr_deftype_hi);
> 
>          __mb();
> 
>          rdmsrl(MSR_MTRRdefType, msrval);
> 
>          pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n",
>                  __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
> 
>          __mb();
> }
> 
> the amount of cores becomes less:

Probably not because of the fencing, but because of the different timing.

> 
> [    0.765260] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765462] MTRR: mtrr_disable: CPU5, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765242] MTRR: mtrr_disable: CPU22, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765522] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765474] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765207] MTRR: mtrr_disable: CPU54, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765225] MTRR: mtrr_disable: CPU8, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765282] MTRR: mtrr_disable: CPU88, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765150] MTRR: mtrr_disable: CPU119, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765370] MTRR: mtrr_disable: CPU49, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765395] MTRR: mtrr_disable: CPU16, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765348] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765270] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0)
> 
> which basically hints at some speculative fun where we end up reading
> the MSR *after* the write to it has already happened. After this thing:
> 
>          /* Disable MTRRs, and set the default type to uncached */
>          mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
>                     state->mtrr_deftype_hi);
> 
> and thus when we read it, we already read the disabled state. But this
> is only a conjecture because I still have no clear idea how TF would
> that even happen?!?

Yeah, and why doesn't it happen when we handle only one cpu at a time?

There might be some interaction between the cpus ...

> 
> Needless to say, this fixes it, ofc:
> 
> diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
> index 3805a6d32d37..4a685898caf3 100644
> --- a/arch/x86/kernel/cpu/cacheinfo.c
> +++ b/arch/x86/kernel/cpu/cacheinfo.c
> @@ -1116,12 +1116,14 @@ void cache_enable(struct cache_state *state)
>                  __write_cr4(state->cr4);
>   }
>   
> +static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
> +
>   static void cache_cpu_init(void)
>   {
>          unsigned long flags;
>          struct cache_state state = { };
>   
> -       local_irq_save(flags);
> +       raw_spin_lock_irqsave(&set_atomicity_lock, flags);
>          cache_disable(&state);
>   
>          if (memory_caching_control & CACHE_MTRR)
> @@ -1131,7 +1133,7 @@ static void cache_cpu_init(void)
>                  pat_cpu_init();
>   
>          cache_enable(&state);
> -       local_irq_restore(flags);
> +       raw_spin_unlock_irqrestore(&set_atomicity_lock, flags);
>   }
>   
>   static bool cache_aps_delayed_init = true;
> 
> ---
> 
> and frankly, considering how we have bigger fish to fry, I'd say we do
> it the old way and leave that can'o'worms half-opened.

I agree to keep this patch out of the series for now.

> 
> Unless you wanna continue poking at it. I can give you access to that
> box at work...

Yes, please. I suspect there are some additional requirements for updating
MTRR in parallel, or this is "just" a cpu bug.


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

      reply	other threads:[~2022-11-08  7:30 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-02  7:46 [PATCH v5 00/16] x86: make PAT and MTRR independent from each other Juergen Gross
2022-11-02  7:46 ` [PATCH v5 01/16] x86/mtrr: add comment for set_mtrr_state() serialization Juergen Gross
2022-11-02  7:46 ` [PATCH v5 02/16] x86/mtrr: remove unused cyrix_set_all() function Juergen Gross
2022-11-02  7:47 ` [PATCH v5 03/16] x86/mtrr: replace use_intel() with a local flag Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/mtrr: Replace " tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 04/16] x86/mtrr: rename prepare_set() and post_set() Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/mtrr: Rename " tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 05/16] x86/mtrr: split MTRR specific handling from cache dis/enabling Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/mtrr: Split MTRR-specific " tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 06/16] x86: move some code out of arch/x86/kernel/cpu/mtrr Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/mtrr: Move cache control code to cacheinfo.c tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 07/16] x86/mtrr: Disentangle MTRR init from PAT init Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] " tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 08/16] x86/mtrr: remove set_all callback from struct mtrr_ops Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/mtrr: Remove " tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 09/16] x86/mtrr: simplify mtrr_bp_init() Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/mtrr: Simplify mtrr_bp_init() tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 10/16] x86/mtrr: get rid of __mtrr_enabled bool Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/mtrr: Get " tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 11/16] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/mtrr: Let " tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 12/16] x86/mtrr: add a stop_machine() handler calling only cache_cpu_init() Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/mtrr: Add " tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 13/16] x86: decouple PAT and MTRR handling Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86: Decouple " tip-bot2 for Juergen Gross
2022-12-01 16:26   ` [PATCH v5 13/16] x86: decouple " Kirill A. Shutemov
2022-12-01 16:33     ` Juergen Gross
2022-12-01 23:57       ` Kirill A. Shutemov
2022-12-02  5:56         ` Juergen Gross
2022-12-02 13:27           ` Kirill A. Shutemov
2022-12-02 13:39             ` Juergen Gross
2022-12-02 14:33               ` Kirill A. Shutemov
2022-12-02 14:56                 ` Juergen Gross
2022-12-05  7:40                   ` Juergen Gross
2022-12-05 12:21                     ` Kirill A. Shutemov
2022-12-02 13:55           ` Borislav Petkov
2022-11-02  7:47 ` [PATCH v5 14/16] x86: switch cache_ap_init() to hotplug callback Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/cacheinfo: Switch " tip-bot2 for Juergen Gross
2022-11-02  7:47 ` [PATCH v5 15/16] x86: do MTRR/PAT setup on all secondary CPUs in parallel Juergen Gross
2022-11-02  7:47 ` [PATCH v5 16/16] x86/mtrr: simplify mtrr_ops initialization Juergen Gross
2022-11-10 12:21   ` [tip: x86/cpu] x86/mtrr: Simplify " tip-bot2 for Juergen Gross
2022-11-02 18:04 ` [PATCH v5 00/16] x86: make PAT and MTRR independent from each other Borislav Petkov
2022-11-03  8:40   ` Juergen Gross
2022-11-03 16:15     ` Borislav Petkov
2022-11-07 19:25       ` Borislav Petkov
2022-11-08  7:30         ` Juergen Gross [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03e5624d-9e45-c51b-9ace-616a223ff092@suse.com \
    --to=jgross@suse.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.