All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Wei Liu" <wl@xen.org>, "Roger Pau Monné" <roger.pau@citrix.com>
Subject: [PATCH] x86: serializing of non-serializing MSR writes
Date: Wed, 28 Feb 2024 15:48:10 +0100	[thread overview]
Message-ID: <58656398-2d64-48b8-9ddc-c6836847a586@suse.com> (raw)

Linux commit 25a068b8e9a4e ("x86/apic: Add extra serialization for non-
serializing MSRs") explains why an MFENCE+LFENCE pair is generally
needed ahead of ICR writes in x2APIC mode, and also why at least in
theory such is also needed ahead of TSC_DEADLINE writes. A comment of
our own in send_IPI_mask_x2apic_phys() further explains a condition
under which the LFENCE can be avoided.

Further Linux commit 04c3024560d3 ("x86/barrier: Do not serialize MSR
accesses on AMD") explains that this barrier isn't needed on AMD or
Hygon, and is in fact hampering performance in a measurable way.

Introduce a similarly named helper function, but with a parameter
allowing callers to specify whether a memory access will follow, thus
permitting the LFENCE to be omitted.

Putting an instance in apic_wait_icr_idle() is to be on the safe side.
The one case where it was clearly missing is in send_IPI_shortcut(),
which is also used in x2APIC mode when called from send_IPI_mask().

Function comment shamelessly borrowed (but adapted) from Linux.

Fixes: 5500d265a2a8 ("x86/smp: use APIC ALLBUT destination shorthand when possible")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I question the need for a fence ahead of writing TSC_DEADLINE: The Linux
commit message talks about LVT being accessed via MMIO in xAPIC mode,
but that should not be relevant here: It's all the local CPU, so there
ought to not be visibility concerns (much like for a self-IPI no fence
is needed ahead of the ICR write). If that wasn't needed, we could
further use alternatives patching to remove the fence also from
apic_wait_icr_idle() when in xAPIC mode. (And only then would I agree to
have APIC in the feature identifier, like Linux has it.)

A number of apic_write() may better be turned into apic_mem_write(), in
particular e.g. the ones in send_IPI_mask_{flat,phys}(). That way it
would be quite a bit easier to spot paths taken only in xAPIC mode.

The INIT-INIT-SIPI sequence for AP startup doesn't use any barrier, also
not in Linux afaics. I can't explain the lack thereof, though.

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1309,6 +1309,12 @@ int reprogram_timer(s_time_t timeout)
 
     if ( tdt_enabled )
     {
+        /*
+         * WRMSR to TSC_DEADLINE is not serializing.  We could pass @timeout
+         * here, but the passed value is preferably compile-time-constant.
+         */
+        weak_wrmsr_fence(false);
+
         wrmsrl(MSR_IA32_TSC_DEADLINE, timeout ? stime2tsc(timeout) : 0);
         return 1;
     }
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -675,8 +675,12 @@ void amd_log_freq(const struct cpuinfo_x
 
 void cf_check early_init_amd(struct cpuinfo_x86 *c)
 {
-	if (c == &boot_cpu_data)
+	if (c == &boot_cpu_data) {
+		/* No fencing needed ahead of certain MSR writes. */
+		setup_force_cpu_cap(X86_FEATURE_NO_WRMSR_FENCE);
+
 		amd_init_levelling();
+	}
 
 	ctxt_switch_levelling(NULL);
 }
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -97,15 +97,15 @@ static void cf_check send_IPI_mask_x2api
 
     /*
      * Ensure that any synchronisation data written in program order by this
-     * CPU is seen by notified remote CPUs. The WRMSR contained within
-     * apic_icr_write() can otherwise be executed early.
+     * CPU is seen by notified remote CPUs. The WRMSR contained in the loop
+     * below can otherwise be executed early.
      * 
-     * The reason smp_mb() is sufficient here is subtle: the register arguments
+     * The reason MFENCE is sufficient here is subtle: the register arguments
      * to WRMSR must depend on a memory read executed after the barrier. This
      * is guaranteed by cpu_physical_id(), which reads from a global array (and
      * so cannot be hoisted above the barrier even by a clever compiler).
      */
-    smp_mb();
+    weak_wrmsr_fence(true);
 
     local_irq_save(flags);
 
@@ -130,7 +130,7 @@ static void cf_check send_IPI_mask_x2api
     const cpumask_t *cluster_cpus;
     unsigned long flags;
 
-    smp_mb(); /* See above for an explanation. */
+    weak_wrmsr_fence(true); /* See above for an explanation. */
 
     local_irq_save(flags);
 
--- a/xen/arch/x86/include/asm/cpufeatures.h
+++ b/xen/arch/x86/include/asm/cpufeatures.h
@@ -24,7 +24,7 @@ XEN_CPUFEATURE(APERFMPERF,        X86_SY
 XEN_CPUFEATURE(MFENCE_RDTSC,      X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
 XEN_CPUFEATURE(XEN_SMEP,          X86_SYNTH(10)) /* SMEP gets used by Xen itself */
 XEN_CPUFEATURE(XEN_SMAP,          X86_SYNTH(11)) /* SMAP gets used by Xen itself */
-/* Bit 12 unused. */
+XEN_CPUFEATURE(NO_WRMSR_FENCE,    X86_SYNTH(12)) /* No MFENCE{,+LFENCE} ahead of certain WRMSR. */
 XEN_CPUFEATURE(IND_THUNK_LFENCE,  X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
 XEN_CPUFEATURE(IND_THUNK_JMP,     X86_SYNTH(14)) /* Use IND_THUNK_JMP */
 XEN_CPUFEATURE(SC_NO_BRANCH_HARDEN, X86_SYNTH(15)) /* (Disable) Conditional branch hardening */
--- a/xen/arch/x86/include/asm/msr.h
+++ b/xen/arch/x86/include/asm/msr.h
@@ -97,6 +97,25 @@ static inline void msr_split(struct cpu_
     regs->rax = (uint32_t)val;
 }
 
+/*
+ * Make previous memory operations globally visible before a WRMSR.  Most
+ * WRMSRs are full serializing instructions themselves and do not require this
+ * barrier.  This may only be required for the TSC_DEADLINE and x2APIC MSRs.
+ *
+ * MFENCE makes writes visible, but only affects load/store instructions.
+ * WRMSR is unfortunately not a load/store instruction and is unaffected by
+ * MFENCE.  The LFENCE ensures that the WRMSR is not reordered, but callers
+ * can indicate to avoid it when they have a suitable memory access between
+ * the invocation of this function and the WRMSR in question.
+ */
+static inline void weak_wrmsr_fence(bool have_mem_access)
+{
+    alternative("mfence", "", X86_FEATURE_NO_WRMSR_FENCE);
+
+    if ( !have_mem_access )
+        alternative("lfence", "", X86_FEATURE_NO_WRMSR_FENCE);
+}
+
 static inline uint64_t rdtsc(void)
 {
     uint32_t low, high;
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -39,7 +39,10 @@ static unsigned int prepare_ICR2(unsigne
 void apic_wait_icr_idle(void)
 {
     if ( x2apic_enabled )
+    {
+        weak_wrmsr_fence(false);
         return;
+    }
 
     while ( apic_read(APIC_ICR) & APIC_ICR_BUSY )
         cpu_relax();


             reply	other threads:[~2024-02-28 14:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 14:48 Jan Beulich [this message]
2024-02-29  1:10 ` [PATCH] x86: serializing of non-serializing MSR writes Andrew Cooper
2024-02-29  9:02   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58656398-2d64-48b8-9ddc-c6836847a586@suse.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.