All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	jan.kiszka@siemens.com, x86@kernel.org, peterz@infradead.org
Subject: [RFC][PATCH 2/2] x86: add extra serialization for non-serializing MSRs
Date: Thu, 05 Mar 2020 09:47:08 -0800	[thread overview]
Message-ID: <20200305174708.F77040DD@viggo.jf.intel.com> (raw)
In-Reply-To: <20200305174706.0D6B8EE4@viggo.jf.intel.com>


Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a
plain "mfence" while the Intel SDM (10.12.3 MSR Access in x2APIC
Mode) calls for "mfence;lfence".

Short summary: we have special MSRs that have weaker ordering
than all the rest.  Add fencing consistent with current SDM
recommendatrions.

This is not known to cause any issues in practice, only in
theory.

Longer story below:

The reason the kernel uses a different semantic is that the SDM
changed (roughly in late 2017).  The SDM changed because folks at
Intel were auditing all of the recommended fences in the SDM and
realized that the x2apic fences were insufficient.

Why was the pain "mfence" judged insufficient?

WRMSR itself is normally a serializing instruction.  No fences
are needed because because the instruction itself serializes
everything.

But, there are explicit exceptions for this serializing behavior
written into the WRMSR instruction documentation for two classes
of MSRs: IA32_TSC_DEADLINE and the X2APIC MSRs.

Back to x2apic: WRMSR is *not* serializing in this specific case.
But why is MFENCE insufficient?  MFENCE makes writes visible, but
only affects load/store instructions.  WRMSR is unfortunately not
a load/store instruction and is unaffected by MEFNCE.  This means
that a non-serializing WRMSR could be reordered by the CPU to
execute before the writes made visible by the MFENCE have even
occurred in the first place.

This mean that an x2apic IPI could theoretically be triggered
before there is any (visible) data to process.

Does this affect anything in practice?  I honestly don't know.
It seems quite possible that by the time an interrupt gets to
consume the (not yet) MFENCE'd data, it has become visible,
mostly by accident.

To be safe, add the SDM-recommended fences for all x2apic WRMSRs.

This also leaves open the question of the _other_ weakly-ordered
WRMSR: MSR_IA32_TSC_DEADLINE.  While it has the same ordering
architecture as the x2APIC MSRs, it seems substantially less
likely to be a problem in practice.  While writes to the
in-memory Local Vector Table (LVT) might theoretically be
reordered with respect to a weakly-ordered WRMSR like
TSC_DEADLINE, the SDM has this to say:

	In x2APIC mode, the WRMSR instruction is used to write to
	the LVT entry. The processor ensures the ordering of this
	write and any subsequent WRMSR to the deadline; no
	fencing is required.

But, that might still leave xAPIC exposed.  The safest thing to
do for now is to add the extra, recommended LFENCE.

Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>

---

 b/arch/x86/include/asm/apic.h           |   10 ----------
 b/arch/x86/include/asm/barrier.h        |   18 ++++++++++++++++++
 b/arch/x86/kernel/apic/apic.c           |    4 ++++
 b/arch/x86/kernel/apic/x2apic_cluster.c |    6 ++++--
 b/arch/x86/kernel/apic/x2apic_phys.c    |    9 ++++++---
 b/tools/arch/x86/include/asm/barrier.h  |    1 +
 6 files changed, 33 insertions(+), 15 deletions(-)

diff -puN arch/x86/include/asm/apic.h~x2apic-wrmsr-serialization arch/x86/include/asm/apic.h
--- a/arch/x86/include/asm/apic.h~x2apic-wrmsr-serialization	2020-03-05 09:42:38.876901038 -0800
+++ b/arch/x86/include/asm/apic.h	2020-03-05 09:42:38.891901038 -0800
@@ -195,16 +195,6 @@ static inline bool apic_needs_pit(void)
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_X86_X2APIC
-/*
- * Make previous memory operations globally visible before
- * sending the IPI through x2apic wrmsr. We need a serializing instruction or
- * mfence for this.
- */
-static inline void x2apic_wrmsr_fence(void)
-{
-	asm volatile("mfence" : : : "memory");
-}
-
 static inline void native_apic_msr_write(u32 reg, u32 v)
 {
 	if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR ||
diff -puN arch/x86/include/asm/barrier.h~x2apic-wrmsr-serialization arch/x86/include/asm/barrier.h
--- a/arch/x86/include/asm/barrier.h~x2apic-wrmsr-serialization	2020-03-05 09:42:38.878901038 -0800
+++ b/arch/x86/include/asm/barrier.h	2020-03-05 09:42:38.893901038 -0800
@@ -84,4 +84,22 @@ do {									\
 
 #include <asm-generic/barrier.h>
 
+/*
+ * Make previous memory operations globally visible before
+ * a WRMSR.
+ *
+ * MFENCE makes writes visible, but only affects load/store
+ * instructions.  WRMSR is unfortunately not a load/store
+ * instruction and is unaffected by MEFNCE.  The LFENCE ensures
+ * that the WRMSR is not reordered.
+ *
+ * Most WRMSRs are full serializing instructions themselves and
+ * do not require this barrier.  This is only required for the
+ * IA32_TSC_DEADLINE and X2APIC MSRs.
+ */
+static inline void weak_wrmsr_fence(void)
+{
+	asm volatile("mfence; lfence" : : : "memory");
+}
+
 #endif /* _ASM_X86_BARRIER_H */
diff -puN arch/x86/kernel/apic/apic.c~x2apic-wrmsr-serialization arch/x86/kernel/apic/apic.c
--- a/arch/x86/kernel/apic/apic.c~x2apic-wrmsr-serialization	2020-03-05 09:42:38.880901038 -0800
+++ b/arch/x86/kernel/apic/apic.c	2020-03-05 09:42:38.892901038 -0800
@@ -42,6 +42,7 @@
 #include <asm/x86_init.h>
 #include <asm/pgalloc.h>
 #include <linux/atomic.h>
+#include <asm/barrier.h>
 #include <asm/mpspec.h>
 #include <asm/i8259.h>
 #include <asm/proto.h>
@@ -474,6 +475,9 @@ static int lapic_next_deadline(unsigned
 {
 	u64 tsc;
 
+	/* This MSR is special and need a special fence: */
+	weak_wrmsr_fence();
+
 	tsc = rdtsc();
 	wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR));
 	return 0;
diff -puN arch/x86/kernel/apic/x2apic_cluster.c~x2apic-wrmsr-serialization arch/x86/kernel/apic/x2apic_cluster.c
--- a/arch/x86/kernel/apic/x2apic_cluster.c~x2apic-wrmsr-serialization	2020-03-05 09:42:38.882901038 -0800
+++ b/arch/x86/kernel/apic/x2apic_cluster.c	2020-03-05 09:42:38.892901038 -0800
@@ -29,7 +29,8 @@ static void x2apic_send_IPI(int cpu, int
 {
 	u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu);
 
-	x2apic_wrmsr_fence();
+	/* x2apic MSRs are special and need a special fence: */
+	weak_wrmsr_fence();
 	__x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
 }
 
@@ -41,7 +42,8 @@ __x2apic_send_IPI_mask(const struct cpum
 	unsigned long flags;
 	u32 dest;
 
-	x2apic_wrmsr_fence();
+	/* x2apic MSRs are special and need a special fence: */
+	weak_wrmsr_fence();
 	local_irq_save(flags);
 
 	tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask);
diff -puN arch/x86/kernel/apic/x2apic_phys.c~x2apic-wrmsr-serialization arch/x86/kernel/apic/x2apic_phys.c
--- a/arch/x86/kernel/apic/x2apic_phys.c~x2apic-wrmsr-serialization	2020-03-05 09:42:38.885901038 -0800
+++ b/arch/x86/kernel/apic/x2apic_phys.c	2020-03-05 09:42:38.892901038 -0800
@@ -37,7 +37,8 @@ static void x2apic_send_IPI(int cpu, int
 {
 	u32 dest = per_cpu(x86_cpu_to_apicid, cpu);
 
-	x2apic_wrmsr_fence();
+	/* x2apic MSRs are special and need a special fence: */
+	weak_wrmsr_fence();
 	__x2apic_send_IPI_dest(dest, vector, APIC_DEST_PHYSICAL);
 }
 
@@ -48,7 +49,8 @@ __x2apic_send_IPI_mask(const struct cpum
 	unsigned long this_cpu;
 	unsigned long flags;
 
-	x2apic_wrmsr_fence();
+	/* x2apic MSRs are special and need a special fence: */
+	weak_wrmsr_fence();
 
 	local_irq_save(flags);
 
@@ -116,7 +118,8 @@ void __x2apic_send_IPI_shorthand(int vec
 {
 	unsigned long cfg = __prepare_ICR(which, vector, 0);
 
-	x2apic_wrmsr_fence();
+	/* x2apic MSRs are special and need a special fence: */
+	weak_wrmsr_fence();
 	native_x2apic_icr_write(cfg, 0);
 }
 
diff -puN tools/arch/x86/include/asm/barrier.h~x2apic-wrmsr-serialization tools/arch/x86/include/asm/barrier.h
--- a/tools/arch/x86/include/asm/barrier.h~x2apic-wrmsr-serialization	2020-03-05 09:42:38.887901038 -0800
+++ b/tools/arch/x86/include/asm/barrier.h	2020-03-05 09:42:38.892901038 -0800
@@ -43,4 +43,5 @@ do {						\
 	___p1;					\
 })
 #endif /* defined(__x86_64__) */
+
 #endif /* _TOOLS_LINUX_ASM_X86_BARRIER_H */
_

  reply	other threads:[~2020-03-05 17:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-05 17:47 [RFC][PATCH 1/2] x86: remove duplicate TSC DEADLINE MSR definitions Dave Hansen
2020-03-05 17:47 ` Dave Hansen [this message]
2021-02-04 16:37   ` [RFC][PATCH 2/2] x86: add extra serialization for non-serializing MSRs Dave Hansen
2021-02-04 19:37   ` [tip: x86/urgent] x86/apic: Add " tip-bot2 for Dave Hansen
2021-02-04 23:37   ` [RFC][PATCH 2/2] x86: add " Andrew Cooper
2021-02-05  0:11     ` Andy Lutomirski
2021-02-05 10:02       ` Peter Zijlstra
2021-02-05 12:08         ` Andrew Cooper
2021-02-05 12:21         ` Peter Zijlstra
2021-09-01 13:07   ` x86/apic: Add " Corey Minyard
2020-03-09 23:50 ` [RFC][PATCH 1/2] x86: remove duplicate TSC DEADLINE MSR definitions Sean Christopherson
2021-02-05  9:31 ` Borislav Petkov
2021-02-12 21:37   ` Arnaldo Carvalho de Melo
2021-03-07  1:29 ` Dave Hansen
2021-03-08 11:46 ` [tip: x86/cleanups] x86: Remove " tip-bot2 for Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200305174708.F77040DD@viggo.jf.intel.com \
    --to=dave.hansen@linux.intel.com \
    --cc=jan.kiszka@siemens.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.