linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jan Kiszka <jan.kiszka@siemens.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Borislav Petkov <bp@suse.de>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 4.4 35/38] x86/apic: Add extra serialization for non-serializing MSRs
Date: Mon,  8 Feb 2021 16:00:57 +0100	[thread overview]
Message-ID: <20210208145806.631981840@linuxfoundation.org> (raw)
In-Reply-To: <20210208145805.279815326@linuxfoundation.org>

From: Dave Hansen <dave.hansen@linux.intel.com>

commit 25a068b8e9a4eb193d755d58efcb3c98928636e0 upstream.

Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain
MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for
MFENCE; LFENCE.

Short summary: we have special MSRs that have weaker ordering than all
the rest. Add fencing consistent with current SDM recommendations.

This is not known to cause any issues in practice, only in theory.

Longer story below:

The reason the kernel uses a different semantic is that the SDM changed
(roughly in late 2017). The SDM changed because folks at Intel were
auditing all of the recommended fences in the SDM and realized that the
x2apic fences were insufficient.

Why was the pain MFENCE judged insufficient?

WRMSR itself is normally a serializing instruction. No fences are needed
because the instruction itself serializes everything.

But, there are explicit exceptions for this serializing behavior written
into the WRMSR instruction documentation for two classes of MSRs:
IA32_TSC_DEADLINE and the X2APIC MSRs.

Back to x2apic: WRMSR is *not* serializing in this specific case.
But why is MFENCE insufficient? MFENCE makes writes visible, but
only affects load/store instructions. WRMSR is unfortunately not a
load/store instruction and is unaffected by MFENCE. This means that a
non-serializing WRMSR could be reordered by the CPU to execute before
the writes made visible by the MFENCE have even occurred in the first
place.

This means that an x2apic IPI could theoretically be triggered before
there is any (visible) data to process.

Does this affect anything in practice? I honestly don't know. It seems
quite possible that by the time an interrupt gets to consume the (not
yet) MFENCE'd data, it has become visible, mostly by accident.

To be safe, add the SDM-recommended fences for all x2apic WRMSRs.

This also leaves open the question of the _other_ weakly-ordered WRMSR:
MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as
the x2APIC MSRs, it seems substantially less likely to be a problem in
practice. While writes to the in-memory Local Vector Table (LVT) might
theoretically be reordered with respect to a weakly-ordered WRMSR like
TSC_DEADLINE, the SDM has this to say:

  In x2APIC mode, the WRMSR instruction is used to write to the LVT
  entry. The processor ensures the ordering of this write and any
  subsequent WRMSR to the deadline; no fencing is required.

But, that might still leave xAPIC exposed. The safest thing to do for
now is to add the extra, recommended LFENCE.

 [ bp: Massage commit message, fix typos, drop accidentally added
   newline to tools/arch/x86/include/asm/barrier.h. ]

Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/apic.h           |   10 ----------
 arch/x86/include/asm/barrier.h        |   18 ++++++++++++++++++
 arch/x86/kernel/apic/apic.c           |    4 ++++
 arch/x86/kernel/apic/x2apic_cluster.c |    3 ++-
 arch/x86/kernel/apic/x2apic_phys.c    |    3 ++-
 5 files changed, 26 insertions(+), 12 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -168,16 +168,6 @@ static inline void disable_local_APIC(vo
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_X86_X2APIC
-/*
- * Make previous memory operations globally visible before
- * sending the IPI through x2apic wrmsr. We need a serializing instruction or
- * mfence for this.
- */
-static inline void x2apic_wrmsr_fence(void)
-{
-	asm volatile("mfence" : : : "memory");
-}
-
 static inline void native_apic_msr_write(u32 reg, u32 v)
 {
 	if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR ||
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -119,4 +119,22 @@ do {									\
 #define smp_mb__before_atomic()	do { } while (0)
 #define smp_mb__after_atomic()	do { } while (0)
 
+/*
+ * Make previous memory operations globally visible before
+ * a WRMSR.
+ *
+ * MFENCE makes writes visible, but only affects load/store
+ * instructions.  WRMSR is unfortunately not a load/store
+ * instruction and is unaffected by MFENCE.  The LFENCE ensures
+ * that the WRMSR is not reordered.
+ *
+ * Most WRMSRs are full serializing instructions themselves and
+ * do not require this barrier.  This is only required for the
+ * IA32_TSC_DEADLINE and X2APIC MSRs.
+ */
+static inline void weak_wrmsr_fence(void)
+{
+	asm volatile("mfence; lfence" : : : "memory");
+}
+
 #endif /* _ASM_X86_BARRIER_H */
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -41,6 +41,7 @@
 #include <asm/x86_init.h>
 #include <asm/pgalloc.h>
 #include <linux/atomic.h>
+#include <asm/barrier.h>
 #include <asm/mpspec.h>
 #include <asm/i8259.h>
 #include <asm/proto.h>
@@ -464,6 +465,9 @@ static int lapic_next_deadline(unsigned
 {
 	u64 tsc;
 
+	/* This MSR is special and need a special fence: */
+	weak_wrmsr_fence();
+
 	tsc = rdtsc();
 	wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR));
 	return 0;
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -32,7 +32,8 @@ __x2apic_send_IPI_mask(const struct cpum
 	unsigned long flags;
 	u32 dest;
 
-	x2apic_wrmsr_fence();
+	/* x2apic MSRs are special and need a special fence: */
+	weak_wrmsr_fence();
 
 	local_irq_save(flags);
 
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -43,7 +43,8 @@ __x2apic_send_IPI_mask(const struct cpum
 	unsigned long this_cpu;
 	unsigned long flags;
 
-	x2apic_wrmsr_fence();
+	/* x2apic MSRs are special and need a special fence: */
+	weak_wrmsr_fence();
 
 	local_irq_save(flags);
 



  parent reply	other threads:[~2021-02-08 15:29 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 15:00 [PATCH 4.4 00/38] 4.4.257-rc1 review Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 01/38] net_sched: reject silly cell_log in qdisc_get_rtab() Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 02/38] futex,rt_mutex: Provide futex specific rt_mutex API Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 03/38] futex: Remove rt_mutex_deadlock_account_*() Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 04/38] futex: Rework inconsistent rt_mutex/futex_q state Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 05/38] futex: Avoid violating the 10th rule of futex Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 06/38] futex: Replace pointless printk in fixup_owner() Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 07/38] futex: Provide and use pi_state_update_owner() Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 08/38] rtmutex: Remove unused argument from rt_mutex_proxy_unlock() Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 09/38] futex: Use pi_state_update_owner() in put_pi_state() Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 10/38] futex: Simplify fixup_pi_state_owner() Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 11/38] futex: Handle faults correctly for PI futexes Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 12/38] usb: udc: core: Use lock when write to soft_connect Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 13/38] scsi: libfc: Avoid invoking response handler twice if ep is already completed Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 14/38] scsi: ibmvfc: Set default timeout to avoid crash during migration Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 15/38] stable: clamp SUBLEVEL in 4.4 and 4.9 Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 16/38] USB: serial: cp210x: add pid/vid for WSDA-200-USB Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 17/38] USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000 Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 18/38] USB: serial: option: Adding support for Cinterion MV31 Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 19/38] Input: i8042 - unbreak Pegatron C15B Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 20/38] net: lapb: Copy the skb before sending a packet Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 21/38] ELF/MIPS build fix Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 22/38] elfcore: fix building with clang Greg Kroah-Hartman
2021-02-09 12:52   ` Pavel Machek
2021-02-09 13:01     ` Greg Kroah-Hartman
2021-02-09 19:10       ` Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 23/38] USB: gadget: legacy: fix an error code in eth_bind() Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 24/38] USB: usblp: dont call usb_set_interface if theres a single alt Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 25/38] usb: dwc2: Fix endpoint direction check in ep_from_windex Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 26/38] mac80211: fix station rate table updates on assoc Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 27/38] kretprobe: Avoid re-registration of the same kretprobe earlier Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 28/38] cifs: report error instead of invalid when revalidating a dentry fails Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 29/38] mmc: core: Limit retries when analyse of SDIO tuples fails Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 30/38] ARM: footbridge: fix dc21285 PCI configuration accessors Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 31/38] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 32/38] mm: hugetlb: fix a race between isolating and freeing page Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 33/38] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 34/38] x86/build: Disable CET instrumentation in the kernel Greg Kroah-Hartman
2021-02-08 15:00 ` Greg Kroah-Hartman [this message]
2021-02-08 15:00 ` [PATCH 4.4 36/38] Input: xpad - sync supported devices with fork on GitHub Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.4 37/38] ACPI: thermal: Do not call acpi_thermal_check() directly Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.4 38/38] ALSA: hda/realtek - Fix typo of pincfg for Dell quirk Greg Kroah-Hartman
2021-02-08 18:36 ` [PATCH 4.4 00/38] 4.4.257-rc1 review Pavel Machek
2021-02-08 20:43 ` Shuah Khan
2021-02-09 18:09 ` Guenter Roeck
2021-02-10  5:04 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210208145806.631981840@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bp@suse.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=jan.kiszka@siemens.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).