linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch V2 00/25] x86/apic: Support for IPI shorthands
@ 2019-07-04 15:51 Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 01/25] x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI Thomas Gleixner
                   ` (24 more replies)
  0 siblings, 25 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

The recent discussion about using HPET as NMI watchdog made me look into
IPI shorthand support. Also Nadav wanted to look into shorthands to speed
up certain TLB operations.

The support for IPI shorthands is rather limited right now and basically
got rendered useless by making it depend on CPU_HOTPLUG=n.

The reason for this is that shorthands are broadcasted and in case that not
all present CPUs have been brought up this might end up with a similar
effect as the dreaded MCE broadcast.

But this can be handled smarter than just preventing shorthands if CPU
hotplug is enabled. The kernel already deals with the MCE broadcast issue
for the 'nosmt' case. It brings up all present CPUs and then shuts down the
SMT siblings right away after they did the basic initialization and set
CR4.MCE.

The core CPU hotplug code keeps track of that information already, so it
can be used to decide whether IPI shorthands can be used safely or not.

If all present CPUs have been brought up at least once it's safe to switch
to IPI shorthand mode. The switch over is done with a static key and can be
prevented completely with the existing (so far 32bit only) command line
option.

As a offlined CPU still receives IPIs the offline code is changed to soft
disable the local APIC so the offline CPU will not be bothered by shorthand
based IPIs. In soft disabled state the APIC still handles NMI, INIT, SIPI
so onlining will work as before.

To support NMI based shorthand IPIs the NMI handler gets a new check right
at the beginning of the handler code which lets the handler ignore the NMI
on a offline CPU and not call through the whole spaghetti maze of NMI
handling.

Soft disabling the local APIC on the offlined CPU unearthed a KVM APIC
emulation issue which is only relevant for CPU0 hotplug testing. The fix is
in the KVM tree already, but there is no need to have this dependency here.
(0-day folks are aware of it).

The APIC setup function has also a few minor issues which are addressed in
this series as well.

Part of the series is also a consolidation of the APIC code which was
necessary to not spread all the shorthand implementation details to header
files etc.

It survived testing on a range of different machines including NMI
shorthand IPIs. Aside of the KVM APIC issue, which is only relevant in
combination with CPU0 hotplug testing, there are no known side effects.

Changes vs. V1(https://lkml.kernel.org/r/20190703105431.096822793@linutronix.de)

	- Fix an 11 years old bug in kgdb

	- Move the shorthand decision logic into the callers (Nadav)

	- Make native_send_call_func_ipi() smarter (Nadav)

	- Consolidate more duplicated code 

The series is also available from git:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/ipi

Thanks,

	tglx

8<------------
 a/arch/x86/include/asm/apic_flat_64.h |    8 -
 a/arch/x86/include/asm/ipi.h          |  109 ---------------------
 a/arch/x86/kernel/apic/x2apic.h       |    9 -
 arch/x86/include/asm/apic.h           |   11 +-
 arch/x86/include/asm/bugs.h           |    2 
 arch/x86/include/asm/processor.h      |    2 
 arch/x86/include/asm/smp.h            |    1 
 arch/x86/kernel/apic/apic.c           |  157 +++++++++++++++++++------------
 arch/x86/kernel/apic/apic_flat_64.c   |   66 ++-----------
 arch/x86/kernel/apic/apic_noop.c      |   18 ---
 arch/x86/kernel/apic/apic_numachip.c  |    8 -
 arch/x86/kernel/apic/bigsmp_32.c      |    9 -
 arch/x86/kernel/apic/ipi.c            |  170 +++++++++++++++++++++++++---------
 arch/x86/kernel/apic/probe_32.c       |   41 --------
 arch/x86/kernel/apic/probe_64.c       |   21 ----
 arch/x86/kernel/apic/x2apic_cluster.c |   20 +---
 arch/x86/kernel/apic/x2apic_phys.c    |   25 ++---
 arch/x86/kernel/apic/x2apic_uv_x.c    |   30 +-----
 arch/x86/kernel/cpu/bugs.c            |    2 
 arch/x86/kernel/cpu/common.c          |   11 ++
 arch/x86/kernel/kgdb.c                |    2 
 arch/x86/kernel/nmi.c                 |    3 
 arch/x86/kernel/reboot.c              |    7 -
 arch/x86/kernel/smp.c                 |   44 --------
 arch/x86/kernel/smpboot.c             |   13 ++
 b/arch/x86/kernel/apic/local.h        |   68 +++++++++++++
 include/linux/bitmap.h                |   23 ++++
 include/linux/cpumask.h               |   16 +++
 kernel/cpu.c                          |   11 +-
 lib/bitmap.c                          |   20 ++++
 30 files changed, 450 insertions(+), 477 deletions(-)




^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 01/25] x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-05 21:43   ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 02/25] x86/apic: Invoke perf_events_lapic_init() after enabling APIC Thomas Gleixner
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

apic->send_IPI_allbutself() takes a vector number as argument.

APIC_DM_NMI is clearly not a vector number. It's defined to 0x400 which is
outside the vector space.

Use NMI_VECTOR instead as that's what it is intended to be.

Fixes: 82da3ff89dc2 ("x86: kgdb support")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/kernel/kgdb.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -424,7 +424,7 @@ static void kgdb_disable_hw_debug(struct
  */
 void kgdb_roundup_cpus(void)
 {
-	apic->send_IPI_allbutself(APIC_DM_NMI);
+	apic->send_IPI_allbutself(VECTOR_NMI);
 }
 #endif
 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 02/25] x86/apic: Invoke perf_events_lapic_init() after enabling APIC
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 01/25] x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 03/25] x86/apic: Soft disable APIC before initializing it Thomas Gleixner
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

If the APIC is soft disabled then unmasking an LVT entry does not work and
the write is ignored. perf_events_lapic_init() tries to do so.

Move the invocation after the point where the APIC has been enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/apic.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1488,7 +1488,6 @@ static void setup_local_APIC(void)
 	int logical_apicid, ldr_apicid;
 #endif
 
-
 	if (disable_apic) {
 		disable_ioapic_support();
 		return;
@@ -1503,8 +1502,6 @@ static void setup_local_APIC(void)
 		apic_write(APIC_ESR, 0);
 	}
 #endif
-	perf_events_lapic_init();
-
 	/*
 	 * Double-check whether this APIC is really registered.
 	 * This is meaningless in clustered apic mode, so we skip it.
@@ -1585,6 +1582,8 @@ static void setup_local_APIC(void)
 	value |= SPURIOUS_APIC_VECTOR;
 	apic_write(APIC_SPIV, value);
 
+	perf_events_lapic_init();
+
 	/*
 	 * Set up LVT0, LVT1:
 	 *



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 03/25] x86/apic: Soft disable APIC before initializing it
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 01/25] x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 02/25] x86/apic: Invoke perf_events_lapic_init() after enabling APIC Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust Thomas Gleixner
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

If the APIC was already enabled on entry of setup_local_APIC() then
disabling it soft via the SPIV register makes a lot of sense.

That masks all LVT entries and brings it into a well defined state.

Otherwise previously enabled LVTs which are not touched in the setup
function stay unmasked and might surprise the just booting kernel.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/apic.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1493,6 +1493,14 @@ static void setup_local_APIC(void)
 		return;
 	}
 
+	/*
+	 * If this comes from kexec/kcrash the APIC might be enabled in
+	 * SPIV. Soft disable it before doing further initialization.
+	 */
+	value = apic_read(APIC_SPIV);
+	value &= ~APIC_SPIV_APIC_ENABLED;
+	apic_write(APIC_SPIV, value);
+
 #ifdef CONFIG_X86_32
 	/* Pound the ESR really hard over the head with a big hammer - mbligh */
 	if (lapic_is_integrated() && apic->disable_esr) {



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 03/25] x86/apic: Soft disable APIC before initializing it Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-05 15:47   ` Andrew Cooper
  2019-07-04 15:51 ` [patch V2 05/25] x86/apic: Move IPI inlines into ipi.c Thomas Gleixner
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

In course of developing shorthand based IPI support issues with the
function which tries to clear eventually pending ISR bits in the local APIC
were observed.

  1) O-day testing triggered the WARN_ON() in apic_pending_intr_clear().

     This warning is emitted when the function fails to clear pending ISR
     bits or observes pending IRR bits which are not delivered to the CPU
     after the stale ISR bit(s) are ACK'ed.

     Unfortunately the function only emits a WARN_ON() and fails to dump
     the IRR/ISR content. That's useless for debugging.

     Feng added spot on debug printk's which revealed that the stale IRR
     bit belonged to the APIC timer interrupt vector, but adding ad hoc
     debug code does not help with sporadic failures in the field.

     Rework the loop so the full IRR/ISR contents are saved and on failure
     dumped.

  2) The loop termination logic is interesting at best.

     If the machine has no TSC or cpu_khz is not known yet it tries 1
     million times to ack stale IRR/ISR bits. What?

     With TSC it uses the TSC to calculate the loop termination. It takes a
     timestamp at entry and terminates the loop when:

     	  (rdtsc() - start_timestamp) >= (cpu_hkz << 10)

     That's roughly one second.

     Both methods are problematic. The APIC has 256 vectors, which means
     that in theory max. 256 IRR/ISR bits can be set. In practice this is
     impossible as the first 32 vectors are reserved and not affected and
     the chance that more than a few bits are set is close to zero.

     With the pure loop based approach the 1 million retries are complete
     overkill.

     With TSC this can terminate too early in a guest which is running on a
     heavily loaded host even with only a couple of IRR/ISR bits set. The
     reason is that after acknowledging the highest priority ISR bit,
     pending IRRs must get serviced first before the next round of
     acknowledge can take place as the APIC (real and virtualized) does not
     honour EOI without a preceeding interrupt on the CPU. And every APIC
     read/write takes a VMEXIT if the APIC is virtualized. While trying to
     reproduce the issue 0-day reported it was observed that the guest was
     scheduled out long enough under heavy load that it terminated after 8
     iterations.

     Make the loop terminate after 512 iterations. That's plenty enough
     in any case and does not take endless time to complete.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/apic.c |  111 +++++++++++++++++++++++++-------------------
 1 file changed, 65 insertions(+), 46 deletions(-)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1424,54 +1424,72 @@ static void lapic_setup_esr(void)
 			oldvalue, value);
 }
 
+#define APIC_IR_REGS		APIC_ISR_NR
+#define APIC_IR_BITS		(APIC_IR_REGS * 32)
+#define APIC_IR_MAPSIZE		(APIC_IR_BITS / BITS_PER_LONG)
+
+union apic_ir {
+	unsigned long	map[APIC_IR_MAPSIZE];
+	u32		regs[APIC_IR_REGS];
+};
+
+static bool apic_check_and_ack(union apic_ir *irr, union apic_ir *isr)
+{
+	int i, bit;
+
+	/* Read the IRRs */
+	for (i = 0; i < APIC_IR_REGS; i++)
+		irr->regs[i] = apic_read(APIC_IRR + i * 0x10);
+
+	/* Read the ISRs */
+	for (i = 0; i < APIC_IR_REGS; i++)
+		isr->regs[i] = apic_read(APIC_ISR + i * 0x10);
+
+	/*
+	 * If the ISR map is not empty. ACK the APIC and run another round
+	 * to verify whether a pending IRR has been unblocked and turned
+	 * into a ISR.
+	 */
+	if (!bitmap_empty(isr->map, APIC_IR_BITS)) {
+		/*
+		 * There can be multiple ISR bits set when a high priority
+		 * interrupt preempted a lower priority one. Issue an ACK
+		 * per set bit.
+		 */
+		for_each_set_bit(bit, isr->map, APIC_IR_BITS)
+			ack_APIC_irq();
+		return true;
+	}
+
+	return !bitmap_empty(irr->map, APIC_IR_BITS);
+}
+
+/*
+ * After a crash, we no longer service the interrupts and a pending
+ * interrupt from previous kernel might still have ISR bit set.
+ *
+ * Most probably by now the CPU has serviced that pending interrupt and it
+ * might not have done the ack_APIC_irq() because it thought, interrupt
+ * came from i8259 as ExtInt. LAPIC did not get EOI so it does not clear
+ * the ISR bit and cpu thinks it has already serivced the interrupt. Hence
+ * a vector might get locked. It was noticed for timer irq (vector
+ * 0x31). Issue an extra EOI to clear ISR.
+ *
+ * If there are pending IRR bits they turn into ISR bits after a higher
+ * priority ISR bit has been acked.
+ */
 static void apic_pending_intr_clear(void)
 {
-	long long max_loops = cpu_khz ? cpu_khz : 1000000;
-	unsigned long long tsc = 0, ntsc;
-	unsigned int queued;
-	unsigned long value;
-	int i, j, acked = 0;
-
-	if (boot_cpu_has(X86_FEATURE_TSC))
-		tsc = rdtsc();
-	/*
-	 * After a crash, we no longer service the interrupts and a pending
-	 * interrupt from previous kernel might still have ISR bit set.
-	 *
-	 * Most probably by now CPU has serviced that pending interrupt and
-	 * it might not have done the ack_APIC_irq() because it thought,
-	 * interrupt came from i8259 as ExtInt. LAPIC did not get EOI so it
-	 * does not clear the ISR bit and cpu thinks it has already serivced
-	 * the interrupt. Hence a vector might get locked. It was noticed
-	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
-	 */
-	do {
-		queued = 0;
-		for (i = APIC_ISR_NR - 1; i >= 0; i--)
-			queued |= apic_read(APIC_IRR + i*0x10);
-
-		for (i = APIC_ISR_NR - 1; i >= 0; i--) {
-			value = apic_read(APIC_ISR + i*0x10);
-			for_each_set_bit(j, &value, 32) {
-				ack_APIC_irq();
-				acked++;
-			}
-		}
-		if (acked > 256) {
-			pr_err("LAPIC pending interrupts after %d EOI\n", acked);
-			break;
-		}
-		if (queued) {
-			if (boot_cpu_has(X86_FEATURE_TSC) && cpu_khz) {
-				ntsc = rdtsc();
-				max_loops = (long long)cpu_khz << 10;
-				max_loops -= ntsc - tsc;
-			} else {
-				max_loops--;
-			}
-		}
-	} while (queued && max_loops > 0);
-	WARN_ON(max_loops <= 0);
+	union apic_ir irr, isr;
+	unsigned int i;
+
+	/* 512 loops are way oversized and give the APIC a chance to obey. */
+	for (i = 0; i < 512; i++) {
+		if (!apic_check_and_ack(&irr, &isr))
+			return;
+	}
+	/* Dump the IRR/ISR content if that failed */
+	pr_warn("APIC: Stale IRR: %256pb ISR: %256pb\n", irr.map, isr.map);
 }
 
 /**
@@ -1544,6 +1562,7 @@ static void setup_local_APIC(void)
 	value &= ~APIC_TPRI_MASK;
 	apic_write(APIC_TASKPRI, value);
 
+	/* Clear eventually stale ISR/IRR bits */
 	apic_pending_intr_clear();
 
 	/*



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 05/25] x86/apic: Move IPI inlines into ipi.c
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (3 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 06/25] x86/apic: Cleanup the include maze Thomas Gleixner
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

No point in having them in an header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/ipi.h |   19 -------------------
 arch/x86/kernel/apic/ipi.c |   16 +++++++++++++---
 2 files changed, 13 insertions(+), 22 deletions(-)

--- a/arch/x86/include/asm/ipi.h
+++ b/arch/x86/include/asm/ipi.h
@@ -71,27 +71,8 @@ extern void default_send_IPI_mask_sequen
 extern void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
 							 int vector);
 
-/* Avoid include hell */
-#define NMI_VECTOR 0x02
-
 extern int no_broadcast;
 
-static inline void __default_local_send_IPI_allbutself(int vector)
-{
-	if (no_broadcast || vector == NMI_VECTOR)
-		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
-	else
-		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector, apic->dest_logical);
-}
-
-static inline void __default_local_send_IPI_all(int vector)
-{
-	if (no_broadcast || vector == NMI_VECTOR)
-		apic->send_IPI_mask(cpu_online_mask, vector);
-	else
-		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector, apic->dest_logical);
-}
-
 #ifdef CONFIG_X86_32
 extern void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
 							 int vector);
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -198,15 +198,25 @@ void default_send_IPI_allbutself(int vec
 	 * if there are no other CPUs in the system then we get an APIC send
 	 * error if we try to broadcast, thus avoid sending IPIs in this case.
 	 */
-	if (!(num_online_cpus() > 1))
+	if (num_online_cpus() < 2)
 		return;
 
-	__default_local_send_IPI_allbutself(vector);
+	if (no_broadcast || vector == NMI_VECTOR) {
+		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
+	} else {
+		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector,
+					    apic->dest_logical);
+	}
 }
 
 void default_send_IPI_all(int vector)
 {
-	__default_local_send_IPI_all(vector);
+	if (no_broadcast || vector == NMI_VECTOR) {
+		apic->send_IPI_mask(cpu_online_mask, vector);
+	} else {
+		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector,
+					    apic->dest_logical);
+	}
 }
 
 void default_send_IPI_self(int vector)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 06/25] x86/apic: Cleanup the include maze
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (4 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 05/25] x86/apic: Move IPI inlines into ipi.c Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 07/25] x86/apic: Move ipi header into apic directory Thomas Gleixner
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

All of these APIC files include the world and some more. Remove the
unneeded cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/apic_flat_64.c   |   15 ++++-----------
 arch/x86/kernel/apic/apic_noop.c      |   18 +-----------------
 arch/x86/kernel/apic/apic_numachip.c  |    6 +++---
 arch/x86/kernel/apic/ipi.c            |   15 +--------------
 arch/x86/kernel/apic/probe_32.c       |   18 ++----------------
 arch/x86/kernel/apic/probe_64.c       |   11 -----------
 arch/x86/kernel/apic/x2apic_cluster.c |   14 ++++++--------
 arch/x86/kernel/apic/x2apic_phys.c    |    9 +++------
 arch/x86/kernel/apic/x2apic_uv_x.c    |   28 ++++------------------------
 9 files changed, 24 insertions(+), 110 deletions(-)

--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -8,21 +8,14 @@
  * Martin Bligh, Andi Kleen, James Bottomley, John Stultz, and
  * James Cleverdon.
  */
-#include <linux/acpi.h>
-#include <linux/errno.h>
-#include <linux/threads.h>
 #include <linux/cpumask.h>
-#include <linux/string.h>
-#include <linux/kernel.h>
-#include <linux/ctype.h>
-#include <linux/hardirq.h>
 #include <linux/export.h>
+#include <linux/acpi.h>
 
-#include <asm/smp.h>
-#include <asm/ipi.h>
-#include <asm/apic.h>
-#include <asm/apic_flat_64.h>
 #include <asm/jailhouse_para.h>
+#include <asm/apic_flat_64.h>
+#include <asm/apic.h>
+#include <asm/ipi.h>
 
 static struct apic apic_physflat;
 static struct apic apic_flat;
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -9,25 +9,9 @@
  * to not uglify the caller's code and allow to call (some) apic routines
  * like self-ipi, etc...
  */
-
-#include <linux/threads.h>
 #include <linux/cpumask.h>
-#include <linux/string.h>
-#include <linux/kernel.h>
-#include <linux/ctype.h>
-#include <linux/errno.h>
-#include <asm/fixmap.h>
-#include <asm/mpspec.h>
-#include <asm/apicdef.h>
-#include <asm/apic.h>
-#include <asm/setup.h>
 
-#include <linux/smp.h>
-#include <asm/ipi.h>
-
-#include <linux/interrupt.h>
-#include <asm/acpi.h>
-#include <asm/e820/api.h>
+#include <asm/apic.h>
 
 static void noop_init_apic_ldr(void) { }
 static void noop_send_IPI(int cpu, int vector) { }
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -10,15 +10,15 @@
  * Send feedback to <support@numascale.com>
  *
  */
-
+#include <linux/types.h>
 #include <linux/init.h>
 
 #include <asm/numachip/numachip.h>
 #include <asm/numachip/numachip_csr.h>
-#include <asm/ipi.h>
+
 #include <asm/apic_flat_64.h>
 #include <asm/pgtable.h>
-#include <asm/pci_x86.h>
+#include <asm/ipi.h>
 
 u8 numachip_system __read_mostly;
 static const struct apic apic_numachip1;
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -1,21 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
-#include <linux/cpumask.h>
-#include <linux/interrupt.h>
 
-#include <linux/mm.h>
-#include <linux/delay.h>
-#include <linux/spinlock.h>
-#include <linux/kernel_stat.h>
-#include <linux/mc146818rtc.h>
-#include <linux/cache.h>
-#include <linux/cpu.h>
+#include <linux/cpumask.h>
 
-#include <asm/smp.h>
-#include <asm/mtrr.h>
-#include <asm/tlbflush.h>
-#include <asm/mmu_context.h>
 #include <asm/apic.h>
-#include <asm/proto.h>
 #include <asm/ipi.h>
 
 void __default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest)
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -6,26 +6,12 @@
  *
  * Generic x86 APIC driver probe layer.
  */
-#include <linux/threads.h>
-#include <linux/cpumask.h>
 #include <linux/export.h>
-#include <linux/string.h>
-#include <linux/kernel.h>
-#include <linux/ctype.h>
-#include <linux/init.h>
 #include <linux/errno.h>
-#include <asm/fixmap.h>
-#include <asm/mpspec.h>
-#include <asm/apicdef.h>
-#include <asm/apic.h>
-#include <asm/setup.h>
-
-#include <linux/smp.h>
-#include <asm/ipi.h>
 
-#include <linux/interrupt.h>
+#include <asm/apic.h>
 #include <asm/acpi.h>
-#include <asm/e820/api.h>
+#include <asm/ipi.h>
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define DEFAULT_SEND_IPI	(1)
--- a/arch/x86/kernel/apic/probe_64.c
+++ b/arch/x86/kernel/apic/probe_64.c
@@ -8,19 +8,8 @@
  * Martin Bligh, Andi Kleen, James Bottomley, John Stultz, and
  * James Cleverdon.
  */
-#include <linux/threads.h>
-#include <linux/cpumask.h>
-#include <linux/string.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/ctype.h>
-#include <linux/hardirq.h>
-#include <linux/dmar.h>
-
-#include <asm/smp.h>
 #include <asm/apic.h>
 #include <asm/ipi.h>
-#include <asm/setup.h>
 
 /*
  * Check the APIC IDs in bios_cpu_apicid and choose the APIC mode.
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -1,14 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
-#include <linux/threads.h>
+
+#include <linux/cpuhotplug.h>
 #include <linux/cpumask.h>
-#include <linux/string.h>
-#include <linux/kernel.h>
-#include <linux/ctype.h>
-#include <linux/dmar.h>
-#include <linux/irq.h>
-#include <linux/cpu.h>
+#include <linux/slab.h>
+#include <linux/mm.h>
+
+#include <asm/apic.h>
 
-#include <asm/smp.h>
 #include "x2apic.h"
 
 struct cluster_mask {
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -1,13 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
-#include <linux/threads.h>
+
 #include <linux/cpumask.h>
-#include <linux/string.h>
-#include <linux/kernel.h>
-#include <linux/ctype.h>
-#include <linux/dmar.h>
+#include <linux/acpi.h>
 
-#include <asm/smp.h>
 #include <asm/ipi.h>
+
 #include "x2apic.h"
 
 int x2apic_phys;
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -7,40 +7,20 @@
  *
  * Copyright (C) 2007-2014 Silicon Graphics, Inc. All rights reserved.
  */
+#include <linux/crash_dump.h>
+#include <linux/cpuhotplug.h>
 #include <linux/cpumask.h>
-#include <linux/hardirq.h>
 #include <linux/proc_fs.h>
-#include <linux/threads.h>
-#include <linux/kernel.h>
+#include <linux/memory.h>
 #include <linux/export.h>
-#include <linux/string.h>
-#include <linux/ctype.h>
-#include <linux/sched.h>
-#include <linux/timer.h>
-#include <linux/slab.h>
-#include <linux/cpu.h>
-#include <linux/init.h>
-#include <linux/io.h>
 #include <linux/pci.h>
-#include <linux/kdebug.h>
-#include <linux/delay.h>
-#include <linux/crash_dump.h>
-#include <linux/reboot.h>
-#include <linux/memory.h>
-#include <linux/numa.h>
 
+#include <asm/e820/api.h>
 #include <asm/uv/uv_mmrs.h>
 #include <asm/uv/uv_hub.h>
-#include <asm/current.h>
-#include <asm/pgtable.h>
 #include <asm/uv/bios.h>
 #include <asm/uv/uv.h>
 #include <asm/apic.h>
-#include <asm/e820/api.h>
-#include <asm/ipi.h>
-#include <asm/smp.h>
-#include <asm/x86_init.h>
-#include <asm/nmi.h>
 
 DEFINE_PER_CPU(int, x2apic_extra_bits);
 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 07/25] x86/apic: Move ipi header into apic directory
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (5 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 06/25] x86/apic: Cleanup the include maze Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 08/25] x86/apic: Move apic_flat_64 " Thomas Gleixner
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

Only used locally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/ipi.h           |   90 -----------------------------------
 arch/x86/kernel/apic/apic_flat_64.c  |    3 -
 arch/x86/kernel/apic/apic_numachip.c |    3 -
 arch/x86/kernel/apic/bigsmp_32.c     |    9 ---
 arch/x86/kernel/apic/ipi.c           |    3 -
 arch/x86/kernel/apic/ipi.h           |   90 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/apic/probe_32.c      |    3 -
 arch/x86/kernel/apic/probe_64.c      |    3 -
 arch/x86/kernel/apic/x2apic_phys.c   |    3 -
 9 files changed, 103 insertions(+), 104 deletions(-)

--- a/arch/x86/include/asm/ipi.h
+++ /dev/null
@@ -1,90 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-#ifndef _ASM_X86_IPI_H
-#define _ASM_X86_IPI_H
-
-#ifdef CONFIG_X86_LOCAL_APIC
-
-/*
- * Copyright 2004 James Cleverdon, IBM.
- *
- * Generic APIC InterProcessor Interrupt code.
- *
- * Moved to include file by James Cleverdon from
- * arch/x86-64/kernel/smp.c
- *
- * Copyrights from kernel/smp.c:
- *
- * (c) 1995 Alan Cox, Building #3 <alan@redhat.com>
- * (c) 1998-99, 2000 Ingo Molnar <mingo@redhat.com>
- * (c) 2002,2003 Andi Kleen, SuSE Labs.
- */
-
-#include <asm/hw_irq.h>
-#include <asm/apic.h>
-#include <asm/smp.h>
-
-/*
- * the following functions deal with sending IPIs between CPUs.
- *
- * We use 'broadcast', CPU->CPU IPIs and self-IPIs too.
- */
-
-static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
-					 unsigned int dest)
-{
-	unsigned int icr = shortcut | dest;
-
-	switch (vector) {
-	default:
-		icr |= APIC_DM_FIXED | vector;
-		break;
-	case NMI_VECTOR:
-		icr |= APIC_DM_NMI;
-		break;
-	}
-	return icr;
-}
-
-static inline int __prepare_ICR2(unsigned int mask)
-{
-	return SET_APIC_DEST_FIELD(mask);
-}
-
-static inline void __xapic_wait_icr_idle(void)
-{
-	while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
-		cpu_relax();
-}
-
-void __default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest);
-
-/*
- * This is used to send an IPI with no shorthand notation (the destination is
- * specified in bits 56 to 63 of the ICR).
- */
-void __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest);
-
-extern void default_send_IPI_single(int cpu, int vector);
-extern void default_send_IPI_single_phys(int cpu, int vector);
-extern void default_send_IPI_mask_sequence_phys(const struct cpumask *mask,
-						 int vector);
-extern void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
-							 int vector);
-
-extern int no_broadcast;
-
-#ifdef CONFIG_X86_32
-extern void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
-							 int vector);
-extern void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
-							 int vector);
-extern void default_send_IPI_mask_logical(const struct cpumask *mask,
-						 int vector);
-extern void default_send_IPI_allbutself(int vector);
-extern void default_send_IPI_all(int vector);
-extern void default_send_IPI_self(int vector);
-#endif
-
-#endif
-
-#endif /* _ASM_X86_IPI_H */
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -15,7 +15,8 @@
 #include <asm/jailhouse_para.h>
 #include <asm/apic_flat_64.h>
 #include <asm/apic.h>
-#include <asm/ipi.h>
+
+#include "ipi.h"
 
 static struct apic apic_physflat;
 static struct apic apic_flat;
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -18,7 +18,8 @@
 
 #include <asm/apic_flat_64.h>
 #include <asm/pgtable.h>
-#include <asm/ipi.h>
+
+#include "ipi.h"
 
 u8 numachip_system __read_mostly;
 static const struct apic apic_numachip1;
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -4,18 +4,13 @@
  *
  * Drives the local APIC in "clustered mode".
  */
-#include <linux/threads.h>
 #include <linux/cpumask.h>
-#include <linux/kernel.h>
-#include <linux/init.h>
 #include <linux/dmi.h>
 #include <linux/smp.h>
 
-#include <asm/apicdef.h>
-#include <asm/fixmap.h>
-#include <asm/mpspec.h>
 #include <asm/apic.h>
-#include <asm/ipi.h>
+
+#include "ipi.h"
 
 static unsigned bigsmp_get_apic_id(unsigned long x)
 {
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -3,7 +3,8 @@
 #include <linux/cpumask.h>
 
 #include <asm/apic.h>
-#include <asm/ipi.h>
+
+#include "ipi.h"
 
 void __default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest)
 {
--- /dev/null
+++ b/arch/x86/kernel/apic/ipi.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_X86_IPI_H
+#define _ASM_X86_IPI_H
+
+#ifdef CONFIG_X86_LOCAL_APIC
+
+/*
+ * Copyright 2004 James Cleverdon, IBM.
+ *
+ * Generic APIC InterProcessor Interrupt code.
+ *
+ * Moved to include file by James Cleverdon from
+ * arch/x86-64/kernel/smp.c
+ *
+ * Copyrights from kernel/smp.c:
+ *
+ * (c) 1995 Alan Cox, Building #3 <alan@redhat.com>
+ * (c) 1998-99, 2000 Ingo Molnar <mingo@redhat.com>
+ * (c) 2002,2003 Andi Kleen, SuSE Labs.
+ */
+
+#include <asm/hw_irq.h>
+#include <asm/apic.h>
+#include <asm/smp.h>
+
+/*
+ * the following functions deal with sending IPIs between CPUs.
+ *
+ * We use 'broadcast', CPU->CPU IPIs and self-IPIs too.
+ */
+
+static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
+					 unsigned int dest)
+{
+	unsigned int icr = shortcut | dest;
+
+	switch (vector) {
+	default:
+		icr |= APIC_DM_FIXED | vector;
+		break;
+	case NMI_VECTOR:
+		icr |= APIC_DM_NMI;
+		break;
+	}
+	return icr;
+}
+
+static inline int __prepare_ICR2(unsigned int mask)
+{
+	return SET_APIC_DEST_FIELD(mask);
+}
+
+static inline void __xapic_wait_icr_idle(void)
+{
+	while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
+		cpu_relax();
+}
+
+void __default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest);
+
+/*
+ * This is used to send an IPI with no shorthand notation (the destination is
+ * specified in bits 56 to 63 of the ICR).
+ */
+void __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest);
+
+extern void default_send_IPI_single(int cpu, int vector);
+extern void default_send_IPI_single_phys(int cpu, int vector);
+extern void default_send_IPI_mask_sequence_phys(const struct cpumask *mask,
+						 int vector);
+extern void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
+							 int vector);
+
+extern int no_broadcast;
+
+#ifdef CONFIG_X86_32
+extern void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
+							 int vector);
+extern void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
+							 int vector);
+extern void default_send_IPI_mask_logical(const struct cpumask *mask,
+						 int vector);
+extern void default_send_IPI_allbutself(int vector);
+extern void default_send_IPI_all(int vector);
+extern void default_send_IPI_self(int vector);
+#endif
+
+#endif
+
+#endif /* _ASM_X86_IPI_H */
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -11,7 +11,8 @@
 
 #include <asm/apic.h>
 #include <asm/acpi.h>
-#include <asm/ipi.h>
+
+#include "ipi.h"
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define DEFAULT_SEND_IPI	(1)
--- a/arch/x86/kernel/apic/probe_64.c
+++ b/arch/x86/kernel/apic/probe_64.c
@@ -9,7 +9,8 @@
  * James Cleverdon.
  */
 #include <asm/apic.h>
-#include <asm/ipi.h>
+
+#include "ipi.h"
 
 /*
  * Check the APIC IDs in bios_cpu_apicid and choose the APIC mode.
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -3,9 +3,8 @@
 #include <linux/cpumask.h>
 #include <linux/acpi.h>
 
-#include <asm/ipi.h>
-
 #include "x2apic.h"
+#include "ipi.h"
 
 int x2apic_phys;
 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 08/25] x86/apic: Move apic_flat_64 header into apic directory
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (6 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 07/25] x86/apic: Move ipi header into apic directory Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 09/25] x86/apic: Consolidate the apic local headers Thomas Gleixner
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

Only used locally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/apic_flat_64.h  |    8 --------
 arch/x86/kernel/apic/apic_flat_64.c  |    2 +-
 arch/x86/kernel/apic/apic_flat_64.h  |    8 ++++++++
 arch/x86/kernel/apic/apic_numachip.c |    2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

--- a/arch/x86/include/asm/apic_flat_64.h
+++ /dev/null
@@ -1,8 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_APIC_FLAT_64_H
-#define _ASM_X86_APIC_FLAT_64_H
-
-extern void flat_init_apic_ldr(void);
-
-#endif
-
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -13,9 +13,9 @@
 #include <linux/acpi.h>
 
 #include <asm/jailhouse_para.h>
-#include <asm/apic_flat_64.h>
 #include <asm/apic.h>
 
+#include "apic_flat_64.h"
 #include "ipi.h"
 
 static struct apic apic_physflat;
--- /dev/null
+++ b/arch/x86/kernel/apic/apic_flat_64.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_APIC_FLAT_64_H
+#define _ASM_X86_APIC_FLAT_64_H
+
+extern void flat_init_apic_ldr(void);
+
+#endif
+
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -16,9 +16,9 @@
 #include <asm/numachip/numachip.h>
 #include <asm/numachip/numachip_csr.h>
 
-#include <asm/apic_flat_64.h>
 #include <asm/pgtable.h>
 
+#include "apic_flat_64.h"
 #include "ipi.h"
 
 u8 numachip_system __read_mostly;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 09/25] x86/apic: Consolidate the apic local headers
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (7 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 08/25] x86/apic: Move apic_flat_64 " Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 10/25] x86/apic/uv: Make x2apic_extra_bits static Thomas Gleixner
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

Now there are three small local headers. Some contain functions which are
only used in one source file.

Move all the inlines and declarations into a single local header and the
inlines which are only used in one source file into that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/apic_flat_64.c   |    3 -
 arch/x86/kernel/apic/apic_flat_64.h   |    8 ---
 arch/x86/kernel/apic/apic_numachip.c  |    3 -
 arch/x86/kernel/apic/bigsmp_32.c      |    2 
 arch/x86/kernel/apic/ipi.c            |   14 ++++-
 arch/x86/kernel/apic/ipi.h            |   90 ----------------------------------
 arch/x86/kernel/apic/local.h          |   63 +++++++++++++++++++++++
 arch/x86/kernel/apic/probe_32.c       |    3 -
 arch/x86/kernel/apic/probe_64.c       |    2 
 arch/x86/kernel/apic/x2apic.h         |    9 ---
 arch/x86/kernel/apic/x2apic_cluster.c |    2 
 arch/x86/kernel/apic/x2apic_phys.c    |    3 -
 12 files changed, 83 insertions(+), 119 deletions(-)

--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -15,8 +15,7 @@
 #include <asm/jailhouse_para.h>
 #include <asm/apic.h>
 
-#include "apic_flat_64.h"
-#include "ipi.h"
+#include "local.h"
 
 static struct apic apic_physflat;
 static struct apic apic_flat;
--- a/arch/x86/kernel/apic/apic_flat_64.h
+++ /dev/null
@@ -1,8 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_APIC_FLAT_64_H
-#define _ASM_X86_APIC_FLAT_64_H
-
-extern void flat_init_apic_ldr(void);
-
-#endif
-
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -18,8 +18,7 @@
 
 #include <asm/pgtable.h>
 
-#include "apic_flat_64.h"
-#include "ipi.h"
+#include "local.h"
 
 u8 numachip_system __read_mostly;
 static const struct apic apic_numachip1;
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -10,7 +10,7 @@
 
 #include <asm/apic.h>
 
-#include "ipi.h"
+#include "local.h"
 
 static unsigned bigsmp_get_apic_id(unsigned long x)
 {
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -1,10 +1,20 @@
 // SPDX-License-Identifier: GPL-2.0
 
 #include <linux/cpumask.h>
+#include <linux/smp.h>
 
-#include <asm/apic.h>
+#include "local.h"
 
-#include "ipi.h"
+static inline int __prepare_ICR2(unsigned int mask)
+{
+	return SET_APIC_DEST_FIELD(mask);
+}
+
+static inline void __xapic_wait_icr_idle(void)
+{
+	while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
+		cpu_relax();
+}
 
 void __default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest)
 {
--- a/arch/x86/kernel/apic/ipi.h
+++ /dev/null
@@ -1,90 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-#ifndef _ASM_X86_IPI_H
-#define _ASM_X86_IPI_H
-
-#ifdef CONFIG_X86_LOCAL_APIC
-
-/*
- * Copyright 2004 James Cleverdon, IBM.
- *
- * Generic APIC InterProcessor Interrupt code.
- *
- * Moved to include file by James Cleverdon from
- * arch/x86-64/kernel/smp.c
- *
- * Copyrights from kernel/smp.c:
- *
- * (c) 1995 Alan Cox, Building #3 <alan@redhat.com>
- * (c) 1998-99, 2000 Ingo Molnar <mingo@redhat.com>
- * (c) 2002,2003 Andi Kleen, SuSE Labs.
- */
-
-#include <asm/hw_irq.h>
-#include <asm/apic.h>
-#include <asm/smp.h>
-
-/*
- * the following functions deal with sending IPIs between CPUs.
- *
- * We use 'broadcast', CPU->CPU IPIs and self-IPIs too.
- */
-
-static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
-					 unsigned int dest)
-{
-	unsigned int icr = shortcut | dest;
-
-	switch (vector) {
-	default:
-		icr |= APIC_DM_FIXED | vector;
-		break;
-	case NMI_VECTOR:
-		icr |= APIC_DM_NMI;
-		break;
-	}
-	return icr;
-}
-
-static inline int __prepare_ICR2(unsigned int mask)
-{
-	return SET_APIC_DEST_FIELD(mask);
-}
-
-static inline void __xapic_wait_icr_idle(void)
-{
-	while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
-		cpu_relax();
-}
-
-void __default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest);
-
-/*
- * This is used to send an IPI with no shorthand notation (the destination is
- * specified in bits 56 to 63 of the ICR).
- */
-void __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest);
-
-extern void default_send_IPI_single(int cpu, int vector);
-extern void default_send_IPI_single_phys(int cpu, int vector);
-extern void default_send_IPI_mask_sequence_phys(const struct cpumask *mask,
-						 int vector);
-extern void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
-							 int vector);
-
-extern int no_broadcast;
-
-#ifdef CONFIG_X86_32
-extern void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
-							 int vector);
-extern void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
-							 int vector);
-extern void default_send_IPI_mask_logical(const struct cpumask *mask,
-						 int vector);
-extern void default_send_IPI_allbutself(int vector);
-extern void default_send_IPI_all(int vector);
-extern void default_send_IPI_self(int vector);
-#endif
-
-#endif
-
-#endif /* _ASM_X86_IPI_H */
--- /dev/null
+++ b/arch/x86/kernel/apic/local.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Historical copyright notices:
+ *
+ * Copyright 2004 James Cleverdon, IBM.
+ * (c) 1995 Alan Cox, Building #3 <alan@redhat.com>
+ * (c) 1998-99, 2000 Ingo Molnar <mingo@redhat.com>
+ * (c) 2002,2003 Andi Kleen, SuSE Labs.
+ */
+#include <asm/apic.h>
+
+/* APIC flat 64 */
+void flat_init_apic_ldr(void);
+
+/* X2APIC */
+int x2apic_apic_id_valid(u32 apicid);
+int x2apic_apic_id_registered(void);
+void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest);
+unsigned int x2apic_get_apic_id(unsigned long id);
+u32 x2apic_set_apic_id(unsigned int id);
+int x2apic_phys_pkg_id(int initial_apicid, int index_msb);
+void x2apic_send_IPI_self(int vector);
+
+/* IPI */
+static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
+					 unsigned int dest)
+{
+	unsigned int icr = shortcut | dest;
+
+	switch (vector) {
+	default:
+		icr |= APIC_DM_FIXED | vector;
+		break;
+	case NMI_VECTOR:
+		icr |= APIC_DM_NMI;
+		break;
+	}
+	return icr;
+}
+
+void __default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest);
+
+/*
+ * This is used to send an IPI with no shorthand notation (the destination is
+ * specified in bits 56 to 63 of the ICR).
+ */
+void __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest);
+
+void default_send_IPI_single(int cpu, int vector);
+void default_send_IPI_single_phys(int cpu, int vector);
+void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, int vector);
+void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask, int vector);
+
+extern int no_broadcast;
+
+#ifdef CONFIG_X86_32
+void default_send_IPI_mask_sequence_logical(const struct cpumask *mask, int vector);
+void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask, int vector);
+void default_send_IPI_mask_logical(const struct cpumask *mask, int vector);
+void default_send_IPI_allbutself(int vector);
+void default_send_IPI_all(int vector);
+void default_send_IPI_self(int vector);
+#endif
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -8,11 +8,12 @@
  */
 #include <linux/export.h>
 #include <linux/errno.h>
+#include <linux/smp.h>
 
 #include <asm/apic.h>
 #include <asm/acpi.h>
 
-#include "ipi.h"
+#include "local.h"
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define DEFAULT_SEND_IPI	(1)
--- a/arch/x86/kernel/apic/probe_64.c
+++ b/arch/x86/kernel/apic/probe_64.c
@@ -10,7 +10,7 @@
  */
 #include <asm/apic.h>
 
-#include "ipi.h"
+#include "local.h"
 
 /*
  * Check the APIC IDs in bios_cpu_apicid and choose the APIC mode.
--- a/arch/x86/kernel/apic/x2apic.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/* Common bits for X2APIC cluster/physical modes. */
-
-int x2apic_apic_id_valid(u32 apicid);
-int x2apic_apic_id_registered(void);
-void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest);
-unsigned int x2apic_get_apic_id(unsigned long id);
-u32 x2apic_set_apic_id(unsigned int id);
-int x2apic_phys_pkg_id(int initial_apicid, int index_msb);
-void x2apic_send_IPI_self(int vector);
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -7,7 +7,7 @@
 
 #include <asm/apic.h>
 
-#include "x2apic.h"
+#include "local.h"
 
 struct cluster_mask {
 	unsigned int	clusterid;
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -3,8 +3,7 @@
 #include <linux/cpumask.h>
 #include <linux/acpi.h>
 
-#include "x2apic.h"
-#include "ipi.h"
+#include "local.h"
 
 int x2apic_phys;
 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 10/25] x86/apic/uv: Make x2apic_extra_bits static
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (8 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 09/25] x86/apic: Consolidate the apic local headers Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 11/25] smp/hotplug: Track booted once CPUs in a cpumask Thomas Gleixner
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

Not used outside of the UV apic source.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/apic.h        |    2 --
 arch/x86/kernel/apic/x2apic_uv_x.c |    2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -466,8 +466,6 @@ static inline unsigned default_get_apic_
 
 #ifdef CONFIG_X86_64
 extern void apic_send_IPI_self(int vector);
-
-DECLARE_PER_CPU(int, x2apic_extra_bits);
 #endif
 
 extern void generic_bigsmp_probe(void);
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -22,7 +22,7 @@
 #include <asm/uv/uv.h>
 #include <asm/apic.h>
 
-DEFINE_PER_CPU(int, x2apic_extra_bits);
+static DEFINE_PER_CPU(int, x2apic_extra_bits);
 
 static enum uv_system_type	uv_system_type;
 static bool			uv_hubless_system;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 11/25] smp/hotplug: Track booted once CPUs in a cpumask
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (9 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 10/25] x86/apic/uv: Make x2apic_extra_bits static Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 12/25] x86/cpu: Move arch_smt_update() to a neutral place Thomas Gleixner
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

The booted once information which is required to deal with the MCE
broadcast issue on X86 correctly is stored in the per cpu hotplug state,
which is perfectly fine for the intended purpose.

X86 needs that information for supporting NMI broadcasting via shortcuts,
but retrieving it from per cpu data is cumbersome.

Move it to a cpumask so the information can be checked against the
cpu_present_mask quickly.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpumask.h |    2 ++
 kernel/cpu.c            |   11 +++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -115,6 +115,8 @@ extern struct cpumask __cpu_active_mask;
 #define cpu_active(cpu)		((cpu) == 0)
 #endif
 
+extern cpumask_t cpus_booted_once_mask;
+
 static inline void cpu_max_bits_warn(unsigned int cpu, unsigned int bits)
 {
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -62,7 +62,6 @@ struct cpuhp_cpu_state {
 	bool			rollback;
 	bool			single;
 	bool			bringup;
-	bool			booted_once;
 	struct hlist_node	*node;
 	struct hlist_node	*last;
 	enum cpuhp_state	cb_state;
@@ -76,6 +75,10 @@ static DEFINE_PER_CPU(struct cpuhp_cpu_s
 	.fail = CPUHP_INVALID,
 };
 
+#ifdef CONFIG_SMP
+cpumask_t cpus_booted_once_mask;
+#endif
+
 #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
 static struct lockdep_map cpuhp_state_up_map =
 	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_map);
@@ -433,7 +436,7 @@ static inline bool cpu_smt_allowed(unsig
 	 * CPU. Otherwise, a broadacasted MCE observing CR4.MCE=0b on any
 	 * core will shutdown the machine.
 	 */
-	return !per_cpu(cpuhp_state, cpu).booted_once;
+	return !cpumask_test_cpu(cpu, &cpus_booted_once_mask);
 }
 #else
 static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
@@ -1066,7 +1069,7 @@ void notify_cpu_starting(unsigned int cp
 	int ret;
 
 	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
-	st->booted_once = true;
+	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
 	while (st->state < target) {
 		st->state++;
 		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
@@ -2327,7 +2330,7 @@ void __init boot_cpu_init(void)
 void __init boot_cpu_hotplug_init(void)
 {
 #ifdef CONFIG_SMP
-	this_cpu_write(cpuhp_state.booted_once, true);
+	cpumask_set_cpu(smp_processor_id(), &cpus_booted_once_mask);
 #endif
 	this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
 }



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 12/25] x86/cpu: Move arch_smt_update() to a neutral place
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (10 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 11/25] smp/hotplug: Track booted once CPUs in a cpumask Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 13/25] x86/hotplug: Silence APIC and NMI when CPU is dead Thomas Gleixner
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

arch_smt_update() will be used to control IPI/NMI broadcasting via the
shorthand mechanism. Keeping it in the bugs file and calling the apic
function from there is possible, but not really intuitive.

Move it to a neutral place and invoke the bugs function from there.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/bugs.h  |    2 ++
 arch/x86/kernel/cpu/bugs.c   |    2 +-
 arch/x86/kernel/cpu/common.c |    9 +++++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/bugs.h
+++ b/arch/x86/include/asm/bugs.h
@@ -18,4 +18,6 @@ int ppro_with_ram_bug(void);
 static inline int ppro_with_ram_bug(void) { return 0; }
 #endif
 
+extern void cpu_bugs_smt_update(void);
+
 #endif /* _ASM_X86_BUGS_H */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -700,7 +700,7 @@ static void update_mds_branch_idle(void)
 
 #define MDS_MSG_SMT "MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.\n"
 
-void arch_smt_update(void)
+void cpu_bugs_smt_update(void)
 {
 	/* Enhanced IBRS implies STIBP. No update required. */
 	if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1880,3 +1880,12 @@ void microcode_check(void)
 	pr_warn("x86/CPU: CPU features have changed after loading microcode, but might not take effect.\n");
 	pr_warn("x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.\n");
 }
+
+/*
+ * Invoked from core CPU hotplug code after hotplug operations
+ */
+void arch_smt_update(void)
+{
+	/* Handle the speculative execution misfeatures */
+	cpu_bugs_smt_update();
+}



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 13/25] x86/hotplug: Silence APIC and NMI when CPU is dead
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (11 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 12/25] x86/cpu: Move arch_smt_update() to a neutral place Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:51 ` [patch V2 14/25] x86/apic: Remove dest argument from __default_send_IPI_shortcut() Thomas Gleixner
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

In order to support IPI/NMI broadcasting via the shorthand mechanism side
effects of shorthands need to be mitigated:

 Shorthand IPIs and NMIs hit all CPUs including unplugged CPUs

Neither of those can be handled on unplugged CPUs for obvious reasons.

It would be trivial to just fully disable the APIC via the enable bit in
MSR_APICBASE. But that's not possible because clearing that bit on systems
based on the 3 wire APIC bus would require a hardware reset to bring it
back as the APIC would lose track of bus arbitration. On systems with FSB
delivery APICBASE could be disabled, but it has to be guaranteed that no
interrupt is sent to the APIC while in that state and it's not clear from
the SDM whether it still responds to INIT/SIPI messages.

Therefore stay on the safe side and switch the APIC into soft disabled mode
so it won't deliver any regular vector to the CPU.

NMIs are still propagated to the 'dead' CPUs. To mitigate that add a per
cpu variable which tells the NMI handler to ignore NMIs. Note, this cannot
use the stop/restart_nmi() magic which is used in the alternatives code. A
dead CPU cannot invoke nmi_enter() or anything else due to RCU and other
reasons.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/apic.h      |    1 +
 arch/x86/include/asm/processor.h |    2 ++
 arch/x86/kernel/apic/apic.c      |   35 ++++++++++++++++++++++++-----------
 arch/x86/kernel/nmi.c            |    3 +++
 arch/x86/kernel/smpboot.c        |   13 ++++++++++++-
 5 files changed, 42 insertions(+), 12 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -136,6 +136,7 @@ extern int lapic_get_maxlvt(void);
 extern void clear_local_APIC(void);
 extern void disconnect_bsp_APIC(int virt_wire_setup);
 extern void disable_local_APIC(void);
+extern void apic_soft_disable(void);
 extern void lapic_shutdown(void);
 extern void sync_Arb_IDs(void);
 extern void init_bsp_APIC(void);
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -425,6 +425,8 @@ DECLARE_PER_CPU_ALIGNED(struct stack_can
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 #endif	/* X86_64 */
 
+DECLARE_PER_CPU(bool, cpu_ignore_nmi);
+
 extern unsigned int fpu_kernel_xstate_size;
 extern unsigned int fpu_user_xstate_size;
 
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1155,25 +1155,38 @@ void clear_local_APIC(void)
 }
 
 /**
- * disable_local_APIC - clear and disable the local APIC
+ * apic_soft_disable - Clears and software disables the local APIC on hotplug
+ *
+ * Contrary to disable_local_APIC() this does not touch the enable bit in
+ * MSR_IA32_APICBASE. Clearing that bit on systems based on the 3 wire APIC
+ * bus would require a hardware reset as the APIC would lose track of bus
+ * arbitration. On systems with FSB delivery APICBASE could be disabled,
+ * but it has to be guaranteed that no interrupt is sent to the APIC while
+ * in that state and it's not clear from the SDM whether it still responds
+ * to INIT/SIPI messages. Stay on the safe side and use software disable.
  */
-void disable_local_APIC(void)
+void apic_soft_disable(void)
 {
-	unsigned int value;
-
-	/* APIC hasn't been mapped yet */
-	if (!x2apic_mode && !apic_phys)
-		return;
+	u32 value;
 
 	clear_local_APIC();
 
-	/*
-	 * Disable APIC (implies clearing of registers
-	 * for 82489DX!).
-	 */
+	/* Soft disable APIC (implies clearing of registers for 82489DX!). */
 	value = apic_read(APIC_SPIV);
 	value &= ~APIC_SPIV_APIC_ENABLED;
 	apic_write(APIC_SPIV, value);
+}
+
+/**
+ * disable_local_APIC - clear and disable the local APIC
+ */
+void disable_local_APIC(void)
+{
+	/* APIC hasn't been mapped yet */
+	if (!x2apic_mode && !apic_phys)
+		return;
+
+	apic_soft_disable();
 
 #ifdef CONFIG_X86_32
 	/*
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -512,6 +512,9 @@ NOKPROBE_SYMBOL(is_debug_stack);
 dotraplinkage notrace void
 do_nmi(struct pt_regs *regs, long error_code)
 {
+	if (IS_ENABLED(CONFIG_SMP) && this_cpu_read(cpu_ignore_nmi))
+		return;
+
 	if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {
 		this_cpu_write(nmi_state, NMI_LATCHED);
 		return;
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -81,6 +81,9 @@
 #include <asm/spec-ctrl.h>
 #include <asm/hw_irq.h>
 
+/* Flag for the NMI path telling it to ignore the NMI */
+DEFINE_PER_CPU(bool, cpu_ignore_nmi);
+
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
 EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
@@ -252,6 +255,8 @@ static void notrace start_secondary(void
 	unlock_vector_lock();
 	cpu_set_state_online(smp_processor_id());
 	x86_platform.nmi_init();
+	/* Reenable NMI handling */
+	this_cpu_write(cpu_ignore_nmi, false);
 
 	/* enable local interrupts */
 	local_irq_enable();
@@ -1524,6 +1529,7 @@ void cpu_disable_common(void)
 	unlock_vector_lock();
 	fixup_irqs();
 	lapic_offline();
+	this_cpu_write(cpu_ignore_nmi, true);
 }
 
 int native_cpu_disable(void)
@@ -1534,7 +1540,12 @@ int native_cpu_disable(void)
 	if (ret)
 		return ret;
 
-	clear_local_APIC();
+	/*
+	 * Disable the local APIC. Otherwise IPI broadcasts will reach
+	 * it. It still responds normally to INIT, NMI, SMI, and SIPI
+	 * messages.
+	 */
+	apic_soft_disable();
 	cpu_disable_common();
 
 	return 0;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 14/25] x86/apic: Remove dest argument from __default_send_IPI_shortcut()
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (12 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 13/25] x86/hotplug: Silence APIC and NMI when CPU is dead Thomas Gleixner
@ 2019-07-04 15:51 ` Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 15/25] x86/apic: Add NMI_VECTOR wait to IPI shorthand Thomas Gleixner
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:51 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

The SDM states:

  "The destination shorthand field of the ICR allows the delivery mode to be
   by-passed in favor of broadcasting the IPI to all the processors on the
   system bus and/or back to itself (see Section 10.6.1, Interrupt Command
   Register (ICR)). Three destination shorthands are supported: self, all
   excluding self, and all including self. The destination mode is ignored
   when a destination shorthand is used."

So there is no point to supply the destination mode to the shorthand
delivery function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/apic_flat_64.c |    6 ++----
 arch/x86/kernel/apic/ipi.c          |   15 +++++++--------
 arch/x86/kernel/apic/local.h        |    2 +-
 arch/x86/kernel/apic/probe_64.c     |    2 +-
 4 files changed, 11 insertions(+), 14 deletions(-)

--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -90,8 +90,7 @@ static void flat_send_IPI_allbutself(int
 			_flat_send_IPI_mask(mask, vector);
 		}
 	} else if (num_online_cpus() > 1) {
-		__default_send_IPI_shortcut(APIC_DEST_ALLBUT,
-					    vector, apic->dest_logical);
+		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
 	}
 }
 
@@ -100,8 +99,7 @@ static void flat_send_IPI_all(int vector
 	if (vector == NMI_VECTOR) {
 		flat_send_IPI_mask(cpu_online_mask, vector);
 	} else {
-		__default_send_IPI_shortcut(APIC_DEST_ALLINC,
-					    vector, apic->dest_logical);
+		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector);
 	}
 }
 
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -16,7 +16,7 @@ static inline void __xapic_wait_icr_idle
 		cpu_relax();
 }
 
-void __default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest)
+void __default_send_IPI_shortcut(unsigned int shortcut, int vector)
 {
 	/*
 	 * Subtle. In the case of the 'never do double writes' workaround
@@ -33,9 +33,10 @@ void __default_send_IPI_shortcut(unsigne
 	__xapic_wait_icr_idle();
 
 	/*
-	 * No need to touch the target chip field
+	 * No need to touch the target chip field. Also the destination
+	 * mode is ignored when a shorthand is used.
 	 */
-	cfg = __prepare_ICR(shortcut, vector, dest);
+	cfg = __prepare_ICR(shortcut, vector, 0);
 
 	/*
 	 * Send the IPI. The write to APIC_ICR fires this off.
@@ -202,8 +203,7 @@ void default_send_IPI_allbutself(int vec
 	if (no_broadcast || vector == NMI_VECTOR) {
 		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
 	} else {
-		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector,
-					    apic->dest_logical);
+		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
 	}
 }
 
@@ -212,14 +212,13 @@ void default_send_IPI_all(int vector)
 	if (no_broadcast || vector == NMI_VECTOR) {
 		apic->send_IPI_mask(cpu_online_mask, vector);
 	} else {
-		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector,
-					    apic->dest_logical);
+		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector);
 	}
 }
 
 void default_send_IPI_self(int vector)
 {
-	__default_send_IPI_shortcut(APIC_DEST_SELF, vector, apic->dest_logical);
+	__default_send_IPI_shortcut(APIC_DEST_SELF, vector);
 }
 
 /* must come after the send_IPI functions above for inlining */
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -38,7 +38,7 @@ static inline unsigned int __prepare_ICR
 	return icr;
 }
 
-void __default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest);
+void __default_send_IPI_shortcut(unsigned int shortcut, int vector);
 
 /*
  * This is used to send an IPI with no shorthand notation (the destination is
--- a/arch/x86/kernel/apic/probe_64.c
+++ b/arch/x86/kernel/apic/probe_64.c
@@ -40,7 +40,7 @@ void __init default_setup_apic_routing(v
 
 void apic_send_IPI_self(int vector)
 {
-	__default_send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL);
+	__default_send_IPI_shortcut(APIC_DEST_SELF, vector);
 }
 
 int __init default_acpi_madt_oem_check(char *oem_id, char *oem_table_id)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 15/25] x86/apic: Add NMI_VECTOR wait to IPI shorthand
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (13 preceding siblings ...)
  2019-07-04 15:51 ` [patch V2 14/25] x86/apic: Remove dest argument from __default_send_IPI_shortcut() Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 16/25] x86/apic: Move no_ipi_broadcast() out of 32bit Thomas Gleixner
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

To support NMI shorthand broadcasts add the safe wait for ICR idle for NMI
vector delivery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/ipi.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -30,7 +30,10 @@ void __default_send_IPI_shortcut(unsigne
 	/*
 	 * Wait for idle.
 	 */
-	__xapic_wait_icr_idle();
+	if (unlikely(vector == NMI_VECTOR))
+		safe_apic_wait_icr_idle();
+	else
+		__xapic_wait_icr_idle();
 
 	/*
 	 * No need to touch the target chip field. Also the destination



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 16/25] x86/apic: Move no_ipi_broadcast() out of 32bit
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (14 preceding siblings ...)
  2019-07-04 15:52 ` [patch V2 15/25] x86/apic: Add NMI_VECTOR wait to IPI shorthand Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 17/25] x86/apic: Add static key to Control IPI shorthands Thomas Gleixner
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

For the upcoming shorthand support for all APIC incarnations the command
line option needs to be available for 64 bit as well.

While at it, rename the control variable, make it static and mark it
__ro_after_init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/ipi.c      |   29 +++++++++++++++++++++++++++--
 arch/x86/kernel/apic/local.h    |    2 --
 arch/x86/kernel/apic/probe_32.c |   25 -------------------------
 3 files changed, 27 insertions(+), 29 deletions(-)

--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -5,6 +5,31 @@
 
 #include "local.h"
 
+#ifdef CONFIG_SMP
+#ifdef CONFIG_HOTPLUG_CPU
+#define DEFAULT_SEND_IPI	(1)
+#else
+#define DEFAULT_SEND_IPI	(0)
+#endif
+
+static int apic_ipi_shorthand_off __ro_after_init = DEFAULT_SEND_IPI;
+
+static __init int apic_ipi_shorthand(char *str)
+{
+	get_option(&str, &apic_ipi_shorthand_off);
+	return 1;
+}
+__setup("no_ipi_broadcast=", apic_ipi_shorthand);
+
+static int __init print_ipi_mode(void)
+{
+	pr_info("IPI shorthand broadcast: %s\n",
+		apic_ipi_shorthand_off ? "disabled" : "enabled");
+	return 0;
+}
+late_initcall(print_ipi_mode);
+#endif
+
 static inline int __prepare_ICR2(unsigned int mask)
 {
 	return SET_APIC_DEST_FIELD(mask);
@@ -203,7 +228,7 @@ void default_send_IPI_allbutself(int vec
 	if (num_online_cpus() < 2)
 		return;
 
-	if (no_broadcast || vector == NMI_VECTOR) {
+	if (apic_ipi_shorthand_off || vector == NMI_VECTOR) {
 		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
 	} else {
 		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
@@ -212,7 +237,7 @@ void default_send_IPI_allbutself(int vec
 
 void default_send_IPI_all(int vector)
 {
-	if (no_broadcast || vector == NMI_VECTOR) {
+	if (apic_ipi_shorthand_off || vector == NMI_VECTOR) {
 		apic->send_IPI_mask(cpu_online_mask, vector);
 	} else {
 		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector);
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -51,8 +51,6 @@ void default_send_IPI_single_phys(int cp
 void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, int vector);
 void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask, int vector);
 
-extern int no_broadcast;
-
 #ifdef CONFIG_X86_32
 void default_send_IPI_mask_sequence_logical(const struct cpumask *mask, int vector);
 void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask, int vector);
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -15,31 +15,6 @@
 
 #include "local.h"
 
-#ifdef CONFIG_HOTPLUG_CPU
-#define DEFAULT_SEND_IPI	(1)
-#else
-#define DEFAULT_SEND_IPI	(0)
-#endif
-
-int no_broadcast = DEFAULT_SEND_IPI;
-
-static __init int no_ipi_broadcast(char *str)
-{
-	get_option(&str, &no_broadcast);
-	pr_info("Using %s mode\n",
-		no_broadcast ? "No IPI Broadcast" : "IPI Broadcast");
-	return 1;
-}
-__setup("no_ipi_broadcast=", no_ipi_broadcast);
-
-static int __init print_ipi_mode(void)
-{
-	pr_info("Using IPI %s mode\n",
-		no_broadcast ? "No-Shortcut" : "Shortcut");
-	return 0;
-}
-late_initcall(print_ipi_mode);
-
 static int default_x86_32_early_logical_apicid(int cpu)
 {
 	return 1 << cpu;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 17/25] x86/apic: Add static key to Control IPI shorthands
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (15 preceding siblings ...)
  2019-07-04 15:52 ` [patch V2 16/25] x86/apic: Move no_ipi_broadcast() out of 32bit Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 18/25] x86/apic: Provide and use helper for send_IPI_allbutself() Thomas Gleixner
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

The IPI shorthand functionality delivers IPI/NMI broadcasts to all CPUs in
the system. This can have similar side effects as the MCE broadcasting when
CPUs are waiting in the BIOS or are offlined.

The kernel tracks already the state of offlined CPUs whether they have been
brought up at least once so that the CR4 MCE bit is set to make sure that
MCE broadcasts can't brick the machine.

Utilize that information and compare it to the cpu_present_mask. If all
present CPUs have been brought up at least once then the broadcast side
effect is mitigated by disabling regular interrupt/IPI delivery in the APIC
itself and by the cpu_ignore_nmi check at the begin of the NMI handler.

Use a static key to switch between broadcasting via shorthands or sending
the IPI/NMI one by one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/apic.h  |    2 ++
 arch/x86/kernel/apic/ipi.c   |   24 +++++++++++++++++++++++-
 arch/x86/kernel/apic/local.h |    6 ++++++
 arch/x86/kernel/cpu/common.c |    2 ++
 4 files changed, 33 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -506,8 +506,10 @@ extern int default_check_phys_apicid_pre
 
 #ifdef CONFIG_SMP
 bool apic_id_is_primary_thread(unsigned int id);
+void apic_smt_update(void);
 #else
 static inline bool apic_id_is_primary_thread(unsigned int id) { return false; }
+static inline void apic_smt_update(void) { }
 #endif
 
 extern void irq_enter(void);
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -5,6 +5,8 @@
 
 #include "local.h"
 
+DEFINE_STATIC_KEY_FALSE(apic_use_ipi_shorthand);
+
 #ifdef CONFIG_SMP
 #ifdef CONFIG_HOTPLUG_CPU
 #define DEFAULT_SEND_IPI	(1)
@@ -28,7 +30,27 @@ static int __init print_ipi_mode(void)
 	return 0;
 }
 late_initcall(print_ipi_mode);
-#endif
+
+void apic_smt_update(void)
+{
+	/*
+	 * Do not switch to broadcast mode if:
+	 * - Disabled on the command line
+	 * - Only a single CPU is online
+	 * - Not all present CPUs have been at least booted once
+	 *
+	 * The latter is important as the local APIC might be in some
+	 * random state and a broadcast might cause havoc. That's
+	 * especially true for NMI broadcasting.
+	 */
+	if (apic_ipi_shorthand_off || num_online_cpus() == 1 ||
+	    !cpumask_equal(cpu_present_mask, &cpus_booted_once_mask)) {
+		static_branch_disable(&apic_use_ipi_shorthand);
+	} else {
+		static_branch_enable(&apic_use_ipi_shorthand);
+	}
+}
+#endif /* CONFIG_SMP */
 
 static inline int __prepare_ICR2(unsigned int mask)
 {
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -7,6 +7,9 @@
  * (c) 1998-99, 2000 Ingo Molnar <mingo@redhat.com>
  * (c) 2002,2003 Andi Kleen, SuSE Labs.
  */
+
+#include <linux/jump_label.h>
+
 #include <asm/apic.h>
 
 /* APIC flat 64 */
@@ -22,6 +25,9 @@ int x2apic_phys_pkg_id(int initial_apici
 void x2apic_send_IPI_self(int vector);
 
 /* IPI */
+
+DECLARE_STATIC_KEY_FALSE(apic_use_ipi_shorthand);
+
 static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
 					 unsigned int dest)
 {
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1888,4 +1888,6 @@ void arch_smt_update(void)
 {
 	/* Handle the speculative execution misfeatures */
 	cpu_bugs_smt_update();
+	/* Check whether IPI broadcasting can be enabled */
+	apic_smt_update();
 }



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 18/25] x86/apic: Provide and use helper for send_IPI_allbutself()
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (16 preceding siblings ...)
  2019-07-04 15:52 ` [patch V2 17/25] x86/apic: Add static key to Control IPI shorthands Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 19/25] cpumask: Implement cpumask_or_equal() Thomas Gleixner
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

To support IPI shorthands wrap invocations of apic->send_IPI_allbutself()
in a helper function, so the static key controlling the shorthand mode is
only in one place.

Fixup all callers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/apic.h |    2 ++
 arch/x86/kernel/apic/ipi.c  |   12 ++++++++++++
 arch/x86/kernel/kgdb.c      |    2 +-
 arch/x86/kernel/reboot.c    |    7 +------
 arch/x86/kernel/smp.c       |    4 ++--
 5 files changed, 18 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -177,6 +177,8 @@ extern void lapic_assign_legacy_vector(u
 extern void lapic_online(void);
 extern void lapic_offline(void);
 
+extern void apic_send_IPI_allbutself(unsigned int vector);
+
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
 #define local_apic_timer_c2_ok		1
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -50,6 +50,18 @@ void apic_smt_update(void)
 		static_branch_enable(&apic_use_ipi_shorthand);
 	}
 }
+
+void apic_send_IPI_allbutself(unsigned int vector)
+{
+	if (num_online_cpus() < 2)
+		return;
+
+	if (static_branch_likely(&apic_use_ipi_shorthand))
+		apic->send_IPI_allbutself(vector);
+	else
+		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
+}
+
 #endif /* CONFIG_SMP */
 
 static inline int __prepare_ICR2(unsigned int mask)
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -424,7 +424,7 @@ static void kgdb_disable_hw_debug(struct
  */
 void kgdb_roundup_cpus(void)
 {
-	apic->send_IPI_allbutself(VECTOR_NMI);
+	apic_send_IPI_allbutself(VECTOR_NMI);
 }
 #endif
 
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -828,11 +828,6 @@ static int crash_nmi_callback(unsigned i
 	return NMI_HANDLED;
 }
 
-static void smp_send_nmi_allbutself(void)
-{
-	apic->send_IPI_allbutself(NMI_VECTOR);
-}
-
 /*
  * Halt all other CPUs, calling the specified function on each of them
  *
@@ -861,7 +856,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb
 	 */
 	wmb();
 
-	smp_send_nmi_allbutself();
+	apic_send_IPI_allbutself(NMI_VECTOR);
 
 	/* Kick CPUs looping in NMI context. */
 	WRITE_ONCE(crash_ipi_issued, 1);
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -209,7 +209,7 @@ static void native_stop_other_cpus(int w
 		/* sync above data before sending IRQ */
 		wmb();
 
-		apic->send_IPI_allbutself(REBOOT_VECTOR);
+		apic_send_IPI_allbutself(REBOOT_VECTOR);
 
 		/*
 		 * Don't wait longer than a second if the caller
@@ -233,7 +233,7 @@ static void native_stop_other_cpus(int w
 
 		pr_emerg("Shutting down cpus with NMI\n");
 
-		apic->send_IPI_allbutself(NMI_VECTOR);
+		apic_send_IPI_allbutself(NMI_VECTOR);
 
 		/*
 		 * Don't wait longer than a 10 ms if the caller



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 19/25] cpumask: Implement cpumask_or_equal()
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (17 preceding siblings ...)
  2019-07-04 15:52 ` [patch V2 18/25] x86/apic: Provide and use helper for send_IPI_allbutself() Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 20/25] x86/smp: Move smp_function_call implementations into IPI code Thomas Gleixner
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

The IPI code of x86 needs to evaluate whether the target cpumask is equal
to the cpu_online_mask or equal except for the calling CPU.

To replace the current implementation which requires the usage of a
temporary cpumask, which might involve allocations, add a new function
which compares a cpumask to the result of two other cpumasks which are
or'ed together before comparison.

This allows to make the required decision in one go and the calling code
then can check for the calling CPU being set in the target mask with
cpumask_test_cpu().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 include/linux/bitmap.h  |   23 +++++++++++++++++++++++
 include/linux/cpumask.h |   14 ++++++++++++++
 lib/bitmap.c            |   20 ++++++++++++++++++++
 3 files changed, 57 insertions(+)

--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -120,6 +120,10 @@ extern int __bitmap_empty(const unsigned
 extern int __bitmap_full(const unsigned long *bitmap, unsigned int nbits);
 extern int __bitmap_equal(const unsigned long *bitmap1,
 			  const unsigned long *bitmap2, unsigned int nbits);
+extern bool __pure __bitmap_or_equal(const unsigned long *src1,
+				     const unsigned long *src2,
+				     const unsigned long *src3,
+				     unsigned int nbits);
 extern void __bitmap_complement(unsigned long *dst, const unsigned long *src,
 			unsigned int nbits);
 extern void __bitmap_shift_right(unsigned long *dst, const unsigned long *src,
@@ -321,6 +325,25 @@ static inline int bitmap_equal(const uns
 	return __bitmap_equal(src1, src2, nbits);
 }
 
+/**
+ * bitmap_or_equal - Check whether the or of two bitnaps is equal to a third
+ * @src1:	Pointer to bitmap 1
+ * @src2:	Pointer to bitmap 2 will be or'ed with bitmap 1
+ * @src3:	Pointer to bitmap 3. Compare to the result of *@src1 | *@src2
+ *
+ * Returns: True if (*@src1 | *@src2) == *@src3, false otherwise
+ */
+static inline bool bitmap_or_equal(const unsigned long *src1,
+				   const unsigned long *src2,
+				   const unsigned long *src3,
+				   unsigned int nbits)
+{
+	if (!small_const_nbits(nbits))
+		return __bitmap_or_equal(src1, src2, src3, nbits);
+
+	return !(((*src1 | *src2) ^ *src3) & BITMAP_LAST_WORD_MASK(nbits));
+}
+
 static inline int bitmap_intersects(const unsigned long *src1,
 			const unsigned long *src2, unsigned int nbits)
 {
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -476,6 +476,20 @@ static inline bool cpumask_equal(const s
 }
 
 /**
+ * cpumask_or_equal - *src1p | *src2p == *src3p
+ * @src1p: the first input
+ * @src2p: the second input
+ * @src3p: the third input
+ */
+static inline bool cpumask_or_equal(const struct cpumask *src1p,
+				    const struct cpumask *src2p,
+				    const struct cpumask *src3p)
+{
+	return bitmap_or_equal(cpumask_bits(src1p), cpumask_bits(src2p),
+			       cpumask_bits(src3p), nr_cpumask_bits);
+}
+
+/**
  * cpumask_intersects - (*src1p & *src2p) != 0
  * @src1p: the first input
  * @src2p: the second input
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -59,6 +59,26 @@ int __bitmap_equal(const unsigned long *
 }
 EXPORT_SYMBOL(__bitmap_equal);
 
+bool __bitmap_or_equal(const unsigned long *bitmap1,
+		       const unsigned long *bitmap2,
+		       const unsigned long *bitmap3,
+		       unsigned int bits)
+{
+	unsigned int k, lim = bits / BITS_PER_LONG;
+	unsigned long tmp;
+
+	for (k = 0; k < lim; ++k) {
+		if ((bitmap1[k] | bitmap2[k]) != bitmap3[k])
+			return false;
+	}
+
+	if (!(bits % BITS_PER_LONG))
+		return true;
+
+	tmp = (bitmap1[k] | bitmap2[k]) ^ bitmap3[k];
+	return (tmp & BITMAP_LAST_WORD_MASK(bits)) == 0;
+}
+
 void __bitmap_complement(unsigned long *dst, const unsigned long *src, unsigned int bits)
 {
 	unsigned int k, lim = BITS_TO_LONGS(bits);



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 20/25] x86/smp: Move smp_function_call implementations into IPI code
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (18 preceding siblings ...)
  2019-07-04 15:52 ` [patch V2 19/25] cpumask: Implement cpumask_or_equal() Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 21/25] x86/smp: Enhance native_send_call_func_ipi() Thomas Gleixner
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

Move it where it belongs. That allows to keep all the shorthand logic in
one place.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/smp.h |    1 +
 arch/x86/kernel/apic/ipi.c |   40 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/smp.c      |   40 ----------------------------------------
 3 files changed, 41 insertions(+), 40 deletions(-)

--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -142,6 +142,7 @@ void play_dead_common(void);
 void wbinvd_on_cpu(int cpu);
 int wbinvd_on_all_cpus(void);
 
+void native_smp_send_reschedule(int cpu);
 void native_send_call_func_ipi(const struct cpumask *mask);
 void native_send_call_func_single_ipi(int cpu);
 void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle);
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -62,6 +62,46 @@ void apic_send_IPI_allbutself(unsigned i
 		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
 }
 
+/*
+ * Send a 'reschedule' IPI to another CPU. It goes straight through and
+ * wastes no time serializing anything. Worst case is that we lose a
+ * reschedule ...
+ */
+void native_smp_send_reschedule(int cpu)
+{
+	if (unlikely(cpu_is_offline(cpu))) {
+		WARN(1, "sched: Unexpected reschedule of offline CPU#%d!\n", cpu);
+		return;
+	}
+	apic->send_IPI(cpu, RESCHEDULE_VECTOR);
+}
+
+void native_send_call_func_single_ipi(int cpu)
+{
+	apic->send_IPI(cpu, CALL_FUNCTION_SINGLE_VECTOR);
+}
+
+void native_send_call_func_ipi(const struct cpumask *mask)
+{
+	cpumask_var_t allbutself;
+
+	if (!alloc_cpumask_var(&allbutself, GFP_ATOMIC)) {
+		apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
+		return;
+	}
+
+	cpumask_copy(allbutself, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), allbutself);
+
+	if (cpumask_equal(mask, allbutself) &&
+	    cpumask_equal(cpu_online_mask, cpu_callout_mask))
+		apic->send_IPI_allbutself(CALL_FUNCTION_VECTOR);
+	else
+		apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
+
+	free_cpumask_var(allbutself);
+}
+
 #endif /* CONFIG_SMP */
 
 static inline int __prepare_ICR2(unsigned int mask)
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -115,46 +115,6 @@
 static atomic_t stopping_cpu = ATOMIC_INIT(-1);
 static bool smp_no_nmi_ipi = false;
 
-/*
- * this function sends a 'reschedule' IPI to another CPU.
- * it goes straight through and wastes no time serializing
- * anything. Worst case is that we lose a reschedule ...
- */
-static void native_smp_send_reschedule(int cpu)
-{
-	if (unlikely(cpu_is_offline(cpu))) {
-		WARN(1, "sched: Unexpected reschedule of offline CPU#%d!\n", cpu);
-		return;
-	}
-	apic->send_IPI(cpu, RESCHEDULE_VECTOR);
-}
-
-void native_send_call_func_single_ipi(int cpu)
-{
-	apic->send_IPI(cpu, CALL_FUNCTION_SINGLE_VECTOR);
-}
-
-void native_send_call_func_ipi(const struct cpumask *mask)
-{
-	cpumask_var_t allbutself;
-
-	if (!alloc_cpumask_var(&allbutself, GFP_ATOMIC)) {
-		apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
-		return;
-	}
-
-	cpumask_copy(allbutself, cpu_online_mask);
-	cpumask_clear_cpu(smp_processor_id(), allbutself);
-
-	if (cpumask_equal(mask, allbutself) &&
-	    cpumask_equal(cpu_online_mask, cpu_callout_mask))
-		apic->send_IPI_allbutself(CALL_FUNCTION_VECTOR);
-	else
-		apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
-
-	free_cpumask_var(allbutself);
-}
-
 static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
 {
 	/* We are registered on stopping cpu too, avoid spurious NMI */



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 21/25] x86/smp: Enhance native_send_call_func_ipi()
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (19 preceding siblings ...)
  2019-07-04 15:52 ` [patch V2 20/25] x86/smp: Move smp_function_call implementations into IPI code Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  2019-07-05  1:26   ` Nadav Amit
  2019-07-04 15:52 ` [patch V2 22/25] x86/apic: Remove the shorthand decision logic Thomas Gleixner
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

Nadav noticed that the cpumask allocations in native_send_call_func_ipi()
are noticeable in microbenchmarks.

Use the new cpumask_or_equal() function to simplify the decision whether
the supplied target CPU mask is either equal to cpu_online_mask or equal to
cpu_online_mask except for the CPU on which the function is invoked.

cpumask_or_equal() or's the target mask and the cpumask of the current CPU
together and compares it to cpu_online_mask.

If the result is false, use the mask based IPI function, otherwise check
whether the current CPU is set in the target mask and invoke either the
send_IPI_all() or the send_IPI_allbutselt() APIC callback.

Make the shorthand decision also depend on the static key which enables
shorthand mode. That allows to remove the extra cpumask comparison with
cpu_callout_mask.

Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/kernel/apic/ipi.c |   24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -83,23 +83,21 @@ void native_send_call_func_single_ipi(in
 
 void native_send_call_func_ipi(const struct cpumask *mask)
 {
-	cpumask_var_t allbutself;
+	if (static_branch_likely(&apic_use_ipi_shorthand)) {
+		unsigned int cpu = smp_processor_id();
 
-	if (!alloc_cpumask_var(&allbutself, GFP_ATOMIC)) {
-		apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
+		if (!cpumask_or_equal(mask, cpumask_of(cpu), cpu_online_mask))
+			goto sendmask;
+
+		if (cpumask_test_cpu(cpu, mask))
+			apic->send_IPI_all(CALL_FUNCTION_VECTOR);
+		else if (num_online_cpus() > 1)
+			apic->send_IPI_allbutself(CALL_FUNCTION_VECTOR);
 		return;
 	}
 
-	cpumask_copy(allbutself, cpu_online_mask);
-	cpumask_clear_cpu(smp_processor_id(), allbutself);
-
-	if (cpumask_equal(mask, allbutself) &&
-	    cpumask_equal(cpu_online_mask, cpu_callout_mask))
-		apic->send_IPI_allbutself(CALL_FUNCTION_VECTOR);
-	else
-		apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
-
-	free_cpumask_var(allbutself);
+sendmask:
+	apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
 }
 
 #endif /* CONFIG_SMP */



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 22/25] x86/apic: Remove the shorthand decision logic
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (20 preceding siblings ...)
  2019-07-04 15:52 ` [patch V2 21/25] x86/smp: Enhance native_send_call_func_ipi() Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 23/25] x86/apic: Share common IPI helpers Thomas Gleixner
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain
the decision logic for shorthand invocation already and invoke
send_IPI_mask() if the prereqisites are not satisfied.

Remove the now redundant decision logic in the 32bit implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Remove the decision logic now that it is already done in the callers
---
 arch/x86/kernel/apic/ipi.c |   27 +++------------------------
 1 file changed, 3 insertions(+), 24 deletions(-)

--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -8,13 +8,7 @@
 DEFINE_STATIC_KEY_FALSE(apic_use_ipi_shorthand);
 
 #ifdef CONFIG_SMP
-#ifdef CONFIG_HOTPLUG_CPU
-#define DEFAULT_SEND_IPI	(1)
-#else
-#define DEFAULT_SEND_IPI	(0)
-#endif
-
-static int apic_ipi_shorthand_off __ro_after_init = DEFAULT_SEND_IPI;
+static int apic_ipi_shorthand_off __ro_after_init;
 
 static __init int apic_ipi_shorthand(char *str)
 {
@@ -293,27 +287,12 @@ void default_send_IPI_mask_logical(const
 
 void default_send_IPI_allbutself(int vector)
 {
-	/*
-	 * if there are no other CPUs in the system then we get an APIC send
-	 * error if we try to broadcast, thus avoid sending IPIs in this case.
-	 */
-	if (num_online_cpus() < 2)
-		return;
-
-	if (apic_ipi_shorthand_off || vector == NMI_VECTOR) {
-		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
-	} else {
-		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
-	}
+	__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
 }
 
 void default_send_IPI_all(int vector)
 {
-	if (apic_ipi_shorthand_off || vector == NMI_VECTOR) {
-		apic->send_IPI_mask(cpu_online_mask, vector);
-	} else {
-		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector);
-	}
+	__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector);
 }
 
 void default_send_IPI_self(int vector)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 23/25] x86/apic: Share common IPI helpers
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (21 preceding siblings ...)
  2019-07-04 15:52 ` [patch V2 22/25] x86/apic: Remove the shorthand decision logic Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 24/25] x86/apic/flat64: Remove the IPI shorthand decision logic Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 25/25] x86/apic/x2apic: Implement IPI shorthands support Thomas Gleixner
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

The 64bit implementations need the same wrappers around
__default_send_IPI_shortcut() as 32bit.

Move them out of the 32bit section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/kernel/apic/ipi.c   |   30 +++++++++++++++---------------
 arch/x86/kernel/apic/local.h |    6 +++---
 2 files changed, 18 insertions(+), 18 deletions(-)

--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -226,6 +226,21 @@ void default_send_IPI_single(int cpu, in
 	apic->send_IPI_mask(cpumask_of(cpu), vector);
 }
 
+void default_send_IPI_allbutself(int vector)
+{
+	__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
+}
+
+void default_send_IPI_all(int vector)
+{
+	__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector);
+}
+
+void default_send_IPI_self(int vector)
+{
+	__default_send_IPI_shortcut(APIC_DEST_SELF, vector);
+}
+
 #ifdef CONFIG_X86_32
 
 void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
@@ -285,21 +300,6 @@ void default_send_IPI_mask_logical(const
 	local_irq_restore(flags);
 }
 
-void default_send_IPI_allbutself(int vector)
-{
-	__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
-}
-
-void default_send_IPI_all(int vector)
-{
-	__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector);
-}
-
-void default_send_IPI_self(int vector)
-{
-	__default_send_IPI_shortcut(APIC_DEST_SELF, vector);
-}
-
 /* must come after the send_IPI functions above for inlining */
 static int convert_apicid_to_cpu(int apic_id)
 {
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -56,12 +56,12 @@ void default_send_IPI_single(int cpu, in
 void default_send_IPI_single_phys(int cpu, int vector);
 void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, int vector);
 void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask, int vector);
+void default_send_IPI_allbutself(int vector);
+void default_send_IPI_all(int vector);
+void default_send_IPI_self(int vector);
 
 #ifdef CONFIG_X86_32
 void default_send_IPI_mask_sequence_logical(const struct cpumask *mask, int vector);
 void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask, int vector);
 void default_send_IPI_mask_logical(const struct cpumask *mask, int vector);
-void default_send_IPI_allbutself(int vector);
-void default_send_IPI_all(int vector);
-void default_send_IPI_self(int vector);
 #endif



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 24/25] x86/apic/flat64: Remove the IPI shorthand decision logic
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (22 preceding siblings ...)
  2019-07-04 15:52 ` [patch V2 23/25] x86/apic: Share common IPI helpers Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  2019-07-04 15:52 ` [patch V2 25/25] x86/apic/x2apic: Implement IPI shorthands support Thomas Gleixner
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain
the decision logic for shorthand invocation already and invoke
send_IPI_mask() if the prereqisites are not satisfied.

Remove the now redundant decision logic in the APIC code and the duplicate
helper in probe_64.c.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Remove the decision logic now that it is already done in the callers
    Drop the duplicate helper
---
 arch/x86/include/asm/apic.h         |    4 --
 arch/x86/kernel/apic/apic_flat_64.c |   49 ++++--------------------------------
 arch/x86/kernel/apic/probe_64.c     |    7 -----
 3 files changed, 6 insertions(+), 54 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -467,10 +467,6 @@ static inline unsigned default_get_apic_
 #define TRAMPOLINE_PHYS_LOW		0x467
 #define TRAMPOLINE_PHYS_HIGH		0x469
 
-#ifdef CONFIG_X86_64
-extern void apic_send_IPI_self(int vector);
-#endif
-
 extern void generic_bigsmp_probe(void);
 
 #ifdef CONFIG_X86_LOCAL_APIC
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -76,33 +76,6 @@ flat_send_IPI_mask_allbutself(const stru
 	_flat_send_IPI_mask(mask, vector);
 }
 
-static void flat_send_IPI_allbutself(int vector)
-{
-	int cpu = smp_processor_id();
-
-	if (IS_ENABLED(CONFIG_HOTPLUG_CPU) || vector == NMI_VECTOR) {
-		if (!cpumask_equal(cpu_online_mask, cpumask_of(cpu))) {
-			unsigned long mask = cpumask_bits(cpu_online_mask)[0];
-
-			if (cpu < BITS_PER_LONG)
-				clear_bit(cpu, &mask);
-
-			_flat_send_IPI_mask(mask, vector);
-		}
-	} else if (num_online_cpus() > 1) {
-		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
-	}
-}
-
-static void flat_send_IPI_all(int vector)
-{
-	if (vector == NMI_VECTOR) {
-		flat_send_IPI_mask(cpu_online_mask, vector);
-	} else {
-		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector);
-	}
-}
-
 static unsigned int flat_get_apic_id(unsigned long x)
 {
 	return (x >> 24) & 0xFF;
@@ -164,9 +137,9 @@ static struct apic apic_flat __ro_after_
 	.send_IPI			= default_send_IPI_single,
 	.send_IPI_mask			= flat_send_IPI_mask,
 	.send_IPI_mask_allbutself	= flat_send_IPI_mask_allbutself,
-	.send_IPI_allbutself		= flat_send_IPI_allbutself,
-	.send_IPI_all			= flat_send_IPI_all,
-	.send_IPI_self			= apic_send_IPI_self,
+	.send_IPI_allbutself		= default_send_IPI_allbutself,
+	.send_IPI_all			= default_send_IPI_all,
+	.send_IPI_self			= default_send_IPI_self,
 
 	.inquire_remote_apic		= default_inquire_remote_apic,
 
@@ -216,16 +189,6 @@ static void physflat_init_apic_ldr(void)
 	 */
 }
 
-static void physflat_send_IPI_allbutself(int vector)
-{
-	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
-}
-
-static void physflat_send_IPI_all(int vector)
-{
-	default_send_IPI_mask_sequence_phys(cpu_online_mask, vector);
-}
-
 static int physflat_probe(void)
 {
 	if (apic == &apic_physflat || num_possible_cpus() > 8 ||
@@ -267,9 +230,9 @@ static struct apic apic_physflat __ro_af
 	.send_IPI			= default_send_IPI_single_phys,
 	.send_IPI_mask			= default_send_IPI_mask_sequence_phys,
 	.send_IPI_mask_allbutself	= default_send_IPI_mask_allbutself_phys,
-	.send_IPI_allbutself		= physflat_send_IPI_allbutself,
-	.send_IPI_all			= physflat_send_IPI_all,
-	.send_IPI_self			= apic_send_IPI_self,
+	.send_IPI_allbutself		= default_send_IPI_allbutself,
+	.send_IPI_all			= default_send_IPI_all,
+	.send_IPI_self			= default_send_IPI_self,
 
 	.inquire_remote_apic		= default_inquire_remote_apic,
 
--- a/arch/x86/kernel/apic/probe_64.c
+++ b/arch/x86/kernel/apic/probe_64.c
@@ -36,13 +36,6 @@ void __init default_setup_apic_routing(v
 		x86_platform.apic_post_init();
 }
 
-/* Same for both flat and physical. */
-
-void apic_send_IPI_self(int vector)
-{
-	__default_send_IPI_shortcut(APIC_DEST_SELF, vector);
-}
-
 int __init default_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
 {
 	struct apic **drv;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch V2 25/25] x86/apic/x2apic: Implement IPI shorthands support
  2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
                   ` (23 preceding siblings ...)
  2019-07-04 15:52 ` [patch V2 24/25] x86/apic/flat64: Remove the IPI shorthand decision logic Thomas Gleixner
@ 2019-07-04 15:52 ` Thomas Gleixner
  24 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-04 15:52 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain
the decision logic for shorthand invocation already and invoke
send_IPI_mask() if the prereqisites are not satisfied.

Implement shorthand support for x2apic.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Remove the decision logic now that it is already done in the callers
---
 arch/x86/kernel/apic/local.h          |    1 +
 arch/x86/kernel/apic/x2apic_cluster.c |    4 ++--
 arch/x86/kernel/apic/x2apic_phys.c    |   12 ++++++++++--
 3 files changed, 13 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -23,6 +23,7 @@ unsigned int x2apic_get_apic_id(unsigned
 u32 x2apic_set_apic_id(unsigned int id);
 int x2apic_phys_pkg_id(int initial_apicid, int index_msb);
 void x2apic_send_IPI_self(int vector);
+void __x2apic_send_IPI_shorthand(int vector, u32 which);
 
 /* IPI */
 
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -82,12 +82,12 @@ x2apic_send_IPI_mask_allbutself(const st
 
 static void x2apic_send_IPI_allbutself(int vector)
 {
-	__x2apic_send_IPI_mask(cpu_online_mask, vector, APIC_DEST_ALLBUT);
+	__x2apic_send_IPI_shorthand(vector, APIC_DEST_ALLBUT);
 }
 
 static void x2apic_send_IPI_all(int vector)
 {
-	__x2apic_send_IPI_mask(cpu_online_mask, vector, APIC_DEST_ALLINC);
+	__x2apic_send_IPI_shorthand(vector, APIC_DEST_ALLINC);
 }
 
 static u32 x2apic_calc_apicid(unsigned int cpu)
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -75,12 +75,12 @@ static void
 
 static void x2apic_send_IPI_allbutself(int vector)
 {
-	__x2apic_send_IPI_mask(cpu_online_mask, vector, APIC_DEST_ALLBUT);
+	__x2apic_send_IPI_shorthand(vector, APIC_DEST_ALLBUT);
 }
 
 static void x2apic_send_IPI_all(int vector)
 {
-	__x2apic_send_IPI_mask(cpu_online_mask, vector, APIC_DEST_ALLINC);
+	__x2apic_send_IPI_shorthand(vector, APIC_DEST_ALLINC);
 }
 
 static void init_x2apic_ldr(void)
@@ -112,6 +112,14 @@ void __x2apic_send_IPI_dest(unsigned int
 	native_x2apic_icr_write(cfg, apicid);
 }
 
+void __x2apic_send_IPI_shorthand(int vector, u32 which)
+{
+	unsigned long cfg = __prepare_ICR(which, vector, 0);
+
+	x2apic_wrmsr_fence();
+	native_x2apic_icr_write(cfg, 0);
+}
+
 unsigned int x2apic_get_apic_id(unsigned long id)
 {
 	return id;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 21/25] x86/smp: Enhance native_send_call_func_ipi()
  2019-07-04 15:52 ` [patch V2 21/25] x86/smp: Enhance native_send_call_func_ipi() Thomas Gleixner
@ 2019-07-05  1:26   ` Nadav Amit
  0 siblings, 0 replies; 42+ messages in thread
From: Nadav Amit @ 2019-07-05  1:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, the arch/x86 maintainers, Ricardo Neri, Stephane Eranian,
	Feng Tang

> On Jul 4, 2019, at 8:52 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> Nadav noticed that the cpumask allocations in native_send_call_func_ipi()
> are noticeable in microbenchmarks.
> 
> Use the new cpumask_or_equal() function to simplify the decision whether
> the supplied target CPU mask is either equal to cpu_online_mask or equal to
> cpu_online_mask except for the CPU on which the function is invoked.
> 
> cpumask_or_equal() or's the target mask and the cpumask of the current CPU
> together and compares it to cpu_online_mask.
> 
> If the result is false, use the mask based IPI function, otherwise check
> whether the current CPU is set in the target mask and invoke either the
> send_IPI_all() or the send_IPI_allbutselt() APIC callback.
> 
> Make the shorthand decision also depend on the static key which enables
> shorthand mode. That allows to remove the extra cpumask comparison with
> cpu_callout_mask.
> 
> Reported-by: Nadav Amit <namit@vmware.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: New patch
> ---
> arch/x86/kernel/apic/ipi.c |   24 +++++++++++-------------
> 1 file changed, 11 insertions(+), 13 deletions(-)
> 
> --- a/arch/x86/kernel/apic/ipi.c
> +++ b/arch/x86/kernel/apic/ipi.c
> @@ -83,23 +83,21 @@ void native_send_call_func_single_ipi(in
> 
> void native_send_call_func_ipi(const struct cpumask *mask)
> {
> -	cpumask_var_t allbutself;
> +	if (static_branch_likely(&apic_use_ipi_shorthand)) {
> +		unsigned int cpu = smp_processor_id();
> 
> -	if (!alloc_cpumask_var(&allbutself, GFP_ATOMIC)) {
> -		apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
> +		if (!cpumask_or_equal(mask, cpumask_of(cpu), cpu_online_mask))
> +			goto sendmask;
> +
> +		if (cpumask_test_cpu(cpu, mask))
> +			apic->send_IPI_all(CALL_FUNCTION_VECTOR);
> +		else if (num_online_cpus() > 1)
> +			apic->send_IPI_allbutself(CALL_FUNCTION_VECTOR);
> 		return;
> 	}
> 
> -	cpumask_copy(allbutself, cpu_online_mask);
> -	cpumask_clear_cpu(smp_processor_id(), allbutself);
> -
> -	if (cpumask_equal(mask, allbutself) &&
> -	    cpumask_equal(cpu_online_mask, cpu_callout_mask))
> -		apic->send_IPI_allbutself(CALL_FUNCTION_VECTOR);
> -	else
> -		apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
> -
> -	free_cpumask_var(allbutself);
> +sendmask:
> +	apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
> }
> 
> #endif /* CONFIG_SMP */

It does look better and simpler than my solution.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-04 15:51 ` [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust Thomas Gleixner
@ 2019-07-05 15:47   ` Andrew Cooper
  2019-07-05 19:06     ` Andy Lutomirski
                       ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Andrew Cooper @ 2019-07-05 15:47 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang,
	Andy Lutomirski

On 04/07/2019 16:51, Thomas Gleixner wrote:
>   2) The loop termination logic is interesting at best.
>
>      If the machine has no TSC or cpu_khz is not known yet it tries 1
>      million times to ack stale IRR/ISR bits. What?
>
>      With TSC it uses the TSC to calculate the loop termination. It takes a
>      timestamp at entry and terminates the loop when:
>
>      	  (rdtsc() - start_timestamp) >= (cpu_hkz << 10)
>
>      That's roughly one second.
>
>      Both methods are problematic. The APIC has 256 vectors, which means
>      that in theory max. 256 IRR/ISR bits can be set. In practice this is
>      impossible as the first 32 vectors are reserved and not affected and
>      the chance that more than a few bits are set is close to zero.

[Disclaimer.  I talked to Thomas in private first, and he asked me to
post this publicly as the CVE is almost a decade old already.]

I'm afraid that this isn't quite true.

In terms of IDT vectors, the first 32 are reserved for exceptions, but
only the first 16 are reserved in the LAPIC.  Vectors 16-31 are fair
game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors).

In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
I'm disappointed to see wasn't shared with other software vendors at the
time.

Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
without an error code on the stack, which results in a corrupt pt_regs
in the exception handler, and a stack underflow on the way back out,
most likely with a fault on IRET.

These can be addressed by setting TPR to 0x10, which will inhibit
delivery of any errant IPIs in this range, but some extra sanity logic
may not go amiss.  An error code on a 64bit stack can be spotted with
`testb $8, %spl` due to %rsp being aligned before pushing the exception
frame.

Another interesting problem is an IPI which its vector 0x80.  A cunning
attacker can use this to simulate system calls from unsuspecting
positions in userspace, or for interrupting kernel context.  At the very
least the int0x80 path does an unconditional swapgs, so will try to run
with the user gs, and I expect things will explode quickly from there.

One option here is to look at ISR and complain if it is found to be set.

Another option, which I've only just remembered, is that AMD hardware
has the Interrupt Enable Register in its extended APIC space, which may
or may not be good enough to prohibit delivery of 0x80.  There isn't
enough information in the APM to be clear, but the name suggests it is
worth experimenting with.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 15:47   ` Andrew Cooper
@ 2019-07-05 19:06     ` Andy Lutomirski
  2019-07-05 20:17       ` Andrew Cooper
  2019-07-05 20:36       ` Thomas Gleixner
  2019-07-05 19:19     ` Nadav Amit
  2019-07-05 20:25     ` Thomas Gleixner
  2 siblings, 2 replies; 42+ messages in thread
From: Andy Lutomirski @ 2019-07-05 19:06 UTC (permalink / raw)
  To: Andrew Cooper, Josh Poimboeuf, Peter Zijlstra
  Cc: Thomas Gleixner, LKML, X86 ML, Nadav Amit, Ricardo Neri,
	Stephane Eranian, Feng Tang, Andy Lutomirski

On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 04/07/2019 16:51, Thomas Gleixner wrote:
> >   2) The loop termination logic is interesting at best.
> >
> >      If the machine has no TSC or cpu_khz is not known yet it tries 1
> >      million times to ack stale IRR/ISR bits. What?
> >
> >      With TSC it uses the TSC to calculate the loop termination. It takes a
> >      timestamp at entry and terminates the loop when:
> >
> >         (rdtsc() - start_timestamp) >= (cpu_hkz << 10)
> >
> >      That's roughly one second.
> >
> >      Both methods are problematic. The APIC has 256 vectors, which means
> >      that in theory max. 256 IRR/ISR bits can be set. In practice this is
> >      impossible as the first 32 vectors are reserved and not affected and
> >      the chance that more than a few bits are set is close to zero.
>
> [Disclaimer.  I talked to Thomas in private first, and he asked me to
> post this publicly as the CVE is almost a decade old already.]
>
> I'm afraid that this isn't quite true.
>
> In terms of IDT vectors, the first 32 are reserved for exceptions, but
> only the first 16 are reserved in the LAPIC.  Vectors 16-31 are fair
> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors).
>
> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
> I'm disappointed to see wasn't shared with other software vendors at the
> time.
>
> Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
> without an error code on the stack, which results in a corrupt pt_regs
> in the exception handler, and a stack underflow on the way back out,
> most likely with a fault on IRET.
>
> These can be addressed by setting TPR to 0x10, which will inhibit
> delivery of any errant IPIs in this range, but some extra sanity logic
> may not go amiss.  An error code on a 64bit stack can be spotted with
> `testb $8, %spl` due to %rsp being aligned before pushing the exception
> frame.

Several years ago, I remember having a discussion with someone (Jan
Beulich, maybe?) about how to efficiently make the entry code figure
out the error code situation automatically.  I suspect it was on IRC
and I can't find the logs.  I'm thinking that maybe we should just
make Linux's idtentry code do something like this.

If nothing else, we could make idtentry do:

testl $8, %esp   /* shorter than testb IIRC */
jz 1f  /* or jnz -- too lazy to figure it out */
pushq $-1
1:

instead of the current hardcoded push.  The cost of a mispredicted
branch here will be smallish compared to the absurdly large cost of
the entry itself.  But I thought I had something more clever than
this.  This sequence works, but it still feels like it should be
possible to do better:

.macro PUSH_ERROR_IF_NEEDED
    /*
     * Before the IRET frame is pushed, RSP is aligned to a 16-byte
     * boundary.  After SS .. RIP and the error code are pushed, RSP is
     * once again aligned.  Pushing -1 will put -1 in the error code slot
     * (regs->orig_ax) if there was no error code.
    */

    pushq    $-1                /* orig_ax = -1, maybe */
    /* now RSP points to orig_ax (aligned) or di (misaligned) */
    pushq    $0
    /* now RSP points to di (misaligned) or si (aligned) */
    orq    $8, %rsp
    /* now RSP points to di */
    addq    $8, %rsp
    /* now RSP points to orig_ax, and we're in good shape */
.endm

Is there a better sequence for this?

A silly downside here is that the ORC annotations need to know the
offset to the IRET frame.  Josh, how ugly would it be to teach the
unwinder that UNWIND_HINT_IRET_REGS actually means "hey, maybe I'm 8
bytes off -- please realign RSP when doing your calculation"?

FWIW, the entry code is rather silly here in that it actually only
uses the orig_ax slot as a temporary dumping ground for the error code
and then it replaces it with -1 later on.  I don't remember whether
anything still cares about the -1.  Once upon a time, there was some
code that assumed that -1 meant "not in a syscall" and anything else
meant "in a syscall", but, if so, I suspect we should just kill that
code regardless.


>
> Another interesting problem is an IPI which its vector 0x80.  A cunning
> attacker can use this to simulate system calls from unsuspecting
> positions in userspace, or for interrupting kernel context.  At the very
> least the int0x80 path does an unconditional swapgs, so will try to run
> with the user gs, and I expect things will explode quickly from there.

At least SMAP helps here on non-FSGSBASE systems.  With FSGSBASE, I
suppose we could harden this by adding a special check to int $0x80 to
validate GSBASE.

>
> One option here is to look at ISR and complain if it is found to be set.

Barring some real hackery, we're toast long before we get far enough to do that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 15:47   ` Andrew Cooper
  2019-07-05 19:06     ` Andy Lutomirski
@ 2019-07-05 19:19     ` Nadav Amit
  2019-07-05 20:47       ` Andrew Cooper
  2019-07-05 20:25     ` Thomas Gleixner
  2 siblings, 1 reply; 42+ messages in thread
From: Nadav Amit @ 2019-07-05 19:19 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Thomas Gleixner, LKML, x86, Ricardo Neri, Stephane Eranian,
	Feng Tang, Andy Lutomirski

> On Jul 5, 2019, at 8:47 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> 
> On 04/07/2019 16:51, Thomas Gleixner wrote:
>>  2) The loop termination logic is interesting at best.
>> 
>>     If the machine has no TSC or cpu_khz is not known yet it tries 1
>>     million times to ack stale IRR/ISR bits. What?
>> 
>>     With TSC it uses the TSC to calculate the loop termination. It takes a
>>     timestamp at entry and terminates the loop when:
>> 
>>     	  (rdtsc() - start_timestamp) >= (cpu_hkz << 10)
>> 
>>     That's roughly one second.
>> 
>>     Both methods are problematic. The APIC has 256 vectors, which means
>>     that in theory max. 256 IRR/ISR bits can be set. In practice this is
>>     impossible as the first 32 vectors are reserved and not affected and
>>     the chance that more than a few bits are set is close to zero.
> 
> [Disclaimer.  I talked to Thomas in private first, and he asked me to
> post this publicly as the CVE is almost a decade old already.]
> 
> I'm afraid that this isn't quite true.
> 
> In terms of IDT vectors, the first 32 are reserved for exceptions, but
> only the first 16 are reserved in the LAPIC.  Vectors 16-31 are fair
> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors).
> 
> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
> I'm disappointed to see wasn't shared with other software vendors at the
> time.

IIRC (and from skimming the CVE again) the basic problem in Xen was that
MSIs can be used when devices are assigned to generate IRQs with arbitrary
vectors. The mitigation was to require interrupt remapping to be enabled in
the IOMMU when IOMMU is used for DMA remapping (i.e., device assignment).

Are you concerned about this case, additional concrete ones, or is it about
security hardening? (or am I missing something?)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 19:06     ` Andy Lutomirski
@ 2019-07-05 20:17       ` Andrew Cooper
  2019-07-05 20:36       ` Thomas Gleixner
  1 sibling, 0 replies; 42+ messages in thread
From: Andrew Cooper @ 2019-07-05 20:17 UTC (permalink / raw)
  To: Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra
  Cc: Thomas Gleixner, LKML, X86 ML, Nadav Amit, Ricardo Neri,
	Stephane Eranian, Feng Tang, Andrew Cooper

On 05/07/2019 20:06, Andy Lutomirski wrote:
> On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 04/07/2019 16:51, Thomas Gleixner wrote:
>>>   2) The loop termination logic is interesting at best.
>>>
>>>      If the machine has no TSC or cpu_khz is not known yet it tries 1
>>>      million times to ack stale IRR/ISR bits. What?
>>>
>>>      With TSC it uses the TSC to calculate the loop termination. It takes a
>>>      timestamp at entry and terminates the loop when:
>>>
>>>         (rdtsc() - start_timestamp) >= (cpu_hkz << 10)
>>>
>>>      That's roughly one second.
>>>
>>>      Both methods are problematic. The APIC has 256 vectors, which means
>>>      that in theory max. 256 IRR/ISR bits can be set. In practice this is
>>>      impossible as the first 32 vectors are reserved and not affected and
>>>      the chance that more than a few bits are set is close to zero.
>> [Disclaimer.  I talked to Thomas in private first, and he asked me to
>> post this publicly as the CVE is almost a decade old already.]
>>
>> I'm afraid that this isn't quite true.
>>
>> In terms of IDT vectors, the first 32 are reserved for exceptions, but
>> only the first 16 are reserved in the LAPIC.  Vectors 16-31 are fair
>> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors).
>>
>> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
>> I'm disappointed to see wasn't shared with other software vendors at the
>> time.
>>
>> Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
>> without an error code on the stack, which results in a corrupt pt_regs
>> in the exception handler, and a stack underflow on the way back out,
>> most likely with a fault on IRET.
>>
>> These can be addressed by setting TPR to 0x10, which will inhibit
>> delivery of any errant IPIs in this range, but some extra sanity logic
>> may not go amiss.  An error code on a 64bit stack can be spotted with
>> `testb $8, %spl` due to %rsp being aligned before pushing the exception
>> frame.
> Several years ago, I remember having a discussion with someone (Jan
> Beulich, maybe?) about how to efficiently make the entry code figure
> out the error code situation automatically.  I suspect it was on IRC
> and I can't find the logs.

It was on IRC, but I don't remember exactly when, either.

> I'm thinking that maybe we should just
> make Linux's idtentry code do something like this.
>
> If nothing else, we could make idtentry do:
>
> testl $8, %esp   /* shorter than testb IIRC */

Sadly not.  test (unlike cmp and the basic mutative opcodes) doesn't
have a sign-extendable imm8 encoding.  The two options are:

f7 c4 08 00 00 00        test   $0x8,%esp
40 f6 c4 08              test   $0x8,%spl

> jz 1f  /* or jnz -- too lazy to figure it out */
> pushq $-1
> 1:

It is jz, and Xen does use this sequence for reserved/unimplemented
vectors, but we expect those codepaths never to be executed.

>
> instead of the current hardcoded push.  The cost of a mispredicted
> branch here will be smallish compared to the absurdly large cost of
> the entry itself.  But I thought I had something more clever than
> this.  This sequence works, but it still feels like it should be
> possible to do better:
>
> .macro PUSH_ERROR_IF_NEEDED
>     /*
>      * Before the IRET frame is pushed, RSP is aligned to a 16-byte
>      * boundary.  After SS .. RIP and the error code are pushed, RSP is
>      * once again aligned.  Pushing -1 will put -1 in the error code slot
>      * (regs->orig_ax) if there was no error code.
>     */
>
>     pushq    $-1                /* orig_ax = -1, maybe */
>     /* now RSP points to orig_ax (aligned) or di (misaligned) */
>     pushq    $0
>     /* now RSP points to di (misaligned) or si (aligned) */
>     orq    $8, %rsp
>     /* now RSP points to di */
>     addq    $8, %rsp
>     /* now RSP points to orig_ax, and we're in good shape */
> .endm
>
> Is there a better sequence for this?

The only aspect I can think of is whether mixing the push/pops with
explicit updates updates to %rsp is better or worse than a very well
predicted branch, given that various frontends have special tracking to
reduce instruction dependencies on %rsp.  I'll have to defer to the CPU
microachitects as to which of the two options is the lesser evil.

That said, both Intel and AMD's Optimisation guides have stack alignment
suggestions which mix push/sub/and on function prolog, so I expect this
is as optimised as it can reasonably be in the pipelines.

>> Another interesting problem is an IPI which its vector 0x80.  A cunning
>> attacker can use this to simulate system calls from unsuspecting
>> positions in userspace, or for interrupting kernel context.  At the very
>> least the int0x80 path does an unconditional swapgs, so will try to run
>> with the user gs, and I expect things will explode quickly from there.
> At least SMAP helps here on non-FSGSBASE systems.  With FSGSBASE, I
> suppose we could harden this by adding a special check to int $0x80 to
> validate GSBASE.
>
>> One option here is to look at ISR and complain if it is found to be set.
> Barring some real hackery, we're toast long before we get far enough to do that.

Even if the path moves to be like a regular idtentry?  How much more
expensive is that in reality?  Ultimately, it is that which needs to be
weighed against any extra wanted robustness.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 15:47   ` Andrew Cooper
  2019-07-05 19:06     ` Andy Lutomirski
  2019-07-05 19:19     ` Nadav Amit
@ 2019-07-05 20:25     ` Thomas Gleixner
  2019-07-05 20:37       ` Andy Lutomirski
  2019-07-05 20:49       ` Paolo Bonzini
  2 siblings, 2 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-05 20:25 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: LKML, x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang,
	Andy Lutomirski, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3508 bytes --]

Andrew,

On Fri, 5 Jul 2019, Andrew Cooper wrote:

> On 04/07/2019 16:51, Thomas Gleixner wrote:
> >   2) The loop termination logic is interesting at best.
> >
> >      If the machine has no TSC or cpu_khz is not known yet it tries 1
> >      million times to ack stale IRR/ISR bits. What?
> >
> >      With TSC it uses the TSC to calculate the loop termination. It takes a
> >      timestamp at entry and terminates the loop when:
> >
> >      	  (rdtsc() - start_timestamp) >= (cpu_hkz << 10)
> >
> >      That's roughly one second.
> >
> >      Both methods are problematic. The APIC has 256 vectors, which means
> >      that in theory max. 256 IRR/ISR bits can be set. In practice this is
> >      impossible as the first 32 vectors are reserved and not affected and
> >      the chance that more than a few bits are set is close to zero.
> 
> [Disclaimer.  I talked to Thomas in private first, and he asked me to
> post this publicly as the CVE is almost a decade old already.]

thanks for bringing this up!

> I'm afraid that this isn't quite true.
> 
> In terms of IDT vectors, the first 32 are reserved for exceptions, but
> only the first 16 are reserved in the LAPIC.  Vectors 16-31 are fair
> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors).

Indeed.

> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
> I'm disappointed to see wasn't shared with other software vendors at the
> time.

No comment.

> Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
> without an error code on the stack, which results in a corrupt pt_regs
> in the exception handler, and a stack underflow on the way back out,
> most likely with a fault on IRET.
> 
> These can be addressed by setting TPR to 0x10, which will inhibit

Right, that's easy and obvious.

> delivery of any errant IPIs in this range, but some extra sanity logic
> may not go amiss.  An error code on a 64bit stack can be spotted with
> `testb $8, %spl` due to %rsp being aligned before pushing the exception
> frame.

The question is what we do with that information :)

> Another interesting problem is an IPI which its vector 0x80.  A cunning
> attacker can use this to simulate system calls from unsuspecting
> positions in userspace, or for interrupting kernel context.  At the very
> least the int0x80 path does an unconditional swapgs, so will try to run
> with the user gs, and I expect things will explode quickly from there.

Cute.

> One option here is to look at ISR and complain if it is found to be set.

That's sloooow, but could at least provide an option to do so.

> Another option, which I've only just remembered, is that AMD hardware
> has the Interrupt Enable Register in its extended APIC space, which may
> or may not be good enough to prohibit delivery of 0x80.  There isn't
> enough information in the APM to be clear, but the name suggests it is
> worth experimenting with.

I doubt it. Clearing a bit in the IER takes the interrupt out of the
priority decision logic. That's a SVM feature so interrupts directed
directly to guests cannot block other interrupts if they are not
serviced. It's grossly misnomed and won't help with the int80 issue.

The more interesting question is whether this is all relevant. If I
understood the issue correctly then this is mitigated by proper interrupt
remapping.

Is there any serious usage of virtualization w/o interrupt remapping left
or have the machines which are not capable been retired already?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 19:06     ` Andy Lutomirski
  2019-07-05 20:17       ` Andrew Cooper
@ 2019-07-05 20:36       ` Thomas Gleixner
  2019-07-05 20:39         ` Andy Lutomirski
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-05 20:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andrew Cooper, Josh Poimboeuf, Peter Zijlstra, LKML, X86 ML,
	Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

On Fri, 5 Jul 2019, Andy Lutomirski wrote:
> On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
> > without an error code on the stack, which results in a corrupt pt_regs
> > in the exception handler, and a stack underflow on the way back out,
> > most likely with a fault on IRET.
> >
> > These can be addressed by setting TPR to 0x10, which will inhibit
> > delivery of any errant IPIs in this range, but some extra sanity logic
> > may not go amiss.  An error code on a 64bit stack can be spotted with
> > `testb $8, %spl` due to %rsp being aligned before pushing the exception
> > frame.
> 
> Several years ago, I remember having a discussion with someone (Jan
> Beulich, maybe?) about how to efficiently make the entry code figure
> out the error code situation automatically.  I suspect it was on IRC
> and I can't find the logs.  I'm thinking that maybe we should just
> make Linux's idtentry code do something like this.
> 
> If nothing else, we could make idtentry do:
> 
> testl $8, %esp   /* shorter than testb IIRC */
> jz 1f  /* or jnz -- too lazy to figure it out */
> pushq $-1
> 1:

Errm, no. We should not silently paper over it. If we detect that this came
in with a wrong stack frame, i.e. not from a CPU originated exception, then
we truly should yell loud. Also in that case you want to check the APIC:ISR
and issue an EOI to clear it.

> > Another interesting problem is an IPI which its vector 0x80.  A cunning
> > attacker can use this to simulate system calls from unsuspecting
> > positions in userspace, or for interrupting kernel context.  At the very
> > least the int0x80 path does an unconditional swapgs, so will try to run
> > with the user gs, and I expect things will explode quickly from there.
> 
> At least SMAP helps here on non-FSGSBASE systems.  With FSGSBASE, I

How does it help? It still crashes the kernel.

> suppose we could harden this by adding a special check to int $0x80 to
> validate GSBASE.

> > One option here is to look at ISR and complain if it is found to be set.
> 
> Barring some real hackery, we're toast long before we get far enough to
> do that.

No. We can map the APIC into the user space visible page tables for PTI
without compromising the PTI isolation and it can be read very early on
before SWAPGS. All you need is a register to clobber not more. It the ISR
is set, then go into an error path, yell loudly, issue EOI and return.
The only issue I can see is: It's slow :)

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 20:25     ` Thomas Gleixner
@ 2019-07-05 20:37       ` Andy Lutomirski
  2019-07-05 20:49       ` Paolo Bonzini
  1 sibling, 0 replies; 42+ messages in thread
From: Andy Lutomirski @ 2019-07-05 20:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Cooper, LKML, X86 ML, Nadav Amit, Ricardo Neri,
	Stephane Eranian, Feng Tang, Andy Lutomirski, Paolo Bonzini

On Fri, Jul 5, 2019 at 1:25 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Andrew,
>
> >
> > These can be addressed by setting TPR to 0x10, which will inhibit
>
> Right, that's easy and obvious.
>

This boots:

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 177aa8ef2afa..5257c40bde6c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1531,11 +1531,14 @@ static void setup_local_APIC(void)
 #endif

        /*
-        * Set Task Priority to 'accept all'. We never change this
-        * later on.
+        * Set Task Priority to 'accept all except vectors 0-31'.  An APIC
+        * vector in the 16-31 range can be delivered otherwise, but we'll
+        * think it's an exception and terrible things will happen.
+        * We never change this later on.
         */
        value = apic_read(APIC_TASKPRI);
        value &= ~APIC_TPRI_MASK;
+       value |= 0x10;
        apic_write(APIC_TASKPRI, value);

        apic_pending_intr_clear();

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 20:36       ` Thomas Gleixner
@ 2019-07-05 20:39         ` Andy Lutomirski
  2019-07-07  8:27           ` Thomas Gleixner
  0 siblings, 1 reply; 42+ messages in thread
From: Andy Lutomirski @ 2019-07-05 20:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, Peter Zijlstra,
	LKML, X86 ML, Nadav Amit, Ricardo Neri, Stephane Eranian,
	Feng Tang

On Fri, Jul 5, 2019 at 1:36 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, 5 Jul 2019, Andy Lutomirski wrote:
> > On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > > Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
> > > without an error code on the stack, which results in a corrupt pt_regs
> > > in the exception handler, and a stack underflow on the way back out,
> > > most likely with a fault on IRET.
> > >
> > > These can be addressed by setting TPR to 0x10, which will inhibit
> > > delivery of any errant IPIs in this range, but some extra sanity logic
> > > may not go amiss.  An error code on a 64bit stack can be spotted with
> > > `testb $8, %spl` due to %rsp being aligned before pushing the exception
> > > frame.
> >
> > Several years ago, I remember having a discussion with someone (Jan
> > Beulich, maybe?) about how to efficiently make the entry code figure
> > out the error code situation automatically.  I suspect it was on IRC
> > and I can't find the logs.  I'm thinking that maybe we should just
> > make Linux's idtentry code do something like this.
> >
> > If nothing else, we could make idtentry do:
> >
> > testl $8, %esp   /* shorter than testb IIRC */
> > jz 1f  /* or jnz -- too lazy to figure it out */
> > pushq $-1
> > 1:
>
> Errm, no. We should not silently paper over it. If we detect that this came
> in with a wrong stack frame, i.e. not from a CPU originated exception, then
> we truly should yell loud. Also in that case you want to check the APIC:ISR
> and issue an EOI to clear it.

It gives us the option to replace idtentry with something
table-driven.  I don't think I love it, but it's not an awful idea.



>
> > > Another interesting problem is an IPI which its vector 0x80.  A cunning
> > > attacker can use this to simulate system calls from unsuspecting
> > > positions in userspace, or for interrupting kernel context.  At the very
> > > least the int0x80 path does an unconditional swapgs, so will try to run
> > > with the user gs, and I expect things will explode quickly from there.
> >
> > At least SMAP helps here on non-FSGSBASE systems.  With FSGSBASE, I
>
> How does it help? It still crashes the kernel.
>
> > suppose we could harden this by adding a special check to int $0x80 to
> > validate GSBASE.
>
> > > One option here is to look at ISR and complain if it is found to be set.
> >
> > Barring some real hackery, we're toast long before we get far enough to
> > do that.
>
> No. We can map the APIC into the user space visible page tables for PTI
> without compromising the PTI isolation and it can be read very early on
> before SWAPGS. All you need is a register to clobber not more. It the ISR
> is set, then go into an error path, yell loudly, issue EOI and return.
> The only issue I can see is: It's slow :)
>
>

I think this will be really extremely slow.  If we can restrict this
to x2apic machines, then maybe it's not so awful.

FWIW, if we just patch up the GS thing, then we are still vulnerable:
the bad guy can arrange for a privileged process to have register
state corresponding to a dangerous syscall and then send an int $0x80
via the APIC.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 19:19     ` Nadav Amit
@ 2019-07-05 20:47       ` Andrew Cooper
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Cooper @ 2019-07-05 20:47 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Thomas Gleixner, LKML, x86, Ricardo Neri, Stephane Eranian,
	Feng Tang, Andy Lutomirski, Andrew Cooper

On 05/07/2019 20:19, Nadav Amit wrote:
>> On Jul 5, 2019, at 8:47 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>
>> On 04/07/2019 16:51, Thomas Gleixner wrote:
>>>  2) The loop termination logic is interesting at best.
>>>
>>>     If the machine has no TSC or cpu_khz is not known yet it tries 1
>>>     million times to ack stale IRR/ISR bits. What?
>>>
>>>     With TSC it uses the TSC to calculate the loop termination. It takes a
>>>     timestamp at entry and terminates the loop when:
>>>
>>>     	  (rdtsc() - start_timestamp) >= (cpu_hkz << 10)
>>>
>>>     That's roughly one second.
>>>
>>>     Both methods are problematic. The APIC has 256 vectors, which means
>>>     that in theory max. 256 IRR/ISR bits can be set. In practice this is
>>>     impossible as the first 32 vectors are reserved and not affected and
>>>     the chance that more than a few bits are set is close to zero.
>> [Disclaimer.  I talked to Thomas in private first, and he asked me to
>> post this publicly as the CVE is almost a decade old already.]
>>
>> I'm afraid that this isn't quite true.
>>
>> In terms of IDT vectors, the first 32 are reserved for exceptions, but
>> only the first 16 are reserved in the LAPIC.  Vectors 16-31 are fair
>> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors).
>>
>> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
>> I'm disappointed to see wasn't shared with other software vendors at the
>> time.
> IIRC (and from skimming the CVE again) the basic problem in Xen was that
> MSIs can be used when devices are assigned to generate IRQs with arbitrary
> vectors. The mitigation was to require interrupt remapping to be enabled in
> the IOMMU when IOMMU is used for DMA remapping (i.e., device assignment).
>
> Are you concerned about this case, additional concrete ones, or is it about
> security hardening? (or am I missing something?)

The phrase "impossible as the first 32 vectors are reserved" stuck out,
because its not true.  That generally means that any logic derived from
it is also false. :)

In practice, I was thinking more about robustness against buggy
conditions.  Setting TPR to 1 at start of day is very easy.  Some of the
other protections, less so.

When it comes to virtualisation, security is an illusion when a guest
kernel has a real piece of hardware in its hands.  Anyone who is under
the misapprehension otherwise should try talking to a IOMMU hardware
engineer and see the reaction on their face.  IOMMUs were designed to
isolate devices when all controlling software was of the same privilege
level.  They don't magically make the system safe against a hostile
guest device driver, which in the most basic case, can still mount a DoS
attempt with deliberately bad DMA.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 20:25     ` Thomas Gleixner
  2019-07-05 20:37       ` Andy Lutomirski
@ 2019-07-05 20:49       ` Paolo Bonzini
  2019-07-05 21:16         ` Andrew Cooper
  2019-07-07  8:37         ` Thomas Gleixner
  1 sibling, 2 replies; 42+ messages in thread
From: Paolo Bonzini @ 2019-07-05 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Cooper
  Cc: LKML, x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang,
	Andy Lutomirski, Alex Williamson

On 05/07/19 22:25, Thomas Gleixner wrote:
> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
> I'm disappointed to see wasn't shared with other software vendors at the
> time.

Oh, that brings back memories.  At the time I was working on Xen, so I
remember that CVE.  IIRC there was some mitigation but the fix was
basically to print a very scary error message if you used VT-d without
interrupt remapping.  Maybe force the user to add something on the Xen
command line too?

> The more interesting question is whether this is all relevant. If I
> understood the issue correctly then this is mitigated by proper interrupt
> remapping.

Yes, and for Linux we're good I think.  VFIO by default refuses to use
the IOMMU if interrupt remapping is absent or disabled, and KVM's own
(pre-VFIO) IOMMU support was removed a couple years ago.  I guess the
secure boot lockdown patches should outlaw VFIO's
allow_unsafe_interrupts option, but that's it.

> Is there any serious usage of virtualization w/o interrupt remapping left
> or have the machines which are not capable been retired already?

I think they were already starting to disappear in 2011, as I don't
remember much worry about customers that were using systems without it.

Paolo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 20:49       ` Paolo Bonzini
@ 2019-07-05 21:16         ` Andrew Cooper
  2019-07-07  8:37         ` Thomas Gleixner
  1 sibling, 0 replies; 42+ messages in thread
From: Andrew Cooper @ 2019-07-05 21:16 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner
  Cc: LKML, x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang,
	Andy Lutomirski, Alex Williamson, Andrew Cooper

On 05/07/2019 21:49, Paolo Bonzini wrote:
> On 05/07/19 22:25, Thomas Gleixner wrote:
>> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
>> I'm disappointed to see wasn't shared with other software vendors at the
>> time.
> Oh, that brings back memories.  At the time I was working on Xen, so I
> remember that CVE.  IIRC there was some mitigation but the fix was
> basically to print a very scary error message if you used VT-d without
> interrupt remapping.  Maybe force the user to add something on the Xen
> command line too?

It was before my time.  I have no public comment on how the other
aspects of it were handled.

>> Is there any serious usage of virtualization w/o interrupt remapping left
>> or have the machines which are not capable been retired already?
> I think they were already starting to disappear in 2011, as I don't
> remember much worry about customers that were using systems without it.

ISTR Nehalem/Westmere era systems were the first to support interrupt
remapping, but were totally crippled with errata to the point of needing
to turn a prerequisite feature (Queued Invalidation) off.  I believe
later systems have it working to a first approximation.

As to the original question, whether people should be using such systems
is a different question to whether they actually are.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 01/25] x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI
  2019-07-04 15:51 ` [patch V2 01/25] x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI Thomas Gleixner
@ 2019-07-05 21:43   ` Thomas Gleixner
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-05 21:43 UTC (permalink / raw)
  To: LKML; +Cc: x86, Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

On Thu, 4 Jul 2019, Thomas Gleixner wrote:

> apic->send_IPI_allbutself() takes a vector number as argument.
> 
> APIC_DM_NMI is clearly not a vector number. It's defined to 0x400 which is
> outside the vector space.
> 
> Use NMI_VECTOR instead as that's what it is intended to be.
> 
> Fixes: 82da3ff89dc2 ("x86: kgdb support")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: New patch
> ---
>  arch/x86/kernel/kgdb.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -424,7 +424,7 @@ static void kgdb_disable_hw_debug(struct
>   */
>  void kgdb_roundup_cpus(void)
>  {
> -	apic->send_IPI_allbutself(APIC_DM_NMI);
> +	apic->send_IPI_allbutself(VECTOR_NMI);

The changelog got it right, but this here needs to be VECTOR_NMI. While I
didn't 0-day was able to find and turn on the config option ...

/blush


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 20:39         ` Andy Lutomirski
@ 2019-07-07  8:27           ` Thomas Gleixner
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-07  8:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andrew Cooper, Josh Poimboeuf, Peter Zijlstra, LKML, X86 ML,
	Nadav Amit, Ricardo Neri, Stephane Eranian, Feng Tang

On Fri, 5 Jul 2019, Andy Lutomirski wrote:
> On Fri, Jul 5, 2019 at 1:36 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > No. We can map the APIC into the user space visible page tables for PTI
> > without compromising the PTI isolation and it can be read very early on
> > before SWAPGS. All you need is a register to clobber not more. It the ISR
> > is set, then go into an error path, yell loudly, issue EOI and return.
> > The only issue I can see is: It's slow :)
> >
> I think this will be really extremely slow.  If we can restrict this
> to x2apic machines, then maybe it's not so awful.

x2apic machines have working iommu/interrupt remapping.

> FWIW, if we just patch up the GS thing, then we are still vulnerable:
> the bad guy can arrange for a privileged process to have register
> state corresponding to a dangerous syscall and then send an int $0x80
> via the APIC.

Right, that's why you want to read the APIC:ISR to check where that thing
came from.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-05 20:49       ` Paolo Bonzini
  2019-07-05 21:16         ` Andrew Cooper
@ 2019-07-07  8:37         ` Thomas Gleixner
  2019-07-09 14:43           ` Thomas Gleixner
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-07  8:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrew Cooper, LKML, x86, Nadav Amit, Ricardo Neri,
	Stephane Eranian, Feng Tang, Andy Lutomirski, Alex Williamson

On Fri, 5 Jul 2019, Paolo Bonzini wrote:
> On 05/07/19 22:25, Thomas Gleixner wrote:
> > The more interesting question is whether this is all relevant. If I
> > understood the issue correctly then this is mitigated by proper interrupt
> > remapping.
> 
> Yes, and for Linux we're good I think.  VFIO by default refuses to use
> the IOMMU if interrupt remapping is absent or disabled, and KVM's own

Confused. If it does not use IOMMU, what does it do? Hand in the device as
is and let the guest fiddle with it unconstrained or does it actually
refuse to pass through?

> (pre-VFIO) IOMMU support was removed a couple years ago.  I guess the
> secure boot lockdown patches should outlaw VFIO's
> allow_unsafe_interrupts option, but that's it.

I'm not worried too much about command line options. The important thing is
the default behaviour. If an admin decides to do something stupid, so be
it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
  2019-07-07  8:37         ` Thomas Gleixner
@ 2019-07-09 14:43           ` Thomas Gleixner
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-07-09 14:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrew Cooper, LKML, x86, Nadav Amit, Ricardo Neri,
	Stephane Eranian, Feng Tang, Andy Lutomirski, Alex Williamson

On Sun, 7 Jul 2019, Thomas Gleixner wrote:

> On Fri, 5 Jul 2019, Paolo Bonzini wrote:
> > On 05/07/19 22:25, Thomas Gleixner wrote:
> > > The more interesting question is whether this is all relevant. If I
> > > understood the issue correctly then this is mitigated by proper interrupt
> > > remapping.
> > 
> > Yes, and for Linux we're good I think.  VFIO by default refuses to use
> > the IOMMU if interrupt remapping is absent or disabled, and KVM's own
> 
> Confused. If it does not use IOMMU, what does it do? Hand in the device as
> is and let the guest fiddle with it unconstrained or does it actually
> refuse to pass through?

Read through it and it refuses to attach unless the allow_unsafe_interrupts
option is set, but again we can't protect against wilful ignorance.

So the default prevents abuse on systems without IOMMU and irq remapping,
so there is not much to worry about AFAICT.

Setting TPR to 1 and fixing the comments/changelogs still makes sense
independently.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2019-07-09 14:43 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-04 15:51 [patch V2 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
2019-07-04 15:51 ` [patch V2 01/25] x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI Thomas Gleixner
2019-07-05 21:43   ` Thomas Gleixner
2019-07-04 15:51 ` [patch V2 02/25] x86/apic: Invoke perf_events_lapic_init() after enabling APIC Thomas Gleixner
2019-07-04 15:51 ` [patch V2 03/25] x86/apic: Soft disable APIC before initializing it Thomas Gleixner
2019-07-04 15:51 ` [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust Thomas Gleixner
2019-07-05 15:47   ` Andrew Cooper
2019-07-05 19:06     ` Andy Lutomirski
2019-07-05 20:17       ` Andrew Cooper
2019-07-05 20:36       ` Thomas Gleixner
2019-07-05 20:39         ` Andy Lutomirski
2019-07-07  8:27           ` Thomas Gleixner
2019-07-05 19:19     ` Nadav Amit
2019-07-05 20:47       ` Andrew Cooper
2019-07-05 20:25     ` Thomas Gleixner
2019-07-05 20:37       ` Andy Lutomirski
2019-07-05 20:49       ` Paolo Bonzini
2019-07-05 21:16         ` Andrew Cooper
2019-07-07  8:37         ` Thomas Gleixner
2019-07-09 14:43           ` Thomas Gleixner
2019-07-04 15:51 ` [patch V2 05/25] x86/apic: Move IPI inlines into ipi.c Thomas Gleixner
2019-07-04 15:51 ` [patch V2 06/25] x86/apic: Cleanup the include maze Thomas Gleixner
2019-07-04 15:51 ` [patch V2 07/25] x86/apic: Move ipi header into apic directory Thomas Gleixner
2019-07-04 15:51 ` [patch V2 08/25] x86/apic: Move apic_flat_64 " Thomas Gleixner
2019-07-04 15:51 ` [patch V2 09/25] x86/apic: Consolidate the apic local headers Thomas Gleixner
2019-07-04 15:51 ` [patch V2 10/25] x86/apic/uv: Make x2apic_extra_bits static Thomas Gleixner
2019-07-04 15:51 ` [patch V2 11/25] smp/hotplug: Track booted once CPUs in a cpumask Thomas Gleixner
2019-07-04 15:51 ` [patch V2 12/25] x86/cpu: Move arch_smt_update() to a neutral place Thomas Gleixner
2019-07-04 15:51 ` [patch V2 13/25] x86/hotplug: Silence APIC and NMI when CPU is dead Thomas Gleixner
2019-07-04 15:51 ` [patch V2 14/25] x86/apic: Remove dest argument from __default_send_IPI_shortcut() Thomas Gleixner
2019-07-04 15:52 ` [patch V2 15/25] x86/apic: Add NMI_VECTOR wait to IPI shorthand Thomas Gleixner
2019-07-04 15:52 ` [patch V2 16/25] x86/apic: Move no_ipi_broadcast() out of 32bit Thomas Gleixner
2019-07-04 15:52 ` [patch V2 17/25] x86/apic: Add static key to Control IPI shorthands Thomas Gleixner
2019-07-04 15:52 ` [patch V2 18/25] x86/apic: Provide and use helper for send_IPI_allbutself() Thomas Gleixner
2019-07-04 15:52 ` [patch V2 19/25] cpumask: Implement cpumask_or_equal() Thomas Gleixner
2019-07-04 15:52 ` [patch V2 20/25] x86/smp: Move smp_function_call implementations into IPI code Thomas Gleixner
2019-07-04 15:52 ` [patch V2 21/25] x86/smp: Enhance native_send_call_func_ipi() Thomas Gleixner
2019-07-05  1:26   ` Nadav Amit
2019-07-04 15:52 ` [patch V2 22/25] x86/apic: Remove the shorthand decision logic Thomas Gleixner
2019-07-04 15:52 ` [patch V2 23/25] x86/apic: Share common IPI helpers Thomas Gleixner
2019-07-04 15:52 ` [patch V2 24/25] x86/apic/flat64: Remove the IPI shorthand decision logic Thomas Gleixner
2019-07-04 15:52 ` [patch V2 25/25] x86/apic/x2apic: Implement IPI shorthands support Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).