All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, Nadav Amit <namit@vmware.com>,
	Ricardo Neri <ricardo.neri-calderon@linux.intel.com>,
	Stephane Eranian <eranian@google.com>,
	Feng Tang <feng.tang@intel.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>
Subject: [patch V3 04/25] x86/apic: Make apic_pending_intr_clear() more robust
Date: Mon, 22 Jul 2019 20:47:09 +0200	[thread overview]
Message-ID: <20190722105219.158847694@linutronix.de> (raw)
In-Reply-To: 20190722104705.550071814@linutronix.de

In course of developing shorthand based IPI support issues with the
function which tries to clear eventually pending ISR bits in the local APIC
were observed.

  1) O-day testing triggered the WARN_ON() in apic_pending_intr_clear().

     This warning is emitted when the function fails to clear pending ISR
     bits or observes pending IRR bits which are not delivered to the CPU
     after the stale ISR bit(s) are ACK'ed.

     Unfortunately the function only emits a WARN_ON() and fails to dump
     the IRR/ISR content. That's useless for debugging.

     Feng added spot on debug printk's which revealed that the stale IRR
     bit belonged to the APIC timer interrupt vector, but adding ad hoc
     debug code does not help with sporadic failures in the field.

     Rework the loop so the full IRR/ISR contents are saved and on failure
     dumped.

  2) The loop termination logic is interesting at best.

     If the machine has no TSC or cpu_khz is not known yet it tries 1
     million times to ack stale IRR/ISR bits. What?

     With TSC it uses the TSC to calculate the loop termination. It takes a
     timestamp at entry and terminates the loop when:

     	  (rdtsc() - start_timestamp) >= (cpu_hkz << 10)

     That's roughly one second.

     Both methods are problematic. The APIC has 256 vectors, which means
     that in theory max. 256 IRR/ISR bits can be set. In practice this is
     impossible and the chance that more than a few bits are set is close
     to zero.

     With the pure loop based approach the 1 million retries are complete
     overkill.

     With TSC this can terminate too early in a guest which is running on a
     heavily loaded host even with only a couple of IRR/ISR bits set. The
     reason is that after acknowledging the highest priority ISR bit,
     pending IRRs must get serviced first before the next round of
     acknowledge can take place as the APIC (real and virtualized) does not
     honour EOI without a preceeding interrupt on the CPU. And every APIC
     read/write takes a VMEXIT if the APIC is virtualized. While trying to
     reproduce the issue 0-day reported it was observed that the guest was
     scheduled out long enough under heavy load that it terminated after 8
     iterations.

     Make the loop terminate after 512 iterations. That's plenty enough
     in any case and does not take endless time to complete.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Removed the misleading vector 0-31 info from the changelog (Andrew)
---
 arch/x86/kernel/apic/apic.c |  111 +++++++++++++++++++++++++-------------------
 1 file changed, 65 insertions(+), 46 deletions(-)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1453,54 +1453,72 @@ static void lapic_setup_esr(void)
 			oldvalue, value);
 }
 
+#define APIC_IR_REGS		APIC_ISR_NR
+#define APIC_IR_BITS		(APIC_IR_REGS * 32)
+#define APIC_IR_MAPSIZE		(APIC_IR_BITS / BITS_PER_LONG)
+
+union apic_ir {
+	unsigned long	map[APIC_IR_MAPSIZE];
+	u32		regs[APIC_IR_REGS];
+};
+
+static bool apic_check_and_ack(union apic_ir *irr, union apic_ir *isr)
+{
+	int i, bit;
+
+	/* Read the IRRs */
+	for (i = 0; i < APIC_IR_REGS; i++)
+		irr->regs[i] = apic_read(APIC_IRR + i * 0x10);
+
+	/* Read the ISRs */
+	for (i = 0; i < APIC_IR_REGS; i++)
+		isr->regs[i] = apic_read(APIC_ISR + i * 0x10);
+
+	/*
+	 * If the ISR map is not empty. ACK the APIC and run another round
+	 * to verify whether a pending IRR has been unblocked and turned
+	 * into a ISR.
+	 */
+	if (!bitmap_empty(isr->map, APIC_IR_BITS)) {
+		/*
+		 * There can be multiple ISR bits set when a high priority
+		 * interrupt preempted a lower priority one. Issue an ACK
+		 * per set bit.
+		 */
+		for_each_set_bit(bit, isr->map, APIC_IR_BITS)
+			ack_APIC_irq();
+		return true;
+	}
+
+	return !bitmap_empty(irr->map, APIC_IR_BITS);
+}
+
+/*
+ * After a crash, we no longer service the interrupts and a pending
+ * interrupt from previous kernel might still have ISR bit set.
+ *
+ * Most probably by now the CPU has serviced that pending interrupt and it
+ * might not have done the ack_APIC_irq() because it thought, interrupt
+ * came from i8259 as ExtInt. LAPIC did not get EOI so it does not clear
+ * the ISR bit and cpu thinks it has already serivced the interrupt. Hence
+ * a vector might get locked. It was noticed for timer irq (vector
+ * 0x31). Issue an extra EOI to clear ISR.
+ *
+ * If there are pending IRR bits they turn into ISR bits after a higher
+ * priority ISR bit has been acked.
+ */
 static void apic_pending_intr_clear(void)
 {
-	long long max_loops = cpu_khz ? cpu_khz : 1000000;
-	unsigned long long tsc = 0, ntsc;
-	unsigned int queued;
-	unsigned long value;
-	int i, j, acked = 0;
-
-	if (boot_cpu_has(X86_FEATURE_TSC))
-		tsc = rdtsc();
-	/*
-	 * After a crash, we no longer service the interrupts and a pending
-	 * interrupt from previous kernel might still have ISR bit set.
-	 *
-	 * Most probably by now CPU has serviced that pending interrupt and
-	 * it might not have done the ack_APIC_irq() because it thought,
-	 * interrupt came from i8259 as ExtInt. LAPIC did not get EOI so it
-	 * does not clear the ISR bit and cpu thinks it has already serivced
-	 * the interrupt. Hence a vector might get locked. It was noticed
-	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
-	 */
-	do {
-		queued = 0;
-		for (i = APIC_ISR_NR - 1; i >= 0; i--)
-			queued |= apic_read(APIC_IRR + i*0x10);
-
-		for (i = APIC_ISR_NR - 1; i >= 0; i--) {
-			value = apic_read(APIC_ISR + i*0x10);
-			for_each_set_bit(j, &value, 32) {
-				ack_APIC_irq();
-				acked++;
-			}
-		}
-		if (acked > 256) {
-			pr_err("LAPIC pending interrupts after %d EOI\n", acked);
-			break;
-		}
-		if (queued) {
-			if (boot_cpu_has(X86_FEATURE_TSC) && cpu_khz) {
-				ntsc = rdtsc();
-				max_loops = (long long)cpu_khz << 10;
-				max_loops -= ntsc - tsc;
-			} else {
-				max_loops--;
-			}
-		}
-	} while (queued && max_loops > 0);
-	WARN_ON(max_loops <= 0);
+	union apic_ir irr, isr;
+	unsigned int i;
+
+	/* 512 loops are way oversized and give the APIC a chance to obey. */
+	for (i = 0; i < 512; i++) {
+		if (!apic_check_and_ack(&irr, &isr))
+			return;
+	}
+	/* Dump the IRR/ISR content if that failed */
+	pr_warn("APIC: Stale IRR: %256pb ISR: %256pb\n", irr.map, isr.map);
 }
 
 /**
@@ -1576,6 +1594,7 @@ static void setup_local_APIC(void)
 	value |= 0x10;
 	apic_write(APIC_TASKPRI, value);
 
+	/* Clear eventually stale ISR/IRR bits */
 	apic_pending_intr_clear();
 
 	/*



  parent reply	other threads:[~2019-07-22 18:58 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-22 18:47 [patch V3 00/25] x86/apic: Support for IPI shorthands Thomas Gleixner
2019-07-22 18:47 ` [patch V3 01/25] x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI Thomas Gleixner
2019-07-25 14:20   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 02/25] x86/apic: Invoke perf_events_lapic_init() after enabling APIC Thomas Gleixner
2019-07-25 14:21   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 03/25] x86/apic: Soft disable APIC before initializing it Thomas Gleixner
2019-07-25 14:22   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` Thomas Gleixner [this message]
2019-07-25 14:22   ` [tip:x86/apic] x86/apic: Make apic_pending_intr_clear() more robust tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 05/25] x86/apic: Move IPI inlines into ipi.c Thomas Gleixner
2019-07-25 14:23   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 06/25] x86/apic: Cleanup the include maze Thomas Gleixner
2019-07-25 14:24   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 07/25] x86/apic: Move ipi header into apic directory Thomas Gleixner
2019-07-25 14:25   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 08/25] x86/apic: Move apic_flat_64 " Thomas Gleixner
2019-07-25 14:25   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 09/25] x86/apic: Consolidate the apic local headers Thomas Gleixner
2019-07-25 14:26   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 10/25] x86/apic/uv: Make x2apic_extra_bits static Thomas Gleixner
2019-07-25 14:27   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 11/25] smp/hotplug: Track booted once CPUs in a cpumask Thomas Gleixner
2019-07-25 14:10   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 12/25] x86/cpu: Move arch_smt_update() to a neutral place Thomas Gleixner
2019-07-25 14:28   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 13/25] x86/hotplug: Silence APIC and NMI when CPU is dead Thomas Gleixner
2019-07-24 15:25   ` [patch V4 " Thomas Gleixner
2019-07-25 14:29     ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 14/25] x86/apic: Remove dest argument from __default_send_IPI_shortcut() Thomas Gleixner
2019-07-25 14:29   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 15/25] x86/apic: Add NMI_VECTOR wait to IPI shorthand Thomas Gleixner
2019-07-25 14:30   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 16/25] x86/apic: Move no_ipi_broadcast() out of 32bit Thomas Gleixner
2019-07-25 14:31   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 17/25] x86/apic: Add static key to Control IPI shorthands Thomas Gleixner
2019-07-25 14:31   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 18/25] x86/apic: Provide and use helper for send_IPI_allbutself() Thomas Gleixner
2019-07-25 14:32   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 19/25] cpumask: Implement cpumask_or_equal() Thomas Gleixner
2019-07-25 14:10   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 20/25] x86/smp: Move smp_function_call implementations into IPI code Thomas Gleixner
2019-07-25 14:33   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 21/25] x86/smp: Enhance native_send_call_func_ipi() Thomas Gleixner
2019-07-25 14:34   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 22/25] x86/apic: Remove the shorthand decision logic Thomas Gleixner
2019-07-25 14:34   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 23/25] x86/apic: Share common IPI helpers Thomas Gleixner
2019-07-25 14:35   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 24/25] x86/apic/flat64: Remove the IPI shorthand decision logic Thomas Gleixner
2019-07-25 14:36   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-22 18:47 ` [patch V3 25/25] x86/apic/x2apic: Implement IPI shorthands support Thomas Gleixner
2019-07-25 14:37   ` [tip:x86/apic] " tip-bot for Thomas Gleixner
2019-07-25 13:17 ` [patch V3 00/25] x86/apic: Support for IPI shorthands Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190722105219.158847694@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=andrew.cooper3@citrix.com \
    --cc=eranian@google.com \
    --cc=feng.tang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=namit@vmware.com \
    --cc=ricardo.neri-calderon@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.