All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ashok Raj <ashok.raj@intel.com>
To: linux-kernel@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>
Cc: Ashok Raj <ashok.raj@intel.com>,
	Sukumar Ghorai <sukumar.ghorai@intel.com>,
	Srikanth Nandamuri <srikanth.nandamuri@intel.com>,
	Evan Green <evgreen@chromium.org>,
	Mathias Nyman <mathias.nyman@linux.intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	x86@kernel.org, stable@vger.kernel.org
Subject: [PATCH v3] x86/hotplug: Silence APIC only after all irq's are migrated
Date: Wed, 26 Aug 2020 21:12:10 -0700	[thread overview]
Message-ID: <1598501530-45821-1-git-send-email-ashok.raj@intel.com> (raw)

There is a race when taking a CPU offline. Current code looks like this:

native_cpu_disable()
{
	...
	apic_soft_disable();
	/*
	 * Any existing set bits for pending interrupt to
	 * this CPU are preserved and will be sent via IPI
	 * to another CPU by fixup_irqs().
	 */
	cpu_disable_common();
	{
		....
		/*
		 * Race window happens here. Once local APIC has been
		 * disabled any new interrupts from the device to
		 * the old CPU are lost
		 */
		fixup_irqs(); // Too late to capture anything in IRR.
		...
	}
}

The fix is to disable the APIC *after* cpu_disable_common().

Testing was done with a USB NIC that provided a source of frequent
interrupts. A script migrated interrupts to a specific CPU and
then took that CPU offline.

Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead")
Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/
Reported-by: Evan Green <evgreen@chromium.org>
Tested-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Tested-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
v3:
    Fix commit message and comments per Thomas.
    https://lore.kernel.org/lkml/87mu2iw86q.fsf@nanos.tec.linutronix.de/

v2:
- Typos and fixes suggested by Randy Dunlap

To: linux-kernel@vger.kernel.org
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Sukumar Ghorai <sukumar.ghorai@intel.com>
Cc: Srikanth Nandamuri <srikanth.nandamuri@intel.com>
Cc: Evan Green <evgreen@chromium.org>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/smpboot.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 27aa04a95702..f5ef689dd62a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1594,14 +1594,28 @@ int native_cpu_disable(void)
 	if (ret)
 		return ret;
 
-	/*
-	 * Disable the local APIC. Otherwise IPI broadcasts will reach
-	 * it. It still responds normally to INIT, NMI, SMI, and SIPI
-	 * messages.
-	 */
-	apic_soft_disable();
 	cpu_disable_common();
 
+        /*
+         * Disable the local APIC. Otherwise IPI broadcasts will reach
+         * it. It still responds normally to INIT, NMI, SMI, and SIPI
+         * messages.
+         *
+         * Disabling the APIC must happen after cpu_disable_common()
+         * which invokes fixup_irqs().
+         *
+         * Disabling the APIC preserves already set bits in IRR, but
+         * an interrupt arriving after disabling the local APIC does not
+         * set the corresponding IRR bit.
+         *
+         * fixup_irqs() scans IRR for set bits so it can raise a not
+         * yet handled interrupt on the new destination CPU via an IPI
+         * but obviously it can't do so for IRR bits which are not set.
+         * IOW, interrupts arriving after disabling the local APIC will
+         * be lost.
+         */
+	apic_soft_disable();
+
 	return 0;
 }
 
-- 
2.7.4


             reply	other threads:[~2020-08-27  4:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-27  4:12 Ashok Raj [this message]
2020-08-27  7:32 ` [tip: x86/urgent] x86/hotplug: Silence APIC only after all interrupts are migrated tip-bot2 for Ashok Raj

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1598501530-45821-1-git-send-email-ashok.raj@intel.com \
    --to=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=evgreen@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    --cc=srikanth.nandamuri@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=sukumar.ghorai@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.