All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Nicholas Piggin <npiggin@gmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Sasha Levin <sashal@kernel.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: [PATCH AUTOSEL 5.10 14/31] powerpc/64: irq replay remove decrementer overflow check
Date: Wed, 30 Dec 2020 08:02:56 -0500	[thread overview]
Message-ID: <20201230130314.3636961-14-sashal@kernel.org> (raw)
In-Reply-To: <20201230130314.3636961-1-sashal@kernel.org>

From: Nicholas Piggin <npiggin@gmail.com>

[ Upstream commit 59d512e4374b2d8a6ad341475dc94c4a4bdec7d3 ]

This is way to catch some cases of decrementer overflow, when the
decrementer has underflowed an odd number of times, while MSR[EE] was
disabled.

With a typical small decrementer, a timer that fires when MSR[EE] is
disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and
8.6 seconds after the timer expires. In any case, the decrementer
interrupt would be taken at 8.6 seconds and the timer would be found at
that point.

So this check is for catching extreme latency events, and it prevents
those latencies from being a further few seconds long.  It's not obvious
this is a good tradeoff. This is already a watchdog magnitude event and
that situation is not improved a significantly with this check. For
large decrementers, it's useless.

Therefore remove this check, which avoids a mftb when enabling hard
disabled interrupts (e.g., when enabling after coming from hardware
interrupt handlers). Perhaps more importantly, it also removes the
clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay
which simplifies the code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/irq.c             | 53 ++-------------------------
 arch/powerpc/kernel/time.c            |  9 ++---
 arch/powerpc/platforms/powernv/opal.c |  2 +-
 3 files changed, 8 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 7d0f7682d01df..6b1eca53e36cc 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -102,14 +102,6 @@ static inline notrace unsigned long get_irq_happened(void)
 	return happened;
 }
 
-static inline notrace int decrementer_check_overflow(void)
-{
-	u64 now = get_tb();
-	u64 *next_tb = this_cpu_ptr(&decrementers_next_tb);
- 
-	return now >= *next_tb;
-}
-
 #ifdef CONFIG_PPC_BOOK3E
 
 /* This is called whenever we are re-enabling interrupts
@@ -142,35 +134,6 @@ notrace unsigned int __check_irq_replay(void)
 	trace_hardirqs_on();
 	trace_hardirqs_off();
 
-	/*
-	 * We are always hard disabled here, but PACA_IRQ_HARD_DIS may
-	 * not be set, which means interrupts have only just been hard
-	 * disabled as part of the local_irq_restore or interrupt return
-	 * code. In that case, skip the decrementr check becaus it's
-	 * expensive to read the TB.
-	 *
-	 * HARD_DIS then gets cleared here, but it's reconciled later.
-	 * Either local_irq_disable will replay the interrupt and that
-	 * will reconcile state like other hard interrupts. Or interrupt
-	 * retur will replay the interrupt and in that case it sets
-	 * PACA_IRQ_HARD_DIS by hand (see comments in entry_64.S).
-	 */
-	if (happened & PACA_IRQ_HARD_DIS) {
-		local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
-
-		/*
-		 * We may have missed a decrementer interrupt if hard disabled.
-		 * Check the decrementer register in case we had a rollover
-		 * while hard disabled.
-		 */
-		if (!(happened & PACA_IRQ_DEC)) {
-			if (decrementer_check_overflow()) {
-				local_paca->irq_happened |= PACA_IRQ_DEC;
-				happened |= PACA_IRQ_DEC;
-			}
-		}
-	}
-
 	if (happened & PACA_IRQ_DEC) {
 		local_paca->irq_happened &= ~PACA_IRQ_DEC;
 		return 0x900;
@@ -186,6 +149,9 @@ notrace unsigned int __check_irq_replay(void)
 		return 0x280;
 	}
 
+	if (happened & PACA_IRQ_HARD_DIS)
+		local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
+
 	/* There should be nothing left ! */
 	BUG_ON(local_paca->irq_happened != 0);
 
@@ -229,18 +195,6 @@ void replay_soft_interrupts(void)
 	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
 		WARN_ON_ONCE(mfmsr() & MSR_EE);
 
-	if (happened & PACA_IRQ_HARD_DIS) {
-		/*
-		 * We may have missed a decrementer interrupt if hard disabled.
-		 * Check the decrementer register in case we had a rollover
-		 * while hard disabled.
-		 */
-		if (!(happened & PACA_IRQ_DEC)) {
-			if (decrementer_check_overflow())
-				happened |= PACA_IRQ_DEC;
-		}
-	}
-
 	/*
 	 * Force the delivery of pending soft-disabled interrupts on PS3.
 	 * Any HV call will have this side effect.
@@ -345,6 +299,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
 			WARN_ON_ONCE(!(mfmsr() & MSR_EE));
 		__hard_irq_disable();
+		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
 	} else {
 		/*
 		 * We should already be hard disabled here. We had bugs
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 74efe46f55327..7d372ff3504b2 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -552,14 +552,11 @@ void timer_interrupt(struct pt_regs *regs)
 	struct pt_regs *old_regs;
 	u64 now;
 
-	/* Some implementations of hotplug will get timer interrupts while
-	 * offline, just ignore these and we also need to set
-	 * decrementers_next_tb as MAX to make sure __check_irq_replay
-	 * don't replay timer interrupt when return, otherwise we'll trap
-	 * here infinitely :(
+	/*
+	 * Some implementations of hotplug will get timer interrupts while
+	 * offline, just ignore these.
 	 */
 	if (unlikely(!cpu_online(smp_processor_id()))) {
-		*next_tb = ~(u64)0;
 		set_dec(decrementer_max);
 		return;
 	}
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index d95954ad4c0af..c61c3b62c8c62 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -731,7 +731,7 @@ int opal_hmi_exception_early2(struct pt_regs *regs)
 	return 1;
 }
 
-/* HMI exception handler called in virtual mode during check_irq_replay. */
+/* HMI exception handler called in virtual mode when irqs are next enabled. */
 int opal_handle_hmi_exception(struct pt_regs *regs)
 {
 	/*
-- 
2.27.0


WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Sasha Levin <sashal@kernel.org>,
	linuxppc-dev@lists.ozlabs.org,
	Nicholas Piggin <npiggin@gmail.com>
Subject: [PATCH AUTOSEL 5.10 14/31] powerpc/64: irq replay remove decrementer overflow check
Date: Wed, 30 Dec 2020 08:02:56 -0500	[thread overview]
Message-ID: <20201230130314.3636961-14-sashal@kernel.org> (raw)
In-Reply-To: <20201230130314.3636961-1-sashal@kernel.org>

From: Nicholas Piggin <npiggin@gmail.com>

[ Upstream commit 59d512e4374b2d8a6ad341475dc94c4a4bdec7d3 ]

This is way to catch some cases of decrementer overflow, when the
decrementer has underflowed an odd number of times, while MSR[EE] was
disabled.

With a typical small decrementer, a timer that fires when MSR[EE] is
disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and
8.6 seconds after the timer expires. In any case, the decrementer
interrupt would be taken at 8.6 seconds and the timer would be found at
that point.

So this check is for catching extreme latency events, and it prevents
those latencies from being a further few seconds long.  It's not obvious
this is a good tradeoff. This is already a watchdog magnitude event and
that situation is not improved a significantly with this check. For
large decrementers, it's useless.

Therefore remove this check, which avoids a mftb when enabling hard
disabled interrupts (e.g., when enabling after coming from hardware
interrupt handlers). Perhaps more importantly, it also removes the
clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay
which simplifies the code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/irq.c             | 53 ++-------------------------
 arch/powerpc/kernel/time.c            |  9 ++---
 arch/powerpc/platforms/powernv/opal.c |  2 +-
 3 files changed, 8 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 7d0f7682d01df..6b1eca53e36cc 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -102,14 +102,6 @@ static inline notrace unsigned long get_irq_happened(void)
 	return happened;
 }
 
-static inline notrace int decrementer_check_overflow(void)
-{
-	u64 now = get_tb();
-	u64 *next_tb = this_cpu_ptr(&decrementers_next_tb);
- 
-	return now >= *next_tb;
-}
-
 #ifdef CONFIG_PPC_BOOK3E
 
 /* This is called whenever we are re-enabling interrupts
@@ -142,35 +134,6 @@ notrace unsigned int __check_irq_replay(void)
 	trace_hardirqs_on();
 	trace_hardirqs_off();
 
-	/*
-	 * We are always hard disabled here, but PACA_IRQ_HARD_DIS may
-	 * not be set, which means interrupts have only just been hard
-	 * disabled as part of the local_irq_restore or interrupt return
-	 * code. In that case, skip the decrementr check becaus it's
-	 * expensive to read the TB.
-	 *
-	 * HARD_DIS then gets cleared here, but it's reconciled later.
-	 * Either local_irq_disable will replay the interrupt and that
-	 * will reconcile state like other hard interrupts. Or interrupt
-	 * retur will replay the interrupt and in that case it sets
-	 * PACA_IRQ_HARD_DIS by hand (see comments in entry_64.S).
-	 */
-	if (happened & PACA_IRQ_HARD_DIS) {
-		local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
-
-		/*
-		 * We may have missed a decrementer interrupt if hard disabled.
-		 * Check the decrementer register in case we had a rollover
-		 * while hard disabled.
-		 */
-		if (!(happened & PACA_IRQ_DEC)) {
-			if (decrementer_check_overflow()) {
-				local_paca->irq_happened |= PACA_IRQ_DEC;
-				happened |= PACA_IRQ_DEC;
-			}
-		}
-	}
-
 	if (happened & PACA_IRQ_DEC) {
 		local_paca->irq_happened &= ~PACA_IRQ_DEC;
 		return 0x900;
@@ -186,6 +149,9 @@ notrace unsigned int __check_irq_replay(void)
 		return 0x280;
 	}
 
+	if (happened & PACA_IRQ_HARD_DIS)
+		local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
+
 	/* There should be nothing left ! */
 	BUG_ON(local_paca->irq_happened != 0);
 
@@ -229,18 +195,6 @@ void replay_soft_interrupts(void)
 	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
 		WARN_ON_ONCE(mfmsr() & MSR_EE);
 
-	if (happened & PACA_IRQ_HARD_DIS) {
-		/*
-		 * We may have missed a decrementer interrupt if hard disabled.
-		 * Check the decrementer register in case we had a rollover
-		 * while hard disabled.
-		 */
-		if (!(happened & PACA_IRQ_DEC)) {
-			if (decrementer_check_overflow())
-				happened |= PACA_IRQ_DEC;
-		}
-	}
-
 	/*
 	 * Force the delivery of pending soft-disabled interrupts on PS3.
 	 * Any HV call will have this side effect.
@@ -345,6 +299,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
 			WARN_ON_ONCE(!(mfmsr() & MSR_EE));
 		__hard_irq_disable();
+		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
 	} else {
 		/*
 		 * We should already be hard disabled here. We had bugs
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 74efe46f55327..7d372ff3504b2 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -552,14 +552,11 @@ void timer_interrupt(struct pt_regs *regs)
 	struct pt_regs *old_regs;
 	u64 now;
 
-	/* Some implementations of hotplug will get timer interrupts while
-	 * offline, just ignore these and we also need to set
-	 * decrementers_next_tb as MAX to make sure __check_irq_replay
-	 * don't replay timer interrupt when return, otherwise we'll trap
-	 * here infinitely :(
+	/*
+	 * Some implementations of hotplug will get timer interrupts while
+	 * offline, just ignore these.
 	 */
 	if (unlikely(!cpu_online(smp_processor_id()))) {
-		*next_tb = ~(u64)0;
 		set_dec(decrementer_max);
 		return;
 	}
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index d95954ad4c0af..c61c3b62c8c62 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -731,7 +731,7 @@ int opal_hmi_exception_early2(struct pt_regs *regs)
 	return 1;
 }
 
-/* HMI exception handler called in virtual mode during check_irq_replay. */
+/* HMI exception handler called in virtual mode when irqs are next enabled. */
 int opal_handle_hmi_exception(struct pt_regs *regs)
 {
 	/*
-- 
2.27.0


  parent reply	other threads:[~2020-12-30 13:11 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-30 13:02 [PATCH AUTOSEL 5.10 01/31] ARM: 9014/2: Replace string mem* functions for KASan Sasha Levin
2020-12-30 13:02 ` Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 02/31] rtc: sun6i: Fix memleak in sun6i_rtc_clk_init Sasha Levin
2020-12-30 13:02   ` Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 03/31] module: set MODULE_STATE_GOING state when a module fails to load Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 04/31] quota: Don't overflow quota file offsets Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 05/31] rtc: pl031: fix resource leak in pl031_probe Sasha Levin
2020-12-30 13:02   ` Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 06/31] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe() Sasha Levin
2020-12-30 13:02   ` Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 07/31] i3c master: fix missing destroy_workqueue() on error in i3c_master_register Sasha Levin
2020-12-30 13:02   ` Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 08/31] reiserfs: add check for an invalid ih_entry_count Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 09/31] NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 10/31] f2fs: Handle casefolding with Encryption Sasha Levin
2020-12-30 13:02   ` [f2fs-dev] " Sasha Levin
2020-12-30 18:01   ` Eric Biggers
2020-12-30 18:01     ` Eric Biggers
2021-01-04 14:20     ` Sasha Levin
2021-01-04 14:20       ` Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 11/31] f2fs: avoid race condition for shrinker count Sasha Levin
2020-12-30 13:02   ` Sasha Levin
2020-12-30 13:02   ` Sasha Levin
2020-12-30 13:02   ` [f2fs-dev] " Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 12/31] f2fs: fix race of pending_pages in decompression Sasha Levin
2020-12-30 13:02   ` [f2fs-dev] " Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 13/31] module: delay kobject uevent until after module init call Sasha Levin
2020-12-30 13:02 ` Sasha Levin [this message]
2020-12-30 13:02   ` [PATCH AUTOSEL 5.10 14/31] powerpc/64: irq replay remove decrementer overflow check Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 15/31] f2fs: fix shift-out-of-bounds in sanity_check_raw_super() Sasha Levin
2020-12-30 13:02   ` [f2fs-dev] " Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 16/31] fs/namespace.c: WARN if mnt_count has become negative Sasha Levin
2020-12-30 13:02 ` [PATCH AUTOSEL 5.10 17/31] watchdog: rti-wdt: fix reference leak in rti_wdt_probe Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 18/31] um: random: Register random as hwrng-core device Sasha Levin
2020-12-30 13:03   ` Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 19/31] um: ubd: Submit all data segments atomically Sasha Levin
2020-12-30 13:03   ` Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 20/31] um: allocate a guard page to helper threads Sasha Levin
2020-12-30 13:03   ` Sasha Levin
2020-12-30 14:48   ` Johannes Berg
2020-12-30 14:48     ` Johannes Berg
2021-01-04 14:21     ` Sasha Levin
2021-01-04 14:21       ` Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 21/31] NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 22/31] ceph: fix inode refcount leak when ceph_fill_inode on non-I_NEW inode fails Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 23/31] drm/amd/display: updated wm table for Renoir Sasha Levin
2020-12-30 13:03   ` Sasha Levin
2020-12-30 13:03   ` Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 24/31] tick/sched: Remove bogus boot "safety" check Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 25/31] s390: always clear kernel stack backchain before calling functions Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 26/31] io_uring: remove racy overflow list fast checks Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 27/31] ext4: check for invalid block size early when mounting a file system Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 28/31] ALSA: pcm: Clear the full allocated memory at hw_params Sasha Levin
2020-12-30 13:03   ` Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 29/31] dm verity: skip verity work if I/O error when system is shutting down Sasha Levin
2020-12-30 13:03   ` [dm-devel] " Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 30/31] ext4: avoid s_mb_prefetch to be zero in individual scenarios Sasha Levin
2020-12-30 13:03 ` [PATCH AUTOSEL 5.10 31/31] device-dax: Fix range release Sasha Levin
2020-12-30 13:03   ` Sasha Levin
2020-12-30 14:18 ` [PATCH AUTOSEL 5.10 01/31] ARM: 9014/2: Replace string mem* functions for KASan Ahmad Fatoum
2020-12-30 14:18   ` Ahmad Fatoum
2021-01-04 14:29   ` Sasha Levin
2021-01-04 14:29     ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201230130314.3636961-14-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.