All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Korty <joe.korty@concurrent-rt.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v2] arm64: arch_timer: XGene-1: TVAL register math error breaks timer expiry calculation.
Date: Mon, 24 Oct 2022 12:54:22 -0400	[thread overview]
Message-ID: <20221024165422.GA51107@zipoli.concurrent-rt.com> (raw)

arm64: arch_timer: XGene-1: TVAL register math error breaks timer expiry calculation.

The TVAL register is 32 bit signed.  Thus only the lower 31 bits are
available to specify when an interrupt is to occur at some time in the
near future.  Attempting to specify a larger interval with TVAL results
in a negative time delta which means the timer fires immediately upon
being programmed, rather than firing at that expected future time.

The solution is for linux to declare that TVAL is a 31 bit register rather
than give its true size of 32 bits.  This prevents linux from programming
TVAL with a too-large value.  Note that, prior to 5.16, this little trick
was the standard way to handle TVAL in linux, so there is nothing new
happening here on that front.

Test procedure: for some reason, the lockup watchdog is sensitive to
this bug.  When we turn the watchdog off, then run a little 'hello world'
program in each of the CPUs, there will often be one that 1) hangs up
forever, 2) hangs up what seems like forever, but actully contines after
a few minutes.  In either case, the program cannot be freed by a ^C.
This test sequence requires CONFIG_SOFTLOCKUP_DETECTOR, and probably
requires that one of the NO_HZ Kconfig options be specified.

The sequence is, for an 8 cpu Mustang XGene-1:

   echo 0 >/proc/sys/kernel/watchdog
   for i in {0..7}; do taskset -c $i echo hi there $i; done

Note that though the hangup usually happens, it does not
always happen.

Some comments on the v1 version of this patch by Marc Zyngier:

  XGene implements CVAL (a 64bit comparator) in terms of TVAL (a countdown
  register) instead of the other way around. TVAL being a 32bit register,
  the width of the counter should equally be 32.  However, TVAL is a
  *signed* value, and keeps counting down in the negative range once the
  timer fires.

  It means that any TVAL value with bit 31 set will fire immediately,
  as it cannot be distinguished from an already expired timer. Reducing
  the timer range back to a paltry 31 bits papers over the issue.

  Another problem cannot be fixed though, which is that the timer interrupt
  *must* be handled within the negative countdown period, or the interrupt
  will be lost (TVAL will rollover to a positive value, indicative of a
  new timer deadline).

 [ v2: Expanded CC list - jak ]
 [ v2: Revamped changelog - jak ] 
 [ v2: streamlined inlined comments - jak ] 

Cc: stable@vger.kernel.org # 5.16+
Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations")
Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>
---
base-commit: v6.0
Index: b/drivers/clocksource/arm_arch_timer.c
===================================================================
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -804,6 +804,9 @@ static u64 __arch_timer_check_delta(void
 		/*
 		 * XGene-1 implements CVAL in terms of TVAL, meaning
 		 * that the maximum timer range is 32bit. Shame on them.
+		 *
+		 * Note that TVAL is signed, thus has only 31 of its
+		 * 32 bits to express magnitude.
 		 */
 		MIDR_ALL_VERSIONS(MIDR_CPU_MODEL(ARM_CPU_IMP_APM,
 						 APM_CPU_PART_POTENZA)),
@@ -811,8 +814,8 @@ static u64 __arch_timer_check_delta(void
 	};
 
 	if (is_midr_in_range_list(read_cpuid_id(), broken_cval_midrs)) {
-		pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 32bits");
-		return CLOCKSOURCE_MASK(32);
+		pr_warn_once("Broken CNTx_CVAL_EL1, using 32 bit TVAL instead.\n");
+		return CLOCKSOURCE_MASK(31);
 	}
 #endif
 	return CLOCKSOURCE_MASK(arch_counter_get_width());

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Joe Korty <joe.korty@concurrent-rt.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v2] arm64: arch_timer: XGene-1: TVAL register math error breaks timer expiry calculation.
Date: Mon, 24 Oct 2022 12:54:22 -0400	[thread overview]
Message-ID: <20221024165422.GA51107@zipoli.concurrent-rt.com> (raw)

arm64: arch_timer: XGene-1: TVAL register math error breaks timer expiry calculation.

The TVAL register is 32 bit signed.  Thus only the lower 31 bits are
available to specify when an interrupt is to occur at some time in the
near future.  Attempting to specify a larger interval with TVAL results
in a negative time delta which means the timer fires immediately upon
being programmed, rather than firing at that expected future time.

The solution is for linux to declare that TVAL is a 31 bit register rather
than give its true size of 32 bits.  This prevents linux from programming
TVAL with a too-large value.  Note that, prior to 5.16, this little trick
was the standard way to handle TVAL in linux, so there is nothing new
happening here on that front.

Test procedure: for some reason, the lockup watchdog is sensitive to
this bug.  When we turn the watchdog off, then run a little 'hello world'
program in each of the CPUs, there will often be one that 1) hangs up
forever, 2) hangs up what seems like forever, but actully contines after
a few minutes.  In either case, the program cannot be freed by a ^C.
This test sequence requires CONFIG_SOFTLOCKUP_DETECTOR, and probably
requires that one of the NO_HZ Kconfig options be specified.

The sequence is, for an 8 cpu Mustang XGene-1:

   echo 0 >/proc/sys/kernel/watchdog
   for i in {0..7}; do taskset -c $i echo hi there $i; done

Note that though the hangup usually happens, it does not
always happen.

Some comments on the v1 version of this patch by Marc Zyngier:

  XGene implements CVAL (a 64bit comparator) in terms of TVAL (a countdown
  register) instead of the other way around. TVAL being a 32bit register,
  the width of the counter should equally be 32.  However, TVAL is a
  *signed* value, and keeps counting down in the negative range once the
  timer fires.

  It means that any TVAL value with bit 31 set will fire immediately,
  as it cannot be distinguished from an already expired timer. Reducing
  the timer range back to a paltry 31 bits papers over the issue.

  Another problem cannot be fixed though, which is that the timer interrupt
  *must* be handled within the negative countdown period, or the interrupt
  will be lost (TVAL will rollover to a positive value, indicative of a
  new timer deadline).

 [ v2: Expanded CC list - jak ]
 [ v2: Revamped changelog - jak ] 
 [ v2: streamlined inlined comments - jak ] 

Cc: stable@vger.kernel.org # 5.16+
Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations")
Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>
---
base-commit: v6.0
Index: b/drivers/clocksource/arm_arch_timer.c
===================================================================
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -804,6 +804,9 @@ static u64 __arch_timer_check_delta(void
 		/*
 		 * XGene-1 implements CVAL in terms of TVAL, meaning
 		 * that the maximum timer range is 32bit. Shame on them.
+		 *
+		 * Note that TVAL is signed, thus has only 31 of its
+		 * 32 bits to express magnitude.
 		 */
 		MIDR_ALL_VERSIONS(MIDR_CPU_MODEL(ARM_CPU_IMP_APM,
 						 APM_CPU_PART_POTENZA)),
@@ -811,8 +814,8 @@ static u64 __arch_timer_check_delta(void
 	};
 
 	if (is_midr_in_range_list(read_cpuid_id(), broken_cval_midrs)) {
-		pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 32bits");
-		return CLOCKSOURCE_MASK(32);
+		pr_warn_once("Broken CNTx_CVAL_EL1, using 32 bit TVAL instead.\n");
+		return CLOCKSOURCE_MASK(31);
 	}
 #endif
 	return CLOCKSOURCE_MASK(arch_counter_get_width());

             reply	other threads:[~2022-10-24 16:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-24 16:54 Joe Korty [this message]
2022-10-24 16:54 ` [PATCH v2] arm64: arch_timer: XGene-1: TVAL register math error breaks timer expiry calculation Joe Korty
2022-10-25 12:31 ` Marc Zyngier
2022-10-25 12:31   ` Marc Zyngier
2022-11-03 15:05   ` Marc Zyngier
2022-11-03 15:05     ` Marc Zyngier
2022-12-02 12:36     ` Daniel Lezcano
2022-12-02 12:36       ` Daniel Lezcano
2022-12-02 14:04       ` Thomas Gleixner
2022-12-02 14:04         ` Thomas Gleixner
2022-11-21 15:08 ` [tip: timers/urgent] clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error tip-bot2 for Joe Korty
2022-12-09 15:47 ` [tip: timers/core] " tip-bot2 for Joe Korty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221024165422.GA51107@zipoli.concurrent-rt.com \
    --to=joe.korty@concurrent-rt.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.