linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
@ 2022-11-22 12:16 Conor Dooley
  2022-11-23  5:49 ` Samuel Holland
  2022-12-01 11:13 ` [tip: timers/urgent] " tip-bot2 for Conor Dooley
  0 siblings, 2 replies; 4+ messages in thread
From: Conor Dooley @ 2022-11-22 12:16 UTC (permalink / raw)
  To: daniel.lezcano, tglx
  Cc: Conor Dooley, Samuel Holland, Anup Patel, Palmer Dabbelt,
	Palmer Dabbelt, aou, atishp, dmitriy, paul.walmsley,
	linux-kernel, linux-riscv

This reverts commit 232ccac1bd9b5bfe73895f527c08623e7fa0752d.

On the subject of suspend, the RISC-V SBI spec states:
> Request the SBI implementation to put the calling hart in a platform
> specific suspend (or low power) state specified by the suspend_type
> parameter. The hart will automatically come out of suspended state and
> resume normal execution when it receives an interrupt or platform
> specific hardware event.

This does not cover whether any given events actually reach the hart or
not, just what the hart will do if it receives an event. On PolarFire
SoC, and potentially other SiFive based implementations, events from the
RISC-V timer do reach a hart during suspend. This is not the case for
the implementation on the Allwinner D1 - there timer events are not
received during suspend.

To fix this, the x86 C3STOP feature was enabled for the timer driver -
but this has broken both RCU stall detection and timers generally on
PolarFire SoC (and potentially other SiFive based implementations).

If an AXI read to the PCIe controller on PolarFire SoC times out, the
system will stall, however, with this patch applied, the system just
locks up without RCU stalling:
	io scheduler mq-deadline registered
	io scheduler kyber registered
	microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
	microchip-pcie 2000000000.pcie:      MEM 0x2008000000..0x2087ffffff -> 0x0008000000
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: axi read request error
	microchip-pcie 2000000000.pcie: axi read timeout
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	Freeing initrd memory: 7332K

Similarly issues were reported with clock_nanosleep() - with a test app
that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250 & the blamed
commit in place, the sleep times are rounded up to the next jiffy:

== CPU: 1 ==      == CPU: 2 ==      == CPU: 3 ==      == CPU: 4 ==
Mean: 7.974992    Mean: 7.976534    Mean: 7.962591    Mean: 3.952179
Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
Hi: 9.472000      Hi: 10.495000     Hi: 8.864000      Hi: 4.736000
Lo: 6.087000      Lo: 6.380000      Lo: 4.872000      Lo: 3.403000
Samples: 521      Samples: 521      Samples: 521      Samples: 521

Fortunately, the D1 has a second timer, which is "currently used in
preference to the RISC-V/SBI timer driver" so a revert here does not
hurt operation of D1 in it's current form.

Ultimately, a DeviceTree property (or node) will be added to encode the
behaviour of the timers, but until then revert the addition of
CLOCK_EVT_FEAT_C3STOP.

Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98/
Link: https://lore.kernel.org/linux-riscv/bf6d3b1f-f703-4a25-833e-972a44a04114@sholland.org/
Fixes: 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
CC: Samuel Holland <samuel@sholland.org>
CC: Anup Patel <anup@brainfault.org>
CC: Palmer Dabbelt <palmer@dabbelt.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
---

For v2, I have re-worked the commit message to (hopefully) add improved
context.

CC: aou@eecs.berkeley.edu
CC: atishp@atishpatra.org
CC: daniel.lezcano@linaro.org
CC: dmitriy@oss-tech.org
CC: paul.walmsley@sifive.com
CC: tglx@linutronix.de
CC: linux-kernel@vger.kernel.org
CC: linux-riscv@lists.infradead.org
---
 drivers/clocksource/timer-riscv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index 969a552da8d2..a0d66fabf073 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -51,7 +51,7 @@ static int riscv_clock_next_event(unsigned long delta,
 static unsigned int riscv_clock_event_irq;
 static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
 	.name			= "riscv_timer_clockevent",
-	.features		= CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
+	.features		= CLOCK_EVT_FEAT_ONESHOT,
 	.rating			= 100,
 	.set_next_event		= riscv_clock_next_event,
 };
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
  2022-11-22 12:16 [PATCH v2] Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend" Conor Dooley
@ 2022-11-23  5:49 ` Samuel Holland
  2022-11-23  9:04   ` Conor Dooley
  2022-12-01 11:13 ` [tip: timers/urgent] " tip-bot2 for Conor Dooley
  1 sibling, 1 reply; 4+ messages in thread
From: Samuel Holland @ 2022-11-23  5:49 UTC (permalink / raw)
  To: Conor Dooley, daniel.lezcano, tglx
  Cc: Anup Patel, Palmer Dabbelt, Palmer Dabbelt, aou, atishp, dmitriy,
	paul.walmsley, linux-kernel, linux-riscv

On 11/22/22 06:16, Conor Dooley wrote:
> This reverts commit 232ccac1bd9b5bfe73895f527c08623e7fa0752d.
> 
> On the subject of suspend, the RISC-V SBI spec states:
>> Request the SBI implementation to put the calling hart in a platform
>> specific suspend (or low power) state specified by the suspend_type
>> parameter. The hart will automatically come out of suspended state and
>> resume normal execution when it receives an interrupt or platform
>> specific hardware event.
> 
> This does not cover whether any given events actually reach the hart or
> not, just what the hart will do if it receives an event. On PolarFire
> SoC, and potentially other SiFive based implementations, events from the
> RISC-V timer do reach a hart during suspend. This is not the case for
> the implementation on the Allwinner D1 - there timer events are not
> received during suspend.
> 
> To fix this, the x86 C3STOP feature was enabled for the timer driver -

C3STOP isn't inherently x86-specific.

> but this has broken both RCU stall detection and timers generally on
> PolarFire SoC (and potentially other SiFive based implementations).
> 
> If an AXI read to the PCIe controller on PolarFire SoC times out, the
> system will stall, however, with this patch applied, the system just
> locks up without RCU stalling:
> 	io scheduler mq-deadline registered
> 	io scheduler kyber registered
> 	microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
> 	microchip-pcie 2000000000.pcie:      MEM 0x2008000000..0x2087ffffff -> 0x0008000000
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: axi read request error
> 	microchip-pcie 2000000000.pcie: axi read timeout
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	Freeing initrd memory: 7332K
> 
> Similarly issues were reported with clock_nanosleep() - with a test app
> that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250 & the blamed
> commit in place, the sleep times are rounded up to the next jiffy:
> 
> == CPU: 1 ==      == CPU: 2 ==      == CPU: 3 ==      == CPU: 4 ==
> Mean: 7.974992    Mean: 7.976534    Mean: 7.962591    Mean: 3.952179
> Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
> Hi: 9.472000      Hi: 10.495000     Hi: 8.864000      Hi: 4.736000
> Lo: 6.087000      Lo: 6.380000      Lo: 4.872000      Lo: 3.403000
> Samples: 521      Samples: 521      Samples: 521      Samples: 521
> 
> Fortunately, the D1 has a second timer, which is "currently used in
> preference to the RISC-V/SBI timer driver" so a revert here does not
> hurt operation of D1 in it's current form.

typo: its

> Ultimately, a DeviceTree property (or node) will be added to encode the
> behaviour of the timers, but until then revert the addition of
> CLOCK_EVT_FEAT_C3STOP.
> 
> Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98/
> Link: https://lore.kernel.org/linux-riscv/bf6d3b1f-f703-4a25-833e-972a44a04114@sholland.org/
> Fixes: 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
> CC: Samuel Holland <samuel@sholland.org>
> CC: Anup Patel <anup@brainfault.org>
> CC: Palmer Dabbelt <palmer@dabbelt.com>
> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
> ---
> 
> For v2, I have re-worked the commit message to (hopefully) add improved
> context.
> 
> CC: aou@eecs.berkeley.edu
> CC: atishp@atishpatra.org
> CC: daniel.lezcano@linaro.org
> CC: dmitriy@oss-tech.org
> CC: paul.walmsley@sifive.com
> CC: tglx@linutronix.de
> CC: linux-kernel@vger.kernel.org
> CC: linux-riscv@lists.infradead.org
> ---
>  drivers/clocksource/timer-riscv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
> index 969a552da8d2..a0d66fabf073 100644
> --- a/drivers/clocksource/timer-riscv.c
> +++ b/drivers/clocksource/timer-riscv.c
> @@ -51,7 +51,7 @@ static int riscv_clock_next_event(unsigned long delta,
>  static unsigned int riscv_clock_event_irq;
>  static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
>  	.name			= "riscv_timer_clockevent",
> -	.features		= CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
> +	.features		= CLOCK_EVT_FEAT_ONESHOT,
>  	.rating			= 100,
>  	.set_next_event		= riscv_clock_next_event,
>  };

Acked-by: Samuel Holland <samuel@sholland.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
  2022-11-23  5:49 ` Samuel Holland
@ 2022-11-23  9:04   ` Conor Dooley
  0 siblings, 0 replies; 4+ messages in thread
From: Conor Dooley @ 2022-11-23  9:04 UTC (permalink / raw)
  To: Samuel Holland
  Cc: daniel.lezcano, tglx, Anup Patel, Palmer Dabbelt, Palmer Dabbelt,
	aou, atishp, dmitriy, paul.walmsley, linux-kernel, linux-riscv

Hey Samuel,

On Tue, Nov 22, 2022 at 11:49:49PM -0600, Samuel Holland wrote:
> On 11/22/22 06:16, Conor Dooley wrote:
> > This reverts commit 232ccac1bd9b5bfe73895f527c08623e7fa0752d.

> > To fix this, the x86 C3STOP feature was enabled for the timer driver -
> 
> C3STOP isn't inherently x86-specific.

I think I originally had feature with "s around it & meant this as a
tongue-in-cheek reference to the header, which describes it as an "x86
(mis)feature" or something like that. Think I decided against that but
forgot to drop the x86 bit..
Could easily do s/x86// and it'd still make sense.

> > Fortunately, the D1 has a second timer, which is "currently used in
> > preference to the RISC-V/SBI timer driver" so a revert here does not
> > hurt operation of D1 in it's current form.
> 
> typo: its

Good spot :)

> Acked-by: Samuel Holland <samuel@sholland.org>

Thanks!

Perhaps the two minor commit message bits could be fixed on application?
Otherwise, I will send a reworded one in a few days.

Conor.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip: timers/urgent] Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
  2022-11-22 12:16 [PATCH v2] Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend" Conor Dooley
  2022-11-23  5:49 ` Samuel Holland
@ 2022-12-01 11:13 ` tip-bot2 for Conor Dooley
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot2 for Conor Dooley @ 2022-12-01 11:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Conor Dooley, Thomas Gleixner, Palmer Dabbelt, Samuel Holland,
	x86, linux-kernel

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     d9f15a9de44affe733e34f93bc184945ba277e6d
Gitweb:        https://git.kernel.org/tip/d9f15a9de44affe733e34f93bc184945ba277e6d
Author:        Conor Dooley <conor.dooley@microchip.com>
AuthorDate:    Tue, 22 Nov 2022 12:16:21 
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 01 Dec 2022 12:05:29 +01:00

Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"

This reverts commit 232ccac1bd9b5bfe73895f527c08623e7fa0752d.

On the subject of suspend, the RISC-V SBI spec states:

  This does not cover whether any given events actually reach the hart or
  not, just what the hart will do if it receives an event. On PolarFire
  SoC, and potentially other SiFive based implementations, events from the
  RISC-V timer do reach a hart during suspend. This is not the case for the
  implementation on the Allwinner D1 - there timer events are not received
  during suspend.

To fix this, the CLOCK_EVT_FEAT_C3STOP (mis)feature was enabled for the
timer driver - but this has broken both RCU stall detection and timers
generally on PolarFire SoC and potentially other SiFive based
implementations.

If an AXI read to the PCIe controller on PolarFire SoC times out, the
system will stall, however, with CLOCK_EVT_FEAT_C3STOP active, the system
just locks up without RCU stalling:

	io scheduler mq-deadline registered
	io scheduler kyber registered
	microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
	microchip-pcie 2000000000.pcie:      MEM 0x2008000000..0x2087ffffff -> 0x0008000000
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: axi read request error
	microchip-pcie 2000000000.pcie: axi read timeout
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	Freeing initrd memory: 7332K

Similarly issues were reported with clock_nanosleep() - with a test app
that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250 & the blamed
commit in place, the sleep times are rounded up to the next jiffy:

== CPU: 1 ==      == CPU: 2 ==      == CPU: 3 ==      == CPU: 4 ==
Mean: 7.974992    Mean: 7.976534    Mean: 7.962591    Mean: 3.952179
Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
Hi: 9.472000      Hi: 10.495000     Hi: 8.864000      Hi: 4.736000
Lo: 6.087000      Lo: 6.380000      Lo: 4.872000      Lo: 3.403000
Samples: 521      Samples: 521      Samples: 521      Samples: 521

Fortunately, the D1 has a second timer, which is "currently used in
preference to the RISC-V/SBI timer driver" so a revert here does not
hurt operation of D1 in its current form.

Ultimately, a DeviceTree property (or node) will be added to encode the
behaviour of the timers, but until then revert the addition of
CLOCK_EVT_FEAT_C3STOP.

Fixes: 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98/
Link: https://lore.kernel.org/linux-riscv/bf6d3b1f-f703-4a25-833e-972a44a04114@sholland.org/
Link: https://lore.kernel.org/r/20221122121620.3522431-1-conor.dooley@microchip.com

---
 drivers/clocksource/timer-riscv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index 969a552..a0d66fa 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -51,7 +51,7 @@ static int riscv_clock_next_event(unsigned long delta,
 static unsigned int riscv_clock_event_irq;
 static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
 	.name			= "riscv_timer_clockevent",
-	.features		= CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
+	.features		= CLOCK_EVT_FEAT_ONESHOT,
 	.rating			= 100,
 	.set_next_event		= riscv_clock_next_event,
 };

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-12-01 11:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-22 12:16 [PATCH v2] Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend" Conor Dooley
2022-11-23  5:49 ` Samuel Holland
2022-11-23  9:04   ` Conor Dooley
2022-12-01 11:13 ` [tip: timers/urgent] " tip-bot2 for Conor Dooley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).