All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer()
@ 2020-06-05 11:16 Rasmus Villemoes
  2020-06-05 11:48 ` Stefan Roese
  2021-04-09 14:12 ` [PATCH] powerpc, wdt: disable ratelimiting when disabling interrupts Rasmus Villemoes
  0 siblings, 2 replies; 13+ messages in thread
From: Rasmus Villemoes @ 2020-06-05 11:16 UTC (permalink / raw)
  To: u-boot

On powerpc, get_timer() is implemented using a volatile variable that
gets incremented from the decrementer interrupt handler. Hence, when
interrupts are disabled, time as measured by get_timer() ceases to
pass.

Interrupts are necessarily disabled during bootm (see the comment in
bootm_disable_interrupts() for why). But after interrupts are
disabled, there's still lots of work to do - e.g. decompressing the
kernel image to the right load address. Unfortunately, the
rate-limiting logic in wdt_uclass.c's watchdog_reset function means
that WATCHDOG_RESET() becomes a no-op, since get_timer() never
progresses past the next_reset.

This, combined with an external gpio-triggered watchdog that must be
petted at least every 800ms, means our board gets reset before booting
into linux.

Now, at least on powerpc, get_ticks() continues to return sensible
results whether or not interrupts are enabled. So this fixes the above
problem for our board - but I don't know if get_ticks() can be assumed
to always work.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
This is what I had in mind. I also considered making it a config knob
(possibly just auto-selected based on $ARCH) whether to use
get_timer() or get_ticks(), but that becomes quite ugly.

 drivers/watchdog/wdt-uclass.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
index 4cdb7bd64c..7be4e9b5bc 100644
--- a/drivers/watchdog/wdt-uclass.c
+++ b/drivers/watchdog/wdt-uclass.c
@@ -16,14 +16,15 @@ DECLARE_GLOBAL_DATA_PTR;
 
 #define WATCHDOG_TIMEOUT_SECS	(CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000)
 
-/*
- * Reset every 1000ms, or however often is required as indicated by a
- * hw_margin_ms property.
- */
-static ulong reset_period = 1000;
+static u64 ratelimit_ticks;
 
 int initr_watchdog(void)
 {
+	/*
+	 * Reset every 1000ms, or however often is required as indicated by a
+	 * hw_margin_ms property.
+	 */
+	u32 reset_period = 1000;
 	u32 timeout = WATCHDOG_TIMEOUT_SECS;
 
 	/*
@@ -48,6 +49,7 @@ int initr_watchdog(void)
 						    4 * reset_period) / 4;
 	}
 
+	ratelimit_ticks = usec_to_tick(reset_period * 1000);
 	wdt_start(gd->watchdog_dev, timeout * 1000, 0);
 	gd->flags |= GD_FLG_WDT_READY;
 	printf("WDT:   Started with%s servicing (%ds timeout)\n",
@@ -117,17 +119,17 @@ int wdt_expire_now(struct udevice *dev, ulong flags)
  */
 void watchdog_reset(void)
 {
-	static ulong next_reset;
-	ulong now;
+	static u64 last_reset;
+	u64 now;
 
 	/* Exit if GD is not ready or watchdog is not initialized yet */
 	if (!gd || !(gd->flags & GD_FLG_WDT_READY))
 		return;
 
 	/* Do not reset the watchdog too often */
-	now = get_timer(0);
-	if (time_after(now, next_reset)) {
-		next_reset = now + reset_period;
+	now = get_ticks();
+	if (now - last_reset >= ratelimit_ticks) {
+		last_reset = now;
 		wdt_reset(gd->watchdog_dev);
 	}
 }
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer()
  2020-06-05 11:16 [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer() Rasmus Villemoes
@ 2020-06-05 11:48 ` Stefan Roese
  2020-06-05 12:11   ` Rasmus Villemoes
  2021-04-09 14:12 ` [PATCH] powerpc, wdt: disable ratelimiting when disabling interrupts Rasmus Villemoes
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Roese @ 2020-06-05 11:48 UTC (permalink / raw)
  To: u-boot

On 05.06.20 13:16, Rasmus Villemoes wrote:
> On powerpc, get_timer() is implemented using a volatile variable that
> gets incremented from the decrementer interrupt handler. Hence, when
> interrupts are disabled, time as measured by get_timer() ceases to
> pass.
> 
> Interrupts are necessarily disabled during bootm (see the comment in
> bootm_disable_interrupts() for why). But after interrupts are
> disabled, there's still lots of work to do - e.g. decompressing the
> kernel image to the right load address. Unfortunately, the
> rate-limiting logic in wdt_uclass.c's watchdog_reset function means
> that WATCHDOG_RESET() becomes a no-op, since get_timer() never
> progresses past the next_reset.
> 
> This, combined with an external gpio-triggered watchdog that must be
> petted at least every 800ms, means our board gets reset before booting
> into linux.
> 
> Now, at least on powerpc, get_ticks() continues to return sensible
> results whether or not interrupts are enabled. So this fixes the above
> problem for our board - but I don't know if get_ticks() can be assumed
> to always work.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
> This is what I had in mind. I also considered making it a config knob
> (possibly just auto-selected based on $ARCH) whether to use
> get_timer() or get_ticks(), but that becomes quite ugly.

I hesitate a bit, moving with this generic code from get_timer()
to get_ticks() for all boards. Did you test this patch on other
platforms, like some ARM boards?

Please note that I don't reject it - just asking.

Thanks,
Stefan

>   drivers/watchdog/wdt-uclass.c | 22 ++++++++++++----------
>   1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
> index 4cdb7bd64c..7be4e9b5bc 100644
> --- a/drivers/watchdog/wdt-uclass.c
> +++ b/drivers/watchdog/wdt-uclass.c
> @@ -16,14 +16,15 @@ DECLARE_GLOBAL_DATA_PTR;
>   
>   #define WATCHDOG_TIMEOUT_SECS	(CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000)
>   
> -/*
> - * Reset every 1000ms, or however often is required as indicated by a
> - * hw_margin_ms property.
> - */
> -static ulong reset_period = 1000;
> +static u64 ratelimit_ticks;
>   
>   int initr_watchdog(void)
>   {
> +	/*
> +	 * Reset every 1000ms, or however often is required as indicated by a
> +	 * hw_margin_ms property.
> +	 */
> +	u32 reset_period = 1000;
>   	u32 timeout = WATCHDOG_TIMEOUT_SECS;
>   
>   	/*
> @@ -48,6 +49,7 @@ int initr_watchdog(void)
>   						    4 * reset_period) / 4;
>   	}
>   
> +	ratelimit_ticks = usec_to_tick(reset_period * 1000);
>   	wdt_start(gd->watchdog_dev, timeout * 1000, 0);
>   	gd->flags |= GD_FLG_WDT_READY;
>   	printf("WDT:   Started with%s servicing (%ds timeout)\n",
> @@ -117,17 +119,17 @@ int wdt_expire_now(struct udevice *dev, ulong flags)
>    */
>   void watchdog_reset(void)
>   {
> -	static ulong next_reset;
> -	ulong now;
> +	static u64 last_reset;
> +	u64 now;
>   
>   	/* Exit if GD is not ready or watchdog is not initialized yet */
>   	if (!gd || !(gd->flags & GD_FLG_WDT_READY))
>   		return;
>   
>   	/* Do not reset the watchdog too often */
> -	now = get_timer(0);
> -	if (time_after(now, next_reset)) {
> -		next_reset = now + reset_period;
> +	now = get_ticks();
> +	if (now - last_reset >= ratelimit_ticks) {
> +		last_reset = now;
>   		wdt_reset(gd->watchdog_dev);
>   	}
>   }
> 


Viele Gr??e,
Stefan

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-51 Fax: (+49)-8142-66989-80 Email: sr at denx.de

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer()
  2020-06-05 11:48 ` Stefan Roese
@ 2020-06-05 12:11   ` Rasmus Villemoes
  2020-06-05 12:13     ` Stefan Roese
  0 siblings, 1 reply; 13+ messages in thread
From: Rasmus Villemoes @ 2020-06-05 12:11 UTC (permalink / raw)
  To: u-boot

On 05/06/2020 13.48, Stefan Roese wrote:
> On 05.06.20 13:16, Rasmus Villemoes wrote:
>> This is what I had in mind. I also considered making it a config knob
>> (possibly just auto-selected based on $ARCH) whether to use
>> get_timer() or get_ticks(), but that becomes quite ugly.
> 
> I hesitate a bit, moving with this generic code from get_timer()
> to get_ticks() for all boards. Did you test this patch on other
> platforms, like some ARM boards?
> 
> Please note that I don't reject it - just asking.

Yeah, I'm not really too happy about it myself, exactly because it
affects all arches/platforms. And no, I don't have other hardware handy
unfortunately. So it's very much an RFC where I hope someone with
knowledge of the various arches can say whether one can expect
get_ticks() to be at least as widely available as get_timer().

Rasmus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer()
  2020-06-05 12:11   ` Rasmus Villemoes
@ 2020-06-05 12:13     ` Stefan Roese
  2020-06-05 13:37       ` Stefan Roese
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Roese @ 2020-06-05 12:13 UTC (permalink / raw)
  To: u-boot

On 05.06.20 14:11, Rasmus Villemoes wrote:
> On 05/06/2020 13.48, Stefan Roese wrote:
>> On 05.06.20 13:16, Rasmus Villemoes wrote:
>>> This is what I had in mind. I also considered making it a config knob
>>> (possibly just auto-selected based on $ARCH) whether to use
>>> get_timer() or get_ticks(), but that becomes quite ugly.
>>
>> I hesitate a bit, moving with this generic code from get_timer()
>> to get_ticks() for all boards. Did you test this patch on other
>> platforms, like some ARM boards?
>>
>> Please note that I don't reject it - just asking.
> 
> Yeah, I'm not really too happy about it myself, exactly because it
> affects all arches/platforms. And no, I don't have other hardware handy
> unfortunately. So it's very much an RFC where I hope someone with
> knowledge of the various arches can say whether one can expect
> get_ticks() to be at least as widely available as get_timer().

I'll try to test it on a non-powerpc platform soon'ish.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer()
  2020-06-05 12:13     ` Stefan Roese
@ 2020-06-05 13:37       ` Stefan Roese
  2020-06-05 14:18         ` Rasmus Villemoes
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Roese @ 2020-06-05 13:37 UTC (permalink / raw)
  To: u-boot

On 05.06.20 14:13, Stefan Roese wrote:
> On 05.06.20 14:11, Rasmus Villemoes wrote:
>> On 05/06/2020 13.48, Stefan Roese wrote:
>>> On 05.06.20 13:16, Rasmus Villemoes wrote:
>>>> This is what I had in mind. I also considered making it a config knob
>>>> (possibly just auto-selected based on $ARCH) whether to use
>>>> get_timer() or get_ticks(), but that becomes quite ugly.
>>>
>>> I hesitate a bit, moving with this generic code from get_timer()
>>> to get_ticks() for all boards. Did you test this patch on other
>>> platforms, like some ARM boards?
>>>
>>> Please note that I don't reject it - just asking.
>>
>> Yeah, I'm not really too happy about it myself, exactly because it
>> affects all arches/platforms. And no, I don't have other hardware handy
>> unfortunately. So it's very much an RFC where I hope someone with
>> knowledge of the various arches can say whether one can expect
>> get_ticks() to be at least as widely available as get_timer().
> 
> I'll try to test it on a non-powerpc platform soon'ish.

I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688)
and it works there without any issues AFAICT.

Two more remarks:

Could you please run a build test with this patch applied for all
boards (Travis, Azure...)?

And do you see an issue, that the timer-wrap fixed by Chris with this
patch [1] will not re-occur with your "ticks" approach?

Thanks,
Stefan

[1]
Author: Chris Packham <judge.packham@gmail.com>  2020-02-24 01:20:33
Committer: Stefan Roese <sr@denx.de>  2020-03-16 11:25:12
Parent: db41d65a97f335167e1fbc0400a83333b5157703 (common: Move hang() to 
the same header as panic())
Child:  b4d9452c442769e6dc25649ac02db2d5ed5ea0c8 (watchdog: move 
initr_watchdog() to wdt-uclass.c)
Branches: master, watchdog-ratelimit-2020-06-05-rfc and many more (84)
Follows: v2020.04-rc3
Precedes: v2020.04-rc4

     watchdog: Handle timer wrap around

     On some platforms/architectures the value from get_timer() can wrap.
     This is particularly problematic when long-running code needs to 
measure
     a time difference as is the case with watchdog_reset() which tries to
     avoid tickling the watchdog too frequently.

     Use time_after() from time.h instead of a plain > comparison to avoid
     any issues with the time wrapping on a system that has been sitting in
     u-boot for a long time.

     Signed-off-by: Chris Packham <judge.packham@gmail.com>
     Reviewed-by: Stefan Roese <sr@denx.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer()
  2020-06-05 13:37       ` Stefan Roese
@ 2020-06-05 14:18         ` Rasmus Villemoes
  2020-06-05 14:23           ` Stefan Roese
  0 siblings, 1 reply; 13+ messages in thread
From: Rasmus Villemoes @ 2020-06-05 14:18 UTC (permalink / raw)
  To: u-boot

On 05/06/2020 15.37, Stefan Roese wrote:
> On 05.06.20 14:13, Stefan Roese wrote:
>> On 05.06.20 14:11, Rasmus Villemoes wrote:
>>> On 05/06/2020 13.48, Stefan Roese wrote:
>>>> On 05.06.20 13:16, Rasmus Villemoes wrote:
>>>>> This is what I had in mind. I also considered making it a config knob
>>>>> (possibly just auto-selected based on $ARCH) whether to use
>>>>> get_timer() or get_ticks(), but that becomes quite ugly.
>>>>
>>>> I hesitate a bit, moving with this generic code from get_timer()
>>>> to get_ticks() for all boards. Did you test this patch on other
>>>> platforms, like some ARM boards?
>>>>
>>>> Please note that I don't reject it - just asking.
>>>
>>> Yeah, I'm not really too happy about it myself, exactly because it
>>> affects all arches/platforms. And no, I don't have other hardware handy
>>> unfortunately. So it's very much an RFC where I hope someone with
>>> knowledge of the various arches can say whether one can expect
>>> get_ticks() to be at least as widely available as get_timer().
>>
>> I'll try to test it on a non-powerpc platform soon'ish.
> 
> I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688)
> and it works there without any issues AFAICT.

Thanks.

> Two more remarks:
> 
> Could you please run a build test with this patch applied for all
> boards (Travis, Azure...)?

I may need a few more pointers than that. What am I supposed to do exactly?

> And do you see an issue, that the timer-wrap fixed by Chris with this
> patch [1] will not re-occur with your "ticks" approach?

No, it will not re-occur, because this does the comparison the right
way, namely by subtracting the absolute times and comparing that delta
to the threshold (that's also what the time_after does under the hood).
The wrong way is saying now >= last_reset + threshold (or as I think the
code used to do, computing next_reset = now + threshold and then asking
now >= next_reset).

Rasmus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer()
  2020-06-05 14:18         ` Rasmus Villemoes
@ 2020-06-05 14:23           ` Stefan Roese
  2020-06-05 14:34             ` Tom Rini
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Roese @ 2020-06-05 14:23 UTC (permalink / raw)
  To: u-boot

On 05.06.20 16:18, Rasmus Villemoes wrote:
> On 05/06/2020 15.37, Stefan Roese wrote:
>> On 05.06.20 14:13, Stefan Roese wrote:
>>> On 05.06.20 14:11, Rasmus Villemoes wrote:
>>>> On 05/06/2020 13.48, Stefan Roese wrote:
>>>>> On 05.06.20 13:16, Rasmus Villemoes wrote:
>>>>>> This is what I had in mind. I also considered making it a config knob
>>>>>> (possibly just auto-selected based on $ARCH) whether to use
>>>>>> get_timer() or get_ticks(), but that becomes quite ugly.
>>>>>
>>>>> I hesitate a bit, moving with this generic code from get_timer()
>>>>> to get_ticks() for all boards. Did you test this patch on other
>>>>> platforms, like some ARM boards?
>>>>>
>>>>> Please note that I don't reject it - just asking.
>>>>
>>>> Yeah, I'm not really too happy about it myself, exactly because it
>>>> affects all arches/platforms. And no, I don't have other hardware handy
>>>> unfortunately. So it's very much an RFC where I hope someone with
>>>> knowledge of the various arches can say whether one can expect
>>>> get_ticks() to be at least as widely available as get_timer().
>>>
>>> I'll try to test it on a non-powerpc platform soon'ish.
>>
>> I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688)
>> and it works there without any issues AFAICT.
> 
> Thanks.
> 
>> Two more remarks:
>>
>> Could you please run a build test with this patch applied for all
>> boards (Travis, Azure...)?
> 
> I may need a few more pointers than that. What am I supposed to do exactly?

Not sure if there is some documentation on how to use the Travis,
Gitlab or Azure build. It mainly uses buildman [1] internally, so you
can use buildman locally if you don't want to push the build into
the cloud.

>> And do you see an issue, that the timer-wrap fixed by Chris with this
>> patch [1] will not re-occur with your "ticks" approach?
> 
> No, it will not re-occur, because this does the comparison the right
> way, namely by subtracting the absolute times and comparing that delta
> to the threshold (that's also what the time_after does under the hood).
> The wrong way is saying now >= last_reset + threshold (or as I think the
> code used to do, computing next_reset = now + threshold and then asking
> now >= next_reset).

Ah, good to know. Thanks for confirming.

Thanks,
Stefan

[1] tools/buildman/README

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer()
  2020-06-05 14:23           ` Stefan Roese
@ 2020-06-05 14:34             ` Tom Rini
  2020-06-05 19:56               ` Rasmus Villemoes
  0 siblings, 1 reply; 13+ messages in thread
From: Tom Rini @ 2020-06-05 14:34 UTC (permalink / raw)
  To: u-boot

On Fri, Jun 05, 2020 at 04:23:21PM +0200, Stefan Roese wrote:
> On 05.06.20 16:18, Rasmus Villemoes wrote:
> > On 05/06/2020 15.37, Stefan Roese wrote:
> > > On 05.06.20 14:13, Stefan Roese wrote:
> > > > On 05.06.20 14:11, Rasmus Villemoes wrote:
> > > > > On 05/06/2020 13.48, Stefan Roese wrote:
> > > > > > On 05.06.20 13:16, Rasmus Villemoes wrote:
> > > > > > > This is what I had in mind. I also considered making it a config knob
> > > > > > > (possibly just auto-selected based on $ARCH) whether to use
> > > > > > > get_timer() or get_ticks(), but that becomes quite ugly.
> > > > > > 
> > > > > > I hesitate a bit, moving with this generic code from get_timer()
> > > > > > to get_ticks() for all boards. Did you test this patch on other
> > > > > > platforms, like some ARM boards?
> > > > > > 
> > > > > > Please note that I don't reject it - just asking.
> > > > > 
> > > > > Yeah, I'm not really too happy about it myself, exactly because it
> > > > > affects all arches/platforms. And no, I don't have other hardware handy
> > > > > unfortunately. So it's very much an RFC where I hope someone with
> > > > > knowledge of the various arches can say whether one can expect
> > > > > get_ticks() to be at least as widely available as get_timer().
> > > > 
> > > > I'll try to test it on a non-powerpc platform soon'ish.
> > > 
> > > I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688)
> > > and it works there without any issues AFAICT.
> > 
> > Thanks.
> > 
> > > Two more remarks:
> > > 
> > > Could you please run a build test with this patch applied for all
> > > boards (Travis, Azure...)?
> > 
> > I may need a few more pointers than that. What am I supposed to do exactly?
> 
> Not sure if there is some documentation on how to use the Travis,
> Gitlab or Azure build. It mainly uses buildman [1] internally, so you
> can use buildman locally if you don't want to push the build into
> the cloud.

So, here's a recently-I-learned.  If you submit a pull request on GitHub
to https://github.com/u-boot/u-boot that will trigger Travis and Azure
runs:
https://travis-ci.org/github/u-boot/u-boot/pull_requests
and somewhere under:
https://dev.azure.com/u-boot/u-boot/_build?definitionId=2&_a=summary

Note that sometimes Travis will have connectivity failures and you have
to be me to tell it to re-run the job.  Azure only has failures when we
get those to-be-debugged random failures of sandbox/qemu targets.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200605/43aad7cb/attachment.sig>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer()
  2020-06-05 14:34             ` Tom Rini
@ 2020-06-05 19:56               ` Rasmus Villemoes
  0 siblings, 0 replies; 13+ messages in thread
From: Rasmus Villemoes @ 2020-06-05 19:56 UTC (permalink / raw)
  To: u-boot

On 05/06/2020 16.34, Tom Rini wrote:
> On Fri, Jun 05, 2020 at 04:23:21PM +0200, Stefan Roese wrote:
>> On 05.06.20 16:18, Rasmus Villemoes wrote:
>>> On 05/06/2020 15.37, Stefan Roese wrote:
>>>> Could you please run a build test with this patch applied for all
>>>> boards (Travis, Azure...)?
>>>
>>> I may need a few more pointers than that. What am I supposed to do exactly?
>>
>> Not sure if there is some documentation on how to use the Travis,
>> Gitlab or Azure build. It mainly uses buildman [1] internally, so you
>> can use buildman locally if you don't want to push the build into
>> the cloud.
> 
> So, here's a recently-I-learned.  If you submit a pull request on GitHub
> to https://github.com/u-boot/u-boot that will trigger Travis and Azure
> runs:
> https://travis-ci.org/github/u-boot/u-boot/pull_requests
> and somewhere under:
> https://dev.azure.com/u-boot/u-boot/_build?definitionId=2&_a=summary
> 
> Note that sometimes Travis will have connectivity failures and you have
> to be me to tell it to re-run the job.  Azure only has failures when we
> get those to-be-debugged random failures of sandbox/qemu targets.
> 

Thanks! FWIW,
https://travis-ci.org/github/u-boot/u-boot/builds/695074829 and
https://dev.azure.com/u-boot/u-boot/_build/results?buildId=713&view=results
are all green - but how many of those actually build wdt-uclass.c I
don't know.

Rasmus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] powerpc, wdt: disable ratelimiting when disabling interrupts
  2020-06-05 11:16 [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer() Rasmus Villemoes
  2020-06-05 11:48 ` Stefan Roese
@ 2021-04-09 14:12 ` Rasmus Villemoes
  2021-04-09 14:37   ` Christophe Leroy
  1 sibling, 1 reply; 13+ messages in thread
From: Rasmus Villemoes @ 2021-04-09 14:12 UTC (permalink / raw)
  To: u-boot

On powerpc, time as measured by get_timer() ceases to pass when
interrupts are disabled (since on powerpc get_timer() returns the
value of a volatile variable that gets updated via a timer
interrupt). That in turn means the watchdog_reset() function provided
by CONFIG_WDT ceases to work due to the ratelimiting it imposes.

Normally, interrupts are just disabled very briefly. However, during
bootm, they are disabled for good prior to decompressing the kernel
image, which can be a somewhat time-consuming operation. Even when we
manage to decompress the kernel and do the other preparation steps and
hand over control to the kernel, the kernel also takes some time
before it is ready to assume responsibility for handling the
watchdog. The end result is that the board gets reset prematurely.

The ratelimiting isn't really strictly needed (prior to DM WDT, no
such thing existed), so just disable it when we know that time no
longer passes and have watchdog_reset() (e.g. called from
decompression loop) unconditionally reset the watchdog timer.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---

I previously sent a patch to change the ratelimiting to be based on
get_ticks() instead of get_timer(), but that has gone nowhere
[1]. This is an alternative which only affects powerpc (and only
boards that have enabled CONFIG_WDT). I hope the watchdog maintainers
will accept at least one of these, or suggest a third alternative, so
I don't have to keep some out-of-tree patch applied without knowing if
that's the direction upstream will take.

[1] https://patchwork.ozlabs.org/project/uboot/patch/20200605111657.28773-1-rasmus.villemoes at prevas.dk/



 arch/powerpc/lib/interrupts.c | 3 +++
 drivers/watchdog/wdt-uclass.c | 8 +++++++-
 include/wdt.h                 | 6 ++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/interrupts.c b/arch/powerpc/lib/interrupts.c
index 73f270002c..5c5b5fd7ff 100644
--- a/arch/powerpc/lib/interrupts.c
+++ b/arch/powerpc/lib/interrupts.c
@@ -11,6 +11,7 @@
 #include <irq_func.h>
 #include <asm/processor.h>
 #include <watchdog.h>
+#include <wdt.h>
 #ifdef CONFIG_LED_STATUS
 #include <status_led.h>
 #endif
@@ -43,6 +44,7 @@ static __inline__ void set_dec (unsigned long val)
 void enable_interrupts(void)
 {
 	set_msr (get_msr () | MSR_EE);
+	watchdog_ratelimit(1);
 }
 
 /* returns flag if MSR_EE was set before */
@@ -50,6 +52,7 @@ int disable_interrupts(void)
 {
 	ulong msr = get_msr ();
 
+	watchdog_ratelimit(0);
 	set_msr (msr & ~MSR_EE);
 	return ((msr & MSR_EE) != 0);
 }
diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
index 0603ffbd36..b70a9d50b8 100644
--- a/drivers/watchdog/wdt-uclass.c
+++ b/drivers/watchdog/wdt-uclass.c
@@ -131,6 +131,12 @@ int wdt_expire_now(struct udevice *dev, ulong flags)
 	return ret;
 }
 
+static int ratelimit = 1;
+void watchdog_ratelimit(int on)
+{
+	ratelimit = on;
+}
+
 #if defined(CONFIG_WATCHDOG)
 /*
  * Called by macro WATCHDOG_RESET. This function be called *very* early,
@@ -148,7 +154,7 @@ void watchdog_reset(void)
 
 	/* Do not reset the watchdog too often */
 	now = get_timer(0);
-	if (time_after(now, next_reset)) {
+	if (!ratelimit || time_after(now, next_reset)) {
 		next_reset = now + reset_period;
 		wdt_reset(gd->watchdog_dev);
 	}
diff --git a/include/wdt.h b/include/wdt.h
index bc242c2eb2..9ba1e62dcf 100644
--- a/include/wdt.h
+++ b/include/wdt.h
@@ -107,4 +107,10 @@ struct wdt_ops {
 
 int initr_watchdog(void);
 
+#if CONFIG_IS_ENABLED(WDT)
+void watchdog_ratelimit(int on);
+#else
+static inline void watchdog_ratelimit(int on) { }
+#endif
+
 #endif  /* _WDT_H_ */
-- 
2.29.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] powerpc, wdt: disable ratelimiting when disabling interrupts
  2021-04-09 14:12 ` [PATCH] powerpc, wdt: disable ratelimiting when disabling interrupts Rasmus Villemoes
@ 2021-04-09 14:37   ` Christophe Leroy
  2021-04-12  7:04     ` Rasmus Villemoes
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Leroy @ 2021-04-09 14:37 UTC (permalink / raw)
  To: u-boot



Le 09/04/2021 ? 16:12, Rasmus Villemoes a ?crit?:
> On powerpc, time as measured by get_timer() ceases to pass when
> interrupts are disabled (since on powerpc get_timer() returns the
> value of a volatile variable that gets updated via a timer
> interrupt). That in turn means the watchdog_reset() function provided
> by CONFIG_WDT ceases to work due to the ratelimiting it imposes.
> 
> Normally, interrupts are just disabled very briefly. However, during
> bootm, they are disabled for good prior to decompressing the kernel
> image, which can be a somewhat time-consuming operation. Even when we
> manage to decompress the kernel and do the other preparation steps and
> hand over control to the kernel, the kernel also takes some time
> before it is ready to assume responsibility for handling the
> watchdog. The end result is that the board gets reset prematurely.
> 
> The ratelimiting isn't really strictly needed (prior to DM WDT, no
> such thing existed), so just disable it when we know that time no
> longer passes and have watchdog_reset() (e.g. called from
> decompression loop) unconditionally reset the watchdog timer.


Do we need to make it that complicated ? I think before the generic implementation, powerpc didn't 
have a rate limitation at all for pinging the watchdog, why not go back this direction, all the time ?

I mean we could simply set reset_period to 0 at all time for powerpc ( and change the test to 
time_after_eq() instead of time_after() ).


> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
> 
> I previously sent a patch to change the ratelimiting to be based on
> get_ticks() instead of get_timer(), but that has gone nowhere
> [1]. This is an alternative which only affects powerpc (and only
> boards that have enabled CONFIG_WDT). I hope the watchdog maintainers
> will accept at least one of these, or suggest a third alternative, so
> I don't have to keep some out-of-tree patch applied without knowing if
> that's the direction upstream will take.
> 
> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200605111657.28773-1-rasmus.villemoes at prevas.dk/
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] powerpc, wdt: disable ratelimiting when disabling interrupts
  2021-04-09 14:37   ` Christophe Leroy
@ 2021-04-12  7:04     ` Rasmus Villemoes
  2021-04-12  8:57       ` Stefan Roese
  0 siblings, 1 reply; 13+ messages in thread
From: Rasmus Villemoes @ 2021-04-12  7:04 UTC (permalink / raw)
  To: u-boot

On 09/04/2021 16.37, Christophe Leroy wrote:
> 
> 
> Le 09/04/2021 ? 16:12, Rasmus Villemoes a ?crit?:

>> The ratelimiting isn't really strictly needed (prior to DM WDT, no
>> such thing existed), so just disable it when we know that time no
>> longer passes and have watchdog_reset() (e.g. called from
>> decompression loop) unconditionally reset the watchdog timer.
> 
> 
> Do we need to make it that complicated ? I think before the generic
> implementation, powerpc didn't have a rate limitation at all for pinging
> the watchdog, why not go back this direction, all the time ?
> 
> I mean we could simply set reset_period to 0 at all time for powerpc (
> and change the test to time_after_eq() instead of time_after() ).

Maybe. I think I'd still prefer the one that changes the rate-limiting
to be based on get_ticks() instead of get_timer(), which did seem to
work everywhere.

IIRC, Stefan previously rejected having a config knob or board hook for
disabling the ratelimiting - changing the code [the s/after/after_eq/]
to effectively allow that by passing hw_margin_ms=0 in DT would probably
work, but seems quite hacky (especially when DT is supposed to describe
hardware).

I've also floated the option of making get_timer() return something
sensible even with interrupts off on ppc - that would also solve the
problem, and would benefit other generic code that uses get_timer()
without knowing its limitation on ppc.
https://lists.denx.de/pipermail/u-boot/2020-June/414842.html .

So, there are lots of ways of solving it. Which direction to take really
depends on which viewpoint one has:

(1) the watchdog code should, if rate-limiting at all, use an
implementation that works on all platforms [the get_ticks option]
(2) the watchdog code should, if rate-limiting at all, provide a way for
a board/platform to override or disable that
(3) get_timer() on ppc is broken, it needs to provide sensible results
at all times, even when interrupts are disabled for tens or hundreds of
milliseconds.

Rasmus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] powerpc, wdt: disable ratelimiting when disabling interrupts
  2021-04-12  7:04     ` Rasmus Villemoes
@ 2021-04-12  8:57       ` Stefan Roese
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Roese @ 2021-04-12  8:57 UTC (permalink / raw)
  To: u-boot

On 12.04.21 09:04, Rasmus Villemoes wrote:
> On 09/04/2021 16.37, Christophe Leroy wrote:
>>
>>
>> Le 09/04/2021 ? 16:12, Rasmus Villemoes a ?crit?:
> 
>>> The ratelimiting isn't really strictly needed (prior to DM WDT, no
>>> such thing existed), so just disable it when we know that time no
>>> longer passes and have watchdog_reset() (e.g. called from
>>> decompression loop) unconditionally reset the watchdog timer.
>>
>>
>> Do we need to make it that complicated ? I think before the generic
>> implementation, powerpc didn't have a rate limitation at all for pinging
>> the watchdog, why not go back this direction, all the time ?
>>
>> I mean we could simply set reset_period to 0 at all time for powerpc (
>> and change the test to time_after_eq() instead of time_after() ).
> 
> Maybe. I think I'd still prefer the one that changes the rate-limiting
> to be based on get_ticks() instead of get_timer(), which did seem to
> work everywhere.
> 
> IIRC, Stefan previously rejected having a config knob or board hook for
> disabling the ratelimiting - changing the code [the s/after/after_eq/]
> to effectively allow that by passing hw_margin_ms=0 in DT would probably
> work, but seems quite hacky (especially when DT is supposed to describe
> hardware).

I'm not sure if I explicitly rejected any patches, sorry it's been too
long. ;)

But I also find this idea from Christophe quite applealing, as it's
impact on the common code is minimal. Either you set the DT property
to 0. Or if you don't have the possibility to do this (no DT on your
platform?), then add a new Kconfig option to configure the default
value for "reset_period" via Kconfig.

> I've also floated the option of making get_timer() return something
> sensible even with interrupts off on ppc - that would also solve the
> problem, and would benefit other generic code that uses get_timer()
> without knowing its limitation on ppc.
> https://lists.denx.de/pipermail/u-boot/2020-June/414842.html .
> 
> So, there are lots of ways of solving it. Which direction to take really
> depends on which viewpoint one has:
> 
> (1) the watchdog code should, if rate-limiting at all, use an
> implementation that works on all platforms [the get_ticks option]
> (2) the watchdog code should, if rate-limiting at all, provide a way for
> a board/platform to override or disable that
> (3) get_timer() on ppc is broken, it needs to provide sensible results
> at all times, even when interrupts are disabled for tens or hundreds of
> milliseconds.

"hw_margin_ms=0" would be my favorite way right now (see above). If this
does not work for you, I'm also fine with the latest patch you sent to
the list.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-04-12  8:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05 11:16 [RFC PATCH] watchdog: base rate-limiting on get_ticks() rather than get_timer() Rasmus Villemoes
2020-06-05 11:48 ` Stefan Roese
2020-06-05 12:11   ` Rasmus Villemoes
2020-06-05 12:13     ` Stefan Roese
2020-06-05 13:37       ` Stefan Roese
2020-06-05 14:18         ` Rasmus Villemoes
2020-06-05 14:23           ` Stefan Roese
2020-06-05 14:34             ` Tom Rini
2020-06-05 19:56               ` Rasmus Villemoes
2021-04-09 14:12 ` [PATCH] powerpc, wdt: disable ratelimiting when disabling interrupts Rasmus Villemoes
2021-04-09 14:37   ` Christophe Leroy
2021-04-12  7:04     ` Rasmus Villemoes
2021-04-12  8:57       ` Stefan Roese

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.