linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] RFC: arm64: arch_timer: Fix timer inconsistency test for A64
@ 2020-09-29 11:13 Roman Stratiienko
  2020-09-29 11:35 ` Ondřej Jirman
  2020-09-29 15:40 ` Mark Rutland
  0 siblings, 2 replies; 4+ messages in thread
From: Roman Stratiienko @ 2020-09-29 11:13 UTC (permalink / raw)
  To: linux-sunxi; +Cc: Roman Stratiienko, linux-arm-kernel, linux-kernel, megous

Fixes linux_kselftest:timers_inconsistency-check_arm_64

Test logs without the fix:
'''
binary returned non-zero. Exit code: 1, stderr: , stdout:
Consistent CLOCK_REALTIME
1601335525:467086804
1601335525:467087554
1601335525:467088345
1601335525:467089095
1601335525:467089887
1601335525:467090637
1601335525:467091429
1601335525:467092179
1601335525:467092929
1601335525:467093720
1601335525:467094470
1601335525:467095262
1601335525:467096012
1601335525:467096804
--------------------
1601335525:467097554
1601335525:467077012
--------------------
1601335525:467099095
1601335525:467099845
1601335525:467100637
1601335525:467101387
1601335525:467102179
1601335525:467102929
'''

Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
CC: linux-arm-kernel@lists.infradead.org
CC: linux-kernel@vger.kernel.org
CC: linux-sunxi@googlegroups.com
CC: megous@megous.com
---
 drivers/clocksource/arm_arch_timer.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 6c3e841801461..d50aa43cb654b 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -346,16 +346,17 @@ static u64 notrace arm64_858921_read_cntvct_el0(void)
  * number of CPU cycles in 3 consecutive 24 MHz counter periods.
  */
 #define __sun50i_a64_read_reg(reg) ({					\
-	u64 _val;							\
+	u64 _val1, _val2;						\
 	int _retries = 150;						\
 									\
 	do {								\
-		_val = read_sysreg(reg);				\
+		_val1 = read_sysreg(reg);				\
+		_val2 = read_sysreg(reg);				\
 		_retries--;						\
-	} while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries);	\
+	} while (((_val2 - _val1) > 0x10) && _retries);			\
 									\
 	WARN_ON_ONCE(!_retries);					\
-	_val;								\
+	_val2;								\
 })
 
 static u64 notrace sun50i_a64_read_cntpct_el0(void)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] RFC: arm64: arch_timer: Fix timer inconsistency test for A64
  2020-09-29 11:13 [PATCH] RFC: arm64: arch_timer: Fix timer inconsistency test for A64 Roman Stratiienko
@ 2020-09-29 11:35 ` Ondřej Jirman
  2020-09-29 15:40 ` Mark Rutland
  1 sibling, 0 replies; 4+ messages in thread
From: Ondřej Jirman @ 2020-09-29 11:35 UTC (permalink / raw)
  To: Roman Stratiienko; +Cc: linux-sunxi, linux-arm-kernel, linux-kernel

Hello Roman,

On Tue, Sep 29, 2020 at 02:13:47PM +0300, Roman Stratiienko wrote:
> Fixes linux_kselftest:timers_inconsistency-check_arm_64
> 
> Test logs without the fix:
> '''
> binary returned non-zero. Exit code: 1, stderr: , stdout:
> Consistent CLOCK_REALTIME
> 1601335525:467086804
> 1601335525:467087554
> 1601335525:467088345
> 1601335525:467089095
> 1601335525:467089887
> 1601335525:467090637
> 1601335525:467091429
> 1601335525:467092179
> 1601335525:467092929
> 1601335525:467093720
> 1601335525:467094470
> 1601335525:467095262
> 1601335525:467096012
> 1601335525:467096804
> --------------------
> 1601335525:467097554
> 1601335525:467077012
> --------------------
> 1601335525:467099095
> 1601335525:467099845
> 1601335525:467100637
> 1601335525:467101387
> 1601335525:467102179
> 1601335525:467102929
> '''

Can you reproduce the issue with a fixed CPU frequency. I suspect the root
cause is around CPU frequency scaling code on A64, and timer jumps happen when
the kernel is changing CPU frequency.

I fixed a similar issue on H3 SoC just by changing the CPU frequency scaling
code, without having to touch the timer readout code.

https://megous.com/git/linux/commit/?h=ths-5.9&id=51ff1a6d80126f678efca42555f93efa611f50c4

regards,
	o.

> Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
> CC: linux-arm-kernel@lists.infradead.org
> CC: linux-kernel@vger.kernel.org
> CC: linux-sunxi@googlegroups.com
> CC: megous@megous.com
> ---
>  drivers/clocksource/arm_arch_timer.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 6c3e841801461..d50aa43cb654b 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -346,16 +346,17 @@ static u64 notrace arm64_858921_read_cntvct_el0(void)
>   * number of CPU cycles in 3 consecutive 24 MHz counter periods.
>   */
>  #define __sun50i_a64_read_reg(reg) ({					\
> -	u64 _val;							\
> +	u64 _val1, _val2;						\
>  	int _retries = 150;						\
>  									\
>  	do {								\
> -		_val = read_sysreg(reg);				\
> +		_val1 = read_sysreg(reg);				\
> +		_val2 = read_sysreg(reg);				\
>  		_retries--;						\
> -	} while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries);	\
> +	} while (((_val2 - _val1) > 0x10) && _retries);			\
>  									\
>  	WARN_ON_ONCE(!_retries);					\
> -	_val;								\
> +	_val2;								\
>  })
>  
>  static u64 notrace sun50i_a64_read_cntpct_el0(void)
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] RFC: arm64: arch_timer: Fix timer inconsistency test for A64
  2020-09-29 11:13 [PATCH] RFC: arm64: arch_timer: Fix timer inconsistency test for A64 Roman Stratiienko
  2020-09-29 11:35 ` Ondřej Jirman
@ 2020-09-29 15:40 ` Mark Rutland
  2020-09-30  1:00   ` Samuel Holland
  1 sibling, 1 reply; 4+ messages in thread
From: Mark Rutland @ 2020-09-29 15:40 UTC (permalink / raw)
  To: Roman Stratiienko
  Cc: linux-sunxi, megous, linux-kernel, linux-arm-kernel, maz

Hi,

Please Cc maintainers for drivers -- Marc and I maintain the arch timer
driver.

On Tue, Sep 29, 2020 at 02:13:47PM +0300, Roman Stratiienko wrote:
> Fixes linux_kselftest:timers_inconsistency-check_arm_64
> 
> Test logs without the fix:
> '''
> binary returned non-zero. Exit code: 1, stderr: , stdout:
> Consistent CLOCK_REALTIME
> 1601335525:467086804
> 1601335525:467087554
> 1601335525:467088345
> 1601335525:467089095
> 1601335525:467089887
> 1601335525:467090637
> 1601335525:467091429
> 1601335525:467092179
> 1601335525:467092929
> 1601335525:467093720
> 1601335525:467094470
> 1601335525:467095262
> 1601335525:467096012
> 1601335525:467096804
> --------------------
> 1601335525:467097554
> 1601335525:467077012

That's 0x1BD757D2 followed by 0x1BD70794. The rollback is somewhere in
bits 15:12 to go from 0x1BD75xxx to 0x1BD70xxx, which suggests the
analysis in the existing comment is incomplete.

> --------------------
> 1601335525:467099095
> 1601335525:467099845
> 1601335525:467100637
> 1601335525:467101387
> 1601335525:467102179
> 1601335525:467102929
> '''

It would be very helpful if the commit message could explain the rough
idea behind the change, because the rationale is not clear to me.

> Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
> CC: linux-arm-kernel@lists.infradead.org
> CC: linux-kernel@vger.kernel.org
> CC: linux-sunxi@googlegroups.com
> CC: megous@megous.com
> ---
>  drivers/clocksource/arm_arch_timer.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 6c3e841801461..d50aa43cb654b 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -346,16 +346,17 @@ static u64 notrace arm64_858921_read_cntvct_el0(void)
>   * number of CPU cycles in 3 consecutive 24 MHz counter periods.
>   */
>  #define __sun50i_a64_read_reg(reg) ({					\
> -	u64 _val;							\
> +	u64 _val1, _val2;						\
>  	int _retries = 150;						\
>  									\
>  	do {								\
> -		_val = read_sysreg(reg);				\
> +		_val1 = read_sysreg(reg);				\
> +		_val2 = read_sysreg(reg);				\
>  		_retries--;						\
> -	} while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries);	\
> +	} while (((_val2 - _val1) > 0x10) && _retries);			\

This is going to fail quite often at low CPU frequencies, and it's not
clear to me that this solves the problem any more generally. DO we know
what the underlying erratum is here?

Thanks,
Mark.

>  									\
>  	WARN_ON_ONCE(!_retries);					\
> -	_val;								\
> +	_val2;								\
>  })
>  
>  static u64 notrace sun50i_a64_read_cntpct_el0(void)
> -- 
> 2.25.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] RFC: arm64: arch_timer: Fix timer inconsistency test for A64
  2020-09-29 15:40 ` Mark Rutland
@ 2020-09-30  1:00   ` Samuel Holland
  0 siblings, 0 replies; 4+ messages in thread
From: Samuel Holland @ 2020-09-30  1:00 UTC (permalink / raw)
  To: Mark Rutland, Roman Stratiienko
  Cc: linux-sunxi, megous, linux-kernel, linux-arm-kernel, maz

On 9/29/20 10:40 AM, Mark Rutland wrote:
> Hi,
> 
> Please Cc maintainers for drivers -- Marc and I maintain the arch timer
> driver.
> 
> On Tue, Sep 29, 2020 at 02:13:47PM +0300, Roman Stratiienko wrote:
>> Fixes linux_kselftest:timers_inconsistency-check_arm_64
>>
>> Test logs without the fix:
>> '''
>> binary returned non-zero. Exit code: 1, stderr: , stdout:
>> Consistent CLOCK_REALTIME
>> 1601335525:467086804
>> 1601335525:467087554
>> 1601335525:467088345
>> 1601335525:467089095
>> 1601335525:467089887
>> 1601335525:467090637
>> 1601335525:467091429
>> 1601335525:467092179
>> 1601335525:467092929
>> 1601335525:467093720
>> 1601335525:467094470
>> 1601335525:467095262
>> 1601335525:467096012
>> 1601335525:467096804
>> --------------------
>> 1601335525:467097554
>> 1601335525:467077012
> 
> That's 0x1BD757D2 followed by 0x1BD70794. The rollback is somewhere in
> bits 15:12 to go from 0x1BD75xxx to 0x1BD70xxx, which suggests the
> analysis in the existing comment is incomplete.

My analysis only looked at consecutive timer reads on a single core. Apparently,
from the vendor comment that they needed CLOCKSOURCE_VALIDATE_LAST_CYCLE (linked
below), there is another inconsistency with reads across cores.

>> --------------------
>> 1601335525:467099095
>> 1601335525:467099845
>> 1601335525:467100637
>> 1601335525:467101387
>> 1601335525:467102179
>> 1601335525:467102929
>> '''
> 
> It would be very helpful if the commit message could explain the rough
> idea behind the change, because the rationale is not clear to me.
> 
>> Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
>> CC: linux-arm-kernel@lists.infradead.org
>> CC: linux-kernel@vger.kernel.org
>> CC: linux-sunxi@googlegroups.com
>> CC: megous@megous.com
>> ---
>>  drivers/clocksource/arm_arch_timer.c | 9 +++++----
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>> index 6c3e841801461..d50aa43cb654b 100644
>> --- a/drivers/clocksource/arm_arch_timer.c
>> +++ b/drivers/clocksource/arm_arch_timer.c
>> @@ -346,16 +346,17 @@ static u64 notrace arm64_858921_read_cntvct_el0(void)
>>   * number of CPU cycles in 3 consecutive 24 MHz counter periods.
>>   */
>>  #define __sun50i_a64_read_reg(reg) ({					\
>> -	u64 _val;							\
>> +	u64 _val1, _val2;						\
>>  	int _retries = 150;						\
>>  									\
>>  	do {								\
>> -		_val = read_sysreg(reg);				\
>> +		_val1 = read_sysreg(reg);				\
>> +		_val2 = read_sysreg(reg);				\
>>  		_retries--;						\
>> -	} while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries);	\
>> +	} while (((_val2 - _val1) > 0x10) && _retries);			\
> 
> This is going to fail quite often at low CPU frequencies, and it's not
> clear to me that this solves the problem any more generally. DO we know
> what the underlying erratum is here?

This is what we have from the vendor:

https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/arch/arm64/include/asm/arch_timer.h#L155
https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272

and they select CLOCKSOURCE_VALIDATE_LAST_CYCLE.

Everything else we know (e.g. the tables and explanation in my original commit
log) is from testing by users.

I had not seen the vendor changes to arch_timer.h when I wrote the existing
workaround. Their retry loop is similar to this patch, but as you mention, it
won't work if the CPU frequency is too low. That may not be a problem for the
vendor kernel, but it breaks badly on mainline.

As Ondrej referenced, the mainline DVFS implementation bypasses the PLL during
frequency changes. At that point, both the CPU and the timer are running at 24
MHz, so the chance of hitting the retry limit is high... and there's a udelay()
call in the PLL bypass code (ccu_mux_notifier_cb()) that will need to read the
timer.

> Thanks,
> Mark.

Cheers,
Samuel

>>  									\
>>  	WARN_ON_ONCE(!_retries);					\
>> -	_val;								\
>> +	_val2;								\
>>  })
>>  
>>  static u64 notrace sun50i_a64_read_cntpct_el0(void)
>> -- 
>> 2.25.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-30  1:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 11:13 [PATCH] RFC: arm64: arch_timer: Fix timer inconsistency test for A64 Roman Stratiienko
2020-09-29 11:35 ` Ondřej Jirman
2020-09-29 15:40 ` Mark Rutland
2020-09-30  1:00   ` Samuel Holland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).