stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [4.14] Failing selftest timer/adjtick
@ 2021-02-10 12:43 Joerg Vehlow
  2021-02-10 13:00 ` Greg KH
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Vehlow @ 2021-02-10 12:43 UTC (permalink / raw)
  To: stable, Ingo Molnar, Miroslav Lichvar, John Stultz

Hi,

we found that on the selftest timer/adjtick fails on arm64 (tested on 
some renesas board and in qemu) quite frequently.
By bisecting the kernel I found that it stopped failing after commit 
78b98e3c5a66 (timekeeping/ntp: Determine the multiplier directly from 
NTP tick length).
Should this patch be applied to 4.14 and is it even possible or could it 
break something else?

Thanks,
Joerg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4.14] Failing selftest timer/adjtick
  2021-02-10 12:43 [4.14] Failing selftest timer/adjtick Joerg Vehlow
@ 2021-02-10 13:00 ` Greg KH
  2021-02-10 13:07   ` Joerg Vehlow
  0 siblings, 1 reply; 11+ messages in thread
From: Greg KH @ 2021-02-10 13:00 UTC (permalink / raw)
  To: Joerg Vehlow; +Cc: stable, Ingo Molnar, Miroslav Lichvar, John Stultz

On Wed, Feb 10, 2021 at 01:43:10PM +0100, Joerg Vehlow wrote:
> Hi,
> 
> we found that on the selftest timer/adjtick fails on arm64 (tested on some
> renesas board and in qemu) quite frequently.
> By bisecting the kernel I found that it stopped failing after commit
> 78b98e3c5a66 (timekeeping/ntp: Determine the multiplier directly from NTP
> tick length).
> Should this patch be applied to 4.14 and is it even possible or could it
> break something else?

Have you tried applying it to that tree to see if it solves your problem
and works properly?  If so, please feel free to provide a working
backported copy, with your signed-off-by and we can consider it.

But, why not just use 4.19 or newer on that system?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4.14] Failing selftest timer/adjtick
  2021-02-10 13:00 ` Greg KH
@ 2021-02-10 13:07   ` Joerg Vehlow
  2021-02-10 13:19     ` Miroslav Lichvar
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Vehlow @ 2021-02-10 13:07 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, Ingo Molnar, Miroslav Lichvar, John Stultz

Hi Greg,

On 2/10/2021 2:00 PM, Greg KH wrote:
> Have you tried applying it to that tree to see if it solves your problem
> and works properly?  If so, please feel free to provide a working
> backported copy, with your signed-off-by and we can consider it.
It can be applied without any changes and fixes the problem, but since I 
have not a lot of knowledge about this subsystem, I don't know if this 
breaks anything or if it requires other patches to be applied first, to 
not break anything..
Maybe the authors of the patch can check this easily or maybe know it. 
That's why I added them to the initial mail.
> But, why not just use 4.19 or newer on that system?
Why does an LTS version of 4.14 exist? Because the customer demands it :)
If the failing test was not one of the kernel selftest, I wouldn't 
bother you with this...

Joerg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4.14] Failing selftest timer/adjtick
  2021-02-10 13:07   ` Joerg Vehlow
@ 2021-02-10 13:19     ` Miroslav Lichvar
  2021-02-10 18:59       ` Naresh Kamboju
  2021-02-11 10:33       ` Joerg Vehlow
  0 siblings, 2 replies; 11+ messages in thread
From: Miroslav Lichvar @ 2021-02-10 13:19 UTC (permalink / raw)
  To: Joerg Vehlow; +Cc: Greg KH, stable, Ingo Molnar, John Stultz

On Wed, Feb 10, 2021 at 02:07:21PM +0100, Joerg Vehlow wrote:
> On 2/10/2021 2:00 PM, Greg KH wrote:
> > Have you tried applying it to that tree to see if it solves your problem
> > and works properly?  If so, please feel free to provide a working
> > backported copy, with your signed-off-by and we can consider it.
> It can be applied without any changes and fixes the problem, but since I
> have not a lot of knowledge about this subsystem, I don't know if this
> breaks anything or if it requires other patches to be applied first, to not
> break anything..
> Maybe the authors of the patch can check this easily or maybe know it.
> That's why I added them to the initial mail.

That patch cannot be applied alone. It would break the timekeeping in
not so obvious ways as there will be unexpected sources of the NTP
tracking error. IIRC, at least the following changes would need to be
included with it. There may be others.

c2cda2a5bda9 ("timekeeping/ntp: Don't align NTP frequency adjustments to ticks")
aea3706cfc4d ("timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD")
d4d1fc61eb38 ("ia64: Update fsyscall gettime to use modern vsyscall_update")

My suggestion for a fix would be to increase the limit in the failing
test.

-- 
Miroslav Lichvar


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4.14] Failing selftest timer/adjtick
  2021-02-10 13:19     ` Miroslav Lichvar
@ 2021-02-10 18:59       ` Naresh Kamboju
  2021-02-11 10:34         ` Joerg Vehlow
  2021-02-11 10:33       ` Joerg Vehlow
  1 sibling, 1 reply; 11+ messages in thread
From: Naresh Kamboju @ 2021-02-10 18:59 UTC (permalink / raw)
  To: Joerg Vehlow
  Cc: Greg KH, Miroslav Lichvar, linux-stable, Ingo Molnar,
	John Stultz, lkft-triage

I have tested adjtick on arm64 juno-r2 device and it got pass
and here is the test output on Linux version 4.14.221-rc1.

+ ./adjtick
Each iteration takes about 15 seconds
Estimating tick (act: 9000 usec, -100000 ppm): 9000 usec, -100000 ppm [OK]
Estimating tick (act: 9250 usec, -75000 ppm): 9250 usec, -75000 ppm [OK]
Estimating tick (act: 9500 usec, -50000 ppm): 9500 usec, -50000 ppm [OK]
Estimating tick (act: 9750 usec, -25000 ppm): 9750 usec, -25001 ppm [OK]
Estimating tick (act: 10000 usec, 0 ppm): 10000 usec, 0 ppm [OK]
Estimating tick (act: 10250 usec, 25000 ppm): 10249 usec, 24999 ppm [OK]
Estimating tick (act: 10500 usec, 50000 ppm): 10500 usec, 50000 ppm [OK]
Estimating tick (act: 10750 usec, 75000 ppm): 10750 usec, 75000 ppm [OK]
Pass 0 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0
1..0

output link,
https://lkft.validation.linaro.org/scheduler/job/2254102#L1255

- Naresh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4.14] Failing selftest timer/adjtick
  2021-02-10 13:19     ` Miroslav Lichvar
  2021-02-10 18:59       ` Naresh Kamboju
@ 2021-02-11 10:33       ` Joerg Vehlow
  2021-02-11 10:45         ` Greg KH
  2021-02-11 10:59         ` Miroslav Lichvar
  1 sibling, 2 replies; 11+ messages in thread
From: Joerg Vehlow @ 2021-02-11 10:33 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: Greg KH, stable, Ingo Molnar, John Stultz

Hi Miroslav,

On 2/10/2021 2:19 PM, Miroslav Lichvar wrote:
> That patch cannot be applied alone. It would break the timekeeping in
> not so obvious ways as there will be unexpected sources of the NTP
> tracking error. IIRC, at least the following changes would need to be
> included with it. There may be others.
>
> c2cda2a5bda9 ("timekeeping/ntp: Don't align NTP frequency adjustments to ticks")
> aea3706cfc4d ("timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD")
> d4d1fc61eb38 ("ia64: Update fsyscall gettime to use modern vsyscall_update")
>
> My suggestion for a fix would be to increase the limit in the failing
> test.
Thanks, that's what I expected. But I still wonder why the test is 
failing almost 100% of time for me on qemu-arm64 (running on x86). Is 
this a regression in 4.14, that was working at some point or was it 
never tested on arm?

Joerg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4.14] Failing selftest timer/adjtick
  2021-02-10 18:59       ` Naresh Kamboju
@ 2021-02-11 10:34         ` Joerg Vehlow
  0 siblings, 0 replies; 11+ messages in thread
From: Joerg Vehlow @ 2021-02-11 10:34 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Greg KH, Miroslav Lichvar, linux-stable, Ingo Molnar,
	John Stultz, lkft-triage

Hi,

On 2/10/2021 7:59 PM, Naresh Kamboju wrote:
> I have tested adjtick on arm64 juno-r2 device and it got pass
> and here is the test output on Linux version 4.14.221-rc1.
Interesting. Is this vanilla 4.14.221 or are there some o-e patches applied?
I just tried again on qemu arm with 4.14.222 from kernel.org stable tree 
and still have failures like the one below every time I try. The failing 
test step differs, but it always fails.

Each iteration takes about 15 seconds
Estimating tick (act: 9000 usec, -100000 ppm): 9000 usec, -100000 ppm    
[OK]
Estimating tick (act: 9250 usec, -75000 ppm): 9250 usec, -75001 ppm    [OK]
Estimating tick (act: 9500 usec, -50000 ppm): 9501 usec, -49995 ppm    [OK]
Estimating tick (act: 9750 usec, -25000 ppm): 9750 usec, -25003 ppm    [OK]
Estimating tick (act: 10000 usec, 0 ppm): 9996 usec, -463 ppm [FAILED]
Bail out!
Pass 0 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0
1..0


Joerg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4.14] Failing selftest timer/adjtick
  2021-02-11 10:33       ` Joerg Vehlow
@ 2021-02-11 10:45         ` Greg KH
  2021-02-11 10:59         ` Miroslav Lichvar
  1 sibling, 0 replies; 11+ messages in thread
From: Greg KH @ 2021-02-11 10:45 UTC (permalink / raw)
  To: Joerg Vehlow; +Cc: Miroslav Lichvar, stable, Ingo Molnar, John Stultz

On Thu, Feb 11, 2021 at 11:33:05AM +0100, Joerg Vehlow wrote:
> Hi Miroslav,
> 
> On 2/10/2021 2:19 PM, Miroslav Lichvar wrote:
> > That patch cannot be applied alone. It would break the timekeeping in
> > not so obvious ways as there will be unexpected sources of the NTP
> > tracking error. IIRC, at least the following changes would need to be
> > included with it. There may be others.
> > 
> > c2cda2a5bda9 ("timekeeping/ntp: Don't align NTP frequency adjustments to ticks")
> > aea3706cfc4d ("timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD")
> > d4d1fc61eb38 ("ia64: Update fsyscall gettime to use modern vsyscall_update")
> > 
> > My suggestion for a fix would be to increase the limit in the failing
> > test.
> Thanks, that's what I expected. But I still wonder why the test is failing
> almost 100% of time for me on qemu-arm64 (running on x86). Is this a
> regression in 4.14, that was working at some point or was it never tested on
> arm?

Does it work on a real system?  That's the proper test...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4.14] Failing selftest timer/adjtick
  2021-02-11 10:33       ` Joerg Vehlow
  2021-02-11 10:45         ` Greg KH
@ 2021-02-11 10:59         ` Miroslav Lichvar
  2021-02-18  7:05           ` Joerg Vehlow
  2021-03-01  7:04           ` Joerg Vehlow
  1 sibling, 2 replies; 11+ messages in thread
From: Miroslav Lichvar @ 2021-02-11 10:59 UTC (permalink / raw)
  To: Joerg Vehlow; +Cc: Greg KH, stable, Ingo Molnar, John Stultz

On Thu, Feb 11, 2021 at 11:33:05AM +0100, Joerg Vehlow wrote:
> > My suggestion for a fix would be to increase the limit in the failing
> > test.
> Thanks, that's what I expected. But I still wonder why the test is failing
> almost 100% of time for me on qemu-arm64 (running on x86). Is this a
> regression in 4.14, that was working at some point or was it never tested on
> arm?

I don't think it is specific to arm or that it is a regression. I
think the virtual machine just happens to be too idle for the test.
There may be unrelated changes, maybe in the kernel, qemu, or
applications, that caused the rate of the clock updates to decrease so
much that the instability now triggers the failure in the test.  The
issue with the clock was there since NO_HZ was introduced, but it
becomes more severe as the activity of the kernel decreases.

-- 
Miroslav Lichvar


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4.14] Failing selftest timer/adjtick
  2021-02-11 10:59         ` Miroslav Lichvar
@ 2021-02-18  7:05           ` Joerg Vehlow
  2021-03-01  7:04           ` Joerg Vehlow
  1 sibling, 0 replies; 11+ messages in thread
From: Joerg Vehlow @ 2021-02-18  7:05 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: Greg KH, stable, Ingo Molnar, John Stultz

Hi Miroslav,

On 2/11/2021 11:59 AM, Miroslav Lichvar wrote:
> I don't think it is specific to arm or that it is a regression. I
> think the virtual machine just happens to be too idle for the test.
> There may be unrelated changes, maybe in the kernel, qemu, or
> applications, that caused the rate of the clock updates to decrease so
> much that the instability now triggers the failure in the test.  The
> issue with the clock was there since NO_HZ was introduced, but it
> becomes more severe as the activity of the kernel decreases.
Thank you for that explanation. I did create some background load (copy 
from urandom to null) and ran the test. This made the test pass every time.

Jörg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4.14] Failing selftest timer/adjtick
  2021-02-11 10:59         ` Miroslav Lichvar
  2021-02-18  7:05           ` Joerg Vehlow
@ 2021-03-01  7:04           ` Joerg Vehlow
  1 sibling, 0 replies; 11+ messages in thread
From: Joerg Vehlow @ 2021-03-01  7:04 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: Greg KH, stable, Ingo Molnar, John Stultz

Hi Mi

On 2/11/2021 11:59 AM, Miroslav Lichvar wrote:
> I don't think it is specific to arm or that it is a regression. I
> think the virtual machine just happens to be too idle for the test.
> There may be unrelated changes, maybe in the kernel, qemu, or
> applications, that caused the rate of the clock updates to decrease so
> much that the instability now triggers the failure in the test.  The
> issue with the clock was there since NO_HZ was introduced, but it
> becomes more severe as the activity of the kernel decreases.
Thanks for the hint towards NO_HZ. Running the tests with some 
background load makes them pass reliably.

Jörg

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-03-01  7:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-10 12:43 [4.14] Failing selftest timer/adjtick Joerg Vehlow
2021-02-10 13:00 ` Greg KH
2021-02-10 13:07   ` Joerg Vehlow
2021-02-10 13:19     ` Miroslav Lichvar
2021-02-10 18:59       ` Naresh Kamboju
2021-02-11 10:34         ` Joerg Vehlow
2021-02-11 10:33       ` Joerg Vehlow
2021-02-11 10:45         ` Greg KH
2021-02-11 10:59         ` Miroslav Lichvar
2021-02-18  7:05           ` Joerg Vehlow
2021-03-01  7:04           ` Joerg Vehlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).