All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]
@ 2015-02-12 13:58 Prarit Bhargava
  2015-02-17 23:16 ` John Stultz
  2015-02-19 17:00 ` Jiri Bohac
  0 siblings, 2 replies; 8+ messages in thread
From: Prarit Bhargava @ 2015-02-12 13:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, John Stultz, Thomas Gleixner, Miroslav Lichvar,
	Peter Zijlstra

During leap second insertion testing it was noticed that a small window
exists where the time_state could be reset such that
time_state = TIME_OK, which then causes the leap second to not occur, or
causes the entire leap second state machine to fail with time_state =
TIME_INS at the end of the leap second.

The test did the following in userspace:

        tx.modes = ADJ_STATUS;
        tx.status = STA_INS;

	/* send leap second request */
        ret = adjtimex(&tx);

        /* Check adjtimex output every half second */
        now = tx.time.tv_sec;
        while (now < next_leap+2) {
                char buf[26];
                ret = adjtimex(&tx);

                ctime_r(&tx.time.tv_sec, buf);
                buf[strlen(buf)-1] = 0; /*remove trailing\n */

                printf("%s + %6ld us\t%s\n",
                                buf,
                                tx.time.tv_usec,
                                time_state_str(ret));
                now = tx.time.tv_sec;
                /* Sleep for another half second */
                ts.tv_sec = 0;
                ts.tv_nsec = NSEC_PER_SEC/2;
                clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
        }

which was intended to mimic the insertion of a leap second.  A
successful run of the test would result in the time_state transitioning
from TIME_OK to TIME_INS, then to TIME_OOP when the leap second was
inserted, and then to TIME_WAIT when the leap second was completed.  While
running this code failures were seen in which the time_state remained TIME_INS,
even though the leap second had occurred.

After some investigation it was noted that the test contained a small error:
the test does not reinitialize tx.status and reissues the STA_INS every
1/2 second.  As a result of this broken test, the following failure was noticed
(the output below is a mix of kernel messages and the output from the test
program, the remaining annotations are printk's in the code and my own
additional notes):

[  942.952833] time_state [1] change from TIME_OK to TIME_INS

Fri Feb 13 18:59:51 2015 + 318126 us    TIME_INS
Fri Feb 13 18:59:51 2015 + 818167 us    TIME_INS
Fri Feb 13 18:59:52 2015 + 318208 us    TIME_INS
Fri Feb 13 18:59:52 2015 + 818248 us    TIME_INS
Fri Feb 13 18:59:53 2015 + 318290 us    TIME_INS
Fri Feb 13 18:59:53 2015 + 818331 us    TIME_INS
Fri Feb 13 18:59:54 2015 + 318372 us    TIME_INS
Fri Feb 13 18:59:54 2015 + 818413 us    TIME_INS
Fri Feb 13 18:59:55 2015 + 318454 us    TIME_INS
Fri Feb 13 18:59:55 2015 + 818495 us    TIME_INS
Fri Feb 13 18:59:56 2015 + 318534 us    TIME_INS
Fri Feb 13 18:59:56 2015 + 818575 us    TIME_INS
Fri Feb 13 18:59:57 2015 + 318617 us    TIME_INS
Fri Feb 13 18:59:57 2015 + 818660 us    TIME_INS
Fri Feb 13 18:59:58 2015 + 318702 us    TIME_INS
Fri Feb 13 18:59:58 2015 + 818744 us    TIME_INS
Fri Feb 13 18:59:59 2015 + 318785 us    TIME_INS
Fri Feb 13 18:59:59 2015 + 818837 us    TIME_INS

[  952.953143] time_state [4] change from TIME_INS to TIME_OOP
[  952.953150] Clock: inserting leap second 23:59:60 UTC
[  953.299905] process_adj_status: insert_leap_sec[1223] setting time_state back
to TIME_OK [1, 1]   <<< adjtimex() call every 1/2 second
[  953.299913] time_state [9] change from TIME_OOP to TIME_OK

Fri Feb 13 18:59:59 2015 + 318878 us    TIME_OK
Fri Feb 13 18:59:59 2015 + 818931 us    TIME_OK

[  954.064237] time_state [1] change from TIME_OK to TIME_INS

Fri Feb 13 19:00:00 2015 + 318972 us    TIME_INS
Fri Feb 13 19:00:00 2015 + 819012 us    TIME_INS
Fri Feb 13 19:00:01 2015 + 319051 us    TIME_INS
Fri Feb 13 19:00:01 2015 + 819089 us    TIME_INS
Fri Feb 13 19:00:02 2015 + 319128 us    TIME_INS

As previously stated, the time_state remains TIME_INS even though the leap
second has already occurred @ 952.953150.

The test was changed to reset tx.status to 0 in the loop, and the test then
succeeded with a 100% rate with the time state ending in TIME_WAIT.

While this is highly unlikely to ever happen in the real world it is
still something we should protect against, as breaking the state machine
is bad.

If the time_state == TIME_OOP (ie, the leap second is in progress) do not
allow an external update to time_state in process_adj_status().  This will
prevent external adjtimex() calls from breaking the leap second state
machine.

[v2]: Only block time_state change when TIME_OOP
[v3]: Write a much more detailed explanation of the bug.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/time/ntp.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 28bf91c..6ff5cd5 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -535,7 +535,8 @@ void ntp_notify_cmos_timer(void) { }
 static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
 {
 	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
-		time_state = TIME_OK;
+		if (time_state != TIME_OOP)
+			time_state = TIME_OK;
 		time_status = STA_UNSYNC;
 		/* restart PPS frequency calibration */
 		pps_reset_freq_interval();
-- 
1.7.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]
  2015-02-12 13:58 [PATCH] time, ntp: Do not update time_state in middle of leap second [v3] Prarit Bhargava
@ 2015-02-17 23:16 ` John Stultz
  2015-02-18 17:14   ` Jiri Bohac
  2015-02-20 14:12   ` Prarit Bhargava
  2015-02-19 17:00 ` Jiri Bohac
  1 sibling, 2 replies; 8+ messages in thread
From: John Stultz @ 2015-02-17 23:16 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: lkml, Thomas Gleixner, Miroslav Lichvar, Peter Zijlstra

On Thu, Feb 12, 2015 at 5:58 AM, Prarit Bhargava <prarit@redhat.com> wrote:
> During leap second insertion testing it was noticed that a small window
> exists where the time_state could be reset such that
> time_state = TIME_OK, which then causes the leap second to not occur, or
> causes the entire leap second state machine to fail with time_state =
> TIME_INS at the end of the leap second.
>
> The test did the following in userspace:
>
>         tx.modes = ADJ_STATUS;
>         tx.status = STA_INS;
>
>         /* send leap second request */
>         ret = adjtimex(&tx);
>
>         /* Check adjtimex output every half second */
>         now = tx.time.tv_sec;
>         while (now < next_leap+2) {
>                 char buf[26];
>                 ret = adjtimex(&tx);
>
>                 ctime_r(&tx.time.tv_sec, buf);
>                 buf[strlen(buf)-1] = 0; /*remove trailing\n */
>
>                 printf("%s + %6ld us\t%s\n",
>                                 buf,
>                                 tx.time.tv_usec,
>                                 time_state_str(ret));
>                 now = tx.time.tv_sec;
>                 /* Sleep for another half second */
>                 ts.tv_sec = 0;
>                 ts.tv_nsec = NSEC_PER_SEC/2;
>                 clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
>         }
>
> which was intended to mimic the insertion of a leap second.  A
> successful run of the test would result in the time_state transitioning
> from TIME_OK to TIME_INS, then to TIME_OOP when the leap second was
> inserted, and then to TIME_WAIT when the leap second was completed.  While
> running this code failures were seen in which the time_state remained TIME_INS,
> even though the leap second had occurred.
>
> After some investigation it was noted that the test contained a small error:
> the test does not reinitialize tx.status and reissues the STA_INS every
> 1/2 second.  As a result of this broken test, the following failure was noticed
> (the output below is a mix of kernel messages and the output from the test
> program, the remaining annotations are printk's in the code and my own
> additional notes):
>
> [  942.952833] time_state [1] change from TIME_OK to TIME_INS
>
> Fri Feb 13 18:59:51 2015 + 318126 us    TIME_INS
> Fri Feb 13 18:59:51 2015 + 818167 us    TIME_INS
> Fri Feb 13 18:59:52 2015 + 318208 us    TIME_INS
> Fri Feb 13 18:59:52 2015 + 818248 us    TIME_INS
> Fri Feb 13 18:59:53 2015 + 318290 us    TIME_INS
> Fri Feb 13 18:59:53 2015 + 818331 us    TIME_INS
> Fri Feb 13 18:59:54 2015 + 318372 us    TIME_INS
> Fri Feb 13 18:59:54 2015 + 818413 us    TIME_INS
> Fri Feb 13 18:59:55 2015 + 318454 us    TIME_INS
> Fri Feb 13 18:59:55 2015 + 818495 us    TIME_INS
> Fri Feb 13 18:59:56 2015 + 318534 us    TIME_INS
> Fri Feb 13 18:59:56 2015 + 818575 us    TIME_INS
> Fri Feb 13 18:59:57 2015 + 318617 us    TIME_INS
> Fri Feb 13 18:59:57 2015 + 818660 us    TIME_INS
> Fri Feb 13 18:59:58 2015 + 318702 us    TIME_INS
> Fri Feb 13 18:59:58 2015 + 818744 us    TIME_INS
> Fri Feb 13 18:59:59 2015 + 318785 us    TIME_INS
> Fri Feb 13 18:59:59 2015 + 818837 us    TIME_INS
>
> [  952.953143] time_state [4] change from TIME_INS to TIME_OOP
> [  952.953150] Clock: inserting leap second 23:59:60 UTC
> [  953.299905] process_adj_status: insert_leap_sec[1223] setting time_state back
> to TIME_OK [1, 1]   <<< adjtimex() call every 1/2 second
> [  953.299913] time_state [9] change from TIME_OOP to TIME_OK
>
> Fri Feb 13 18:59:59 2015 + 318878 us    TIME_OK
> Fri Feb 13 18:59:59 2015 + 818931 us    TIME_OK
>
> [  954.064237] time_state [1] change from TIME_OK to TIME_INS
>
> Fri Feb 13 19:00:00 2015 + 318972 us    TIME_INS
> Fri Feb 13 19:00:00 2015 + 819012 us    TIME_INS
> Fri Feb 13 19:00:01 2015 + 319051 us    TIME_INS
> Fri Feb 13 19:00:01 2015 + 819089 us    TIME_INS
> Fri Feb 13 19:00:02 2015 + 319128 us    TIME_INS
>
> As previously stated, the time_state remains TIME_INS even though the leap
> second has already occurred @ 952.953150.
>
> The test was changed to reset tx.status to 0 in the loop, and the test then
> succeeded with a 100% rate with the time state ending in TIME_WAIT.
>
> While this is highly unlikely to ever happen in the real world it is
> still something we should protect against, as breaking the state machine
> is bad.
>
> If the time_state == TIME_OOP (ie, the leap second is in progress) do not
> allow an external update to time_state in process_adj_status().  This will
> prevent external adjtimex() calls from breaking the leap second state
> machine.
>
> [v2]: Only block time_state change when TIME_OOP
> [v3]: Write a much more detailed explanation of the bug.


Ok, thanks for the more verbose explanation. Although this is more a
history of what you've seen rather then the crux of the change.

To distill this down just a bit, the point is the usual mode for NTP
time_state machine looks like:

TIME_OK -> TIME_INS -> TIME_OOP
  |                       |
  v                       v
TIME_DEL ------------> TIME_WAIT  -(back)-> TIME_OK

(hopefully the ascii art survives here)

Now, from any of these states, currently if adjtimex is called w/ the
STA_PLL bit cleared (after STA_PLL was set), we reset back to TIME_OK,
effectively cancelling any transitions. (You'll have to imagine a line
from any of the states back to TIME_OK, since that's going to be too
ugly to do in ascii)

Your patch is trying to remove the line back from TIME_OOP back to
TIME_OK. Basically stopping the ability to reset the ntp state during
a leapsecond.

I do get that the behavior seen was strange due to a bug in the test
code which caused unexpected cancellation of state, but I'm not sure
if we should change the behavior to enforce that cancellation not be
possible. I could imagine some logic which really wants to reset the
state, which just by chance lands during a leap second, and the
application is confused since the state change didn't occur as
expected.

So I guess I'm not seeing that the state machine is actually "broken"
in this case that you've outlined.  If you can articulate better why
the OOP -> OK transition is truly invalid, I'd be interested in
hearing, but I'm not sure I want to risk a behavioral change unless
there's wide agreement.

thanks
-john

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]
  2015-02-17 23:16 ` John Stultz
@ 2015-02-18 17:14   ` Jiri Bohac
  2015-02-18 17:38     ` Jiri Bohac
  2015-02-20 14:12   ` Prarit Bhargava
  1 sibling, 1 reply; 8+ messages in thread
From: Jiri Bohac @ 2015-02-18 17:14 UTC (permalink / raw)
  To: John Stultz, Roman Zippel
  Cc: Prarit Bhargava, lkml, Thomas Gleixner, Miroslav Lichvar, Peter Zijlstra

On Tue, Feb 17, 2015 at 03:16:18PM -0800, John Stultz wrote:
> Ok, thanks for the more verbose explanation. Although this is more a
> history of what you've seen rather then the crux of the change.
> 
> To distill this down just a bit, the point is the usual mode for NTP
> time_state machine looks like:
> 
> TIME_OK -> TIME_INS -> TIME_OOP
>   |                       |
>   v                       v
> TIME_DEL ------------> TIME_WAIT  -(back)-> TIME_OK
> 
> (hopefully the ascii art survives here)
> 
> Now, from any of these states, currently if adjtimex is called w/ the
> STA_PLL bit cleared (after STA_PLL was set), we reset back to TIME_OK,
> effectively cancelling any transitions. (You'll have to imagine a line
> from any of the states back to TIME_OK, since that's going to be too
> ugly to do in ascii)
> 
> Your patch is trying to remove the line back from TIME_OOP back to
> TIME_OK. Basically stopping the ability to reset the ntp state during
> a leapsecond.
> 
> I do get that the behavior seen was strange due to a bug in the test
> code which caused unexpected cancellation of state, but I'm not sure
> if we should change the behavior to enforce that cancellation not be
> possible. I could imagine some logic which really wants to reset the
> state, which just by chance lands during a leap second, and the
> application is confused since the state change didn't occur as
> expected.
> 
> So I guess I'm not seeing that the state machine is actually "broken"
> in this case that you've outlined.  If you can articulate better why
> the OOP -> OK transition is truly invalid, I'd be interested in
> hearing, but I'm not sure I want to risk a behavioral change unless
> there's wide agreement.

I think the only real problem occurs when the adjtimex is called in the
the TIME_OOP state with STA_PLL cleared _and_ STA_INS set.
In this case the state machine is reset to TIME_OK but goes back
to TIME_INS on the next second_overflow, potentially causing
another false leap second to be inserted on the following
midnight.

The state machine is meant to only go back to TIME_INS once STA_INS is
cleared and then set again - this is what the TIME_WAIT state is
for.

In fact, I don't see a reason why the STA_PLL -> !STA_PLL transition should 
ever set the time_state to TIME_OK.
- When the STA_INS/STA_DEL flag is removed from the status, the state
  machine will end up in TIME_OK from any state.
- When STA_INS/STA_DEL is set in
  the status, the state mchine will transition from TIME_OK to
  TIME_INS/TIME_DEL anyway.

I think the "time_status = TIME_OK" should be just dropped.

It has been added by eea83d896e318bda54be2d2770d2c5d6668d11db
(ntp: NTP4 user space bits update) and it's not clear why.
Roman?


-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]
  2015-02-18 17:14   ` Jiri Bohac
@ 2015-02-18 17:38     ` Jiri Bohac
  0 siblings, 0 replies; 8+ messages in thread
From: Jiri Bohac @ 2015-02-18 17:38 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: John Stultz, Roman Zippel, Prarit Bhargava, lkml,
	Thomas Gleixner, Miroslav Lichvar, Peter Zijlstra

On Wed, Feb 18, 2015 at 06:14:04PM +0100, Jiri Bohac wrote:
> I think the only real problem occurs when the adjtimex is called in the
> the TIME_OOP state 

... or the TIME_WAIT state ... 

> with STA_PLL cleared  _and_ STA_INS set.
> In this case the state machine is reset to TIME_OK but goes back
> to TIME_INS on the next second_overflow, potentially causing
> another false leap second to be inserted on the following
> midnight.
> 
> The state machine is meant to only go back to TIME_INS once STA_INS is
> cleared and then set again - this is what the TIME_WAIT state is
> for.
> 
> In fact, I don't see a reason why the STA_PLL -> !STA_PLL transition should 
> ever set the time_state to TIME_OK.
> - When the STA_INS/STA_DEL flag is removed from the status, the state
>   machine will end up in TIME_OK from any state.
> - When STA_INS/STA_DEL is set in
>   the status, the state mchine will transition from TIME_OK to
>   TIME_INS/TIME_DEL anyway.
> 
> I think the "time_status = TIME_OK" should be just dropped.
> 
> It has been added by eea83d896e318bda54be2d2770d2c5d6668d11db
> (ntp: NTP4 user space bits update) and it's not clear why.
> Roman?

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]
  2015-02-12 13:58 [PATCH] time, ntp: Do not update time_state in middle of leap second [v3] Prarit Bhargava
  2015-02-17 23:16 ` John Stultz
@ 2015-02-19 17:00 ` Jiri Bohac
  2015-02-20 14:15   ` Prarit Bhargava
  1 sibling, 1 reply; 8+ messages in thread
From: Jiri Bohac @ 2015-02-19 17:00 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Miroslav Lichvar,
	Peter Zijlstra

Hi,

I'm trying to understand what exactly is going on here...

On Thu, Feb 12, 2015 at 08:58:19AM -0500, Prarit Bhargava wrote:
> The test did the following in userspace:
> 
>         tx.modes = ADJ_STATUS;
>         tx.status = STA_INS;
> 
> 	/* send leap second request */
>         ret = adjtimex(&tx);
> 
>         /* Check adjtimex output every half second */
>         now = tx.time.tv_sec;
>         while (now < next_leap+2) {
>                 char buf[26];
>                 ret = adjtimex(&tx);
> 
>                 ctime_r(&tx.time.tv_sec, buf);
>                 buf[strlen(buf)-1] = 0; /*remove trailing\n */
> 
>                 printf("%s + %6ld us\t%s\n",
>                                 buf,
>                                 tx.time.tv_usec,
>                                 time_state_str(ret));
>                 now = tx.time.tv_sec;
>                 /* Sleep for another half second */
>                 ts.tv_sec = 0;
>                 ts.tv_nsec = NSEC_PER_SEC/2;
>                 clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
>         }
> 
> After some investigation it was noted that the test contained a small error:
> the test does not reinitialize tx.status and reissues the STA_INS every
> 1/2 second.

Prarit, can you explain who sets the STA_PLL flag, so that
process_adj_status() detects a STA_PLL->!STA_PLL transition and
goes to the branch that sets time_state = TIME_OK?

Is that ntpd running in parallel with your test program?  If that
is the case, you would eventually end up oscilating between the
the TIME_INS and TIME_OK states anyway, even with your patch.
ntpd will clear the STA_INS flag after midnight, the state
machine will transition from TIME_WAIT to TIME_OK and your test
program will set STA_INS again (fighting with ntpd which will
clear the flag from time to time) ...  right?

Thanks,

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]
  2015-02-17 23:16 ` John Stultz
  2015-02-18 17:14   ` Jiri Bohac
@ 2015-02-20 14:12   ` Prarit Bhargava
  1 sibling, 0 replies; 8+ messages in thread
From: Prarit Bhargava @ 2015-02-20 14:12 UTC (permalink / raw)
  To: John Stultz; +Cc: lkml, Thomas Gleixner, Miroslav Lichvar, Peter Zijlstra



On 02/17/2015 06:16 PM, John Stultz wrote:
> On Thu, Feb 12, 2015 at 5:58 AM, Prarit Bhargava <prarit@redhat.com> wrote:

>>
>> which was intended to mimic the insertion of a leap second.  A
>> successful run of the test would result in the time_state transitioning
>> from TIME_OK to TIME_INS, then to TIME_OOP when the leap second was
>> inserted, and then to TIME_WAIT when the leap second was completed.  While
>> running this code failures were seen in which the time_state remained TIME_INS,
>> even though the leap second had occurred.
>>
> 
> 
> Ok, thanks for the more verbose explanation. Although this is more a
> history of what you've seen rather then the crux of the change.
> 
> To distill this down just a bit, the point is the usual mode for NTP
> time_state machine looks like:
> 
> TIME_OK -> TIME_INS -> TIME_OOP
>   |                       |
>   v                       v
> TIME_DEL ------------> TIME_WAIT  -(back)-> TIME_OK
> 
> (hopefully the ascii art survives here)
> 
> Now, from any of these states, currently if adjtimex is called w/ the
> STA_PLL bit cleared (after STA_PLL was set), we reset back to TIME_OK,
> effectively cancelling any transitions. (You'll have to imagine a line
> from any of the states back to TIME_OK, since that's going to be too
> ugly to do in ascii)
> 
> Your patch is trying to remove the line back from TIME_OOP back to
> TIME_OK. Basically stopping the ability to reset the ntp state during
> a leapsecond.

Correct.

> 
> I do get that the behavior seen was strange due to a bug in the test
> code which caused unexpected cancellation of state, but I'm not sure
> if we should change the behavior to enforce that cancellation not be
> possible. I could imagine some logic which really wants to reset the
> state, which just by chance lands during a leap second, and the
> application is confused since the state change didn't occur as
> expected.

I think setting it in the middle of the leap second should be a NOOP.  We all
know how fragile this code has been in the past and allowing a state transition
at that particular time isn't a good idea given the outcome that the state may
remain TIME_INS.

> 
> So I guess I'm not seeing that the state machine is actually "broken"
> in this case that you've outlined.  If you can articulate better why
> the OOP -> OK transition is truly invalid, I'd be interested in
> hearing, but I'm not sure I want to risk a behavioral change unless
> there's wide agreement.

I understand -- After thinking about it from your point of view I agree that
calling it "broken" is not right.  Perhaps a better way of looking at it is, as
you also point out, if OOP -> OK is truly valid.

P.

> 
> thanks
> -john
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]
  2015-02-19 17:00 ` Jiri Bohac
@ 2015-02-20 14:15   ` Prarit Bhargava
  2015-02-20 17:19     ` Jiri Bohac
  0 siblings, 1 reply; 8+ messages in thread
From: Prarit Bhargava @ 2015-02-20 14:15 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Miroslav Lichvar,
	Peter Zijlstra



On 02/19/2015 12:00 PM, Jiri Bohac wrote:
> Hi,
> 
> I'm trying to understand what exactly is going on here...
> 
> On Thu, Feb 12, 2015 at 08:58:19AM -0500, Prarit Bhargava wrote:
>> The test did the following in userspace:
>>
>>         tx.modes = ADJ_STATUS;
>>         tx.status = STA_INS;
>>
>> 	/* send leap second request */
>>         ret = adjtimex(&tx);
>>
>>         /* Check adjtimex output every half second */
>>         now = tx.time.tv_sec;
>>         while (now < next_leap+2) {
>>                 char buf[26];
>>                 ret = adjtimex(&tx);
>>
>>                 ctime_r(&tx.time.tv_sec, buf);
>>                 buf[strlen(buf)-1] = 0; /*remove trailing\n */
>>
>>                 printf("%s + %6ld us\t%s\n",
>>                                 buf,
>>                                 tx.time.tv_usec,
>>                                 time_state_str(ret));
>>                 now = tx.time.tv_sec;
>>                 /* Sleep for another half second */
>>                 ts.tv_sec = 0;
>>                 ts.tv_nsec = NSEC_PER_SEC/2;
>>                 clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
>>         }
>>
>> After some investigation it was noted that the test contained a small error:
>> the test does not reinitialize tx.status and reissues the STA_INS every
>> 1/2 second.
> 
> Prarit, can you explain who sets the STA_PLL flag, so that
> process_adj_status() detects a STA_PLL->!STA_PLL transition and
> goes to the branch that sets time_state = TIME_OK?

Jiri,

The test being run is:

https://github.com/johnstultz-work/timetests/blob/master/leap-a-day.c

prior to commit

https://github.com/johnstultz-work/timetests/commit/be4526e8b5d48cd108a8d2cf7f5c8fd763acf421

> 
> Is that ntpd running in parallel with your test program?  If that

No -- ntpd is disabled (chronyd in the case of systemd + current Fedora).

P.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]
  2015-02-20 14:15   ` Prarit Bhargava
@ 2015-02-20 17:19     ` Jiri Bohac
  0 siblings, 0 replies; 8+ messages in thread
From: Jiri Bohac @ 2015-02-20 17:19 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Jiri Bohac, linux-kernel, John Stultz, Thomas Gleixner,
	Miroslav Lichvar, Peter Zijlstra

On Fri, Feb 20, 2015 at 09:15:23AM -0500, Prarit Bhargava wrote:
> On 02/19/2015 12:00 PM, Jiri Bohac wrote:
> > Prarit, can you explain who sets the STA_PLL flag, so that
> > process_adj_status() detects a STA_PLL->!STA_PLL transition and
> > goes to the branch that sets time_state = TIME_OK?
> 
> Jiri,
> 
> The test being run is:
> 
> https://github.com/johnstultz-work/timetests/blob/master/leap-a-day.c
> 
> prior to commit
> 
> https://github.com/johnstultz-work/timetests/commit/be4526e8b5d48cd108a8d2cf7f5c8fd763acf421


I can't make sense of the output of your test:

On Thu, Feb 12, 2015 at 08:58:19AM -0500, Prarit Bhargava wrote:
> [  942.952833] time_state [1] change from TIME_OK to TIME_INS
> 
> Fri Feb 13 18:59:51 2015 + 318126 us    TIME_INS
> Fri Feb 13 18:59:51 2015 + 818167 us    TIME_INS
> Fri Feb 13 18:59:52 2015 + 318208 us    TIME_INS
> Fri Feb 13 18:59:52 2015 + 818248 us    TIME_INS
> Fri Feb 13 18:59:53 2015 + 318290 us    TIME_INS
> Fri Feb 13 18:59:53 2015 + 818331 us    TIME_INS
> Fri Feb 13 18:59:54 2015 + 318372 us    TIME_INS
> Fri Feb 13 18:59:54 2015 + 818413 us    TIME_INS
> Fri Feb 13 18:59:55 2015 + 318454 us    TIME_INS
> Fri Feb 13 18:59:55 2015 + 818495 us    TIME_INS
> Fri Feb 13 18:59:56 2015 + 318534 us    TIME_INS
> Fri Feb 13 18:59:56 2015 + 818575 us    TIME_INS

Why did the test program print the above lines? It's supposed to
sleep until 3 seconds prior to the midnight:

	/* Wake up 3 seconds before leap */
	ts.tv_sec = next_leap - 3;
	ts.tv_nsec = 0;
	while(clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &ts, NULL))
		printf("Something woke us up, returning to sleep\n");


> Fri Feb 13 18:59:57 2015 + 318617 us    TIME_INS
> Fri Feb 13 18:59:57 2015 + 818660 us    TIME_INS
> Fri Feb 13 18:59:58 2015 + 318702 us    TIME_INS
> Fri Feb 13 18:59:58 2015 + 818744 us    TIME_INS
> Fri Feb 13 18:59:59 2015 + 318785 us    TIME_INS
> Fri Feb 13 18:59:59 2015 + 818837 us    TIME_INS
> 
> [  952.953143] time_state [4] change from TIME_INS to TIME_OOP
> [  952.953150] Clock: inserting leap second 23:59:60 UTC
> [  953.299905] process_adj_status: insert_leap_sec[1223] setting time_state back
> to TIME_OK [1, 1]   <<< adjtimex() call every 1/2 second
> [  953.299913] time_state [9] change from TIME_OOP to TIME_OK

2) The only place where the test program sets STA_PLL is in 
clear_time_state(); It clears it right after that.

clear_time_state() is not called inside the while "(now < next_leap+2)" loop,
except in the SIGINT/SIGKILL handler. Did you send signals to the program
at this point?

If not, I can't understand how the status went from STA_PLL to !STA_PLL
and thus why time_state went to TIME_OK

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-02-20 17:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-12 13:58 [PATCH] time, ntp: Do not update time_state in middle of leap second [v3] Prarit Bhargava
2015-02-17 23:16 ` John Stultz
2015-02-18 17:14   ` Jiri Bohac
2015-02-18 17:38     ` Jiri Bohac
2015-02-20 14:12   ` Prarit Bhargava
2015-02-19 17:00 ` Jiri Bohac
2015-02-20 14:15   ` Prarit Bhargava
2015-02-20 17:19     ` Jiri Bohac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.