All of lore.kernel.org
 help / color / mirror / Atom feed
* phc2sys - does it work?
@ 2020-07-25 12:49 Russell King - ARM Linux admin
  2020-07-25 13:29 ` Vladimir Oltean
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux admin @ 2020-07-25 12:49 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev

Hi,

I've been writing another PTP clock driver, and I'm wondering whether
phc2sys is actually working correctly.

I'm running it with: phc2sys -c /dev/ptp1 -s CLOCK_REALTIME -q -m -O 0
and I have additional pr_info() to debug in the clock driver.

What I see is the "sys offset" that phc2sys comes out with doesn't
seem to make much sense:

kt: 000000005f1c273ds 371ce3f5ns t: 00005f1c273ds 374c6cf5.2ae8ac76ns
kt: 000000005f1c273ds 377bf04cns t: 00005f1c273ds 37ab792b.4ef91e82ns
kt: 000000005f1c273ds 37daf7ccns t: 00005f1c273ds 380a80c3.5d12653ans
kt: 000000005f1c273ds 383a143cns t: 00005f1c273ds 38699d5c.cf1319f2ns
kt: 000000005f1c273ds 38992094ns t: 00005f1c273ds 38c8a9c8.f4247162ns
kt: 000000005f1c273ds 38f82d13ns t: 00005f1c273ds 3927b640.196edf5ans
kt: 000000005f1c273ds 3957323bns t: 00005f1c273ds 3986bb74.1c28a8fans
kt: 000000005f1c273ds 39b643e3ns t: 00005f1c273ds 39e5cd56.5b34c14ens
kt: 000000005f1c273ds 3a155d5ans t: 00005f1c273ds 3a44e6bf.be0b79e6ns
kt: 000000005f1c273ds 3a746fcans t: 00005f1c273ds 3aa3f943.001a4266ns
phc2sys[127.224]: /dev/ptp1 sys offset      5788 s2 freq  -69793 delay 6229046

Here, ktime_real (kt) is behind the ptp timestamp (t), and we have a
positive "sys offset".  This continues for a while:

kt: 000000005f1c2743s 0e1c25bdns t: 00005f1c2743s 0e4ba86d.91ebf06ans
kt: 000000005f1c2743s 0e7b3801ns t: 00005f1c2743s 0eaaba44.601c1f30ns
kt: 000000005f1c2743s 0eda4225ns t: 00005f1c2743s 0f09c4c3.0fa0d134ns
kt: 000000005f1c2743s 0f395a82ns t: 00005f1c2743s 0f68dd0a.f8abbf48ns
kt: 000000005f1c2743s 0f986abens t: 00005f1c2743s 0fc7ed41.c00f543cns
kt: 000000005f1c2743s 0ff773cans t: 00005f1c2743s 1026f675.6a32920cns
kt: 000000005f1c2743s 1056784ens t: 00005f1c2743s 1085fad9.003b24aans
kt: 000000005f1c2743s 10b57fcans t: 00005f1c2743s 10e50267.a378bd3cns
kt: 000000005f1c2743s 111481f6ns t: 00005f1c2743s 1144047b.2fde6accns
kt: 000000005f1c2743s 117387b9ns t: 00005f1c2743s 11a30a5c.cc1d52b4ns
phc2sys[132.536]: /dev/ptp1 sys offset      4882 s2 freq  -62617 delay 6227425

kt is still behind t, and we still have a positive "sys offset".

kt: 000000005f1c2744s 11d56067ns t: 00005f1c2744s 120473bd.c6a08dbfns
kt: 000000005f1c2744s 12346f54ns t: 00005f1c2744s 1263806f.d008fce3ns
kt: 000000005f1c2744s 12937d7ans t: 00005f1c2744s 12c28c2b.d57fe555ns
kt: 000000005f1c2744s 12f29f00ns t: 00005f1c2744s 1321ab52.2aa69e70ns
kt: 000000005f1c2744s 1351aedens t: 00005f1c2744s 1380b903.38258359ns
kt: 000000005f1c2744s 13b0c614ns t: 00005f1c2744s 13dfcdef.635079dbns
kt: 000000005f1c2744s 140fd641ns t: 00005f1c2744s 143edba0.70cf5ec4ns
kt: 000000005f1c2744s 146ee8c7ns t: 00005f1c2744s 149debf2.8913fe8dns
kt: 000000005f1c2744s 14cdf64dns t: 00005f1c2744s 14fcf6d6.8b147d37ns
kt: 000000005f1c2744s 152d1233ns t: 00005f1c2744s 155c10cf.caf99eb8ns
phc2sys[133.599]: /dev/ptp1 sys offset    -25014 s2 freq  -91049 delay 6229170

kt is still behind t, but now we have a negative "sys offset" ?

kt: 000000005f1c2745s 158f0f04ns t: 00005f1c2745s 15bd0ce3.e88dec93ns
kt: 000000005f1c2745s 15ee1e79ns t: 00005f1c2745s 161c1aa9.0c161488ns
kt: 000000005f1c2745s 16503ebcns t: 00005f1c2745s 167e3973.7b6c4843ns
kt: 000000005f1c2745s 16b0ebb1ns t: 00005f1c2745s 16dee4f2.436c858dns
kt: 000000005f1c2745s 1711806dns t: 00005f1c2745s 173f7822.7a65ccddns
kt: 000000005f1c2745s 177215f1ns t: 00005f1c2745s 17a00c04.b57f3ef8ns
kt: 000000005f1c2745s 17d15448ns t: 00005f1c2745s 17ff48bd.f1275cb3ns
kt: 000000005f1c2745s 1830735dns t: 00005f1c2745s 185e6670.73b72a47ns
kt: 000000005f1c2745s 188f95bbns t: 00005f1c2745s 18bd8724.082da8dbns
kt: 000000005f1c2745s 18ee9e77ns t: 00005f1c2745s 191c8e4e.044592dcns
phc2sys[134.662]: /dev/ptp1 sys offset    -98237 s2 freq -171776 delay 6227754

... and an even bigger negative "sys offset" but kt is still behind t.

kt: 000000005f1c2746s 16ad1681ns t: 00005f1c2746s 16db503d.5a2783d6ns
kt: 000000005f1c2746s 170f8b80ns t: 00005f1c2746s 173dc5a6.91660baans
kt: 000000005f1c2746s 176ed88fns t: 00005f1c2746s 179d1366.45f7dc3ans
kt: 000000005f1c2746s 17cdebcdns t: 00005f1c2746s 17fc2725.6da454bbns
kt: 000000005f1c2746s 182cfb23ns t: 00005f1c2746s 185b371b.6ab43403ns
kt: 000000005f1c2746s 188c0208ns t: 00005f1c2746s 18ba3e8f.07fd0ee9ns
kt: 000000005f1c2746s 18eb07fdns t: 00005f1c2746s 1919451f.9b3f2f2bns
kt: 000000005f1c2746s 194a13e3ns t: 00005f1c2746s 19785178.6fad0c97ns
kt: 000000005f1c2746s 19a915c8ns t: 00005f1c2746s 19d753fa.d549cdabns
phc2sys[135.674]: /dev/ptp1 sys offset    -77622 s2 freq -180632 delay 6226562

... same story.

I added the debug (which dramatically increased delay) because I notice
that phc2sys exhibits random sudden jumps in the "sys offset" value.
I've noticed it with this driver (which, without the debug, reports a
delay of around 5000) and also with the Marvell PHY PTP driver.  I had
put the Marvell PHY PTP driver instability down to other MDIO bus
activity, as the delay would increase, but that is not the case here.

There _is_ something odd going on with the adjfine adjustment, but I
can't fathom that (which is another reason for adding the above debug.)

If I undo some of the debug, this is the kind of thing I see:

phc2sys[20.697]: /dev/ptp1 sys offset         2 s2 freq  -25586 delay 5244
phc2sys[21.698]: /dev/ptp1 sys offset        17 s2 freq  -25570 delay 5262
phc2sys[22.698]: /dev/ptp1 sys offset       -11 s2 freq  -25593 delay 5250
phc2sys[23.698]: /dev/ptp1 sys offset       -14 s2 freq  -25600 delay 5265
phc2sys[24.698]: /dev/ptp1 sys offset       -17 s2 freq  -25607 delay 5250
phc2sys[25.698]: /dev/ptp1 sys offset        64 s2 freq  -25531 delay 5244
phc2sys[26.698]: /dev/ptp1 sys offset        -9 s2 freq  -25585 delay 5251
phc2sys[27.699]: /dev/ptp1 sys offset       -44 s2 freq  -25622 delay 5250
phc2sys[28.699]: /dev/ptp1 sys offset        35 s2 freq  -25557 delay 5262
phc2sys[29.699]: /dev/ptp1 sys offset   -433522 s2 freq -459103 delay 5256
phc2sys[30.699]: /dev/ptp1 sys offset   -500029 s2 freq -655667 delay 5228
phc2sys[31.700]: /dev/ptp1 sys offset   -369958 s2 freq -675604 delay 5259

Notice the sudden massive jump in sys offset.

Any ideas?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: phc2sys - does it work?
  2020-07-25 12:49 phc2sys - does it work? Russell King - ARM Linux admin
@ 2020-07-25 13:29 ` Vladimir Oltean
  2020-07-26 11:01   ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 7+ messages in thread
From: Vladimir Oltean @ 2020-07-25 13:29 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: Richard Cochran, netdev

On Sat, Jul 25, 2020 at 01:49:27PM +0100, Russell King - ARM Linux admin wrote:
> Hi,
> 
> I've been writing another PTP clock driver, and I'm wondering whether
> phc2sys is actually working correctly.
> 
> I'm running it with: phc2sys -c /dev/ptp1 -s CLOCK_REALTIME -q -m -O 0
> and I have additional pr_info() to debug in the clock driver.
> 
> What I see is the "sys offset" that phc2sys comes out with doesn't
> seem to make much sense:
> 
> kt: 000000005f1c273ds 371ce3f5ns t: 00005f1c273ds 374c6cf5.2ae8ac76ns
> kt: 000000005f1c273ds 377bf04cns t: 00005f1c273ds 37ab792b.4ef91e82ns
> kt: 000000005f1c273ds 37daf7ccns t: 00005f1c273ds 380a80c3.5d12653ans
> kt: 000000005f1c273ds 383a143cns t: 00005f1c273ds 38699d5c.cf1319f2ns
> kt: 000000005f1c273ds 38992094ns t: 00005f1c273ds 38c8a9c8.f4247162ns
> kt: 000000005f1c273ds 38f82d13ns t: 00005f1c273ds 3927b640.196edf5ans
> kt: 000000005f1c273ds 3957323bns t: 00005f1c273ds 3986bb74.1c28a8fans
> kt: 000000005f1c273ds 39b643e3ns t: 00005f1c273ds 39e5cd56.5b34c14ens
> kt: 000000005f1c273ds 3a155d5ans t: 00005f1c273ds 3a44e6bf.be0b79e6ns
> kt: 000000005f1c273ds 3a746fcans t: 00005f1c273ds 3aa3f943.001a4266ns
> phc2sys[127.224]: /dev/ptp1 sys offset      5788 s2 freq  -69793 delay 6229046
> 
> Here, ktime_real (kt) is behind the ptp timestamp (t), and we have a
> positive "sys offset".  This continues for a while:
> 
> kt: 000000005f1c2743s 0e1c25bdns t: 00005f1c2743s 0e4ba86d.91ebf06ans
> kt: 000000005f1c2743s 0e7b3801ns t: 00005f1c2743s 0eaaba44.601c1f30ns
> kt: 000000005f1c2743s 0eda4225ns t: 00005f1c2743s 0f09c4c3.0fa0d134ns
> kt: 000000005f1c2743s 0f395a82ns t: 00005f1c2743s 0f68dd0a.f8abbf48ns
> kt: 000000005f1c2743s 0f986abens t: 00005f1c2743s 0fc7ed41.c00f543cns
> kt: 000000005f1c2743s 0ff773cans t: 00005f1c2743s 1026f675.6a32920cns
> kt: 000000005f1c2743s 1056784ens t: 00005f1c2743s 1085fad9.003b24aans
> kt: 000000005f1c2743s 10b57fcans t: 00005f1c2743s 10e50267.a378bd3cns
> kt: 000000005f1c2743s 111481f6ns t: 00005f1c2743s 1144047b.2fde6accns
> kt: 000000005f1c2743s 117387b9ns t: 00005f1c2743s 11a30a5c.cc1d52b4ns
> phc2sys[132.536]: /dev/ptp1 sys offset      4882 s2 freq  -62617 delay 6227425
> 
> kt is still behind t, and we still have a positive "sys offset".
> 
> kt: 000000005f1c2744s 11d56067ns t: 00005f1c2744s 120473bd.c6a08dbfns
> kt: 000000005f1c2744s 12346f54ns t: 00005f1c2744s 1263806f.d008fce3ns
> kt: 000000005f1c2744s 12937d7ans t: 00005f1c2744s 12c28c2b.d57fe555ns
> kt: 000000005f1c2744s 12f29f00ns t: 00005f1c2744s 1321ab52.2aa69e70ns
> kt: 000000005f1c2744s 1351aedens t: 00005f1c2744s 1380b903.38258359ns
> kt: 000000005f1c2744s 13b0c614ns t: 00005f1c2744s 13dfcdef.635079dbns
> kt: 000000005f1c2744s 140fd641ns t: 00005f1c2744s 143edba0.70cf5ec4ns
> kt: 000000005f1c2744s 146ee8c7ns t: 00005f1c2744s 149debf2.8913fe8dns
> kt: 000000005f1c2744s 14cdf64dns t: 00005f1c2744s 14fcf6d6.8b147d37ns
> kt: 000000005f1c2744s 152d1233ns t: 00005f1c2744s 155c10cf.caf99eb8ns
> phc2sys[133.599]: /dev/ptp1 sys offset    -25014 s2 freq  -91049 delay 6229170
> 
> kt is still behind t, but now we have a negative "sys offset" ?
> 
> kt: 000000005f1c2745s 158f0f04ns t: 00005f1c2745s 15bd0ce3.e88dec93ns
> kt: 000000005f1c2745s 15ee1e79ns t: 00005f1c2745s 161c1aa9.0c161488ns
> kt: 000000005f1c2745s 16503ebcns t: 00005f1c2745s 167e3973.7b6c4843ns
> kt: 000000005f1c2745s 16b0ebb1ns t: 00005f1c2745s 16dee4f2.436c858dns
> kt: 000000005f1c2745s 1711806dns t: 00005f1c2745s 173f7822.7a65ccddns
> kt: 000000005f1c2745s 177215f1ns t: 00005f1c2745s 17a00c04.b57f3ef8ns
> kt: 000000005f1c2745s 17d15448ns t: 00005f1c2745s 17ff48bd.f1275cb3ns
> kt: 000000005f1c2745s 1830735dns t: 00005f1c2745s 185e6670.73b72a47ns
> kt: 000000005f1c2745s 188f95bbns t: 00005f1c2745s 18bd8724.082da8dbns
> kt: 000000005f1c2745s 18ee9e77ns t: 00005f1c2745s 191c8e4e.044592dcns
> phc2sys[134.662]: /dev/ptp1 sys offset    -98237 s2 freq -171776 delay 6227754
> 
> ... and an even bigger negative "sys offset" but kt is still behind t.
> 
> kt: 000000005f1c2746s 16ad1681ns t: 00005f1c2746s 16db503d.5a2783d6ns
> kt: 000000005f1c2746s 170f8b80ns t: 00005f1c2746s 173dc5a6.91660baans
> kt: 000000005f1c2746s 176ed88fns t: 00005f1c2746s 179d1366.45f7dc3ans
> kt: 000000005f1c2746s 17cdebcdns t: 00005f1c2746s 17fc2725.6da454bbns
> kt: 000000005f1c2746s 182cfb23ns t: 00005f1c2746s 185b371b.6ab43403ns
> kt: 000000005f1c2746s 188c0208ns t: 00005f1c2746s 18ba3e8f.07fd0ee9ns
> kt: 000000005f1c2746s 18eb07fdns t: 00005f1c2746s 1919451f.9b3f2f2bns
> kt: 000000005f1c2746s 194a13e3ns t: 00005f1c2746s 19785178.6fad0c97ns
> kt: 000000005f1c2746s 19a915c8ns t: 00005f1c2746s 19d753fa.d549cdabns
> phc2sys[135.674]: /dev/ptp1 sys offset    -77622 s2 freq -180632 delay 6226562
> 
> ... same story.
> 
> I added the debug (which dramatically increased delay) because I notice
> that phc2sys exhibits random sudden jumps in the "sys offset" value.
> I've noticed it with this driver (which, without the debug, reports a
> delay of around 5000) and also with the Marvell PHY PTP driver.  I had
> put the Marvell PHY PTP driver instability down to other MDIO bus
> activity, as the delay would increase, but that is not the case here.
> 
> There _is_ something odd going on with the adjfine adjustment, but I
> can't fathom that (which is another reason for adding the above debug.)
> 
> If I undo some of the debug, this is the kind of thing I see:
> 
> phc2sys[20.697]: /dev/ptp1 sys offset         2 s2 freq  -25586 delay 5244
> phc2sys[21.698]: /dev/ptp1 sys offset        17 s2 freq  -25570 delay 5262
> phc2sys[22.698]: /dev/ptp1 sys offset       -11 s2 freq  -25593 delay 5250
> phc2sys[23.698]: /dev/ptp1 sys offset       -14 s2 freq  -25600 delay 5265
> phc2sys[24.698]: /dev/ptp1 sys offset       -17 s2 freq  -25607 delay 5250
> phc2sys[25.698]: /dev/ptp1 sys offset        64 s2 freq  -25531 delay 5244
> phc2sys[26.698]: /dev/ptp1 sys offset        -9 s2 freq  -25585 delay 5251
> phc2sys[27.699]: /dev/ptp1 sys offset       -44 s2 freq  -25622 delay 5250
> phc2sys[28.699]: /dev/ptp1 sys offset        35 s2 freq  -25557 delay 5262
> phc2sys[29.699]: /dev/ptp1 sys offset   -433522 s2 freq -459103 delay 5256
> phc2sys[30.699]: /dev/ptp1 sys offset   -500029 s2 freq -655667 delay 5228
> phc2sys[31.700]: /dev/ptp1 sys offset   -369958 s2 freq -675604 delay 5259
> 
> Notice the sudden massive jump in sys offset.
> 
> Any ideas?
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Just a sanity check: do you have this patch?
https://github.com/richardcochran/linuxptp/commit/e0580929f451e685d92cd10d80b76f39e9b09a97

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: phc2sys - does it work?
  2020-07-25 13:29 ` Vladimir Oltean
@ 2020-07-26 11:01   ` Russell King - ARM Linux admin
  2020-07-26 18:05     ` Richard Cochran
  2020-07-26 19:53     ` Vladimir Oltean
  0 siblings, 2 replies; 7+ messages in thread
From: Russell King - ARM Linux admin @ 2020-07-26 11:01 UTC (permalink / raw)
  To: Vladimir Oltean, Richard Cochran, netdev

On Sat, Jul 25, 2020 at 04:29:16PM +0300, Vladimir Oltean wrote:
> Just a sanity check: do you have this patch?
> https://github.com/richardcochran/linuxptp/commit/e0580929f451e685d92cd10d80b76f39e9b09a97

I did not, as I was running Debian stable's 1.9.2 version, whereas
current git head for linuxptp appears to behave much better.  Thanks.

I've got to the bottom of stuff like:

phc2sys[7190.912]: /dev/ptp1 sys offset        81 s2 freq  -71290 delay    641
phc2sys[7191.912]: /dev/ptp1 sys offset        66 s2 freq  -71281 delay    640
phc2sys[7192.912]: /dev/ptp1 sys offset      -926 s2 freq  -72253 delay    640
phc2sys[7193.912]: /dev/ptp1 sys offset     -8124 s2 freq  -79729 delay    680
phc2sys[7194.912]: /dev/ptp1 sys offset     -7794 s2 freq  -81836 delay    641
phc2sys[7195.913]: /dev/ptp1 sys offset     -5355 s2 freq  -81735 delay    680
phc2sys[7196.913]: /dev/ptp1 sys offset     -2994 s2 freq  -80981 delay    680
phc2sys[7197.913]: /dev/ptp1 sys offset     -1336 s2 freq  -80221 delay    640
phc2sys[7198.913]: /dev/ptp1 sys offset      -422 s2 freq  -79708 delay    640
phc2sys[7199.913]: /dev/ptp1 sys offset        -9 s2 freq  -79421 delay    680
phc2sys[7200.913]: /dev/ptp1 sys offset       159 s2 freq  -79256 delay    640
phc2sys[7201.913]: /dev/ptp1 sys offset       211 s2 freq  -79156 delay    680

This is due to NTP.  Each NTP period (starting at 64s), ntpd updates
the kernel timekeeping variables with the latest information.  One of
these is the offset, which is applied to the kernel's timekeeping by
adjusting the length of a tick:

        /* Compute the phase adjustment for the next second */
        tick_length      = tick_length_base;

        delta            = ntp_offset_chunk(time_offset);
        time_offset     -= delta;
        tick_length     += delta;

This has the effect of slightly changing the length of a second to slew
small adjustments, which appears as a change of frequency compared to
the PTP clock.  As we progress through the NTP period, the amount of
adjustment is reduced (notice that time_offset is reduced.)  When
time_offset hits zero, then no further adjustment is made, and the rate
that the kernel time passes settles - and in turn phc2sys settles to
a stable "freq" figure.

What this means is that synchronising the PTP clock to the kernel time
on a second by second basis exposes the PTP clock to these properties
of the kernel NTP loop, which has the effect of throwing the PTP clock
off by a 10s of PPM.

One way around this would be to synchronise the PTP clock updates with
NTP updates, but that is difficult due to NTP selecting how often it
does its updates - it generally starts off at 64s, and the interval
increases through powers of two.  However, just specifying -R to
phc2sys does not give better results - the amount that the PTP clock
fluctuates just gets larger.

Another solution would be to avoid running NTP on any machine intending
to be the source of PTP time on a network, but that then brings up the
problem that you can't synchronise the PTP time source to a reference
time, which rather makes PTP pointless unless all that you're after is
"all my local machines say the same wrong time."

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: phc2sys - does it work?
  2020-07-26 11:01   ` Russell King - ARM Linux admin
@ 2020-07-26 18:05     ` Richard Cochran
  2020-07-26 21:29       ` Russell King - ARM Linux admin
  2020-07-26 19:53     ` Vladimir Oltean
  1 sibling, 1 reply; 7+ messages in thread
From: Richard Cochran @ 2020-07-26 18:05 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: Vladimir Oltean, netdev

On Sun, Jul 26, 2020 at 12:01:05PM +0100, Russell King - ARM Linux admin wrote:
> Another solution would be to avoid running NTP on any machine intending
> to be the source of PTP time on a network, but that then brings up the
> problem that you can't synchronise the PTP time source to a reference
> time, which rather makes PTP pointless unless all that you're after is
> "all my local machines say the same wrong time."

It is clear that you can't have two services both adjusting the system
time.  For example, running ntpd and chrony on the same machine won't
work, and neither does running ntpd with 'phc2sys -a -r'.

However, if you want to use NTP as the global time source on a PTP GM,
and you have a heterogeneous collection of PHC cards, then you can run

	phc2sys -a -r -r

(note the two -r flags) and ptp4l with

	boundary_clock_jbod	1
	free_running		1
	priority1		100

for example.  After ptp4l starts, it will need to be configured as a
GM, and for that you will need to provide the kernel with the correct
TAI-UTC offset.  The ntpd program will set this offset, but
unfortunately it waits until it a very long time to do so.  You can
either wait until the kernel reports a non-zero TAI-UTC offset, or you
can script/program the start up logic when starting ptp4l.  See below
for a more or less complete example script.

Just bear in mind that, because phc2sys synchronizes the PHCs to the
NTP system time using software time stamps, there might be a time
error on the order of microseconds.

The reasoning behind the above settings is:

- phc2sys -a -r -r

  Option -a makes phc2sys pay attention to the port state from ptp4l,
  and the first -r lets it synchronize the system time from the port
  with the SLAVE role.  The second -r allows phc2sys to consider the
  system time as a time source, thus when all of the ports take the
  MASTER role, phc2sys with synchronize the PHCs to the system time.

- boundary_clock_jbod=1

  This allows ptp4l to act as a BC or GM using a set of PHCs that do
  not share the same PHC.  The assumption is that some other process
  (like phc2sys or ts2phc) looks after the PHC-to-PHC synchronization.

- free_running=1

  Since the intention is to become the GM, this prevents ptp4l from
  accidentally adjusting the PHCs in the presence of a "better" remote
  GM.

- priority1=100

  This sets a higher priority in order to let the GM win the BMCA
  election.  Still you need to take care not to install a second GM in
  the network at a higher priority.

If the GM has PHCs that are either synchronized in hardware or can be
using internal PPS signals, then the configuration should be
different.  Not sure if that applies to your setup.

HTH,
Richard

---8<---
#!/bin/sh

set -e
set -x

#
# Look here for a hacked version of this program that sets the TAI-UTC offset.
# https://github.com/richardcochran/ntpclient-2015
#
adjtimex=/usr/sbin/adjtimex

#
# Read the leapfile to get the current TAI-UTC offset
#
leapfile=$(awk -e '
{
	if ($1 == "leapfile") print $2;
}
' /etc/ntp.conf)

now=`date +%s`

# NTP/UTC conversion:
# utc = ntp - 2208988800
# ntp = utc + 2208988800

offset=$(awk -v utc_now="$now" -e '
!($1~/^\#/) {
	ntp_leapsecond_date = $1;
	utc_leapsecond_date = ntp_leapsecond_date - 2208988800;
	if (utc_leapsecond_date < utc_now) {
		current_offset = $2;
	}
}
END {
	print current_offset;
}
' $leapfile)

#
# Tell the kernel the current TAI-UTC offset.
#
$adjtimex -T $offset

#
# Tell ptp4l how to act like a Grand Master.
#
while [ 1 ]; do
	if [ -e /var/run/ptp4l ]; then
		break;
	fi
	echo Waiting for /var/run/ptp4l to appear...
	sleep 1
done
exec /usr/sbin/pmc -u -b 0 \
"set GRANDMASTER_SETTINGS_NP
clockClass 248
clockAccuracy 0xfe
offsetScaledLogVariance 0xffff
currentUtcOffset $offset
leap61 0
leap59 0
currentUtcOffsetValid 1
ptpTimescale 1
timeTraceable 1
frequencyTraceable 1
timeSource 0x50
"

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: phc2sys - does it work?
  2020-07-26 11:01   ` Russell King - ARM Linux admin
  2020-07-26 18:05     ` Richard Cochran
@ 2020-07-26 19:53     ` Vladimir Oltean
  1 sibling, 0 replies; 7+ messages in thread
From: Vladimir Oltean @ 2020-07-26 19:53 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: Richard Cochran, netdev

On Sun, Jul 26, 2020 at 12:01:05PM +0100, Russell King - ARM Linux admin wrote:
> 
> Another solution would be to avoid running NTP on any machine intending
> to be the source of PTP time on a network, but that then brings up the
> problem that you can't synchronise the PTP time source to a reference
> time, which rather makes PTP pointless unless all that you're after is
> "all my local machines say the same wrong time."
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

TL;DR: if your PHC supports external timestamping (extts), use that,
plus a GPS module. Then synchonize CLOCK_REALTIME to the PHC and not the
other way around.

I guess there is some truth to the saying that "a man with one clock
knows what time it is; a man with two clocks is never sure".

In my corner of the universe, you would never want a 1588 GM to be
disciplined to a Stratum >= 2 NTP server, and possibly never over NTP at
large. That is, _if_ you want your 1588 timing domain to be traceable to
TAI at all (and if the use case doesn't require that, you're 100% better
off leaving the 1588 GM free-running).  Jitter propagates transitively,
and there are few worse things you can do to a synchronization network
than serve a time that is jittery in the first place.

The biggest source of jitter is so-called 'software synchronization'
(aka without hardware assist). phc2sys is a prime example of that, but
also NTP in the configuration most people use it in. There are ways to
improve that (the various species of SYSOFF), and while they do work
fine, the brick wall between hardware and software synchronization still
exists. The one place where it is fine is at the leaves of the clock
distribution tree, aka syncing the system time to the PHC. There, even
if you want to do some periodic tasks based on the PTP schedule, the
scheduling jitter is probably large enough anyway that software
synchronization is not your biggest concern.

-Vladimir

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: phc2sys - does it work?
  2020-07-26 18:05     ` Richard Cochran
@ 2020-07-26 21:29       ` Russell King - ARM Linux admin
  2020-07-27 14:10         ` Richard Cochran
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux admin @ 2020-07-26 21:29 UTC (permalink / raw)
  To: Richard Cochran; +Cc: Vladimir Oltean, netdev

On Sun, Jul 26, 2020 at 11:05:51AM -0700, Richard Cochran wrote:
> On Sun, Jul 26, 2020 at 12:01:05PM +0100, Russell King - ARM Linux admin wrote:
> > Another solution would be to avoid running NTP on any machine intending
> > to be the source of PTP time on a network, but that then brings up the
> > problem that you can't synchronise the PTP time source to a reference
> > time, which rather makes PTP pointless unless all that you're after is
> > "all my local machines say the same wrong time."
> 
> It is clear that you can't have two services both adjusting the system
> time.  For example, running ntpd and chrony on the same machine won't
> work, and neither does running ntpd with 'phc2sys -a -r'.

You've misunderstood, that is not what I'm doing.  The system time on
the machine is sync'd using ntpd, and then I'm syncing the PTP clock
to alone to the system time.  Right now, I'm just testing the PTP
clock implementation, nothing else, to make sure that it is implemented
properly.

So, the setup is:

     +----------+       +--------------------------+
     |  host 1  |       |         test host        |           freq
GPS ---> ntpd ---- lan ---> ntpd -> system -> TAI ---> PPS -> counter
     |          |       |            time          |
     +----------+       +--------------------------+

The good news is - the whole thing has mostly settled - I no longer
see large swings in the PPS signal produced by the PTP/TAI clock,
where large is 10s of PPM.  I'm now down to a frequency error of
around 500PPB.

I think what was going on is ntpd on the test host was switching
between different time sources, causing it to almost constantly slew
the system time on the test host.

I have noticed that phc2sys can sometimes get confused and it needs
phc_ctl to reset the frequency back to zero for it to have another go.
The hardware is capable of a max_adj of S32_MAX, and I think that
allows phc2sys to get confused sometimes, so I probably need to clamp
my calculated max_adj to a sane limit.  Is there an upper limit that
phc2sys expects?

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: phc2sys - does it work?
  2020-07-26 21:29       ` Russell King - ARM Linux admin
@ 2020-07-27 14:10         ` Richard Cochran
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Cochran @ 2020-07-27 14:10 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: Vladimir Oltean, netdev

On Sun, Jul 26, 2020 at 10:29:53PM +0100, Russell King - ARM Linux admin wrote:
> I have noticed that phc2sys can sometimes get confused and it needs
> phc_ctl to reset the frequency back to zero for it to have another go.
> The hardware is capable of a max_adj of S32_MAX, and I think that
> allows phc2sys to get confused sometimes, so I probably need to clamp
> my calculated max_adj to a sane limit.  Is there an upper limit that
> phc2sys expects?

The program uses the minumum of the PHC's max_adj and the
max_frequency configuration value (whose default is 900000000).

In general, huge frequency corrections are a sign that something is
wrong.  If your setup has sudden phase jumps (like ntpd resetting the
clock), then you should consider allowing phc2sys to jump as well.
For example, I use

     phc2sys -S 0.128

which allows phc2sys to jump when the offset is greater that 128
milliseconds.  That value is chosen to match ntpd's threshold for
jumping the time.

HTH,
Richard

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-27 14:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-25 12:49 phc2sys - does it work? Russell King - ARM Linux admin
2020-07-25 13:29 ` Vladimir Oltean
2020-07-26 11:01   ` Russell King - ARM Linux admin
2020-07-26 18:05     ` Richard Cochran
2020-07-26 21:29       ` Russell King - ARM Linux admin
2020-07-27 14:10         ` Richard Cochran
2020-07-26 19:53     ` Vladimir Oltean

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.