All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Stultz <john.stultz@linaro.org>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: lkml <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Miroslav Lichvar <mlichvar@redhat.com>,
	Richard Cochran <richardcochran@gmail.com>,
	Prarit Bhargava <prarit@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	Anton Blanchard <anton@samba.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Tony Luck <tony.luck@intel.com>,
	Fenghua Yu <fenghua.yu@intel.com>
Subject: Re: [RFC][PATCH] time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD
Date: Mon, 22 May 2017 12:06:04 -0700	[thread overview]
Message-ID: <CALAqxLV4AeGnxSjRqsN4Oge4+KHJ6gxm3oXM+uNEx6btuAc2wA@mail.gmail.com> (raw)
In-Reply-To: <87y3tpso51.fsf@concordia.ellerman.id.au>

On Sun, May 21, 2017 at 5:58 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> John Stultz <john.stultz@linaro.org> writes:
>
>> CONFIG_GENERIC_TIME_VSYSCALL_OLD was introduced five years ago
>> to allow a transition from the old vsyscall implementations to
>> the new method (which simplified internal accounting and made
>> timekeeping more precise).
>
> I'm sure it's completely obvious to everyone except me what needs to be
> done, but can you spell it out for me? Keeping in mind that I don't know
> anything about the time keeping code.

No. Apologies, I probably should have included something like this.

Basically long ago, timekeeping was handled (roughly) like:

clock_gettime():
    now = tk->clock->read()
    offset_ns = ((now - tk->cycle_last) * tk->clock->mult) >> tk->clock->shift;
    return timespec_add_ns(tk->xtime, offset_ns);

But since for error handling use sub-ns precision, combined with that
for update performance, we accumulate in fixed intervals, there are
situations where in the update, we could accumulate half of a
nanosecond into the base tk->xtime value and leaving half of a
nanosecond in the offset.   This caused the split nanosecond to be
truncated out by the math, causing 1ns discontinuities.

So to address this, we came up with sort of a hack, which when we
accumulate rounds up that partial nanosecond, and adds the amount we
rounded up to the error (which will cause the freq correction code to
slow the clock down slightly). This is the code that is now done in
the old_vsyscall_fixup() logic.

Unfortunately this fix (which generates up to a nanosecond of error
per tick) then made the freq correction code do more work and made it
more difficult to have a stable clock.

So we went for a more proper fix, which was to properly handle the
sub-nanosecond portion of the timekeeping throughout the logic, doing
the truncation last.

clock_gettime():
    now = tk->clock->read()
    ret.tv_sec = tk->xtime_sec;
    offset_sns = (now - tk->cycle_last) * tk->clock->mult;
    ret.tv_nsec = (offset_sns + tk->tkr_mono.xtime_nsec) >> tk->clock->shift;
    return ret;

So in the above, we now use the tk->tkr_mono.xtime_nsec (which despite
its unfortunate name, stores the accumulated shifted nanoseconds), and
add it to the (current_cycle_delta * clock->mult), then we do the
shift last to preserve as much precision as we can.

Unfortunately we need all the reader code to do the same, which wasn't
easy to transition in some cases. So we provided the
CONFIG_GENERIC_TIME_VSYSCALL_OLD option to preserve the old round-up
behavior while arch maintainers could make the transition.


>> However, PPC and IA64 have yet to make the transition, despite
>> in some cases me sending test patches to try to help it along.
>>
>> http://patches.linaro.org/patch/30501/
>> http://patches.linaro.org/patch/35412/
>
> I've never seen a PPC patch, did you send one?

Yea. The PPC patch I never felt comfortable enough with to send.


>> If its helpful, my last pass at the patches can be found here:
>> https://git.linaro.org/people/john.stultz/linux.git dev/oldvsyscall-cleanup
>
> Looks like it's just a draft for PPC. Do you think that should work and
> it just needs testing? The comment about the vdso being "slightly
> behind" is a little concerning.

So, long ago I talked w/ Paul Mackerras about the ppc vdso code, as
ppc has some other legacy "userspace time" code that has to be
maintained as well (I believe there's not code page, just data page
that userspace pulls directly from). So for that case, we have the
problem where we can't do this sub-ns accounting, so the hack there is
rather then rounding-up and adding to ntp_error in the accumulation
code (which then causes the mult to slow), we're basically doing it
in the reader, slowing down mult by one. This will cause userspace
time to have "steps" where after an accumulation time jumps forward a
bit, but avoids the possibility of a discontinuity where time jumps
backwards.

But again, I don't want to pretend I'm not an expert on the ppc side.
This draft patch doesn't even touch the __kernel_clock_gettime()
implementations, and was trying to preserve the existing ppc logic
while transitioning the core code. Its likely that a better fix should
be done deeper in the ppc side (likely splitting the legacy userspace
data formats out so any hacks only apply to them).

thanks
-john

  reply	other threads:[~2017-05-22 19:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-12 18:24 [RFC][PATCH] time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD John Stultz
2017-05-22  0:58 ` Michael Ellerman
2017-05-22 19:06   ` John Stultz [this message]
2017-05-22 20:45     ` Benjamin Herrenschmidt
2017-05-24 13:31     ` Michael Ellerman
2017-05-25 12:03     ` Paul Mackerras
2017-05-27  3:58       ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALAqxLV4AeGnxSjRqsN4Oge4+KHJ6gxm3oXM+uNEx6btuAc2wA@mail.gmail.com \
    --to=john.stultz@linaro.org \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=fenghua.yu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mlichvar@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=mtosatti@redhat.com \
    --cc=paulus@samba.org \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.