All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Stultz <jstultz@google.com>
To: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, sboyd@kernel.org,
	eranian@google.com, namhyung@kernel.org, ak@linux.intel.com,
	adrian.hunter@intel.com
Subject: Re: [RFC PATCH V2 2/9] perf: Extend ABI to support post-processing monotonic raw conversion
Date: Tue, 14 Feb 2023 11:52:17 -0800	[thread overview]
Message-ID: <CANDhNCo4HugnOeHNaCqbp5R7Na1j_7pU-rWhPE-D9jMO3UdihQ@mail.gmail.com> (raw)
In-Reply-To: <6898b1c8-9dbf-67ce-46e6-15d5307ced25@linux.intel.com>

On Tue, Feb 14, 2023 at 6:51 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> On 2023-02-13 5:22 p.m., John Stultz wrote:
> > On Mon, Feb 13, 2023 at 1:40 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >> On 2023-02-13 2:37 p.m., John Stultz wrote:
> >>> On Mon, Feb 13, 2023 at 11:08 AM <kan.liang@linux.intel.com> wrote:
> >>>>
> >>>> From: Kan Liang <kan.liang@linux.intel.com>
> >>>>
> >>>> The monotonic raw clock is not affected by NTP/PTP correction. The
> >>>> calculation of the monotonic raw clock can be done in the
> >>>> post-processing, which can reduce the kernel overhead.
> >>>>
> >>>> Add hw_time in the struct perf_event_attr to tell the kernel dump the
> >>>> raw HW time to user space. The perf tool will calculate the HW time
> >>>> in post-processing.
> >>>> Currently, only supports the monotonic raw conversion.
> >>>> Only dump the raw HW time with PERF_RECORD_SAMPLE, because the accurate
> >>>> HW time can only be provided in a sample by HW. For other type of
> >>>> records, the user requested clock should be returned as usual. Nothing
> >>>> is changed.
> >>>>
> >>>> Add perf_event_mmap_page::cap_user_time_mono_raw ABI to dump the
> >>>> conversion information. The cap_user_time_mono_raw also indicates
> >>>> whether the monotonic raw conversion information is available.
> >>>> If yes, the clock monotonic raw can be calculated as
> >>>> mono_raw = base + ((cyc - last) * mult + nsec) >> shift
> >>>
> >>> Again, I appreciate you reworking and resending this series out, I
> >>> know it took some effort.
> >>>
> >>> But oof, I'd really like to make sure we're not exporting timekeeping
> >>> internals to userland.
> >>>
> >>> I think Thomas' suggestion of doing the timestamp conversion in
> >>> post-processing was more about interpolating collected system times
> >>> with the counter (tsc) values captured.
> >>>
> >>
> >> Thomas, could you please clarify your suggestion regarding "the relevant
> >> conversion information" provided by the kernel?
> >> https://lore.kernel.org/lkml/87ilgsgl5f.ffs@tglx/
> >>
> >> Is it only the interpolation information or the entire conversion
> >> information (Mult, shift etc.)?
> >>
> >> If it's only the interpolation information, the user space will be lack
> >> of information to handle all the cases. If I understand John's comments
> >> correctly, it could also bring some interpolation error which can only
> >> be addressed by the mult/shift conversion.
> >
>
>
> Thanks for the details John.
>
> > "Only" is maybe too strong a word. I think having the driver use
> > kernel timekeeping accessors to CLOCK_MONONOTONIC_RAW time with
> > counter values will minimize the error.
> >
>
> The key motivation of using the TSC in the PEBS record is to get an
> accurate timestamp of each record. We definitely want the conversion has
> minimized error.

Yep.

> > But again, it's not yet established that any interpolation error using
> > existing interfaces is great enough to be problematic here.
> >
> > The interpoloation is pretty easy to do:
> >
> > do {
> >     start= readtsc();
> >     clock_gett(CLOCK_MONOTONIC_RAW, &ts);
> >     end = readtsc();
> >     delta = end-start;
> > } while (delta  > THRESHOLD)   // make sure the reads were not preempted
> > mid = start + (delta +(delta/2))/2; //round-closest
> >
>
> How to choose the THRESHOLD? It seems the THRESHOLD value also impacts
> the accuracy.

Maybe by running a number of of these reads and collecting the detlas,
then setting THRESHOLD to a standard deviation of the results?
(I'm sure there's more sound methods, but I'd have to do some digging
to find them)

Alternatively you could always take 10 samples and then only do the
mapping with the smallest delta value.


> > and be able to get you a fairly close matching of TSC to
> > CLOCK_MONOTONIC_RAW value.
> >
> > Once you have that mapping you can take a few samples and establish
> > the linear function.
> >
> > But that will have some error, so quantifying that error helps
> > establish why being able to get an atomic mapping of TSC ->
> > CLOCK_MONOTONIC_RAW would help.
> >
> > So I really don't think we need to expose the kernel internal values
> > to userland, but I'm willing to guess the atomic mapping (which the
> > driver will have access to, not userland) may be helpful for the fine
> > granularity you want in the trace.
> >
>
> If I understand correctly, the idea is to let the user space tool run
> the above interpoloation algorithm several times to 'guess' the atomic
> mapping. Using the mapping information to covert the TSC from the PEBS
> record. Is my understanding correct?

So I think that's what Thomas was suggesting.

The next step would probably be to provide a way for the driver to
provide atomic TSC->CLOCK_MONOTONIC_RAW samples, so userland can
calculate the function itself.

So then the problem becomes if X1 and Y1 are exactly mapped, and X2
and Y2 are exactly mapped, then given X3, find Y3.

And if that doesn't work, then we would have to see about having the
driver do all the conversions.

> If so, to be honest, I doubt we can get the accuracy we want.

Sure. I just want to make sure its quantified that the pure userland
interpolation approach won't work before we go adding in extra
in-kernel logic

(We'd obviously rather do the logic that can be done in userland in userland)

thanks
-john

  parent reply	other threads:[~2023-02-14 19:52 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-13 19:07 [RFC PATCH V2 0/9] Convert TSC to monotonic raw clock for PEBS kan.liang
2023-02-13 19:07 ` [RFC PATCH V2 1/9] timekeeping: Expose the conversion information of monotonic raw kan.liang
2023-02-13 19:28   ` John Stultz
2023-02-13 19:07 ` [RFC PATCH V2 2/9] perf: Extend ABI to support post-processing monotonic raw conversion kan.liang
2023-02-13 19:37   ` John Stultz
2023-02-13 21:40     ` Liang, Kan
2023-02-13 22:22       ` John Stultz
2023-02-14 10:43         ` Peter Zijlstra
2023-02-14 17:46           ` Liang, Kan
2023-02-14 19:37             ` John Stultz
2023-02-14 20:09               ` Liang, Kan
2023-02-14 20:21                 ` John Stultz
2023-03-12 20:50                   ` Andi Kleen
2023-02-14 19:34           ` John Stultz
2023-02-14 14:51         ` Liang, Kan
2023-02-14 17:00           ` Liang, Kan
2023-02-14 20:11             ` John Stultz
2023-02-14 20:38               ` Liang, Kan
2023-02-17 23:11                 ` John Stultz
2023-03-08 18:44                   ` Liang, Kan
2023-03-09  1:17                     ` John Stultz
2023-03-09 16:56                       ` Liang, Kan
2023-03-11  5:55                         ` John Stultz
2023-03-13 21:19                           ` Liang, Kan
2023-03-18  6:02                             ` John Stultz
2023-03-21 15:26                               ` Liang, Kan
2023-02-14 19:52           ` John Stultz [this message]
2023-02-13 19:07 ` [RFC PATCH V2 3/9] perf/x86: Factor out x86_pmu_sample_preload() kan.liang
2023-02-13 19:07 ` [RFC PATCH V2 4/9] perf/x86: Enable post-processing monotonic raw conversion kan.liang
2023-02-14 20:02   ` Thomas Gleixner
2023-02-14 20:21     ` Liang, Kan
2023-02-14 20:55       ` Thomas Gleixner
2023-03-21 15:38         ` Liang, Kan
2023-02-13 19:07 ` [RFC PATCH V2 5/9] perf/x86/intel: Enable large PEBS for monotonic raw kan.liang
2023-02-13 19:07 ` [RFC PATCH V2 6/9] tools headers UAPI: Sync linux/perf_event.h with the kernel sources kan.liang
2023-02-13 19:07 ` [RFC PATCH V2 7/9] perf session: Support the monotonic raw clock conversion information kan.liang
2023-02-13 19:07 ` [RFC PATCH V2 8/9] perf evsel, tsc: Support the monotonic raw clock conversion kan.liang
2023-02-13 19:07 ` [RFC PATCH V2 9/9] perf evsel: Enable post-processing monotonic raw conversion by default kan.liang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANDhNCo4HugnOeHNaCqbp5R7Na1j_7pU-rWhPE-D9jMO3UdihQ@mail.gmail.com \
    --to=jstultz@google.com \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.