From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Bragg Subject: Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events Date: Tue, 5 Dec 2017 14:28:16 +0000 Message-ID: References: <1510748034-14034-1-git-send-email-sagar.a.kamble@intel.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1996439990==" Return-path: Received: from mail-qt0-x22c.google.com (mail-qt0-x22c.google.com [IPv6:2607:f8b0:400d:c0d::22c]) by gabe.freedesktop.org (Postfix) with ESMTPS id E63AA6E575 for ; Tue, 5 Dec 2017 14:28:37 +0000 (UTC) Received: by mail-qt0-x22c.google.com with SMTP id w10so1051039qtb.10 for ; Tue, 05 Dec 2017 06:28:37 -0800 (PST) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Lionel Landwerlin Cc: Intel Graphics Development List-Id: intel-gfx@lists.freedesktop.org --===============1996439990== Content-Type: multipart/alternative; boundary="94eb2c0761d2875c31055f98a6f1" --94eb2c0761d2875c31055f98a6f1 Content-Type: text/plain; charset="UTF-8" On Tue, Dec 5, 2017 at 2:16 PM, Lionel Landwerlin < lionel.g.landwerlin@intel.com> wrote: > Hey Sagar, > > Sorry for the delay looking into this series. > I've done some userspace/UI work in GPUTop to try to correlate perf > samples/tracepoints with i915 perf reports. > > I wanted to avoid having to add too much logic into the kernel and tried > to sample both cpu clocks & gpu timestamps from userspace. > So far that's not working. People more knowledgable than I would have > realized that the kernel can sneak in work into syscalls. > So result is that 2 syscalls (one to get the cpu clock, one for the gpu > timestamp) back to back from the same thread leads to time differences of > anywhere from a few microseconds to in some cases close to 1millisecond. So > it's basically unworkable. > Anyway the UI work won't go to waste :) > > I'm thinking to go with your approach. > From my experiment with gputop, it seems we might want to use a different > cpu clock source though or make it configurable. > The perf infrastructure allows you to choose what clock you want to use. > Since we want to avoid time adjustments on that clock (because we're adding > deltas), a clock monotonic raw would make most sense. > I would guess the most generally useful clock domain to correlate with the largest number of interesting events would surely be CLOCK_MONOTONIC, not _MONOTONIC_RAW. E.g. here's some discussion around why vblank events use CLOCK_MONOTINIC: https://lists.freedesktop.org/archives/dri-devel/2012-October/028878.html Br, - Robert > I'll look at adding some tests for this too. > > Thanks, > > - > Lionel > > On 15/11/17 12:13, Sagar Arun Kamble wrote: > >> We can compute system time corresponding to GPU timestamp by taking a >> reference point (CPU monotonic time, GPU timestamp) and then adding >> delta time computed using timecounter/cyclecounter support in kernel. >> We have to configure cyclecounter with the GPU timestamp frequency. >> Earlier approach that was based on cross-timestamp is not needed. It >> was being used to approximate the frequency based on invalid assumptions >> (possibly drift was being seen in the time due to precision issue). >> The precision of time from GPU clocks is already in ns and timecounter >> takes care of it as verified over variable durations. >> >> This series adds base timecounter/cyclecounter changes and changes to >> get GPU and CPU timestamps in OA samples. >> >> Sagar Arun Kamble (1): >> drm/i915/perf: Add support to correlate GPU timestamp with system time >> >> Sourab Gupta (3): >> drm/i915/perf: Add support for collecting 64 bit timestamps with OA >> reports >> drm/i915/perf: Extract raw GPU timestamps from OA reports >> drm/i915/perf: Send system clock monotonic time in perf samples >> >> drivers/gpu/drm/i915/i915_drv.h | 11 ++++ >> drivers/gpu/drm/i915/i915_perf.c | 124 ++++++++++++++++++++++++++++++ >> ++++++++- >> drivers/gpu/drm/i915/i915_reg.h | 6 ++ >> include/uapi/drm/i915_drm.h | 14 +++++ >> 4 files changed, 154 insertions(+), 1 deletion(-) >> >> > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > --94eb2c0761d2875c31055f98a6f1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Tue, Dec 5, 2017 at 2:16 PM, Lionel Landwerlin <= ;lionel.= g.landwerlin@intel.com> wrote:
Hey Sagar,

Sorry for the delay looking into this series.
I've done some userspace/UI work in GPUTop to try to correlate perf sam= ples/tracepoints with i915 perf reports.

I wanted to avoid having to add too much logic into the kernel and tried to= sample both cpu clocks & gpu timestamps from userspace.
So far that's not working. People more knowledgable than I would have r= ealized that the kernel can sneak in work into syscalls.
So result is that 2 syscalls (one to get the cpu clock, one for the gpu tim= estamp) back to back from the same thread leads to time differences of anyw= here from a few microseconds to in some cases close to 1millisecond. So it&= #39;s basically unworkable.
Anyway the UI work won't go to waste :)

I'm thinking to go with your approach.
>>From my experiment with gputop, it seems we might want to use a different c= pu clock source though or make it configurable.
The perf infrastructure allows you to choose what clock you want to use. Si= nce we want to avoid time adjustments on that clock (because we're addi= ng deltas), a clock monotonic raw would make most sense.

I would guess the most generally useful clock domain to c= orrelate with the largest number of interesting events would surely be CLOC= K_MONOTONIC, not _MONOTONIC_RAW.

E.g. here's s= ome discussion around why vblank events use CLOCK_MONOTINIC: htt= ps://lists.freedesktop.org/archives/dri-devel/2012-October/028878.html<= br>

Br,
- Robert


I'll look at adding some tests for this too.

Thanks,

-
Lionel

On 15/11/17 12:13, Sagar Arun Kamble wrote:
We can compute system time corresponding to GPU timestamp by taking a
reference point (CPU monotonic time, GPU timestamp) and then adding
delta time computed using timecounter/cyclecounter support in kernel.
We have to configure cyclecounter with the GPU timestamp frequency.
Earlier approach that was based on cross-timestamp is not needed. It
was being used to approximate the frequency based on invalid assumptions (possibly drift was being seen in the time due to precision issue).
The precision of time from GPU clocks is already in ns and timecounter
takes care of it as verified over variable durations.

This series adds base timecounter/cyclecounter changes and changes to
get GPU and CPU timestamps in OA samples.

Sagar Arun Kamble (1):
=C2=A0 =C2=A0drm/i915/perf: Add support to correlate GPU timestamp with sys= tem time

Sourab Gupta (3):
=C2=A0 =C2=A0drm/i915/perf: Add support for collecting 64 bit timestamps wi= th OA
=C2=A0 =C2=A0 =C2=A0reports
=C2=A0 =C2=A0drm/i915/perf: Extract raw GPU timestamps from OA reports
=C2=A0 =C2=A0drm/i915/perf: Send system clock monotonic time in perf sample= s

=C2=A0 drivers/gpu/drm/i915/i915_drv.h=C2=A0 |=C2=A0 11 ++++
=C2=A0 drivers/gpu/drm/i915/i915_perf.c | 124 ++++++++++++++++++++++++= ++++++++++++++-
=C2=A0 drivers/gpu/drm/i915/i915_reg.h=C2=A0 |=C2=A0 =C2=A06 ++
=C2=A0 include/uapi/drm/i915_drm.h=C2=A0 =C2=A0 =C2=A0 |=C2=A0 14 +++++
=C2=A0 4 files changed, 154 insertions(+), 1 deletion(-)


_______________________________________________
Intel-gfx mailing list
Intel-= gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/l= istinfo/intel-gfx

--94eb2c0761d2875c31055f98a6f1-- --===============1996439990== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vaW50ZWwtZ2Z4Cg== --===============1996439990==--