All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
To: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: intel-gfx@lists.freedesktop.org, Chris Wilson <chris.p.wilson@intel.com>
Subject: Re: [Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy
Date: Wed, 3 Mar 2021 18:28:22 +0200	[thread overview]
Message-ID: <ea032cd7-e02a-a2e0-6673-d653e24a56f6@intel.com> (raw)
In-Reply-To: <20210303162717.GB38857@orsosgc001.ra.intel.com>

On 03/03/2021 18:27, Umesh Nerlige Ramappa wrote:
> On Wed, Mar 03, 2021 at 11:21:39AM +0200, Lionel Landwerlin wrote:
>> On 03/03/2021 02:12, Umesh Nerlige Ramappa wrote:
>>> On Tue, Mar 02, 2021 at 10:35:19PM +0200, Lionel Landwerlin wrote:
>>>> Thanks a bunch for sharing this!
>>>>
>>>> On 02/03/2021 20:29, Umesh Nerlige Ramappa wrote:
>>>>> Perf measurements rely on CPU and engine timestamps to correlate
>>>>> events of interest across these time domains. Current mechanisms get
>>>>> these timestamps separately and the calculated delta between these
>>>>> timestamps lack enough accuracy.
>>>>>
>>>>> To improve the accuracy of these time measurements to within a few 
>>>>> us,
>>>>> add a query that returns the engine and cpu timestamps captured as
>>>>> close to each other as possible.
>>>>>
>>>>> v2: (Tvrtko)
>>>>> - document clock reference used
>>>>> - return cpu timestamp always
>>>>> - capture cpu time just before lower dword of cs timestamp
>>>>>
>>>>> v3: (Chris)
>>>>> - use uncore-rpm
>>>>> - use __query_cs_timestamp helper
>>>>>
>>>>> v4: (Lionel)
>>>>> - Kernel perf subsytem allows users to specify the clock id to be 
>>>>> used
>>>>>   in perf_event_open. This clock id is used by the perf subsystem to
>>>>>   return the appropriate cpu timestamp in perf events. Similarly, let
>>>>>   the user pass the clockid to this query so that cpu timestamp
>>>>>   corresponds to the clock id requested.
>>>>>
>>>>> v5: (Tvrtko)
>>>>> - Use normal ktime accessors instead of fast versions
>>>>> - Add more uApi documentation
>>>>>
>>>>> v6: (Lionel)
>>>>> - Move switch out of spinlock
>>>>>
>>>>> v7: (Chris)
>>>>> - cs_timestamp is a misnomer, use cs_cycles instead
>>>>> - return the cs cycle frequency as well in the query
>>>>>
>>>>> v8:
>>>>> - Add platform and engine specific checks
>>>>>
>>>>> Signed-off-by: Umesh Nerlige Ramappa 
>>>>> <umesh.nerlige.ramappa@intel.com>
>>>>> ---
>>>>>  drivers/gpu/drm/i915/i915_query.c | 140 
>>>>> ++++++++++++++++++++++++++++++
>>>>>  include/uapi/drm/i915_drm.h       |  43 +++++++++
>>>>>  2 files changed, 183 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_query.c 
>>>>> b/drivers/gpu/drm/i915/i915_query.c
>>>>> index fed337ad7b68..763f0f918065 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_query.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_query.c
>>>>> @@ -6,6 +6,8 @@
>>>>>  #include <linux/nospec.h>
>>>>> +#include "gt/intel_engine_pm.h"
>>>>> +#include "gt/intel_engine_user.h"
>>>>>  #include "i915_drv.h"
>>>>>  #include "i915_perf.h"
>>>>>  #include "i915_query.h"
>>>>> @@ -90,6 +92,143 @@ static int query_topology_info(struct 
>>>>> drm_i915_private *dev_priv,
>>>>>      return total_length;
>>>>>  }
>>>>> +typedef u64 (*__ktime_func_t)(void);
>>>>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>>>>> +{
>>>>> +    /*
>>>>> +     * Use logic same as the perf subsystem to allow user to 
>>>>> select the
>>>>> +     * reference clock id to be used for timestamps.
>>>>> +     */
>>>>> +    switch (clk_id) {
>>>>> +    case CLOCK_MONOTONIC:
>>>>> +        return &ktime_get_ns;
>>>>> +    case CLOCK_MONOTONIC_RAW:
>>>>> +        return &ktime_get_raw_ns;
>>>>> +    case CLOCK_REALTIME:
>>>>> +        return &ktime_get_real_ns;
>>>>> +    case CLOCK_BOOTTIME:
>>>>> +        return &ktime_get_boottime_ns;
>>>>> +    case CLOCK_TAI:
>>>>> +        return &ktime_get_clocktai_ns;
>>>>> +    default:
>>>>> +        return NULL;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +static inline int
>>>>> +__read_timestamps(struct intel_uncore *uncore,
>>>>> +          i915_reg_t lower_reg,
>>>>> +          i915_reg_t upper_reg,
>>>>> +          u64 *cs_ts,
>>>>> +          u64 *cpu_ts,
>>>>> +          __ktime_func_t cpu_clock)
>>>>> +{
>>>>> +    u32 upper, lower, old_upper, loop = 0;
>>>>> +
>>>>> +    upper = intel_uncore_read_fw(uncore, upper_reg);
>>>>> +    do {
>>>>> +        *cpu_ts = cpu_clock();
>>>>> +        lower = intel_uncore_read_fw(uncore, lower_reg);
>>>>> +        old_upper = upper;
>>>>> +        upper = intel_uncore_read_fw(uncore, upper_reg);
>>>>> +    } while (upper != old_upper && loop++ < 2);
>>>>
>>>>
>>>> With the 2 cpu timestamps things I mentioned below, this would be
>>>>
>>>>
>>>> do {
>>>>
>>>>     *cpu_ts0 = cpu_clock();
>>>>
>>>>     lower = intel_uncore_read_fw(uncore, lower_reg);
>>>>
>>>>     *cpu_ts1 = cpu_clock();
>>>>
>>>>     upper = intel_uncore_read_fw(uncore, upper_reg);
>>>>
>>>> } while (upper != old_upper && loop++ < 2);
>>>>
>>>>
>>>>> +
>>>>> +    *cs_ts = (u64)upper << 32 | lower;
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static int
>>>>> +__query_cs_cycles(struct intel_engine_cs *engine,
>>>>> +          u64 *cs_ts, u64 *cpu_ts,
>>>>> +          __ktime_func_t cpu_clock)
>>>>> +{
>>>>> +    struct intel_uncore *uncore = engine->uncore;
>>>>> +    enum forcewake_domains fw_domains;
>>>>> +    u32 base = engine->mmio_base;
>>>>> +    intel_wakeref_t wakeref;
>>>>> +    int ret;
>>>>> +
>>>>> +    fw_domains = intel_uncore_forcewake_for_reg(uncore,
>>>>> +                            RING_TIMESTAMP(base),
>>>>> +                            FW_REG_READ);
>>>>> +
>>>>> +    with_intel_runtime_pm(uncore->rpm, wakeref) {
>>>>> +        spin_lock_irq(&uncore->lock);
>>>>> +        intel_uncore_forcewake_get__locked(uncore, fw_domains);
>>>>> +
>>>>> +        ret = __read_timestamps(uncore,
>>>>> +                    RING_TIMESTAMP(base),
>>>>> +                    RING_TIMESTAMP_UDW(base),
>>>>> +                    cs_ts,
>>>>> +                    cpu_ts,
>>>>> +                    cpu_clock);
>>>>> +
>>>>> +        intel_uncore_forcewake_put__locked(uncore, fw_domains);
>>>>> +        spin_unlock_irq(&uncore->lock);
>>>>> +    }
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +static int
>>>>> +query_cs_cycles(struct drm_i915_private *i915,
>>>>> +        struct drm_i915_query_item *query_item)
>>>>> +{
>>>>> +    struct drm_i915_query_cs_cycles __user *query_ptr;
>>>>> +    struct drm_i915_query_cs_cycles query;
>>>>> +    struct intel_engine_cs *engine;
>>>>> +    __ktime_func_t cpu_clock;
>>>>> +    int ret;
>>>>> +
>>>>> +    if (INTEL_GEN(i915) < 6)
>>>>> +        return -ENODEV;
>>>>> +
>>>>> +    query_ptr = u64_to_user_ptr(query_item->data_ptr);
>>>>> +    ret = copy_query_item(&query, sizeof(query), sizeof(query), 
>>>>> query_item);
>>>>> +    if (ret != 0)
>>>>> +        return ret;
>>>>> +
>>>>> +    if (query.flags)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    if (query.rsvd)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    cpu_clock = __clock_id_to_func(query.clockid);
>>>>> +    if (!cpu_clock)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    engine = intel_engine_lookup_user(i915,
>>>>> +                      query.engine.engine_class,
>>>>> +                      query.engine.engine_instance);
>>>>> +    if (!engine)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    if (IS_GEN(i915, 6) &&
>>>>> +        query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
>>>>> +        return -ENODEV;
>>>>> +
>>>>> +    query.cs_frequency = engine->gt->clock_frequency;
>>>>> +    ret = __query_cs_cycles(engine,
>>>>> +                &query.cs_cycles,
>>>>> +                &query.cpu_timestamp,
>>>>> +                cpu_clock);
>>>>> +    if (ret)
>>>>> +        return ret;
>>>>> +
>>>>> +    if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
>>>>> +        return -EFAULT;
>>>>> +
>>>>> +    if (put_user(query.cpu_timestamp, &query_ptr->cpu_timestamp))
>>>>> +        return -EFAULT;
>>>>> +
>>>>> +    if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
>>>>> +        return -EFAULT;
>>>>> +
>>>>> +    return sizeof(query);
>>>>> +}
>>>>> +
>>>>>  static int
>>>>>  query_engine_info(struct drm_i915_private *i915,
>>>>>            struct drm_i915_query_item *query_item)
>>>>> @@ -424,6 +563,7 @@ static int (* const i915_query_funcs[])(struct 
>>>>> drm_i915_private *dev_priv,
>>>>>      query_topology_info,
>>>>>      query_engine_info,
>>>>>      query_perf_config,
>>>>> +    query_cs_cycles,
>>>>>  };
>>>>>  int i915_query_ioctl(struct drm_device *dev, void *data, struct 
>>>>> drm_file *file)
>>>>> diff --git a/include/uapi/drm/i915_drm.h 
>>>>> b/include/uapi/drm/i915_drm.h
>>>>> index 1987e2ea79a3..379ae6e7aeb0 100644
>>>>> --- a/include/uapi/drm/i915_drm.h
>>>>> +++ b/include/uapi/drm/i915_drm.h
>>>>> @@ -2176,6 +2176,10 @@ struct drm_i915_query_item {
>>>>>  #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>>>>>  #define DRM_I915_QUERY_ENGINE_INFO    2
>>>>>  #define DRM_I915_QUERY_PERF_CONFIG      3
>>>>> +    /**
>>>>> +     * Query Command Streamer timestamp register.
>>>>> +     */
>>>>> +#define DRM_I915_QUERY_CS_CYCLES    4
>>>>>  /* Must be kept compact -- no holes and well documented */
>>>>>      /*
>>>>> @@ -2309,6 +2313,45 @@ struct drm_i915_engine_info {
>>>>>      __u64 rsvd1[4];
>>>>>  };
>>>>> +/**
>>>>> + * struct drm_i915_query_cs_cycles
>>>>> + *
>>>>> + * The query returns the command streamer cycles and the 
>>>>> frequency that can be
>>>>> + * used to calculate the command streamer timestamp. In addition 
>>>>> the query
>>>>> + * returns the cpu timestamp that indicates when the command 
>>>>> streamer cycle
>>>>> + * count was captured.
>>>>> + */
>>>>> +struct drm_i915_query_cs_cycles {
>>>>> +    /** Engine for which command streamer cycles is queried. */
>>>>> +    struct i915_engine_class_instance engine;
>>>>> +
>>>>> +    /** Must be zero. */
>>>>> +    __u32 flags;
>>>>> +
>>>>> +    /**
>>>>> +     * Command streamer cycles as read from the command streamer
>>>>> +     * register at 0x358 offset.
>>>>> +     */
>>>>> +    __u64 cs_cycles;
>>>>> +
>>>>> +    /** Frequency of the cs cycles in Hz. */
>>>>> +    __u64 cs_frequency;
>>>>> +
>>>>> +    /** CPU timestamp in nanoseconds. */
>>>>> +    __u64 cpu_timestamp;
>>>>
>>>>
>>>> Would it be possible to have : u64 cpu_timestamps[2];
>>>>
>>>> with cpu_timestamps[0] taken before & cpu_timestamps[1] taken after 
>>>> the cs_cycles, so we can have an idea of how long the read takes.
>>>
>>> Possible, but I thought multiple queries would indirectly provide 
>>> such information. If query1 returns cpu1 and cs1 time and query2 
>>> returns cpu2 and cs2 times. Assuming neither overflowed,
>>>
>>> |((cpu2 - cpu1) - (cs1 - cs2))|
>>>
>>> should be the worst case time taken to read the register 
>>> (essentially delta_delta in the IGT test). Thoughts?
>>
>>
>> Going through 2 syscalls introduces a delay.
>>
>> I did some measurements and it appears to be in the orders of 20~30us.
>>
>
> Have you tried multiple query items in the same call to the query 
> ioctl?  Does that make any difference?


Yeah, it was the average :/


-Lionel


>
>>
>> While doing the 2 cpu timestamp capture with a single mmio read in 
>> between should be below 2us.
>>
>> We're hoping to go as precise as possible with this :)
>
> I see. I will post an update.
>
> Can you also share how you intend to use the query result with 2 cpu 
> timestamps? I want to add that to the IGT.
>
> Thanks,
> Umesh
>
>>
>>
>> -Lionel
>>
>>
>>>
>>> Thanks,
>>> Umesh
>>>
>>>>
>>>>
>>>>> +
>>>>> +    /**
>>>>> +     * Reference clock id for CPU timestamp. For definition, see
>>>>> +     * clock_gettime(2) and perf_event_open(2). Supported clock 
>>>>> ids are
>>>>> +     * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, 
>>>>> CLOCK_BOOTTIME,
>>>>> +     * CLOCK_TAI.
>>>>> +     */
>>>>> +    __s32 clockid;
>>>>> +
>>>>> +    /** Must be zero. */
>>>>> +    __u32 rsvd;
>>>>> +};
>>>>> +
>>>>>  /**
>>>>>   * struct drm_i915_query_engine_info
>>>>>   *
>>>>
>>>>
>>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2021-03-03 16:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-02 18:29 [Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy Umesh Nerlige Ramappa
2021-03-02 20:35 ` Lionel Landwerlin
2021-03-03  0:12   ` Umesh Nerlige Ramappa
2021-03-03  9:21     ` Lionel Landwerlin
2021-03-03 16:27       ` Umesh Nerlige Ramappa
2021-03-03 16:28         ` Lionel Landwerlin [this message]
2021-03-03 17:02 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2021-03-03 20:23 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2021-03-03 21:28 [Intel-gfx] [PATCH] " Umesh Nerlige Ramappa
2021-03-04  0:09 ` Chris Wilson
2021-03-04  8:28   ` Lionel Landwerlin
2021-03-04  8:58     ` Chris Wilson
2021-03-04  9:45       ` Lionel Landwerlin
2021-03-04  9:54         ` Chris Wilson
2021-03-04 10:27           ` Lionel Landwerlin
2021-03-04  8:32 ` Lionel Landwerlin
2021-03-04 12:23 ` Lionel Landwerlin
2021-03-04  0:11 kernel test robot
2021-03-05 19:37 Umesh Nerlige Ramappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea032cd7-e02a-a2e0-6673-d653e24a56f6@intel.com \
    --to=lionel.g.landwerlin@intel.com \
    --cc=chris.p.wilson@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.