All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 0/1] Add support for querying engine cycles
@ 2021-04-27 21:49 Umesh Nerlige Ramappa
  2021-04-27 21:49 ` [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy Umesh Nerlige Ramappa
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-04-27 21:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

This is just a refresh of the earlier patch along with cover letter for the IGT
testing. The query provides the engine cs cycles counter.

v2: Use GRAPHICS_VER() instead of IG_GEN()

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Test-with: 20210421172046.65062-1-umesh.nerlige.ramappa@intel.com

Umesh Nerlige Ramappa (1):
  i915/query: Correlate engine and cpu timestamps with better accuracy

 drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h       |  48 ++++++++++
 2 files changed, 193 insertions(+)

-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
  2021-04-27 21:49 [Intel-gfx] [PATCH 0/1] Add support for querying engine cycles Umesh Nerlige Ramappa
@ 2021-04-27 21:49 ` Umesh Nerlige Ramappa
  2021-04-28  8:43     ` [Intel-gfx] " Jani Nikula
  2021-04-27 22:16 ` [Intel-gfx] ✗ Fi.CI.DOCS: warning for Add support for querying engine cycles Patchwork
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-04-27 21:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Perf measurements rely on CPU and engine timestamps to correlate
events of interest across these time domains. Current mechanisms get
these timestamps separately and the calculated delta between these
timestamps lack enough accuracy.

To improve the accuracy of these time measurements to within a few us,
add a query that returns the engine and cpu timestamps captured as
close to each other as possible.

v2: (Tvrtko)
- document clock reference used
- return cpu timestamp always
- capture cpu time just before lower dword of cs timestamp

v3: (Chris)
- use uncore-rpm
- use __query_cs_timestamp helper

v4: (Lionel)
- Kernel perf subsytem allows users to specify the clock id to be used
  in perf_event_open. This clock id is used by the perf subsystem to
  return the appropriate cpu timestamp in perf events. Similarly, let
  the user pass the clockid to this query so that cpu timestamp
  corresponds to the clock id requested.

v5: (Tvrtko)
- Use normal ktime accessors instead of fast versions
- Add more uApi documentation

v6: (Lionel)
- Move switch out of spinlock

v7: (Chris)
- cs_timestamp is a misnomer, use cs_cycles instead
- return the cs cycle frequency as well in the query

v8:
- Add platform and engine specific checks

v9: (Lionel)
- Return 2 cpu timestamps in the query - captured before and after the
  register read

v10: (Chris)
- Use local_clock() to measure time taken to read lower dword of
  register and return it to user.

v11: (Jani)
- IS_GEN deprecated. User GRAPHICS_VER instead.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h       |  48 ++++++++++
 2 files changed, 193 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index fed337ad7b68..2594b93901ac 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -6,6 +6,8 @@
 
 #include <linux/nospec.h>
 
+#include "gt/intel_engine_pm.h"
+#include "gt/intel_engine_user.h"
 #include "i915_drv.h"
 #include "i915_perf.h"
 #include "i915_query.h"
@@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
 	return total_length;
 }
 
+typedef u64 (*__ktime_func_t)(void);
+static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
+{
+	/*
+	 * Use logic same as the perf subsystem to allow user to select the
+	 * reference clock id to be used for timestamps.
+	 */
+	switch (clk_id) {
+	case CLOCK_MONOTONIC:
+		return &ktime_get_ns;
+	case CLOCK_MONOTONIC_RAW:
+		return &ktime_get_raw_ns;
+	case CLOCK_REALTIME:
+		return &ktime_get_real_ns;
+	case CLOCK_BOOTTIME:
+		return &ktime_get_boottime_ns;
+	case CLOCK_TAI:
+		return &ktime_get_clocktai_ns;
+	default:
+		return NULL;
+	}
+}
+
+static inline int
+__read_timestamps(struct intel_uncore *uncore,
+		  i915_reg_t lower_reg,
+		  i915_reg_t upper_reg,
+		  u64 *cs_ts,
+		  u64 *cpu_ts,
+		  __ktime_func_t cpu_clock)
+{
+	u32 upper, lower, old_upper, loop = 0;
+
+	upper = intel_uncore_read_fw(uncore, upper_reg);
+	do {
+		cpu_ts[1] = local_clock();
+		cpu_ts[0] = cpu_clock();
+		lower = intel_uncore_read_fw(uncore, lower_reg);
+		cpu_ts[1] = local_clock() - cpu_ts[1];
+		old_upper = upper;
+		upper = intel_uncore_read_fw(uncore, upper_reg);
+	} while (upper != old_upper && loop++ < 2);
+
+	*cs_ts = (u64)upper << 32 | lower;
+
+	return 0;
+}
+
+static int
+__query_cs_cycles(struct intel_engine_cs *engine,
+		  u64 *cs_ts, u64 *cpu_ts,
+		  __ktime_func_t cpu_clock)
+{
+	struct intel_uncore *uncore = engine->uncore;
+	enum forcewake_domains fw_domains;
+	u32 base = engine->mmio_base;
+	intel_wakeref_t wakeref;
+	int ret;
+
+	fw_domains = intel_uncore_forcewake_for_reg(uncore,
+						    RING_TIMESTAMP(base),
+						    FW_REG_READ);
+
+	with_intel_runtime_pm(uncore->rpm, wakeref) {
+		spin_lock_irq(&uncore->lock);
+		intel_uncore_forcewake_get__locked(uncore, fw_domains);
+
+		ret = __read_timestamps(uncore,
+					RING_TIMESTAMP(base),
+					RING_TIMESTAMP_UDW(base),
+					cs_ts,
+					cpu_ts,
+					cpu_clock);
+
+		intel_uncore_forcewake_put__locked(uncore, fw_domains);
+		spin_unlock_irq(&uncore->lock);
+	}
+
+	return ret;
+}
+
+static int
+query_cs_cycles(struct drm_i915_private *i915,
+		struct drm_i915_query_item *query_item)
+{
+	struct drm_i915_query_cs_cycles __user *query_ptr;
+	struct drm_i915_query_cs_cycles query;
+	struct intel_engine_cs *engine;
+	__ktime_func_t cpu_clock;
+	int ret;
+
+	if (GRAPHICS_VER(i915) < 6)
+		return -ENODEV;
+
+	query_ptr = u64_to_user_ptr(query_item->data_ptr);
+	ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
+	if (ret != 0)
+		return ret;
+
+	if (query.flags)
+		return -EINVAL;
+
+	if (query.rsvd)
+		return -EINVAL;
+
+	cpu_clock = __clock_id_to_func(query.clockid);
+	if (!cpu_clock)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(i915,
+					  query.engine.engine_class,
+					  query.engine.engine_instance);
+	if (!engine)
+		return -EINVAL;
+
+	if (GRAPHICS_VER(i915) == 6 &&
+	    query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
+		return -ENODEV;
+
+	query.cs_frequency = engine->gt->clock_frequency;
+	ret = __query_cs_cycles(engine,
+				&query.cs_cycles,
+				query.cpu_timestamp,
+				cpu_clock);
+	if (ret)
+		return ret;
+
+	if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
+		return -EFAULT;
+
+	if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
+		return -EFAULT;
+
+	if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
+		return -EFAULT;
+
+	if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
+		return -EFAULT;
+
+	return sizeof(query);
+}
+
 static int
 query_engine_info(struct drm_i915_private *i915,
 		  struct drm_i915_query_item *query_item)
@@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
 	query_topology_info,
 	query_engine_info,
 	query_perf_config,
+	query_cs_cycles,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6a34243a7646..08b00f1709b5 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
 #define DRM_I915_QUERY_TOPOLOGY_INFO    1
 #define DRM_I915_QUERY_ENGINE_INFO	2
 #define DRM_I915_QUERY_PERF_CONFIG      3
+	/**
+	 * Query Command Streamer timestamp register.
+	 */
+#define DRM_I915_QUERY_CS_CYCLES	4
 /* Must be kept compact -- no holes and well documented */
 
 	/**
@@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
 	__u64 rsvd1[4];
 };
 
+/**
+ * struct drm_i915_query_cs_cycles
+ *
+ * The query returns the command streamer cycles and the frequency that can be
+ * used to calculate the command streamer timestamp. In addition the query
+ * returns a set of cpu timestamps that indicate when the command streamer cycle
+ * count was captured.
+ */
+struct drm_i915_query_cs_cycles {
+	/** Engine for which command streamer cycles is queried. */
+	struct i915_engine_class_instance engine;
+
+	/** Must be zero. */
+	__u32 flags;
+
+	/**
+	 * Command streamer cycles as read from the command streamer
+	 * register at 0x358 offset.
+	 */
+	__u64 cs_cycles;
+
+	/** Frequency of the cs cycles in Hz. */
+	__u64 cs_frequency;
+
+	/**
+	 * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
+	 * cs_cycles register using the reference clockid set by the user.
+	 * cpu_timestamp[1] is the time taken in ns to read the lower dword of
+	 * the cs_cycles register.
+	 */
+	__u64 cpu_timestamp[2];
+
+	/**
+	 * Reference clock id for CPU timestamp. For definition, see
+	 * clock_gettime(2) and perf_event_open(2). Supported clock ids are
+	 * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
+	 * CLOCK_TAI.
+	 */
+	__s32 clockid;
+
+	/** Must be zero. */
+	__u32 rsvd;
+};
+
 /**
  * struct drm_i915_query_engine_info
  *
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] ✗ Fi.CI.DOCS: warning for Add support for querying engine cycles
  2021-04-27 21:49 [Intel-gfx] [PATCH 0/1] Add support for querying engine cycles Umesh Nerlige Ramappa
  2021-04-27 21:49 ` [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy Umesh Nerlige Ramappa
@ 2021-04-27 22:16 ` Patchwork
  2021-04-27 22:41 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
  2021-04-28  3:31 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  3 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2021-04-27 22:16 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: Add support for querying engine cycles
URL   : https://patchwork.freedesktop.org/series/89560/
State : warning

== Summary ==

$ make htmldocs 2>&1 > /dev/null | grep i915
./include/uapi/drm/i915_drm.h:2234: warning: Incorrect use of kernel-doc format:          * Query Command Streamer timestamp register.
./include/uapi/drm/i915_drm.h:2420: warning: Incorrect use of kernel-doc format:          * Command streamer cycles as read from the command streamer
./include/uapi/drm/i915_drm.h:2429: warning: Incorrect use of kernel-doc format:          * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
./include/uapi/drm/i915_drm.h:2437: warning: Incorrect use of kernel-doc format:          * Reference clock id for CPU timestamp. For definition, see
./include/uapi/drm/i915_drm.h:2446: warning: Function parameter or member 'engine' not described in 'drm_i915_query_cs_cycles'
./include/uapi/drm/i915_drm.h:2446: warning: Function parameter or member 'flags' not described in 'drm_i915_query_cs_cycles'
./include/uapi/drm/i915_drm.h:2446: warning: Function parameter or member 'cs_cycles' not described in 'drm_i915_query_cs_cycles'
./include/uapi/drm/i915_drm.h:2446: warning: Function parameter or member 'cs_frequency' not described in 'drm_i915_query_cs_cycles'
./include/uapi/drm/i915_drm.h:2446: warning: Function parameter or member 'cpu_timestamp' not described in 'drm_i915_query_cs_cycles'
./include/uapi/drm/i915_drm.h:2446: warning: Function parameter or member 'clockid' not described in 'drm_i915_query_cs_cycles'
./include/uapi/drm/i915_drm.h:2446: warning: Function parameter or member 'rsvd' not described in 'drm_i915_query_cs_cycles'


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Add support for querying engine cycles
  2021-04-27 21:49 [Intel-gfx] [PATCH 0/1] Add support for querying engine cycles Umesh Nerlige Ramappa
  2021-04-27 21:49 ` [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy Umesh Nerlige Ramappa
  2021-04-27 22:16 ` [Intel-gfx] ✗ Fi.CI.DOCS: warning for Add support for querying engine cycles Patchwork
@ 2021-04-27 22:41 ` Patchwork
  2021-04-28  3:31 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  3 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2021-04-27 22:41 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2959 bytes --]

== Series Details ==

Series: Add support for querying engine cycles
URL   : https://patchwork.freedesktop.org/series/89560/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10018 -> Patchwork_20008
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/index.html

Known issues
------------

  Here are the changes found in Patchwork_20008 that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-tgl-u2:          [FAIL][1] ([i915#1888]) -> [PASS][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/fi-tgl-u2/igt@gem_exec_suspend@basic-s3.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/fi-tgl-u2/igt@gem_exec_suspend@basic-s3.html

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-tgl-y:           [DMESG-FAIL][3] ([i915#541]) -> [PASS][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/fi-tgl-y/igt@i915_selftest@live@gt_heartbeat.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/fi-tgl-y/igt@i915_selftest@live@gt_heartbeat.html

  
#### Warnings ####

  * igt@kms_chamelium@vga-edid-read:
    - fi-icl-u2:          [SKIP][5] -> [SKIP][6] ([fdo#109309]) +1 similar issue
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/fi-icl-u2/igt@kms_chamelium@vga-edid-read.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/fi-icl-u2/igt@kms_chamelium@vga-edid-read.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109309]: https://bugs.freedesktop.org/show_bug.cgi?id=109309
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#3180]: https://gitlab.freedesktop.org/drm/intel/issues/3180
  [i915#541]: https://gitlab.freedesktop.org/drm/intel/issues/541


Participating hosts (47 -> 41)
------------------------------

  Missing    (6): fi-ilk-m540 fi-hsw-4200u fi-bsw-cyan fi-ctg-p8600 fi-icl-y fi-bdw-samus 


Build changes
-------------

  * IGT: IGT_6076 -> IGTPW_5757
  * Linux: CI_DRM_10018 -> Patchwork_20008

  CI-20190529: 20190529
  CI_DRM_10018: 929a4fa94d31990066fd8be6a02f1a6c2b9f1d2d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_5757: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_5757/index.html
  IGT_6076: 9ab0820dbd07781161c1ace6973ea222fd24e53a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_20008: 043ae82f7a388989581edb393cd22ff6fb2fde02 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

043ae82f7a38 i915/query: Correlate engine and cpu timestamps with better accuracy

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/index.html

[-- Attachment #1.2: Type: text/html, Size: 3503 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for Add support for querying engine cycles
  2021-04-27 21:49 [Intel-gfx] [PATCH 0/1] Add support for querying engine cycles Umesh Nerlige Ramappa
                   ` (2 preceding siblings ...)
  2021-04-27 22:41 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2021-04-28  3:31 ` Patchwork
  3 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2021-04-28  3:31 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 30261 bytes --]

== Series Details ==

Series: Add support for querying engine cycles
URL   : https://patchwork.freedesktop.org/series/89560/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10018_full -> Patchwork_20008_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

New tests
---------

  New tests have been introduced between CI_DRM_10018_full and Patchwork_20008_full:

### New IGT tests (2) ###

  * igt@i915_query@cs-cycles:
    - Statuses : 6 pass(s)
    - Exec time: [0.01, 0.14] s

  * igt@i915_query@cs-cycles-invalid:
    - Statuses : 6 pass(s)
    - Exec time: [0.00, 0.04] s

  

Known issues
------------

  Here are the changes found in Patchwork_20008_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@feature_discovery@display-3x:
    - shard-iclb:         NOTRUN -> [SKIP][1] ([i915#1839]) +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb7/igt@feature_discovery@display-3x.html

  * igt@gem_create@create-clear:
    - shard-glk:          [PASS][2] -> [FAIL][3] ([i915#3160])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-glk7/igt@gem_create@create-clear.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk3/igt@gem_create@create-clear.html

  * igt@gem_create@create-massive:
    - shard-apl:          NOTRUN -> [DMESG-WARN][4] ([i915#3002])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl6/igt@gem_create@create-massive.html

  * igt@gem_ctx_persistence@legacy-engines-queued:
    - shard-snb:          NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#1099]) +6 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-snb2/igt@gem_ctx_persistence@legacy-engines-queued.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-skl:          NOTRUN -> [FAIL][6] ([i915#2846])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl5/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-none-vip@rcs0:
    - shard-kbl:          [PASS][7] -> [FAIL][8] ([i915#2842]) +1 similar issue
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-kbl1/igt@gem_exec_fair@basic-none-vip@rcs0.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-kbl1/igt@gem_exec_fair@basic-none-vip@rcs0.html

  * igt@gem_exec_fair@basic-none@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][9] ([i915#2842]) +4 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb1/igt@gem_exec_fair@basic-none@vcs1.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-tglb:         [PASS][10] -> [FAIL][11] ([i915#2842])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-tglb1/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb5/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-glk:          [PASS][12] -> [FAIL][13] ([i915#2842])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-glk6/igt@gem_exec_fair@basic-pace-solo@rcs0.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk2/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_exec_fair@basic-pace@vcs1:
    - shard-tglb:         NOTRUN -> [FAIL][14] ([i915#2842]) +2 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb8/igt@gem_exec_fair@basic-pace@vcs1.html

  * igt@gem_exec_fair@basic-pace@vecs0:
    - shard-glk:          NOTRUN -> [FAIL][15] ([i915#2842]) +2 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk6/igt@gem_exec_fair@basic-pace@vecs0.html

  * igt@gem_exec_params@secure-non-root:
    - shard-iclb:         NOTRUN -> [SKIP][16] ([fdo#112283])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb5/igt@gem_exec_params@secure-non-root.html

  * igt@gem_exec_reloc@basic-wide-active@bcs0:
    - shard-apl:          NOTRUN -> [FAIL][17] ([i915#2389]) +3 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl3/igt@gem_exec_reloc@basic-wide-active@bcs0.html
    - shard-skl:          NOTRUN -> [FAIL][18] ([i915#2389]) +3 similar issues
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl1/igt@gem_exec_reloc@basic-wide-active@bcs0.html

  * igt@gem_exec_reloc@basic-wide-active@rcs0:
    - shard-snb:          NOTRUN -> [FAIL][19] ([i915#2389]) +2 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-snb6/igt@gem_exec_reloc@basic-wide-active@rcs0.html

  * igt@gem_exec_whisper@basic-queues-forked:
    - shard-glk:          [PASS][20] -> [DMESG-WARN][21] ([i915#118] / [i915#95]) +1 similar issue
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-glk7/igt@gem_exec_whisper@basic-queues-forked.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk3/igt@gem_exec_whisper@basic-queues-forked.html

  * igt@gem_mmap_gtt@cpuset-big-copy:
    - shard-iclb:         [PASS][22] -> [FAIL][23] ([i915#307])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-iclb2/igt@gem_mmap_gtt@cpuset-big-copy.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb3/igt@gem_mmap_gtt@cpuset-big-copy.html

  * igt@gem_mmap_gtt@cpuset-big-copy-xy:
    - shard-glk:          [PASS][24] -> [FAIL][25] ([i915#307]) +1 similar issue
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-glk4/igt@gem_mmap_gtt@cpuset-big-copy-xy.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk1/igt@gem_mmap_gtt@cpuset-big-copy-xy.html

  * igt@gem_pread@exhaustion:
    - shard-apl:          NOTRUN -> [WARN][26] ([i915#2658])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl6/igt@gem_pread@exhaustion.html

  * igt@gem_render_copy@y-tiled-mc-ccs-to-vebox-y-tiled:
    - shard-iclb:         NOTRUN -> [SKIP][27] ([i915#768]) +2 similar issues
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb5/igt@gem_render_copy@y-tiled-mc-ccs-to-vebox-y-tiled.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-apl:          NOTRUN -> [SKIP][28] ([fdo#109271] / [i915#3323])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl6/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@gem_userptr_blits@dmabuf-unsync:
    - shard-tglb:         NOTRUN -> [SKIP][29] ([i915#3297]) +1 similar issue
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb1/igt@gem_userptr_blits@dmabuf-unsync.html
    - shard-iclb:         NOTRUN -> [SKIP][30] ([i915#3297]) +2 similar issues
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb2/igt@gem_userptr_blits@dmabuf-unsync.html

  * igt@gem_userptr_blits@set-cache-level:
    - shard-kbl:          NOTRUN -> [FAIL][31] ([i915#3324])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-kbl2/igt@gem_userptr_blits@set-cache-level.html
    - shard-snb:          NOTRUN -> [FAIL][32] ([i915#3324])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-snb7/igt@gem_userptr_blits@set-cache-level.html
    - shard-skl:          NOTRUN -> [FAIL][33] ([i915#3324])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl10/igt@gem_userptr_blits@set-cache-level.html
    - shard-tglb:         NOTRUN -> [FAIL][34] ([i915#3324])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb2/igt@gem_userptr_blits@set-cache-level.html
    - shard-apl:          NOTRUN -> [FAIL][35] ([i915#3324])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl8/igt@gem_userptr_blits@set-cache-level.html
    - shard-iclb:         NOTRUN -> [FAIL][36] ([i915#3324])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb5/igt@gem_userptr_blits@set-cache-level.html
    - shard-glk:          NOTRUN -> [FAIL][37] ([i915#3324])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk9/igt@gem_userptr_blits@set-cache-level.html

  * igt@gem_userptr_blits@vma-merge:
    - shard-apl:          NOTRUN -> [FAIL][38] ([i915#3318])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl1/igt@gem_userptr_blits@vma-merge.html

  * igt@gen9_exec_parse@bb-large:
    - shard-iclb:         NOTRUN -> [SKIP][39] ([i915#2527])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb3/igt@gen9_exec_parse@bb-large.html

  * igt@gen9_exec_parse@valid-registers:
    - shard-iclb:         NOTRUN -> [SKIP][40] ([fdo#112306]) +1 similar issue
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb2/igt@gen9_exec_parse@valid-registers.html

  * igt@i915_pm_rpm@modeset-non-lpsp:
    - shard-iclb:         NOTRUN -> [SKIP][41] ([fdo#110892])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb4/igt@i915_pm_rpm@modeset-non-lpsp.html

  * igt@i915_pm_rpm@modeset-pc8-residency-stress:
    - shard-tglb:         NOTRUN -> [SKIP][42] ([fdo#109506] / [i915#2411])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb3/igt@i915_pm_rpm@modeset-pc8-residency-stress.html
    - shard-iclb:         NOTRUN -> [SKIP][43] ([fdo#109293] / [fdo#109506])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb1/igt@i915_pm_rpm@modeset-pc8-residency-stress.html

  * igt@kms_big_fb@linear-64bpp-rotate-90:
    - shard-tglb:         NOTRUN -> [SKIP][44] ([fdo#111614]) +1 similar issue
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb3/igt@kms_big_fb@linear-64bpp-rotate-90.html

  * igt@kms_big_fb@x-tiled-32bpp-rotate-270:
    - shard-iclb:         NOTRUN -> [SKIP][45] ([fdo#110725] / [fdo#111614]) +1 similar issue
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb1/igt@kms_big_fb@x-tiled-32bpp-rotate-270.html

  * igt@kms_big_joiner@invalid-modeset:
    - shard-skl:          NOTRUN -> [SKIP][46] ([fdo#109271] / [i915#2705]) +1 similar issue
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl8/igt@kms_big_joiner@invalid-modeset.html

  * igt@kms_ccs@pipe-c-bad-aux-stride:
    - shard-skl:          NOTRUN -> [SKIP][47] ([fdo#109271] / [fdo#111304]) +4 similar issues
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl9/igt@kms_ccs@pipe-c-bad-aux-stride.html

  * igt@kms_ccs@pipe-c-ccs-on-another-bo:
    - shard-glk:          NOTRUN -> [SKIP][48] ([fdo#109271]) +39 similar issues
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk6/igt@kms_ccs@pipe-c-ccs-on-another-bo.html

  * igt@kms_chamelium@hdmi-hpd-storm:
    - shard-kbl:          NOTRUN -> [SKIP][49] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-kbl2/igt@kms_chamelium@hdmi-hpd-storm.html

  * igt@kms_chamelium@vga-hpd:
    - shard-apl:          NOTRUN -> [SKIP][50] ([fdo#109271] / [fdo#111827]) +22 similar issues
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl6/igt@kms_chamelium@vga-hpd.html

  * igt@kms_chamelium@vga-hpd-with-enabled-mode:
    - shard-iclb:         NOTRUN -> [SKIP][51] ([fdo#109284] / [fdo#111827]) +7 similar issues
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb3/igt@kms_chamelium@vga-hpd-with-enabled-mode.html

  * igt@kms_chamelium@vga-hpd-without-ddc:
    - shard-snb:          NOTRUN -> [SKIP][52] ([fdo#109271] / [fdo#111827]) +24 similar issues
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-snb6/igt@kms_chamelium@vga-hpd-without-ddc.html
    - shard-tglb:         NOTRUN -> [SKIP][53] ([fdo#109284] / [fdo#111827]) +1 similar issue
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb5/igt@kms_chamelium@vga-hpd-without-ddc.html
    - shard-glk:          NOTRUN -> [SKIP][54] ([fdo#109271] / [fdo#111827]) +1 similar issue
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk7/igt@kms_chamelium@vga-hpd-without-ddc.html

  * igt@kms_color@pipe-b-ctm-0-75:
    - shard-skl:          [PASS][55] -> [DMESG-WARN][56] ([i915#1982])
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-skl2/igt@kms_color@pipe-b-ctm-0-75.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl8/igt@kms_color@pipe-b-ctm-0-75.html

  * igt@kms_color_chamelium@pipe-b-ctm-max:
    - shard-skl:          NOTRUN -> [SKIP][57] ([fdo#109271] / [fdo#111827]) +12 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl7/igt@kms_color_chamelium@pipe-b-ctm-max.html

  * igt@kms_color_chamelium@pipe-d-ctm-green-to-red:
    - shard-iclb:         NOTRUN -> [SKIP][58] ([fdo#109278] / [fdo#109284] / [fdo#111827])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb8/igt@kms_color_chamelium@pipe-d-ctm-green-to-red.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-apl:          NOTRUN -> [TIMEOUT][59] ([i915#1319]) +1 similar issue
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl8/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_cursor_crc@pipe-a-cursor-512x512-offscreen:
    - shard-iclb:         NOTRUN -> [SKIP][60] ([fdo#109278] / [fdo#109279]) +1 similar issue
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb7/igt@kms_cursor_crc@pipe-a-cursor-512x512-offscreen.html

  * igt@kms_cursor_crc@pipe-b-cursor-32x10-random:
    - shard-kbl:          NOTRUN -> [SKIP][61] ([fdo#109271]) +106 similar issues
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-kbl7/igt@kms_cursor_crc@pipe-b-cursor-32x10-random.html
    - shard-tglb:         NOTRUN -> [SKIP][62] ([i915#3359]) +1 similar issue
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb3/igt@kms_cursor_crc@pipe-b-cursor-32x10-random.html

  * igt@kms_cursor_crc@pipe-d-cursor-32x32-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][63] ([i915#3319])
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb6/igt@kms_cursor_crc@pipe-d-cursor-32x32-rapid-movement.html

  * igt@kms_cursor_crc@pipe-d-cursor-512x512-onscreen:
    - shard-tglb:         NOTRUN -> [SKIP][64] ([fdo#109279] / [i915#3359]) +1 similar issue
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb5/igt@kms_cursor_crc@pipe-d-cursor-512x512-onscreen.html

  * igt@kms_cursor_edge_walk@pipe-d-256x256-left-edge:
    - shard-iclb:         NOTRUN -> [SKIP][65] ([fdo#109278]) +23 similar issues
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb7/igt@kms_cursor_edge_walk@pipe-d-256x256-left-edge.html

  * igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic:
    - shard-iclb:         NOTRUN -> [SKIP][66] ([fdo#109274] / [fdo#109278]) +2 similar issues
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb7/igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic.html

  * igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-atomic:
    - shard-skl:          [PASS][67] -> [FAIL][68] ([i915#2346])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-skl2/igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-atomic.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl8/igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-atomic.html

  * igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ab-hdmi-a1-hdmi-a2:
    - shard-glk:          [PASS][69] -> [FAIL][70] ([i915#2122])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-glk3/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ab-hdmi-a1-hdmi-a2.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk2/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ab-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@2x-flip-vs-fences:
    - shard-iclb:         NOTRUN -> [SKIP][71] ([fdo#109274]) +3 similar issues
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb3/igt@kms_flip@2x-flip-vs-fences.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs:
    - shard-apl:          NOTRUN -> [SKIP][72] ([fdo#109271] / [i915#2672])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl2/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile:
    - shard-skl:          NOTRUN -> [SKIP][73] ([fdo#109271] / [i915#2642]) +1 similar issue
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl2/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile.html
    - shard-apl:          NOTRUN -> [SKIP][74] ([fdo#109271] / [i915#2642])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl7/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile.html

  * igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack-mmap-gtt:
    - shard-skl:          NOTRUN -> [SKIP][75] ([fdo#109271]) +234 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl1/igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-onoff:
    - shard-tglb:         NOTRUN -> [SKIP][76] ([fdo#111825]) +12 similar issues
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb2/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-indfb-draw-mmap-gtt:
    - shard-skl:          [PASS][77] -> [SKIP][78] ([fdo#109271]) +7 similar issues
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-skl1/igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-indfb-draw-mmap-gtt.html
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl7/igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-indfb-draw-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-onoff:
    - shard-snb:          NOTRUN -> [SKIP][79] ([fdo#109271]) +378 similar issues
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-snb5/igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-shrfb-plflip-blt:
    - shard-iclb:         NOTRUN -> [SKIP][80] ([fdo#109280]) +24 similar issues
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb3/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-shrfb-plflip-blt.html

  * igt@kms_hdr@bpc-switch-suspend:
    - shard-skl:          [PASS][81] -> [FAIL][82] ([i915#1188])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-skl1/igt@kms_hdr@bpc-switch-suspend.html
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl9/igt@kms_hdr@bpc-switch-suspend.html

  * igt@kms_pipe_b_c_ivb@from-pipe-c-to-b-with-3-lanes:
    - shard-iclb:         NOTRUN -> [SKIP][83] ([fdo#109289]) +3 similar issues
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb1/igt@kms_pipe_b_c_ivb@from-pipe-c-to-b-with-3-lanes.html
    - shard-tglb:         NOTRUN -> [SKIP][84] ([fdo#109289]) +2 similar issues
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb3/igt@kms_pipe_b_c_ivb@from-pipe-c-to-b-with-3-lanes.html

  * igt@kms_pipe_crc_basic@hang-read-crc-pipe-d:
    - shard-skl:          NOTRUN -> [SKIP][85] ([fdo#109271] / [i915#533]) +2 similar issues
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl2/igt@kms_pipe_crc_basic@hang-read-crc-pipe-d.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-basic:
    - shard-apl:          NOTRUN -> [FAIL][86] ([fdo#108145] / [i915#265]) +2 similar issues
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl8/igt@kms_plane_alpha_blend@pipe-a-alpha-basic.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb:
    - shard-skl:          NOTRUN -> [FAIL][87] ([i915#265]) +1 similar issue
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl1/igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb.html
    - shard-apl:          NOTRUN -> [FAIL][88] ([i915#265])
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl3/igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb.html
    - shard-glk:          NOTRUN -> [FAIL][89] ([i915#265])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk3/igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb.html
    - shard-kbl:          NOTRUN -> [FAIL][90] ([i915#265])
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-kbl6/igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb:
    - shard-kbl:          NOTRUN -> [FAIL][91] ([fdo#108145] / [i915#265]) +1 similar issue
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-kbl3/igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb.html
    - shard-skl:          NOTRUN -> [FAIL][92] ([fdo#108145] / [i915#265]) +3 similar issues
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl6/igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb.html
    - shard-glk:          NOTRUN -> [FAIL][93] ([fdo#108145] / [i915#265])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk7/igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb.html

  * igt@kms_plane_lowres@pipe-a-tiling-x:
    - shard-tglb:         [PASS][94] -> [FAIL][95] ([i915#899])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-tglb2/igt@kms_plane_lowres@pipe-a-tiling-x.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb5/igt@kms_plane_lowres@pipe-a-tiling-x.html

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-yf:
    - shard-tglb:         NOTRUN -> [SKIP][96] ([fdo#111615])
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb6/igt@kms_plane_multiple@atomic-pipe-a-tiling-yf.html

  * igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping:
    - shard-apl:          NOTRUN -> [SKIP][97] ([fdo#109271] / [i915#2733])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl6/igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping.html
    - shard-skl:          NOTRUN -> [SKIP][98] ([fdo#109271] / [i915#2733])
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl3/igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-1:
    - shard-skl:          NOTRUN -> [SKIP][99] ([fdo#109271] / [i915#658]) +4 similar issues
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl8/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-1.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-4:
    - shard-apl:          NOTRUN -> [SKIP][100] ([fdo#109271] / [i915#658]) +5 similar issues
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl7/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-4.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area-0:
    - shard-iclb:         NOTRUN -> [SKIP][101] ([i915#658]) +1 similar issue
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb1/igt@kms_psr2_sf@plane-move-sf-dmg-area-0.html
    - shard-glk:          NOTRUN -> [SKIP][102] ([fdo#109271] / [i915#658])
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk1/igt@kms_psr2_sf@plane-move-sf-dmg-area-0.html
    - shard-kbl:          NOTRUN -> [SKIP][103] ([fdo#109271] / [i915#658]) +1 similar issue
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-kbl3/igt@kms_psr2_sf@plane-move-sf-dmg-area-0.html

  * igt@kms_psr@psr2_primary_mmap_cpu:
    - shard-iclb:         NOTRUN -> [SKIP][104] ([fdo#109441]) +2 similar issues
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb5/igt@kms_psr@psr2_primary_mmap_cpu.html
    - shard-tglb:         NOTRUN -> [FAIL][105] ([i915#132]) +1 similar issue
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb2/igt@kms_psr@psr2_primary_mmap_cpu.html

  * igt@kms_psr@psr2_sprite_plane_move:
    - shard-iclb:         [PASS][106] -> [SKIP][107] ([fdo#109441]) +2 similar issues
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-iclb2/igt@kms_psr@psr2_sprite_plane_move.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb5/igt@kms_psr@psr2_sprite_plane_move.html

  * igt@kms_setmode@invalid-clone-single-crtc:
    - shard-skl:          [PASS][108] -> [WARN][109] ([i915#2100])
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-skl5/igt@kms_setmode@invalid-clone-single-crtc.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl7/igt@kms_setmode@invalid-clone-single-crtc.html

  * igt@kms_vblank@pipe-c-ts-continuation-suspend:
    - shard-kbl:          NOTRUN -> [DMESG-WARN][110] ([i915#180]) +1 similar issue
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-kbl1/igt@kms_vblank@pipe-c-ts-continuation-suspend.html

  * igt@kms_vblank@pipe-d-wait-idle:
    - shard-apl:          NOTRUN -> [SKIP][111] ([fdo#109271] / [i915#533]) +2 similar issues
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl1/igt@kms_vblank@pipe-d-wait-idle.html

  * igt@kms_writeback@writeback-pixel-formats:
    - shard-skl:          NOTRUN -> [SKIP][112] ([fdo#109271] / [i915#2437])
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl10/igt@kms_writeback@writeback-pixel-formats.html

  * igt@nouveau_crc@pipe-c-source-outp-inactive:
    - shard-iclb:         NOTRUN -> [SKIP][113] ([i915#2530]) +1 similar issue
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb7/igt@nouveau_crc@pipe-c-source-outp-inactive.html

  * igt@perf@polling-small-buf:
    - shard-skl:          [PASS][114] -> [FAIL][115] ([i915#1722])
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-skl8/igt@perf@polling-small-buf.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl8/igt@perf@polling-small-buf.html

  * igt@prime_nv_api@i915_nv_reimport_twice_check_flink_name:
    - shard-apl:          NOTRUN -> [SKIP][116] ([fdo#109271]) +237 similar issues
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl6/igt@prime_nv_api@i915_nv_reimport_twice_check_flink_name.html

  * igt@prime_nv_api@nv_i915_import_twice_check_flink_name:
    - shard-iclb:         NOTRUN -> [SKIP][117] ([fdo#109291]) +4 similar issues
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb3/igt@prime_nv_api@nv_i915_import_twice_check_flink_name.html

  * igt@prime_nv_api@nv_self_import:
    - shard-tglb:         NOTRUN -> [SKIP][118] ([fdo#109291]) +1 similar issue
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb5/igt@prime_nv_api@nv_self_import.html

  * igt@sysfs_clients@fair-1:
    - shard-glk:          NOTRUN -> [SKIP][119] ([fdo#109271] / [i915#2994])
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk7/igt@sysfs_clients@fair-1.html
    - shard-iclb:         NOTRUN -> [SKIP][120] ([i915#2994]) +1 similar issue
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb2/igt@sysfs_clients@fair-1.html
    - shard-apl:          NOTRUN -> [SKIP][121] ([fdo#109271] / [i915#2994]) +4 similar issues
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-apl2/igt@sysfs_clients@fair-1.html
    - shard-tglb:         NOTRUN -> [SKIP][122] ([i915#2994])
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb5/igt@sysfs_clients@fair-1.html

  * igt@sysfs_clients@fair-7:
    - shard-skl:          NOTRUN -> [SKIP][123] ([fdo#109271] / [i915#2994]) +3 similar issues
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-skl6/igt@sysfs_clients@fair-7.html

  * igt@sysfs_clients@split-25:
    - shard-kbl:          NOTRUN -> [SKIP][124] ([fdo#109271] / [i915#2994]) +1 similar issue
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-kbl7/igt@sysfs_clients@split-25.html

  
#### Possible fixes ####

  * igt@gem_ctx_persistence@many-contexts:
    - shard-tglb:         [FAIL][125] ([i915#2410]) -> [PASS][126]
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-tglb1/igt@gem_ctx_persistence@many-contexts.html
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb3/igt@gem_ctx_persistence@many-contexts.html

  * igt@gem_eio@unwedge-stress:
    - shard-tglb:         [TIMEOUT][127] ([i915#2369] / [i915#3063]) -> [PASS][128]
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-tglb1/igt@gem_eio@unwedge-stress.html
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-tglb6/igt@gem_eio@unwedge-stress.html
    - shard-iclb:         [TIMEOUT][129] ([i915#2369] / [i915#2481] / [i915#3070]) -> [PASS][130]
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-iclb1/igt@gem_eio@unwedge-stress.html
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-iclb2/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-glk:          [FAIL][131] ([i915#2846]) -> [PASS][132]
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10018/shard-glk9/igt@gem_exec_fair@basic-deadline.html
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/shard-glk4/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-flow@rcs0:
    - shard-tglb:         [FAIL][1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20008/index.html

[-- Attachment #1.2: Type: text/html, Size: 33814 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
  2021-04-27 21:49 ` [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy Umesh Nerlige Ramappa
@ 2021-04-28  8:43     ` Jani Nikula
  0 siblings, 0 replies; 26+ messages in thread
From: Jani Nikula @ 2021-04-28  8:43 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx; +Cc: Jason Ekstrand, dri-devel, Chris Wilson

On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
> Perf measurements rely on CPU and engine timestamps to correlate
> events of interest across these time domains. Current mechanisms get
> these timestamps separately and the calculated delta between these
> timestamps lack enough accuracy.
>
> To improve the accuracy of these time measurements to within a few us,
> add a query that returns the engine and cpu timestamps captured as
> close to each other as possible.

Cc: dri-devel, Jason and Daniel for review.

>
> v2: (Tvrtko)
> - document clock reference used
> - return cpu timestamp always
> - capture cpu time just before lower dword of cs timestamp
>
> v3: (Chris)
> - use uncore-rpm
> - use __query_cs_timestamp helper
>
> v4: (Lionel)
> - Kernel perf subsytem allows users to specify the clock id to be used
>   in perf_event_open. This clock id is used by the perf subsystem to
>   return the appropriate cpu timestamp in perf events. Similarly, let
>   the user pass the clockid to this query so that cpu timestamp
>   corresponds to the clock id requested.
>
> v5: (Tvrtko)
> - Use normal ktime accessors instead of fast versions
> - Add more uApi documentation
>
> v6: (Lionel)
> - Move switch out of spinlock
>
> v7: (Chris)
> - cs_timestamp is a misnomer, use cs_cycles instead
> - return the cs cycle frequency as well in the query
>
> v8:
> - Add platform and engine specific checks
>
> v9: (Lionel)
> - Return 2 cpu timestamps in the query - captured before and after the
>   register read
>
> v10: (Chris)
> - Use local_clock() to measure time taken to read lower dword of
>   register and return it to user.
>
> v11: (Jani)
> - IS_GEN deprecated. User GRAPHICS_VER instead.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h       |  48 ++++++++++
>  2 files changed, 193 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> index fed337ad7b68..2594b93901ac 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -6,6 +6,8 @@
>  
>  #include <linux/nospec.h>
>  
> +#include "gt/intel_engine_pm.h"
> +#include "gt/intel_engine_user.h"
>  #include "i915_drv.h"
>  #include "i915_perf.h"
>  #include "i915_query.h"
> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>  	return total_length;
>  }
>  
> +typedef u64 (*__ktime_func_t)(void);
> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> +{
> +	/*
> +	 * Use logic same as the perf subsystem to allow user to select the
> +	 * reference clock id to be used for timestamps.
> +	 */
> +	switch (clk_id) {
> +	case CLOCK_MONOTONIC:
> +		return &ktime_get_ns;
> +	case CLOCK_MONOTONIC_RAW:
> +		return &ktime_get_raw_ns;
> +	case CLOCK_REALTIME:
> +		return &ktime_get_real_ns;
> +	case CLOCK_BOOTTIME:
> +		return &ktime_get_boottime_ns;
> +	case CLOCK_TAI:
> +		return &ktime_get_clocktai_ns;
> +	default:
> +		return NULL;
> +	}
> +}
> +
> +static inline int
> +__read_timestamps(struct intel_uncore *uncore,
> +		  i915_reg_t lower_reg,
> +		  i915_reg_t upper_reg,
> +		  u64 *cs_ts,
> +		  u64 *cpu_ts,
> +		  __ktime_func_t cpu_clock)
> +{
> +	u32 upper, lower, old_upper, loop = 0;
> +
> +	upper = intel_uncore_read_fw(uncore, upper_reg);
> +	do {
> +		cpu_ts[1] = local_clock();
> +		cpu_ts[0] = cpu_clock();
> +		lower = intel_uncore_read_fw(uncore, lower_reg);
> +		cpu_ts[1] = local_clock() - cpu_ts[1];
> +		old_upper = upper;
> +		upper = intel_uncore_read_fw(uncore, upper_reg);
> +	} while (upper != old_upper && loop++ < 2);
> +
> +	*cs_ts = (u64)upper << 32 | lower;
> +
> +	return 0;
> +}
> +
> +static int
> +__query_cs_cycles(struct intel_engine_cs *engine,
> +		  u64 *cs_ts, u64 *cpu_ts,
> +		  __ktime_func_t cpu_clock)
> +{
> +	struct intel_uncore *uncore = engine->uncore;
> +	enum forcewake_domains fw_domains;
> +	u32 base = engine->mmio_base;
> +	intel_wakeref_t wakeref;
> +	int ret;
> +
> +	fw_domains = intel_uncore_forcewake_for_reg(uncore,
> +						    RING_TIMESTAMP(base),
> +						    FW_REG_READ);
> +
> +	with_intel_runtime_pm(uncore->rpm, wakeref) {
> +		spin_lock_irq(&uncore->lock);
> +		intel_uncore_forcewake_get__locked(uncore, fw_domains);
> +
> +		ret = __read_timestamps(uncore,
> +					RING_TIMESTAMP(base),
> +					RING_TIMESTAMP_UDW(base),
> +					cs_ts,
> +					cpu_ts,
> +					cpu_clock);
> +
> +		intel_uncore_forcewake_put__locked(uncore, fw_domains);
> +		spin_unlock_irq(&uncore->lock);
> +	}
> +
> +	return ret;
> +}
> +
> +static int
> +query_cs_cycles(struct drm_i915_private *i915,
> +		struct drm_i915_query_item *query_item)
> +{
> +	struct drm_i915_query_cs_cycles __user *query_ptr;
> +	struct drm_i915_query_cs_cycles query;
> +	struct intel_engine_cs *engine;
> +	__ktime_func_t cpu_clock;
> +	int ret;
> +
> +	if (GRAPHICS_VER(i915) < 6)
> +		return -ENODEV;
> +
> +	query_ptr = u64_to_user_ptr(query_item->data_ptr);
> +	ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
> +	if (ret != 0)
> +		return ret;
> +
> +	if (query.flags)
> +		return -EINVAL;
> +
> +	if (query.rsvd)
> +		return -EINVAL;
> +
> +	cpu_clock = __clock_id_to_func(query.clockid);
> +	if (!cpu_clock)
> +		return -EINVAL;
> +
> +	engine = intel_engine_lookup_user(i915,
> +					  query.engine.engine_class,
> +					  query.engine.engine_instance);
> +	if (!engine)
> +		return -EINVAL;
> +
> +	if (GRAPHICS_VER(i915) == 6 &&
> +	    query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
> +		return -ENODEV;
> +
> +	query.cs_frequency = engine->gt->clock_frequency;
> +	ret = __query_cs_cycles(engine,
> +				&query.cs_cycles,
> +				query.cpu_timestamp,
> +				cpu_clock);
> +	if (ret)
> +		return ret;
> +
> +	if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
> +		return -EFAULT;
> +
> +	if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
> +		return -EFAULT;
> +
> +	if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
> +		return -EFAULT;
> +
> +	if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
> +		return -EFAULT;
> +
> +	return sizeof(query);
> +}
> +
>  static int
>  query_engine_info(struct drm_i915_private *i915,
>  		  struct drm_i915_query_item *query_item)
> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>  	query_topology_info,
>  	query_engine_info,
>  	query_perf_config,
> +	query_cs_cycles,
>  };
>  
>  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 6a34243a7646..08b00f1709b5 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>  #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>  #define DRM_I915_QUERY_ENGINE_INFO	2
>  #define DRM_I915_QUERY_PERF_CONFIG      3
> +	/**
> +	 * Query Command Streamer timestamp register.
> +	 */
> +#define DRM_I915_QUERY_CS_CYCLES	4
>  /* Must be kept compact -- no holes and well documented */
>  
>  	/**
> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>  	__u64 rsvd1[4];
>  };
>  
> +/**
> + * struct drm_i915_query_cs_cycles
> + *
> + * The query returns the command streamer cycles and the frequency that can be
> + * used to calculate the command streamer timestamp. In addition the query
> + * returns a set of cpu timestamps that indicate when the command streamer cycle
> + * count was captured.
> + */
> +struct drm_i915_query_cs_cycles {
> +	/** Engine for which command streamer cycles is queried. */
> +	struct i915_engine_class_instance engine;
> +
> +	/** Must be zero. */
> +	__u32 flags;
> +
> +	/**
> +	 * Command streamer cycles as read from the command streamer
> +	 * register at 0x358 offset.
> +	 */
> +	__u64 cs_cycles;
> +
> +	/** Frequency of the cs cycles in Hz. */
> +	__u64 cs_frequency;
> +
> +	/**
> +	 * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
> +	 * cs_cycles register using the reference clockid set by the user.
> +	 * cpu_timestamp[1] is the time taken in ns to read the lower dword of
> +	 * the cs_cycles register.
> +	 */
> +	__u64 cpu_timestamp[2];
> +
> +	/**
> +	 * Reference clock id for CPU timestamp. For definition, see
> +	 * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> +	 * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> +	 * CLOCK_TAI.
> +	 */
> +	__s32 clockid;
> +
> +	/** Must be zero. */
> +	__u32 rsvd;
> +};
> +
>  /**
>   * struct drm_i915_query_engine_info
>   *

-- 
Jani Nikula, Intel Open Source Graphics Center
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
@ 2021-04-28  8:43     ` Jani Nikula
  0 siblings, 0 replies; 26+ messages in thread
From: Jani Nikula @ 2021-04-28  8:43 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx; +Cc: dri-devel, Chris Wilson

On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
> Perf measurements rely on CPU and engine timestamps to correlate
> events of interest across these time domains. Current mechanisms get
> these timestamps separately and the calculated delta between these
> timestamps lack enough accuracy.
>
> To improve the accuracy of these time measurements to within a few us,
> add a query that returns the engine and cpu timestamps captured as
> close to each other as possible.

Cc: dri-devel, Jason and Daniel for review.

>
> v2: (Tvrtko)
> - document clock reference used
> - return cpu timestamp always
> - capture cpu time just before lower dword of cs timestamp
>
> v3: (Chris)
> - use uncore-rpm
> - use __query_cs_timestamp helper
>
> v4: (Lionel)
> - Kernel perf subsytem allows users to specify the clock id to be used
>   in perf_event_open. This clock id is used by the perf subsystem to
>   return the appropriate cpu timestamp in perf events. Similarly, let
>   the user pass the clockid to this query so that cpu timestamp
>   corresponds to the clock id requested.
>
> v5: (Tvrtko)
> - Use normal ktime accessors instead of fast versions
> - Add more uApi documentation
>
> v6: (Lionel)
> - Move switch out of spinlock
>
> v7: (Chris)
> - cs_timestamp is a misnomer, use cs_cycles instead
> - return the cs cycle frequency as well in the query
>
> v8:
> - Add platform and engine specific checks
>
> v9: (Lionel)
> - Return 2 cpu timestamps in the query - captured before and after the
>   register read
>
> v10: (Chris)
> - Use local_clock() to measure time taken to read lower dword of
>   register and return it to user.
>
> v11: (Jani)
> - IS_GEN deprecated. User GRAPHICS_VER instead.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h       |  48 ++++++++++
>  2 files changed, 193 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> index fed337ad7b68..2594b93901ac 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -6,6 +6,8 @@
>  
>  #include <linux/nospec.h>
>  
> +#include "gt/intel_engine_pm.h"
> +#include "gt/intel_engine_user.h"
>  #include "i915_drv.h"
>  #include "i915_perf.h"
>  #include "i915_query.h"
> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>  	return total_length;
>  }
>  
> +typedef u64 (*__ktime_func_t)(void);
> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> +{
> +	/*
> +	 * Use logic same as the perf subsystem to allow user to select the
> +	 * reference clock id to be used for timestamps.
> +	 */
> +	switch (clk_id) {
> +	case CLOCK_MONOTONIC:
> +		return &ktime_get_ns;
> +	case CLOCK_MONOTONIC_RAW:
> +		return &ktime_get_raw_ns;
> +	case CLOCK_REALTIME:
> +		return &ktime_get_real_ns;
> +	case CLOCK_BOOTTIME:
> +		return &ktime_get_boottime_ns;
> +	case CLOCK_TAI:
> +		return &ktime_get_clocktai_ns;
> +	default:
> +		return NULL;
> +	}
> +}
> +
> +static inline int
> +__read_timestamps(struct intel_uncore *uncore,
> +		  i915_reg_t lower_reg,
> +		  i915_reg_t upper_reg,
> +		  u64 *cs_ts,
> +		  u64 *cpu_ts,
> +		  __ktime_func_t cpu_clock)
> +{
> +	u32 upper, lower, old_upper, loop = 0;
> +
> +	upper = intel_uncore_read_fw(uncore, upper_reg);
> +	do {
> +		cpu_ts[1] = local_clock();
> +		cpu_ts[0] = cpu_clock();
> +		lower = intel_uncore_read_fw(uncore, lower_reg);
> +		cpu_ts[1] = local_clock() - cpu_ts[1];
> +		old_upper = upper;
> +		upper = intel_uncore_read_fw(uncore, upper_reg);
> +	} while (upper != old_upper && loop++ < 2);
> +
> +	*cs_ts = (u64)upper << 32 | lower;
> +
> +	return 0;
> +}
> +
> +static int
> +__query_cs_cycles(struct intel_engine_cs *engine,
> +		  u64 *cs_ts, u64 *cpu_ts,
> +		  __ktime_func_t cpu_clock)
> +{
> +	struct intel_uncore *uncore = engine->uncore;
> +	enum forcewake_domains fw_domains;
> +	u32 base = engine->mmio_base;
> +	intel_wakeref_t wakeref;
> +	int ret;
> +
> +	fw_domains = intel_uncore_forcewake_for_reg(uncore,
> +						    RING_TIMESTAMP(base),
> +						    FW_REG_READ);
> +
> +	with_intel_runtime_pm(uncore->rpm, wakeref) {
> +		spin_lock_irq(&uncore->lock);
> +		intel_uncore_forcewake_get__locked(uncore, fw_domains);
> +
> +		ret = __read_timestamps(uncore,
> +					RING_TIMESTAMP(base),
> +					RING_TIMESTAMP_UDW(base),
> +					cs_ts,
> +					cpu_ts,
> +					cpu_clock);
> +
> +		intel_uncore_forcewake_put__locked(uncore, fw_domains);
> +		spin_unlock_irq(&uncore->lock);
> +	}
> +
> +	return ret;
> +}
> +
> +static int
> +query_cs_cycles(struct drm_i915_private *i915,
> +		struct drm_i915_query_item *query_item)
> +{
> +	struct drm_i915_query_cs_cycles __user *query_ptr;
> +	struct drm_i915_query_cs_cycles query;
> +	struct intel_engine_cs *engine;
> +	__ktime_func_t cpu_clock;
> +	int ret;
> +
> +	if (GRAPHICS_VER(i915) < 6)
> +		return -ENODEV;
> +
> +	query_ptr = u64_to_user_ptr(query_item->data_ptr);
> +	ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
> +	if (ret != 0)
> +		return ret;
> +
> +	if (query.flags)
> +		return -EINVAL;
> +
> +	if (query.rsvd)
> +		return -EINVAL;
> +
> +	cpu_clock = __clock_id_to_func(query.clockid);
> +	if (!cpu_clock)
> +		return -EINVAL;
> +
> +	engine = intel_engine_lookup_user(i915,
> +					  query.engine.engine_class,
> +					  query.engine.engine_instance);
> +	if (!engine)
> +		return -EINVAL;
> +
> +	if (GRAPHICS_VER(i915) == 6 &&
> +	    query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
> +		return -ENODEV;
> +
> +	query.cs_frequency = engine->gt->clock_frequency;
> +	ret = __query_cs_cycles(engine,
> +				&query.cs_cycles,
> +				query.cpu_timestamp,
> +				cpu_clock);
> +	if (ret)
> +		return ret;
> +
> +	if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
> +		return -EFAULT;
> +
> +	if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
> +		return -EFAULT;
> +
> +	if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
> +		return -EFAULT;
> +
> +	if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
> +		return -EFAULT;
> +
> +	return sizeof(query);
> +}
> +
>  static int
>  query_engine_info(struct drm_i915_private *i915,
>  		  struct drm_i915_query_item *query_item)
> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>  	query_topology_info,
>  	query_engine_info,
>  	query_perf_config,
> +	query_cs_cycles,
>  };
>  
>  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 6a34243a7646..08b00f1709b5 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>  #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>  #define DRM_I915_QUERY_ENGINE_INFO	2
>  #define DRM_I915_QUERY_PERF_CONFIG      3
> +	/**
> +	 * Query Command Streamer timestamp register.
> +	 */
> +#define DRM_I915_QUERY_CS_CYCLES	4
>  /* Must be kept compact -- no holes and well documented */
>  
>  	/**
> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>  	__u64 rsvd1[4];
>  };
>  
> +/**
> + * struct drm_i915_query_cs_cycles
> + *
> + * The query returns the command streamer cycles and the frequency that can be
> + * used to calculate the command streamer timestamp. In addition the query
> + * returns a set of cpu timestamps that indicate when the command streamer cycle
> + * count was captured.
> + */
> +struct drm_i915_query_cs_cycles {
> +	/** Engine for which command streamer cycles is queried. */
> +	struct i915_engine_class_instance engine;
> +
> +	/** Must be zero. */
> +	__u32 flags;
> +
> +	/**
> +	 * Command streamer cycles as read from the command streamer
> +	 * register at 0x358 offset.
> +	 */
> +	__u64 cs_cycles;
> +
> +	/** Frequency of the cs cycles in Hz. */
> +	__u64 cs_frequency;
> +
> +	/**
> +	 * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
> +	 * cs_cycles register using the reference clockid set by the user.
> +	 * cpu_timestamp[1] is the time taken in ns to read the lower dword of
> +	 * the cs_cycles register.
> +	 */
> +	__u64 cpu_timestamp[2];
> +
> +	/**
> +	 * Reference clock id for CPU timestamp. For definition, see
> +	 * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> +	 * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> +	 * CLOCK_TAI.
> +	 */
> +	__s32 clockid;
> +
> +	/** Must be zero. */
> +	__u32 rsvd;
> +};
> +
>  /**
>   * struct drm_i915_query_engine_info
>   *

-- 
Jani Nikula, Intel Open Source Graphics Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
  2021-04-28  8:43     ` [Intel-gfx] " Jani Nikula
@ 2021-04-28 19:24       ` Jason Ekstrand
  -1 siblings, 0 replies; 26+ messages in thread
From: Jason Ekstrand @ 2021-04-28 19:24 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Intel GFX, Maling list - DRI developers, Chris Wilson,
	Umesh Nerlige Ramappa

On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>
> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
> > Perf measurements rely on CPU and engine timestamps to correlate
> > events of interest across these time domains. Current mechanisms get
> > these timestamps separately and the calculated delta between these
> > timestamps lack enough accuracy.
> >
> > To improve the accuracy of these time measurements to within a few us,
> > add a query that returns the engine and cpu timestamps captured as
> > close to each other as possible.
>
> Cc: dri-devel, Jason and Daniel for review.

Thanks!

> >
> > v2: (Tvrtko)
> > - document clock reference used
> > - return cpu timestamp always
> > - capture cpu time just before lower dword of cs timestamp
> >
> > v3: (Chris)
> > - use uncore-rpm
> > - use __query_cs_timestamp helper
> >
> > v4: (Lionel)
> > - Kernel perf subsytem allows users to specify the clock id to be used
> >   in perf_event_open. This clock id is used by the perf subsystem to
> >   return the appropriate cpu timestamp in perf events. Similarly, let
> >   the user pass the clockid to this query so that cpu timestamp
> >   corresponds to the clock id requested.
> >
> > v5: (Tvrtko)
> > - Use normal ktime accessors instead of fast versions
> > - Add more uApi documentation
> >
> > v6: (Lionel)
> > - Move switch out of spinlock
> >
> > v7: (Chris)
> > - cs_timestamp is a misnomer, use cs_cycles instead
> > - return the cs cycle frequency as well in the query
> >
> > v8:
> > - Add platform and engine specific checks
> >
> > v9: (Lionel)
> > - Return 2 cpu timestamps in the query - captured before and after the
> >   register read
> >
> > v10: (Chris)
> > - Use local_clock() to measure time taken to read lower dword of
> >   register and return it to user.
> >
> > v11: (Jani)
> > - IS_GEN deprecated. User GRAPHICS_VER instead.
> >
> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
> >  include/uapi/drm/i915_drm.h       |  48 ++++++++++
> >  2 files changed, 193 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> > index fed337ad7b68..2594b93901ac 100644
> > --- a/drivers/gpu/drm/i915/i915_query.c
> > +++ b/drivers/gpu/drm/i915/i915_query.c
> > @@ -6,6 +6,8 @@
> >
> >  #include <linux/nospec.h>
> >
> > +#include "gt/intel_engine_pm.h"
> > +#include "gt/intel_engine_user.h"
> >  #include "i915_drv.h"
> >  #include "i915_perf.h"
> >  #include "i915_query.h"
> > @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
> >       return total_length;
> >  }
> >
> > +typedef u64 (*__ktime_func_t)(void);
> > +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> > +{
> > +     /*
> > +      * Use logic same as the perf subsystem to allow user to select the
> > +      * reference clock id to be used for timestamps.
> > +      */
> > +     switch (clk_id) {
> > +     case CLOCK_MONOTONIC:
> > +             return &ktime_get_ns;
> > +     case CLOCK_MONOTONIC_RAW:
> > +             return &ktime_get_raw_ns;
> > +     case CLOCK_REALTIME:
> > +             return &ktime_get_real_ns;
> > +     case CLOCK_BOOTTIME:
> > +             return &ktime_get_boottime_ns;
> > +     case CLOCK_TAI:
> > +             return &ktime_get_clocktai_ns;
> > +     default:
> > +             return NULL;
> > +     }
> > +}
> > +
> > +static inline int
> > +__read_timestamps(struct intel_uncore *uncore,
> > +               i915_reg_t lower_reg,
> > +               i915_reg_t upper_reg,
> > +               u64 *cs_ts,
> > +               u64 *cpu_ts,
> > +               __ktime_func_t cpu_clock)
> > +{
> > +     u32 upper, lower, old_upper, loop = 0;
> > +
> > +     upper = intel_uncore_read_fw(uncore, upper_reg);
> > +     do {
> > +             cpu_ts[1] = local_clock();
> > +             cpu_ts[0] = cpu_clock();
> > +             lower = intel_uncore_read_fw(uncore, lower_reg);
> > +             cpu_ts[1] = local_clock() - cpu_ts[1];
> > +             old_upper = upper;
> > +             upper = intel_uncore_read_fw(uncore, upper_reg);
> > +     } while (upper != old_upper && loop++ < 2);
> > +
> > +     *cs_ts = (u64)upper << 32 | lower;
> > +
> > +     return 0;
> > +}
> > +
> > +static int
> > +__query_cs_cycles(struct intel_engine_cs *engine,
> > +               u64 *cs_ts, u64 *cpu_ts,
> > +               __ktime_func_t cpu_clock)
> > +{
> > +     struct intel_uncore *uncore = engine->uncore;
> > +     enum forcewake_domains fw_domains;
> > +     u32 base = engine->mmio_base;
> > +     intel_wakeref_t wakeref;
> > +     int ret;
> > +
> > +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
> > +                                                 RING_TIMESTAMP(base),
> > +                                                 FW_REG_READ);
> > +
> > +     with_intel_runtime_pm(uncore->rpm, wakeref) {
> > +             spin_lock_irq(&uncore->lock);
> > +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
> > +
> > +             ret = __read_timestamps(uncore,
> > +                                     RING_TIMESTAMP(base),
> > +                                     RING_TIMESTAMP_UDW(base),
> > +                                     cs_ts,
> > +                                     cpu_ts,
> > +                                     cpu_clock);
> > +
> > +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
> > +             spin_unlock_irq(&uncore->lock);
> > +     }
> > +
> > +     return ret;
> > +}
> > +
> > +static int
> > +query_cs_cycles(struct drm_i915_private *i915,
> > +             struct drm_i915_query_item *query_item)
> > +{
> > +     struct drm_i915_query_cs_cycles __user *query_ptr;
> > +     struct drm_i915_query_cs_cycles query;
> > +     struct intel_engine_cs *engine;
> > +     __ktime_func_t cpu_clock;
> > +     int ret;
> > +
> > +     if (GRAPHICS_VER(i915) < 6)
> > +             return -ENODEV;
> > +
> > +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
> > +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
> > +     if (ret != 0)
> > +             return ret;
> > +
> > +     if (query.flags)
> > +             return -EINVAL;
> > +
> > +     if (query.rsvd)
> > +             return -EINVAL;
> > +
> > +     cpu_clock = __clock_id_to_func(query.clockid);
> > +     if (!cpu_clock)
> > +             return -EINVAL;
> > +
> > +     engine = intel_engine_lookup_user(i915,
> > +                                       query.engine.engine_class,
> > +                                       query.engine.engine_instance);
> > +     if (!engine)
> > +             return -EINVAL;
> > +
> > +     if (GRAPHICS_VER(i915) == 6 &&
> > +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
> > +             return -ENODEV;
> > +
> > +     query.cs_frequency = engine->gt->clock_frequency;
> > +     ret = __query_cs_cycles(engine,
> > +                             &query.cs_cycles,
> > +                             query.cpu_timestamp,
> > +                             cpu_clock);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
> > +             return -EFAULT;
> > +
> > +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
> > +             return -EFAULT;
> > +
> > +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
> > +             return -EFAULT;
> > +
> > +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
> > +             return -EFAULT;
> > +
> > +     return sizeof(query);
> > +}
> > +
> >  static int
> >  query_engine_info(struct drm_i915_private *i915,
> >                 struct drm_i915_query_item *query_item)
> > @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
> >       query_topology_info,
> >       query_engine_info,
> >       query_perf_config,
> > +     query_cs_cycles,
> >  };
> >
> >  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 6a34243a7646..08b00f1709b5 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
> >  #define DRM_I915_QUERY_TOPOLOGY_INFO    1
> >  #define DRM_I915_QUERY_ENGINE_INFO   2
> >  #define DRM_I915_QUERY_PERF_CONFIG      3
> > +     /**
> > +      * Query Command Streamer timestamp register.
> > +      */
> > +#define DRM_I915_QUERY_CS_CYCLES     4
> >  /* Must be kept compact -- no holes and well documented */
> >
> >       /**
> > @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
> >       __u64 rsvd1[4];
> >  };
> >
> > +/**
> > + * struct drm_i915_query_cs_cycles
> > + *
> > + * The query returns the command streamer cycles and the frequency that can be
> > + * used to calculate the command streamer timestamp. In addition the query
> > + * returns a set of cpu timestamps that indicate when the command streamer cycle
> > + * count was captured.
> > + */
> > +struct drm_i915_query_cs_cycles {
> > +     /** Engine for which command streamer cycles is queried. */
> > +     struct i915_engine_class_instance engine;

Why is this per-engine?  Do we actually expect it to change between
engines?  If so, we may have a problem because Vulkan expects a
unified timestamp domain for all command streamer timestamp queries.

--Jason


> > +     /** Must be zero. */
> > +     __u32 flags;
> > +
> > +     /**
> > +      * Command streamer cycles as read from the command streamer
> > +      * register at 0x358 offset.
> > +      */
> > +     __u64 cs_cycles;
> > +
> > +     /** Frequency of the cs cycles in Hz. */
> > +     __u64 cs_frequency;
> > +
> > +     /**
> > +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
> > +      * cs_cycles register using the reference clockid set by the user.
> > +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
> > +      * the cs_cycles register.
> > +      */
> > +     __u64 cpu_timestamp[2];
> > +
> > +     /**
> > +      * Reference clock id for CPU timestamp. For definition, see
> > +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> > +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> > +      * CLOCK_TAI.
> > +      */
> > +     __s32 clockid;
> > +
> > +     /** Must be zero. */
> > +     __u32 rsvd;
> > +};
> > +
> >  /**
> >   * struct drm_i915_query_engine_info
> >   *
>
> --
> Jani Nikula, Intel Open Source Graphics Center
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
@ 2021-04-28 19:24       ` Jason Ekstrand
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Ekstrand @ 2021-04-28 19:24 UTC (permalink / raw)
  To: Jani Nikula; +Cc: Intel GFX, Maling list - DRI developers, Chris Wilson

On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>
> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
> > Perf measurements rely on CPU and engine timestamps to correlate
> > events of interest across these time domains. Current mechanisms get
> > these timestamps separately and the calculated delta between these
> > timestamps lack enough accuracy.
> >
> > To improve the accuracy of these time measurements to within a few us,
> > add a query that returns the engine and cpu timestamps captured as
> > close to each other as possible.
>
> Cc: dri-devel, Jason and Daniel for review.

Thanks!

> >
> > v2: (Tvrtko)
> > - document clock reference used
> > - return cpu timestamp always
> > - capture cpu time just before lower dword of cs timestamp
> >
> > v3: (Chris)
> > - use uncore-rpm
> > - use __query_cs_timestamp helper
> >
> > v4: (Lionel)
> > - Kernel perf subsytem allows users to specify the clock id to be used
> >   in perf_event_open. This clock id is used by the perf subsystem to
> >   return the appropriate cpu timestamp in perf events. Similarly, let
> >   the user pass the clockid to this query so that cpu timestamp
> >   corresponds to the clock id requested.
> >
> > v5: (Tvrtko)
> > - Use normal ktime accessors instead of fast versions
> > - Add more uApi documentation
> >
> > v6: (Lionel)
> > - Move switch out of spinlock
> >
> > v7: (Chris)
> > - cs_timestamp is a misnomer, use cs_cycles instead
> > - return the cs cycle frequency as well in the query
> >
> > v8:
> > - Add platform and engine specific checks
> >
> > v9: (Lionel)
> > - Return 2 cpu timestamps in the query - captured before and after the
> >   register read
> >
> > v10: (Chris)
> > - Use local_clock() to measure time taken to read lower dword of
> >   register and return it to user.
> >
> > v11: (Jani)
> > - IS_GEN deprecated. User GRAPHICS_VER instead.
> >
> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
> >  include/uapi/drm/i915_drm.h       |  48 ++++++++++
> >  2 files changed, 193 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> > index fed337ad7b68..2594b93901ac 100644
> > --- a/drivers/gpu/drm/i915/i915_query.c
> > +++ b/drivers/gpu/drm/i915/i915_query.c
> > @@ -6,6 +6,8 @@
> >
> >  #include <linux/nospec.h>
> >
> > +#include "gt/intel_engine_pm.h"
> > +#include "gt/intel_engine_user.h"
> >  #include "i915_drv.h"
> >  #include "i915_perf.h"
> >  #include "i915_query.h"
> > @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
> >       return total_length;
> >  }
> >
> > +typedef u64 (*__ktime_func_t)(void);
> > +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> > +{
> > +     /*
> > +      * Use logic same as the perf subsystem to allow user to select the
> > +      * reference clock id to be used for timestamps.
> > +      */
> > +     switch (clk_id) {
> > +     case CLOCK_MONOTONIC:
> > +             return &ktime_get_ns;
> > +     case CLOCK_MONOTONIC_RAW:
> > +             return &ktime_get_raw_ns;
> > +     case CLOCK_REALTIME:
> > +             return &ktime_get_real_ns;
> > +     case CLOCK_BOOTTIME:
> > +             return &ktime_get_boottime_ns;
> > +     case CLOCK_TAI:
> > +             return &ktime_get_clocktai_ns;
> > +     default:
> > +             return NULL;
> > +     }
> > +}
> > +
> > +static inline int
> > +__read_timestamps(struct intel_uncore *uncore,
> > +               i915_reg_t lower_reg,
> > +               i915_reg_t upper_reg,
> > +               u64 *cs_ts,
> > +               u64 *cpu_ts,
> > +               __ktime_func_t cpu_clock)
> > +{
> > +     u32 upper, lower, old_upper, loop = 0;
> > +
> > +     upper = intel_uncore_read_fw(uncore, upper_reg);
> > +     do {
> > +             cpu_ts[1] = local_clock();
> > +             cpu_ts[0] = cpu_clock();
> > +             lower = intel_uncore_read_fw(uncore, lower_reg);
> > +             cpu_ts[1] = local_clock() - cpu_ts[1];
> > +             old_upper = upper;
> > +             upper = intel_uncore_read_fw(uncore, upper_reg);
> > +     } while (upper != old_upper && loop++ < 2);
> > +
> > +     *cs_ts = (u64)upper << 32 | lower;
> > +
> > +     return 0;
> > +}
> > +
> > +static int
> > +__query_cs_cycles(struct intel_engine_cs *engine,
> > +               u64 *cs_ts, u64 *cpu_ts,
> > +               __ktime_func_t cpu_clock)
> > +{
> > +     struct intel_uncore *uncore = engine->uncore;
> > +     enum forcewake_domains fw_domains;
> > +     u32 base = engine->mmio_base;
> > +     intel_wakeref_t wakeref;
> > +     int ret;
> > +
> > +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
> > +                                                 RING_TIMESTAMP(base),
> > +                                                 FW_REG_READ);
> > +
> > +     with_intel_runtime_pm(uncore->rpm, wakeref) {
> > +             spin_lock_irq(&uncore->lock);
> > +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
> > +
> > +             ret = __read_timestamps(uncore,
> > +                                     RING_TIMESTAMP(base),
> > +                                     RING_TIMESTAMP_UDW(base),
> > +                                     cs_ts,
> > +                                     cpu_ts,
> > +                                     cpu_clock);
> > +
> > +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
> > +             spin_unlock_irq(&uncore->lock);
> > +     }
> > +
> > +     return ret;
> > +}
> > +
> > +static int
> > +query_cs_cycles(struct drm_i915_private *i915,
> > +             struct drm_i915_query_item *query_item)
> > +{
> > +     struct drm_i915_query_cs_cycles __user *query_ptr;
> > +     struct drm_i915_query_cs_cycles query;
> > +     struct intel_engine_cs *engine;
> > +     __ktime_func_t cpu_clock;
> > +     int ret;
> > +
> > +     if (GRAPHICS_VER(i915) < 6)
> > +             return -ENODEV;
> > +
> > +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
> > +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
> > +     if (ret != 0)
> > +             return ret;
> > +
> > +     if (query.flags)
> > +             return -EINVAL;
> > +
> > +     if (query.rsvd)
> > +             return -EINVAL;
> > +
> > +     cpu_clock = __clock_id_to_func(query.clockid);
> > +     if (!cpu_clock)
> > +             return -EINVAL;
> > +
> > +     engine = intel_engine_lookup_user(i915,
> > +                                       query.engine.engine_class,
> > +                                       query.engine.engine_instance);
> > +     if (!engine)
> > +             return -EINVAL;
> > +
> > +     if (GRAPHICS_VER(i915) == 6 &&
> > +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
> > +             return -ENODEV;
> > +
> > +     query.cs_frequency = engine->gt->clock_frequency;
> > +     ret = __query_cs_cycles(engine,
> > +                             &query.cs_cycles,
> > +                             query.cpu_timestamp,
> > +                             cpu_clock);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
> > +             return -EFAULT;
> > +
> > +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
> > +             return -EFAULT;
> > +
> > +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
> > +             return -EFAULT;
> > +
> > +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
> > +             return -EFAULT;
> > +
> > +     return sizeof(query);
> > +}
> > +
> >  static int
> >  query_engine_info(struct drm_i915_private *i915,
> >                 struct drm_i915_query_item *query_item)
> > @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
> >       query_topology_info,
> >       query_engine_info,
> >       query_perf_config,
> > +     query_cs_cycles,
> >  };
> >
> >  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 6a34243a7646..08b00f1709b5 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
> >  #define DRM_I915_QUERY_TOPOLOGY_INFO    1
> >  #define DRM_I915_QUERY_ENGINE_INFO   2
> >  #define DRM_I915_QUERY_PERF_CONFIG      3
> > +     /**
> > +      * Query Command Streamer timestamp register.
> > +      */
> > +#define DRM_I915_QUERY_CS_CYCLES     4
> >  /* Must be kept compact -- no holes and well documented */
> >
> >       /**
> > @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
> >       __u64 rsvd1[4];
> >  };
> >
> > +/**
> > + * struct drm_i915_query_cs_cycles
> > + *
> > + * The query returns the command streamer cycles and the frequency that can be
> > + * used to calculate the command streamer timestamp. In addition the query
> > + * returns a set of cpu timestamps that indicate when the command streamer cycle
> > + * count was captured.
> > + */
> > +struct drm_i915_query_cs_cycles {
> > +     /** Engine for which command streamer cycles is queried. */
> > +     struct i915_engine_class_instance engine;

Why is this per-engine?  Do we actually expect it to change between
engines?  If so, we may have a problem because Vulkan expects a
unified timestamp domain for all command streamer timestamp queries.

--Jason


> > +     /** Must be zero. */
> > +     __u32 flags;
> > +
> > +     /**
> > +      * Command streamer cycles as read from the command streamer
> > +      * register at 0x358 offset.
> > +      */
> > +     __u64 cs_cycles;
> > +
> > +     /** Frequency of the cs cycles in Hz. */
> > +     __u64 cs_frequency;
> > +
> > +     /**
> > +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
> > +      * cs_cycles register using the reference clockid set by the user.
> > +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
> > +      * the cs_cycles register.
> > +      */
> > +     __u64 cpu_timestamp[2];
> > +
> > +     /**
> > +      * Reference clock id for CPU timestamp. For definition, see
> > +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> > +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> > +      * CLOCK_TAI.
> > +      */
> > +     __s32 clockid;
> > +
> > +     /** Must be zero. */
> > +     __u32 rsvd;
> > +};
> > +
> >  /**
> >   * struct drm_i915_query_engine_info
> >   *
>
> --
> Jani Nikula, Intel Open Source Graphics Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
  2021-04-28 19:24       ` [Intel-gfx] " Jason Ekstrand
@ 2021-04-28 19:49         ` Lionel Landwerlin
  -1 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2021-04-28 19:49 UTC (permalink / raw)
  To: Jason Ekstrand, Jani Nikula
  Cc: Intel GFX, Umesh Nerlige Ramappa, Maling list - DRI developers,
	Chris Wilson


[-- Attachment #1.1: Type: text/plain, Size: 11184 bytes --]

On 28/04/2021 22:24, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
>>> Perf measurements rely on CPU and engine timestamps to correlate
>>> events of interest across these time domains. Current mechanisms get
>>> these timestamps separately and the calculated delta between these
>>> timestamps lack enough accuracy.
>>>
>>> To improve the accuracy of these time measurements to within a few us,
>>> add a query that returns the engine and cpu timestamps captured as
>>> close to each other as possible.
>> Cc: dri-devel, Jason and Daniel for review.
> Thanks!
>
>>> v2: (Tvrtko)
>>> - document clock reference used
>>> - return cpu timestamp always
>>> - capture cpu time just before lower dword of cs timestamp
>>>
>>> v3: (Chris)
>>> - use uncore-rpm
>>> - use __query_cs_timestamp helper
>>>
>>> v4: (Lionel)
>>> - Kernel perf subsytem allows users to specify the clock id to be used
>>>    in perf_event_open. This clock id is used by the perf subsystem to
>>>    return the appropriate cpu timestamp in perf events. Similarly, let
>>>    the user pass the clockid to this query so that cpu timestamp
>>>    corresponds to the clock id requested.
>>>
>>> v5: (Tvrtko)
>>> - Use normal ktime accessors instead of fast versions
>>> - Add more uApi documentation
>>>
>>> v6: (Lionel)
>>> - Move switch out of spinlock
>>>
>>> v7: (Chris)
>>> - cs_timestamp is a misnomer, use cs_cycles instead
>>> - return the cs cycle frequency as well in the query
>>>
>>> v8:
>>> - Add platform and engine specific checks
>>>
>>> v9: (Lionel)
>>> - Return 2 cpu timestamps in the query - captured before and after the
>>>    register read
>>>
>>> v10: (Chris)
>>> - Use local_clock() to measure time taken to read lower dword of
>>>    register and return it to user.
>>>
>>> v11: (Jani)
>>> - IS_GEN deprecated. User GRAPHICS_VER instead.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
>>>   include/uapi/drm/i915_drm.h       |  48 ++++++++++
>>>   2 files changed, 193 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
>>> index fed337ad7b68..2594b93901ac 100644
>>> --- a/drivers/gpu/drm/i915/i915_query.c
>>> +++ b/drivers/gpu/drm/i915/i915_query.c
>>> @@ -6,6 +6,8 @@
>>>
>>>   #include <linux/nospec.h>
>>>
>>> +#include "gt/intel_engine_pm.h"
>>> +#include "gt/intel_engine_user.h"
>>>   #include "i915_drv.h"
>>>   #include "i915_perf.h"
>>>   #include "i915_query.h"
>>> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>>>        return total_length;
>>>   }
>>>
>>> +typedef u64 (*__ktime_func_t)(void);
>>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>>> +{
>>> +     /*
>>> +      * Use logic same as the perf subsystem to allow user to select the
>>> +      * reference clock id to be used for timestamps.
>>> +      */
>>> +     switch (clk_id) {
>>> +     case CLOCK_MONOTONIC:
>>> +             return &ktime_get_ns;
>>> +     case CLOCK_MONOTONIC_RAW:
>>> +             return &ktime_get_raw_ns;
>>> +     case CLOCK_REALTIME:
>>> +             return &ktime_get_real_ns;
>>> +     case CLOCK_BOOTTIME:
>>> +             return &ktime_get_boottime_ns;
>>> +     case CLOCK_TAI:
>>> +             return &ktime_get_clocktai_ns;
>>> +     default:
>>> +             return NULL;
>>> +     }
>>> +}
>>> +
>>> +static inline int
>>> +__read_timestamps(struct intel_uncore *uncore,
>>> +               i915_reg_t lower_reg,
>>> +               i915_reg_t upper_reg,
>>> +               u64 *cs_ts,
>>> +               u64 *cpu_ts,
>>> +               __ktime_func_t cpu_clock)
>>> +{
>>> +     u32 upper, lower, old_upper, loop = 0;
>>> +
>>> +     upper = intel_uncore_read_fw(uncore, upper_reg);
>>> +     do {
>>> +             cpu_ts[1] = local_clock();
>>> +             cpu_ts[0] = cpu_clock();
>>> +             lower = intel_uncore_read_fw(uncore, lower_reg);
>>> +             cpu_ts[1] = local_clock() - cpu_ts[1];
>>> +             old_upper = upper;
>>> +             upper = intel_uncore_read_fw(uncore, upper_reg);
>>> +     } while (upper != old_upper && loop++ < 2);
>>> +
>>> +     *cs_ts = (u64)upper << 32 | lower;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static int
>>> +__query_cs_cycles(struct intel_engine_cs *engine,
>>> +               u64 *cs_ts, u64 *cpu_ts,
>>> +               __ktime_func_t cpu_clock)
>>> +{
>>> +     struct intel_uncore *uncore = engine->uncore;
>>> +     enum forcewake_domains fw_domains;
>>> +     u32 base = engine->mmio_base;
>>> +     intel_wakeref_t wakeref;
>>> +     int ret;
>>> +
>>> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
>>> +                                                 RING_TIMESTAMP(base),
>>> +                                                 FW_REG_READ);
>>> +
>>> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
>>> +             spin_lock_irq(&uncore->lock);
>>> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
>>> +
>>> +             ret = __read_timestamps(uncore,
>>> +                                     RING_TIMESTAMP(base),
>>> +                                     RING_TIMESTAMP_UDW(base),
>>> +                                     cs_ts,
>>> +                                     cpu_ts,
>>> +                                     cpu_clock);
>>> +
>>> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
>>> +             spin_unlock_irq(&uncore->lock);
>>> +     }
>>> +
>>> +     return ret;
>>> +}
>>> +
>>> +static int
>>> +query_cs_cycles(struct drm_i915_private *i915,
>>> +             struct drm_i915_query_item *query_item)
>>> +{
>>> +     struct drm_i915_query_cs_cycles __user *query_ptr;
>>> +     struct drm_i915_query_cs_cycles query;
>>> +     struct intel_engine_cs *engine;
>>> +     __ktime_func_t cpu_clock;
>>> +     int ret;
>>> +
>>> +     if (GRAPHICS_VER(i915) < 6)
>>> +             return -ENODEV;
>>> +
>>> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
>>> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
>>> +     if (ret != 0)
>>> +             return ret;
>>> +
>>> +     if (query.flags)
>>> +             return -EINVAL;
>>> +
>>> +     if (query.rsvd)
>>> +             return -EINVAL;
>>> +
>>> +     cpu_clock = __clock_id_to_func(query.clockid);
>>> +     if (!cpu_clock)
>>> +             return -EINVAL;
>>> +
>>> +     engine = intel_engine_lookup_user(i915,
>>> +                                       query.engine.engine_class,
>>> +                                       query.engine.engine_instance);
>>> +     if (!engine)
>>> +             return -EINVAL;
>>> +
>>> +     if (GRAPHICS_VER(i915) == 6 &&
>>> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
>>> +             return -ENODEV;
>>> +
>>> +     query.cs_frequency = engine->gt->clock_frequency;
>>> +     ret = __query_cs_cycles(engine,
>>> +                             &query.cs_cycles,
>>> +                             query.cpu_timestamp,
>>> +                             cpu_clock);
>>> +     if (ret)
>>> +             return ret;
>>> +
>>> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
>>> +             return -EFAULT;
>>> +
>>> +     return sizeof(query);
>>> +}
>>> +
>>>   static int
>>>   query_engine_info(struct drm_i915_private *i915,
>>>                  struct drm_i915_query_item *query_item)
>>> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>>>        query_topology_info,
>>>        query_engine_info,
>>>        query_perf_config,
>>> +     query_cs_cycles,
>>>   };
>>>
>>>   int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 6a34243a7646..08b00f1709b5 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>>>   #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>>>   #define DRM_I915_QUERY_ENGINE_INFO   2
>>>   #define DRM_I915_QUERY_PERF_CONFIG      3
>>> +     /**
>>> +      * Query Command Streamer timestamp register.
>>> +      */
>>> +#define DRM_I915_QUERY_CS_CYCLES     4
>>>   /* Must be kept compact -- no holes and well documented */
>>>
>>>        /**
>>> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>>>        __u64 rsvd1[4];
>>>   };
>>>
>>> +/**
>>> + * struct drm_i915_query_cs_cycles
>>> + *
>>> + * The query returns the command streamer cycles and the frequency that can be
>>> + * used to calculate the command streamer timestamp. In addition the query
>>> + * returns a set of cpu timestamps that indicate when the command streamer cycle
>>> + * count was captured.
>>> + */
>>> +struct drm_i915_query_cs_cycles {
>>> +     /** Engine for which command streamer cycles is queried. */
>>> +     struct i915_engine_class_instance engine;
> Why is this per-engine?  Do we actually expect it to change between
> engines?


Each engine has its own timestamp register.


>    If so, we may have a problem because Vulkan expects a
> unified timestamp domain for all command streamer timestamp queries.


I don't think it does : "

Timestamps*may*only be meaningfully compared if they are written by 
commands submitted to the same queue.

" [1]


[1] : 
https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html


-Lionel


>
> --Jason
>
>
>>> +     /** Must be zero. */
>>> +     __u32 flags;
>>> +
>>> +     /**
>>> +      * Command streamer cycles as read from the command streamer
>>> +      * register at 0x358 offset.
>>> +      */
>>> +     __u64 cs_cycles;
>>> +
>>> +     /** Frequency of the cs cycles in Hz. */
>>> +     __u64 cs_frequency;
>>> +
>>> +     /**
>>> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
>>> +      * cs_cycles register using the reference clockid set by the user.
>>> +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
>>> +      * the cs_cycles register.
>>> +      */
>>> +     __u64 cpu_timestamp[2];
>>> +
>>> +     /**
>>> +      * Reference clock id for CPU timestamp. For definition, see
>>> +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
>>> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
>>> +      * CLOCK_TAI.
>>> +      */
>>> +     __s32 clockid;
>>> +
>>> +     /** Must be zero. */
>>> +     __u32 rsvd;
>>> +};
>>> +
>>>   /**
>>>    * struct drm_i915_query_engine_info
>>>    *
>> --
>> Jani Nikula, Intel Open Source Graphics Center



[-- Attachment #1.2: Type: text/html, Size: 13283 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
@ 2021-04-28 19:49         ` Lionel Landwerlin
  0 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2021-04-28 19:49 UTC (permalink / raw)
  To: Jason Ekstrand, Jani Nikula
  Cc: Intel GFX, Maling list - DRI developers, Chris Wilson


[-- Attachment #1.1: Type: text/plain, Size: 11184 bytes --]

On 28/04/2021 22:24, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
>>> Perf measurements rely on CPU and engine timestamps to correlate
>>> events of interest across these time domains. Current mechanisms get
>>> these timestamps separately and the calculated delta between these
>>> timestamps lack enough accuracy.
>>>
>>> To improve the accuracy of these time measurements to within a few us,
>>> add a query that returns the engine and cpu timestamps captured as
>>> close to each other as possible.
>> Cc: dri-devel, Jason and Daniel for review.
> Thanks!
>
>>> v2: (Tvrtko)
>>> - document clock reference used
>>> - return cpu timestamp always
>>> - capture cpu time just before lower dword of cs timestamp
>>>
>>> v3: (Chris)
>>> - use uncore-rpm
>>> - use __query_cs_timestamp helper
>>>
>>> v4: (Lionel)
>>> - Kernel perf subsytem allows users to specify the clock id to be used
>>>    in perf_event_open. This clock id is used by the perf subsystem to
>>>    return the appropriate cpu timestamp in perf events. Similarly, let
>>>    the user pass the clockid to this query so that cpu timestamp
>>>    corresponds to the clock id requested.
>>>
>>> v5: (Tvrtko)
>>> - Use normal ktime accessors instead of fast versions
>>> - Add more uApi documentation
>>>
>>> v6: (Lionel)
>>> - Move switch out of spinlock
>>>
>>> v7: (Chris)
>>> - cs_timestamp is a misnomer, use cs_cycles instead
>>> - return the cs cycle frequency as well in the query
>>>
>>> v8:
>>> - Add platform and engine specific checks
>>>
>>> v9: (Lionel)
>>> - Return 2 cpu timestamps in the query - captured before and after the
>>>    register read
>>>
>>> v10: (Chris)
>>> - Use local_clock() to measure time taken to read lower dword of
>>>    register and return it to user.
>>>
>>> v11: (Jani)
>>> - IS_GEN deprecated. User GRAPHICS_VER instead.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
>>>   include/uapi/drm/i915_drm.h       |  48 ++++++++++
>>>   2 files changed, 193 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
>>> index fed337ad7b68..2594b93901ac 100644
>>> --- a/drivers/gpu/drm/i915/i915_query.c
>>> +++ b/drivers/gpu/drm/i915/i915_query.c
>>> @@ -6,6 +6,8 @@
>>>
>>>   #include <linux/nospec.h>
>>>
>>> +#include "gt/intel_engine_pm.h"
>>> +#include "gt/intel_engine_user.h"
>>>   #include "i915_drv.h"
>>>   #include "i915_perf.h"
>>>   #include "i915_query.h"
>>> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>>>        return total_length;
>>>   }
>>>
>>> +typedef u64 (*__ktime_func_t)(void);
>>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>>> +{
>>> +     /*
>>> +      * Use logic same as the perf subsystem to allow user to select the
>>> +      * reference clock id to be used for timestamps.
>>> +      */
>>> +     switch (clk_id) {
>>> +     case CLOCK_MONOTONIC:
>>> +             return &ktime_get_ns;
>>> +     case CLOCK_MONOTONIC_RAW:
>>> +             return &ktime_get_raw_ns;
>>> +     case CLOCK_REALTIME:
>>> +             return &ktime_get_real_ns;
>>> +     case CLOCK_BOOTTIME:
>>> +             return &ktime_get_boottime_ns;
>>> +     case CLOCK_TAI:
>>> +             return &ktime_get_clocktai_ns;
>>> +     default:
>>> +             return NULL;
>>> +     }
>>> +}
>>> +
>>> +static inline int
>>> +__read_timestamps(struct intel_uncore *uncore,
>>> +               i915_reg_t lower_reg,
>>> +               i915_reg_t upper_reg,
>>> +               u64 *cs_ts,
>>> +               u64 *cpu_ts,
>>> +               __ktime_func_t cpu_clock)
>>> +{
>>> +     u32 upper, lower, old_upper, loop = 0;
>>> +
>>> +     upper = intel_uncore_read_fw(uncore, upper_reg);
>>> +     do {
>>> +             cpu_ts[1] = local_clock();
>>> +             cpu_ts[0] = cpu_clock();
>>> +             lower = intel_uncore_read_fw(uncore, lower_reg);
>>> +             cpu_ts[1] = local_clock() - cpu_ts[1];
>>> +             old_upper = upper;
>>> +             upper = intel_uncore_read_fw(uncore, upper_reg);
>>> +     } while (upper != old_upper && loop++ < 2);
>>> +
>>> +     *cs_ts = (u64)upper << 32 | lower;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static int
>>> +__query_cs_cycles(struct intel_engine_cs *engine,
>>> +               u64 *cs_ts, u64 *cpu_ts,
>>> +               __ktime_func_t cpu_clock)
>>> +{
>>> +     struct intel_uncore *uncore = engine->uncore;
>>> +     enum forcewake_domains fw_domains;
>>> +     u32 base = engine->mmio_base;
>>> +     intel_wakeref_t wakeref;
>>> +     int ret;
>>> +
>>> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
>>> +                                                 RING_TIMESTAMP(base),
>>> +                                                 FW_REG_READ);
>>> +
>>> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
>>> +             spin_lock_irq(&uncore->lock);
>>> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
>>> +
>>> +             ret = __read_timestamps(uncore,
>>> +                                     RING_TIMESTAMP(base),
>>> +                                     RING_TIMESTAMP_UDW(base),
>>> +                                     cs_ts,
>>> +                                     cpu_ts,
>>> +                                     cpu_clock);
>>> +
>>> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
>>> +             spin_unlock_irq(&uncore->lock);
>>> +     }
>>> +
>>> +     return ret;
>>> +}
>>> +
>>> +static int
>>> +query_cs_cycles(struct drm_i915_private *i915,
>>> +             struct drm_i915_query_item *query_item)
>>> +{
>>> +     struct drm_i915_query_cs_cycles __user *query_ptr;
>>> +     struct drm_i915_query_cs_cycles query;
>>> +     struct intel_engine_cs *engine;
>>> +     __ktime_func_t cpu_clock;
>>> +     int ret;
>>> +
>>> +     if (GRAPHICS_VER(i915) < 6)
>>> +             return -ENODEV;
>>> +
>>> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
>>> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
>>> +     if (ret != 0)
>>> +             return ret;
>>> +
>>> +     if (query.flags)
>>> +             return -EINVAL;
>>> +
>>> +     if (query.rsvd)
>>> +             return -EINVAL;
>>> +
>>> +     cpu_clock = __clock_id_to_func(query.clockid);
>>> +     if (!cpu_clock)
>>> +             return -EINVAL;
>>> +
>>> +     engine = intel_engine_lookup_user(i915,
>>> +                                       query.engine.engine_class,
>>> +                                       query.engine.engine_instance);
>>> +     if (!engine)
>>> +             return -EINVAL;
>>> +
>>> +     if (GRAPHICS_VER(i915) == 6 &&
>>> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
>>> +             return -ENODEV;
>>> +
>>> +     query.cs_frequency = engine->gt->clock_frequency;
>>> +     ret = __query_cs_cycles(engine,
>>> +                             &query.cs_cycles,
>>> +                             query.cpu_timestamp,
>>> +                             cpu_clock);
>>> +     if (ret)
>>> +             return ret;
>>> +
>>> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
>>> +             return -EFAULT;
>>> +
>>> +     return sizeof(query);
>>> +}
>>> +
>>>   static int
>>>   query_engine_info(struct drm_i915_private *i915,
>>>                  struct drm_i915_query_item *query_item)
>>> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>>>        query_topology_info,
>>>        query_engine_info,
>>>        query_perf_config,
>>> +     query_cs_cycles,
>>>   };
>>>
>>>   int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 6a34243a7646..08b00f1709b5 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>>>   #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>>>   #define DRM_I915_QUERY_ENGINE_INFO   2
>>>   #define DRM_I915_QUERY_PERF_CONFIG      3
>>> +     /**
>>> +      * Query Command Streamer timestamp register.
>>> +      */
>>> +#define DRM_I915_QUERY_CS_CYCLES     4
>>>   /* Must be kept compact -- no holes and well documented */
>>>
>>>        /**
>>> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>>>        __u64 rsvd1[4];
>>>   };
>>>
>>> +/**
>>> + * struct drm_i915_query_cs_cycles
>>> + *
>>> + * The query returns the command streamer cycles and the frequency that can be
>>> + * used to calculate the command streamer timestamp. In addition the query
>>> + * returns a set of cpu timestamps that indicate when the command streamer cycle
>>> + * count was captured.
>>> + */
>>> +struct drm_i915_query_cs_cycles {
>>> +     /** Engine for which command streamer cycles is queried. */
>>> +     struct i915_engine_class_instance engine;
> Why is this per-engine?  Do we actually expect it to change between
> engines?


Each engine has its own timestamp register.


>    If so, we may have a problem because Vulkan expects a
> unified timestamp domain for all command streamer timestamp queries.


I don't think it does : "

Timestamps*may*only be meaningfully compared if they are written by 
commands submitted to the same queue.

" [1]


[1] : 
https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html


-Lionel


>
> --Jason
>
>
>>> +     /** Must be zero. */
>>> +     __u32 flags;
>>> +
>>> +     /**
>>> +      * Command streamer cycles as read from the command streamer
>>> +      * register at 0x358 offset.
>>> +      */
>>> +     __u64 cs_cycles;
>>> +
>>> +     /** Frequency of the cs cycles in Hz. */
>>> +     __u64 cs_frequency;
>>> +
>>> +     /**
>>> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
>>> +      * cs_cycles register using the reference clockid set by the user.
>>> +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
>>> +      * the cs_cycles register.
>>> +      */
>>> +     __u64 cpu_timestamp[2];
>>> +
>>> +     /**
>>> +      * Reference clock id for CPU timestamp. For definition, see
>>> +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
>>> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
>>> +      * CLOCK_TAI.
>>> +      */
>>> +     __s32 clockid;
>>> +
>>> +     /** Must be zero. */
>>> +     __u32 rsvd;
>>> +};
>>> +
>>>   /**
>>>    * struct drm_i915_query_engine_info
>>>    *
>> --
>> Jani Nikula, Intel Open Source Graphics Center



[-- Attachment #1.2: Type: text/html, Size: 13283 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
  2021-04-28 19:49         ` [Intel-gfx] " Lionel Landwerlin
@ 2021-04-28 19:54           ` Jason Ekstrand
  -1 siblings, 0 replies; 26+ messages in thread
From: Jason Ekstrand @ 2021-04-28 19:54 UTC (permalink / raw)
  To: Lionel Landwerlin
  Cc: Intel GFX, Maling list - DRI developers, Chris Wilson,
	Umesh Nerlige Ramappa

On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
>
> On 28/04/2021 22:24, Jason Ekstrand wrote:
>
> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>
> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
>
> Perf measurements rely on CPU and engine timestamps to correlate
> events of interest across these time domains. Current mechanisms get
> these timestamps separately and the calculated delta between these
> timestamps lack enough accuracy.
>
> To improve the accuracy of these time measurements to within a few us,
> add a query that returns the engine and cpu timestamps captured as
> close to each other as possible.
>
> Cc: dri-devel, Jason and Daniel for review.
>
> Thanks!
>
> v2: (Tvrtko)
> - document clock reference used
> - return cpu timestamp always
> - capture cpu time just before lower dword of cs timestamp
>
> v3: (Chris)
> - use uncore-rpm
> - use __query_cs_timestamp helper
>
> v4: (Lionel)
> - Kernel perf subsytem allows users to specify the clock id to be used
>   in perf_event_open. This clock id is used by the perf subsystem to
>   return the appropriate cpu timestamp in perf events. Similarly, let
>   the user pass the clockid to this query so that cpu timestamp
>   corresponds to the clock id requested.
>
> v5: (Tvrtko)
> - Use normal ktime accessors instead of fast versions
> - Add more uApi documentation
>
> v6: (Lionel)
> - Move switch out of spinlock
>
> v7: (Chris)
> - cs_timestamp is a misnomer, use cs_cycles instead
> - return the cs cycle frequency as well in the query
>
> v8:
> - Add platform and engine specific checks
>
> v9: (Lionel)
> - Return 2 cpu timestamps in the query - captured before and after the
>   register read
>
> v10: (Chris)
> - Use local_clock() to measure time taken to read lower dword of
>   register and return it to user.
>
> v11: (Jani)
> - IS_GEN deprecated. User GRAPHICS_VER instead.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h       |  48 ++++++++++
>  2 files changed, 193 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> index fed337ad7b68..2594b93901ac 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -6,6 +6,8 @@
>
>  #include <linux/nospec.h>
>
> +#include "gt/intel_engine_pm.h"
> +#include "gt/intel_engine_user.h"
>  #include "i915_drv.h"
>  #include "i915_perf.h"
>  #include "i915_query.h"
> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>       return total_length;
>  }
>
> +typedef u64 (*__ktime_func_t)(void);
> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> +{
> +     /*
> +      * Use logic same as the perf subsystem to allow user to select the
> +      * reference clock id to be used for timestamps.
> +      */
> +     switch (clk_id) {
> +     case CLOCK_MONOTONIC:
> +             return &ktime_get_ns;
> +     case CLOCK_MONOTONIC_RAW:
> +             return &ktime_get_raw_ns;
> +     case CLOCK_REALTIME:
> +             return &ktime_get_real_ns;
> +     case CLOCK_BOOTTIME:
> +             return &ktime_get_boottime_ns;
> +     case CLOCK_TAI:
> +             return &ktime_get_clocktai_ns;
> +     default:
> +             return NULL;
> +     }
> +}
> +
> +static inline int
> +__read_timestamps(struct intel_uncore *uncore,
> +               i915_reg_t lower_reg,
> +               i915_reg_t upper_reg,
> +               u64 *cs_ts,
> +               u64 *cpu_ts,
> +               __ktime_func_t cpu_clock)
> +{
> +     u32 upper, lower, old_upper, loop = 0;
> +
> +     upper = intel_uncore_read_fw(uncore, upper_reg);
> +     do {
> +             cpu_ts[1] = local_clock();
> +             cpu_ts[0] = cpu_clock();
> +             lower = intel_uncore_read_fw(uncore, lower_reg);
> +             cpu_ts[1] = local_clock() - cpu_ts[1];
> +             old_upper = upper;
> +             upper = intel_uncore_read_fw(uncore, upper_reg);
> +     } while (upper != old_upper && loop++ < 2);
> +
> +     *cs_ts = (u64)upper << 32 | lower;
> +
> +     return 0;
> +}
> +
> +static int
> +__query_cs_cycles(struct intel_engine_cs *engine,
> +               u64 *cs_ts, u64 *cpu_ts,
> +               __ktime_func_t cpu_clock)
> +{
> +     struct intel_uncore *uncore = engine->uncore;
> +     enum forcewake_domains fw_domains;
> +     u32 base = engine->mmio_base;
> +     intel_wakeref_t wakeref;
> +     int ret;
> +
> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
> +                                                 RING_TIMESTAMP(base),
> +                                                 FW_REG_READ);
> +
> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
> +             spin_lock_irq(&uncore->lock);
> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
> +
> +             ret = __read_timestamps(uncore,
> +                                     RING_TIMESTAMP(base),
> +                                     RING_TIMESTAMP_UDW(base),
> +                                     cs_ts,
> +                                     cpu_ts,
> +                                     cpu_clock);
> +
> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
> +             spin_unlock_irq(&uncore->lock);
> +     }
> +
> +     return ret;
> +}
> +
> +static int
> +query_cs_cycles(struct drm_i915_private *i915,
> +             struct drm_i915_query_item *query_item)
> +{
> +     struct drm_i915_query_cs_cycles __user *query_ptr;
> +     struct drm_i915_query_cs_cycles query;
> +     struct intel_engine_cs *engine;
> +     __ktime_func_t cpu_clock;
> +     int ret;
> +
> +     if (GRAPHICS_VER(i915) < 6)
> +             return -ENODEV;
> +
> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
> +     if (ret != 0)
> +             return ret;
> +
> +     if (query.flags)
> +             return -EINVAL;
> +
> +     if (query.rsvd)
> +             return -EINVAL;
> +
> +     cpu_clock = __clock_id_to_func(query.clockid);
> +     if (!cpu_clock)
> +             return -EINVAL;
> +
> +     engine = intel_engine_lookup_user(i915,
> +                                       query.engine.engine_class,
> +                                       query.engine.engine_instance);
> +     if (!engine)
> +             return -EINVAL;
> +
> +     if (GRAPHICS_VER(i915) == 6 &&
> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
> +             return -ENODEV;
> +
> +     query.cs_frequency = engine->gt->clock_frequency;
> +     ret = __query_cs_cycles(engine,
> +                             &query.cs_cycles,
> +                             query.cpu_timestamp,
> +                             cpu_clock);
> +     if (ret)
> +             return ret;
> +
> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
> +             return -EFAULT;
> +
> +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
> +             return -EFAULT;
> +
> +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
> +             return -EFAULT;
> +
> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
> +             return -EFAULT;
> +
> +     return sizeof(query);
> +}
> +
>  static int
>  query_engine_info(struct drm_i915_private *i915,
>                 struct drm_i915_query_item *query_item)
> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>       query_topology_info,
>       query_engine_info,
>       query_perf_config,
> +     query_cs_cycles,
>  };
>
>  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 6a34243a7646..08b00f1709b5 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>  #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>  #define DRM_I915_QUERY_ENGINE_INFO   2
>  #define DRM_I915_QUERY_PERF_CONFIG      3
> +     /**
> +      * Query Command Streamer timestamp register.
> +      */
> +#define DRM_I915_QUERY_CS_CYCLES     4
>  /* Must be kept compact -- no holes and well documented */
>
>       /**
> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>       __u64 rsvd1[4];
>  };
>
> +/**
> + * struct drm_i915_query_cs_cycles
> + *
> + * The query returns the command streamer cycles and the frequency that can be
> + * used to calculate the command streamer timestamp. In addition the query
> + * returns a set of cpu timestamps that indicate when the command streamer cycle
> + * count was captured.
> + */
> +struct drm_i915_query_cs_cycles {
> +     /** Engine for which command streamer cycles is queried. */
> +     struct i915_engine_class_instance engine;
>
> Why is this per-engine?  Do we actually expect it to change between
> engines?
>
>
> Each engine has its own timestamp register.
>
>
>   If so, we may have a problem because Vulkan expects a
> unified timestamp domain for all command streamer timestamp queries.
>
>
> I don't think it does : "
>
> Timestamps may only be meaningfully compared if they are written by commands submitted to the same queue.

Yes but vkGetCalibratedTimestampsEXT() doesn't take a queue or even a
queue family.  Also, VkPhysicalDeviceLimits::timestampPeriod gives a
single timestampPeriod for all queues.  It's possible that Vulkan
messed up real bad there but I thought we did a HW survey at the time
and determined that it was ok.

--Jason


> " [1]
>
>
> [1] : https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html
>
>
> -Lionel
>
>
>
> --Jason
>
>
> +     /** Must be zero. */
> +     __u32 flags;
> +
> +     /**
> +      * Command streamer cycles as read from the command streamer
> +      * register at 0x358 offset.
> +      */
> +     __u64 cs_cycles;
> +
> +     /** Frequency of the cs cycles in Hz. */
> +     __u64 cs_frequency;
> +
> +     /**
> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
> +      * cs_cycles register using the reference clockid set by the user.
> +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
> +      * the cs_cycles register.
> +      */
> +     __u64 cpu_timestamp[2];
> +
> +     /**
> +      * Reference clock id for CPU timestamp. For definition, see
> +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> +      * CLOCK_TAI.
> +      */
> +     __s32 clockid;
> +
> +     /** Must be zero. */
> +     __u32 rsvd;
> +};
> +
>  /**
>   * struct drm_i915_query_engine_info
>   *
>
> --
> Jani Nikula, Intel Open Source Graphics Center
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
@ 2021-04-28 19:54           ` Jason Ekstrand
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Ekstrand @ 2021-04-28 19:54 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel GFX, Maling list - DRI developers, Chris Wilson

On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
>
> On 28/04/2021 22:24, Jason Ekstrand wrote:
>
> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>
> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
>
> Perf measurements rely on CPU and engine timestamps to correlate
> events of interest across these time domains. Current mechanisms get
> these timestamps separately and the calculated delta between these
> timestamps lack enough accuracy.
>
> To improve the accuracy of these time measurements to within a few us,
> add a query that returns the engine and cpu timestamps captured as
> close to each other as possible.
>
> Cc: dri-devel, Jason and Daniel for review.
>
> Thanks!
>
> v2: (Tvrtko)
> - document clock reference used
> - return cpu timestamp always
> - capture cpu time just before lower dword of cs timestamp
>
> v3: (Chris)
> - use uncore-rpm
> - use __query_cs_timestamp helper
>
> v4: (Lionel)
> - Kernel perf subsytem allows users to specify the clock id to be used
>   in perf_event_open. This clock id is used by the perf subsystem to
>   return the appropriate cpu timestamp in perf events. Similarly, let
>   the user pass the clockid to this query so that cpu timestamp
>   corresponds to the clock id requested.
>
> v5: (Tvrtko)
> - Use normal ktime accessors instead of fast versions
> - Add more uApi documentation
>
> v6: (Lionel)
> - Move switch out of spinlock
>
> v7: (Chris)
> - cs_timestamp is a misnomer, use cs_cycles instead
> - return the cs cycle frequency as well in the query
>
> v8:
> - Add platform and engine specific checks
>
> v9: (Lionel)
> - Return 2 cpu timestamps in the query - captured before and after the
>   register read
>
> v10: (Chris)
> - Use local_clock() to measure time taken to read lower dword of
>   register and return it to user.
>
> v11: (Jani)
> - IS_GEN deprecated. User GRAPHICS_VER instead.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h       |  48 ++++++++++
>  2 files changed, 193 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> index fed337ad7b68..2594b93901ac 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -6,6 +6,8 @@
>
>  #include <linux/nospec.h>
>
> +#include "gt/intel_engine_pm.h"
> +#include "gt/intel_engine_user.h"
>  #include "i915_drv.h"
>  #include "i915_perf.h"
>  #include "i915_query.h"
> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>       return total_length;
>  }
>
> +typedef u64 (*__ktime_func_t)(void);
> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> +{
> +     /*
> +      * Use logic same as the perf subsystem to allow user to select the
> +      * reference clock id to be used for timestamps.
> +      */
> +     switch (clk_id) {
> +     case CLOCK_MONOTONIC:
> +             return &ktime_get_ns;
> +     case CLOCK_MONOTONIC_RAW:
> +             return &ktime_get_raw_ns;
> +     case CLOCK_REALTIME:
> +             return &ktime_get_real_ns;
> +     case CLOCK_BOOTTIME:
> +             return &ktime_get_boottime_ns;
> +     case CLOCK_TAI:
> +             return &ktime_get_clocktai_ns;
> +     default:
> +             return NULL;
> +     }
> +}
> +
> +static inline int
> +__read_timestamps(struct intel_uncore *uncore,
> +               i915_reg_t lower_reg,
> +               i915_reg_t upper_reg,
> +               u64 *cs_ts,
> +               u64 *cpu_ts,
> +               __ktime_func_t cpu_clock)
> +{
> +     u32 upper, lower, old_upper, loop = 0;
> +
> +     upper = intel_uncore_read_fw(uncore, upper_reg);
> +     do {
> +             cpu_ts[1] = local_clock();
> +             cpu_ts[0] = cpu_clock();
> +             lower = intel_uncore_read_fw(uncore, lower_reg);
> +             cpu_ts[1] = local_clock() - cpu_ts[1];
> +             old_upper = upper;
> +             upper = intel_uncore_read_fw(uncore, upper_reg);
> +     } while (upper != old_upper && loop++ < 2);
> +
> +     *cs_ts = (u64)upper << 32 | lower;
> +
> +     return 0;
> +}
> +
> +static int
> +__query_cs_cycles(struct intel_engine_cs *engine,
> +               u64 *cs_ts, u64 *cpu_ts,
> +               __ktime_func_t cpu_clock)
> +{
> +     struct intel_uncore *uncore = engine->uncore;
> +     enum forcewake_domains fw_domains;
> +     u32 base = engine->mmio_base;
> +     intel_wakeref_t wakeref;
> +     int ret;
> +
> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
> +                                                 RING_TIMESTAMP(base),
> +                                                 FW_REG_READ);
> +
> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
> +             spin_lock_irq(&uncore->lock);
> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
> +
> +             ret = __read_timestamps(uncore,
> +                                     RING_TIMESTAMP(base),
> +                                     RING_TIMESTAMP_UDW(base),
> +                                     cs_ts,
> +                                     cpu_ts,
> +                                     cpu_clock);
> +
> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
> +             spin_unlock_irq(&uncore->lock);
> +     }
> +
> +     return ret;
> +}
> +
> +static int
> +query_cs_cycles(struct drm_i915_private *i915,
> +             struct drm_i915_query_item *query_item)
> +{
> +     struct drm_i915_query_cs_cycles __user *query_ptr;
> +     struct drm_i915_query_cs_cycles query;
> +     struct intel_engine_cs *engine;
> +     __ktime_func_t cpu_clock;
> +     int ret;
> +
> +     if (GRAPHICS_VER(i915) < 6)
> +             return -ENODEV;
> +
> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
> +     if (ret != 0)
> +             return ret;
> +
> +     if (query.flags)
> +             return -EINVAL;
> +
> +     if (query.rsvd)
> +             return -EINVAL;
> +
> +     cpu_clock = __clock_id_to_func(query.clockid);
> +     if (!cpu_clock)
> +             return -EINVAL;
> +
> +     engine = intel_engine_lookup_user(i915,
> +                                       query.engine.engine_class,
> +                                       query.engine.engine_instance);
> +     if (!engine)
> +             return -EINVAL;
> +
> +     if (GRAPHICS_VER(i915) == 6 &&
> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
> +             return -ENODEV;
> +
> +     query.cs_frequency = engine->gt->clock_frequency;
> +     ret = __query_cs_cycles(engine,
> +                             &query.cs_cycles,
> +                             query.cpu_timestamp,
> +                             cpu_clock);
> +     if (ret)
> +             return ret;
> +
> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
> +             return -EFAULT;
> +
> +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
> +             return -EFAULT;
> +
> +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
> +             return -EFAULT;
> +
> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
> +             return -EFAULT;
> +
> +     return sizeof(query);
> +}
> +
>  static int
>  query_engine_info(struct drm_i915_private *i915,
>                 struct drm_i915_query_item *query_item)
> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>       query_topology_info,
>       query_engine_info,
>       query_perf_config,
> +     query_cs_cycles,
>  };
>
>  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 6a34243a7646..08b00f1709b5 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>  #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>  #define DRM_I915_QUERY_ENGINE_INFO   2
>  #define DRM_I915_QUERY_PERF_CONFIG      3
> +     /**
> +      * Query Command Streamer timestamp register.
> +      */
> +#define DRM_I915_QUERY_CS_CYCLES     4
>  /* Must be kept compact -- no holes and well documented */
>
>       /**
> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>       __u64 rsvd1[4];
>  };
>
> +/**
> + * struct drm_i915_query_cs_cycles
> + *
> + * The query returns the command streamer cycles and the frequency that can be
> + * used to calculate the command streamer timestamp. In addition the query
> + * returns a set of cpu timestamps that indicate when the command streamer cycle
> + * count was captured.
> + */
> +struct drm_i915_query_cs_cycles {
> +     /** Engine for which command streamer cycles is queried. */
> +     struct i915_engine_class_instance engine;
>
> Why is this per-engine?  Do we actually expect it to change between
> engines?
>
>
> Each engine has its own timestamp register.
>
>
>   If so, we may have a problem because Vulkan expects a
> unified timestamp domain for all command streamer timestamp queries.
>
>
> I don't think it does : "
>
> Timestamps may only be meaningfully compared if they are written by commands submitted to the same queue.

Yes but vkGetCalibratedTimestampsEXT() doesn't take a queue or even a
queue family.  Also, VkPhysicalDeviceLimits::timestampPeriod gives a
single timestampPeriod for all queues.  It's possible that Vulkan
messed up real bad there but I thought we did a HW survey at the time
and determined that it was ok.

--Jason


> " [1]
>
>
> [1] : https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html
>
>
> -Lionel
>
>
>
> --Jason
>
>
> +     /** Must be zero. */
> +     __u32 flags;
> +
> +     /**
> +      * Command streamer cycles as read from the command streamer
> +      * register at 0x358 offset.
> +      */
> +     __u64 cs_cycles;
> +
> +     /** Frequency of the cs cycles in Hz. */
> +     __u64 cs_frequency;
> +
> +     /**
> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
> +      * cs_cycles register using the reference clockid set by the user.
> +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
> +      * the cs_cycles register.
> +      */
> +     __u64 cpu_timestamp[2];
> +
> +     /**
> +      * Reference clock id for CPU timestamp. For definition, see
> +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> +      * CLOCK_TAI.
> +      */
> +     __s32 clockid;
> +
> +     /** Must be zero. */
> +     __u32 rsvd;
> +};
> +
>  /**
>   * struct drm_i915_query_engine_info
>   *
>
> --
> Jani Nikula, Intel Open Source Graphics Center
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
  2021-04-28 19:54           ` [Intel-gfx] " Jason Ekstrand
@ 2021-04-28 20:14             ` Lionel Landwerlin
  -1 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2021-04-28 20:14 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Intel GFX, Maling list - DRI developers, Chris Wilson,
	Umesh Nerlige Ramappa

On 28/04/2021 22:54, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
> <lionel.g.landwerlin@intel.com> wrote:
>> On 28/04/2021 22:24, Jason Ekstrand wrote:
>>
>> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>>
>> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
>>
>> Perf measurements rely on CPU and engine timestamps to correlate
>> events of interest across these time domains. Current mechanisms get
>> these timestamps separately and the calculated delta between these
>> timestamps lack enough accuracy.
>>
>> To improve the accuracy of these time measurements to within a few us,
>> add a query that returns the engine and cpu timestamps captured as
>> close to each other as possible.
>>
>> Cc: dri-devel, Jason and Daniel for review.
>>
>> Thanks!
>>
>> v2: (Tvrtko)
>> - document clock reference used
>> - return cpu timestamp always
>> - capture cpu time just before lower dword of cs timestamp
>>
>> v3: (Chris)
>> - use uncore-rpm
>> - use __query_cs_timestamp helper
>>
>> v4: (Lionel)
>> - Kernel perf subsytem allows users to specify the clock id to be used
>>    in perf_event_open. This clock id is used by the perf subsystem to
>>    return the appropriate cpu timestamp in perf events. Similarly, let
>>    the user pass the clockid to this query so that cpu timestamp
>>    corresponds to the clock id requested.
>>
>> v5: (Tvrtko)
>> - Use normal ktime accessors instead of fast versions
>> - Add more uApi documentation
>>
>> v6: (Lionel)
>> - Move switch out of spinlock
>>
>> v7: (Chris)
>> - cs_timestamp is a misnomer, use cs_cycles instead
>> - return the cs cycle frequency as well in the query
>>
>> v8:
>> - Add platform and engine specific checks
>>
>> v9: (Lionel)
>> - Return 2 cpu timestamps in the query - captured before and after the
>>    register read
>>
>> v10: (Chris)
>> - Use local_clock() to measure time taken to read lower dword of
>>    register and return it to user.
>>
>> v11: (Jani)
>> - IS_GEN deprecated. User GRAPHICS_VER instead.
>>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
>>   include/uapi/drm/i915_drm.h       |  48 ++++++++++
>>   2 files changed, 193 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
>> index fed337ad7b68..2594b93901ac 100644
>> --- a/drivers/gpu/drm/i915/i915_query.c
>> +++ b/drivers/gpu/drm/i915/i915_query.c
>> @@ -6,6 +6,8 @@
>>
>>   #include <linux/nospec.h>
>>
>> +#include "gt/intel_engine_pm.h"
>> +#include "gt/intel_engine_user.h"
>>   #include "i915_drv.h"
>>   #include "i915_perf.h"
>>   #include "i915_query.h"
>> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>>        return total_length;
>>   }
>>
>> +typedef u64 (*__ktime_func_t)(void);
>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>> +{
>> +     /*
>> +      * Use logic same as the perf subsystem to allow user to select the
>> +      * reference clock id to be used for timestamps.
>> +      */
>> +     switch (clk_id) {
>> +     case CLOCK_MONOTONIC:
>> +             return &ktime_get_ns;
>> +     case CLOCK_MONOTONIC_RAW:
>> +             return &ktime_get_raw_ns;
>> +     case CLOCK_REALTIME:
>> +             return &ktime_get_real_ns;
>> +     case CLOCK_BOOTTIME:
>> +             return &ktime_get_boottime_ns;
>> +     case CLOCK_TAI:
>> +             return &ktime_get_clocktai_ns;
>> +     default:
>> +             return NULL;
>> +     }
>> +}
>> +
>> +static inline int
>> +__read_timestamps(struct intel_uncore *uncore,
>> +               i915_reg_t lower_reg,
>> +               i915_reg_t upper_reg,
>> +               u64 *cs_ts,
>> +               u64 *cpu_ts,
>> +               __ktime_func_t cpu_clock)
>> +{
>> +     u32 upper, lower, old_upper, loop = 0;
>> +
>> +     upper = intel_uncore_read_fw(uncore, upper_reg);
>> +     do {
>> +             cpu_ts[1] = local_clock();
>> +             cpu_ts[0] = cpu_clock();
>> +             lower = intel_uncore_read_fw(uncore, lower_reg);
>> +             cpu_ts[1] = local_clock() - cpu_ts[1];
>> +             old_upper = upper;
>> +             upper = intel_uncore_read_fw(uncore, upper_reg);
>> +     } while (upper != old_upper && loop++ < 2);
>> +
>> +     *cs_ts = (u64)upper << 32 | lower;
>> +
>> +     return 0;
>> +}
>> +
>> +static int
>> +__query_cs_cycles(struct intel_engine_cs *engine,
>> +               u64 *cs_ts, u64 *cpu_ts,
>> +               __ktime_func_t cpu_clock)
>> +{
>> +     struct intel_uncore *uncore = engine->uncore;
>> +     enum forcewake_domains fw_domains;
>> +     u32 base = engine->mmio_base;
>> +     intel_wakeref_t wakeref;
>> +     int ret;
>> +
>> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
>> +                                                 RING_TIMESTAMP(base),
>> +                                                 FW_REG_READ);
>> +
>> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
>> +             spin_lock_irq(&uncore->lock);
>> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
>> +
>> +             ret = __read_timestamps(uncore,
>> +                                     RING_TIMESTAMP(base),
>> +                                     RING_TIMESTAMP_UDW(base),
>> +                                     cs_ts,
>> +                                     cpu_ts,
>> +                                     cpu_clock);
>> +
>> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
>> +             spin_unlock_irq(&uncore->lock);
>> +     }
>> +
>> +     return ret;
>> +}
>> +
>> +static int
>> +query_cs_cycles(struct drm_i915_private *i915,
>> +             struct drm_i915_query_item *query_item)
>> +{
>> +     struct drm_i915_query_cs_cycles __user *query_ptr;
>> +     struct drm_i915_query_cs_cycles query;
>> +     struct intel_engine_cs *engine;
>> +     __ktime_func_t cpu_clock;
>> +     int ret;
>> +
>> +     if (GRAPHICS_VER(i915) < 6)
>> +             return -ENODEV;
>> +
>> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
>> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
>> +     if (ret != 0)
>> +             return ret;
>> +
>> +     if (query.flags)
>> +             return -EINVAL;
>> +
>> +     if (query.rsvd)
>> +             return -EINVAL;
>> +
>> +     cpu_clock = __clock_id_to_func(query.clockid);
>> +     if (!cpu_clock)
>> +             return -EINVAL;
>> +
>> +     engine = intel_engine_lookup_user(i915,
>> +                                       query.engine.engine_class,
>> +                                       query.engine.engine_instance);
>> +     if (!engine)
>> +             return -EINVAL;
>> +
>> +     if (GRAPHICS_VER(i915) == 6 &&
>> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
>> +             return -ENODEV;
>> +
>> +     query.cs_frequency = engine->gt->clock_frequency;
>> +     ret = __query_cs_cycles(engine,
>> +                             &query.cs_cycles,
>> +                             query.cpu_timestamp,
>> +                             cpu_clock);
>> +     if (ret)
>> +             return ret;
>> +
>> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
>> +             return -EFAULT;
>> +
>> +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
>> +             return -EFAULT;
>> +
>> +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
>> +             return -EFAULT;
>> +
>> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
>> +             return -EFAULT;
>> +
>> +     return sizeof(query);
>> +}
>> +
>>   static int
>>   query_engine_info(struct drm_i915_private *i915,
>>                  struct drm_i915_query_item *query_item)
>> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>>        query_topology_info,
>>        query_engine_info,
>>        query_perf_config,
>> +     query_cs_cycles,
>>   };
>>
>>   int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 6a34243a7646..08b00f1709b5 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>>   #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>>   #define DRM_I915_QUERY_ENGINE_INFO   2
>>   #define DRM_I915_QUERY_PERF_CONFIG      3
>> +     /**
>> +      * Query Command Streamer timestamp register.
>> +      */
>> +#define DRM_I915_QUERY_CS_CYCLES     4
>>   /* Must be kept compact -- no holes and well documented */
>>
>>        /**
>> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>>        __u64 rsvd1[4];
>>   };
>>
>> +/**
>> + * struct drm_i915_query_cs_cycles
>> + *
>> + * The query returns the command streamer cycles and the frequency that can be
>> + * used to calculate the command streamer timestamp. In addition the query
>> + * returns a set of cpu timestamps that indicate when the command streamer cycle
>> + * count was captured.
>> + */
>> +struct drm_i915_query_cs_cycles {
>> +     /** Engine for which command streamer cycles is queried. */
>> +     struct i915_engine_class_instance engine;
>>
>> Why is this per-engine?  Do we actually expect it to change between
>> engines?
>>
>>
>> Each engine has its own timestamp register.
>>
>>
>>    If so, we may have a problem because Vulkan expects a
>> unified timestamp domain for all command streamer timestamp queries.
>>
>>
>> I don't think it does : "
>>
>> Timestamps may only be meaningfully compared if they are written by commands submitted to the same queue.
> Yes but vkGetCalibratedTimestampsEXT() doesn't take a queue or even a
> queue family.


I know, I brought up the issue recently. See khronos issue 2551.

You might not like the resolution... I did propose to do a rev2 of the 
extension to let the user specify the queue.

We can still do that in the future.


>    Also, VkPhysicalDeviceLimits::timestampPeriod gives a
> single timestampPeriod for all queues.


That is fine for us, we should have the same period on all command 
streamers.


-Lionel


>    It's possible that Vulkan
> messed up real bad there but I thought we did a HW survey at the time
> and determined that it was ok.
>
> --Jason
>
>
>> " [1]
>>
>>
>> [1] : https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html
>>
>>
>> -Lionel
>>
>>
>>
>> --Jason
>>
>>
>> +     /** Must be zero. */
>> +     __u32 flags;
>> +
>> +     /**
>> +      * Command streamer cycles as read from the command streamer
>> +      * register at 0x358 offset.
>> +      */
>> +     __u64 cs_cycles;
>> +
>> +     /** Frequency of the cs cycles in Hz. */
>> +     __u64 cs_frequency;
>> +
>> +     /**
>> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
>> +      * cs_cycles register using the reference clockid set by the user.
>> +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
>> +      * the cs_cycles register.
>> +      */
>> +     __u64 cpu_timestamp[2];
>> +
>> +     /**
>> +      * Reference clock id for CPU timestamp. For definition, see
>> +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
>> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
>> +      * CLOCK_TAI.
>> +      */
>> +     __s32 clockid;
>> +
>> +     /** Must be zero. */
>> +     __u32 rsvd;
>> +};
>> +
>>   /**
>>    * struct drm_i915_query_engine_info
>>    *
>>
>> --
>> Jani Nikula, Intel Open Source Graphics Center
>>
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
@ 2021-04-28 20:14             ` Lionel Landwerlin
  0 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2021-04-28 20:14 UTC (permalink / raw)
  To: Jason Ekstrand; +Cc: Intel GFX, Maling list - DRI developers, Chris Wilson

On 28/04/2021 22:54, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
> <lionel.g.landwerlin@intel.com> wrote:
>> On 28/04/2021 22:24, Jason Ekstrand wrote:
>>
>> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>>
>> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
>>
>> Perf measurements rely on CPU and engine timestamps to correlate
>> events of interest across these time domains. Current mechanisms get
>> these timestamps separately and the calculated delta between these
>> timestamps lack enough accuracy.
>>
>> To improve the accuracy of these time measurements to within a few us,
>> add a query that returns the engine and cpu timestamps captured as
>> close to each other as possible.
>>
>> Cc: dri-devel, Jason and Daniel for review.
>>
>> Thanks!
>>
>> v2: (Tvrtko)
>> - document clock reference used
>> - return cpu timestamp always
>> - capture cpu time just before lower dword of cs timestamp
>>
>> v3: (Chris)
>> - use uncore-rpm
>> - use __query_cs_timestamp helper
>>
>> v4: (Lionel)
>> - Kernel perf subsytem allows users to specify the clock id to be used
>>    in perf_event_open. This clock id is used by the perf subsystem to
>>    return the appropriate cpu timestamp in perf events. Similarly, let
>>    the user pass the clockid to this query so that cpu timestamp
>>    corresponds to the clock id requested.
>>
>> v5: (Tvrtko)
>> - Use normal ktime accessors instead of fast versions
>> - Add more uApi documentation
>>
>> v6: (Lionel)
>> - Move switch out of spinlock
>>
>> v7: (Chris)
>> - cs_timestamp is a misnomer, use cs_cycles instead
>> - return the cs cycle frequency as well in the query
>>
>> v8:
>> - Add platform and engine specific checks
>>
>> v9: (Lionel)
>> - Return 2 cpu timestamps in the query - captured before and after the
>>    register read
>>
>> v10: (Chris)
>> - Use local_clock() to measure time taken to read lower dword of
>>    register and return it to user.
>>
>> v11: (Jani)
>> - IS_GEN deprecated. User GRAPHICS_VER instead.
>>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
>>   include/uapi/drm/i915_drm.h       |  48 ++++++++++
>>   2 files changed, 193 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
>> index fed337ad7b68..2594b93901ac 100644
>> --- a/drivers/gpu/drm/i915/i915_query.c
>> +++ b/drivers/gpu/drm/i915/i915_query.c
>> @@ -6,6 +6,8 @@
>>
>>   #include <linux/nospec.h>
>>
>> +#include "gt/intel_engine_pm.h"
>> +#include "gt/intel_engine_user.h"
>>   #include "i915_drv.h"
>>   #include "i915_perf.h"
>>   #include "i915_query.h"
>> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>>        return total_length;
>>   }
>>
>> +typedef u64 (*__ktime_func_t)(void);
>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>> +{
>> +     /*
>> +      * Use logic same as the perf subsystem to allow user to select the
>> +      * reference clock id to be used for timestamps.
>> +      */
>> +     switch (clk_id) {
>> +     case CLOCK_MONOTONIC:
>> +             return &ktime_get_ns;
>> +     case CLOCK_MONOTONIC_RAW:
>> +             return &ktime_get_raw_ns;
>> +     case CLOCK_REALTIME:
>> +             return &ktime_get_real_ns;
>> +     case CLOCK_BOOTTIME:
>> +             return &ktime_get_boottime_ns;
>> +     case CLOCK_TAI:
>> +             return &ktime_get_clocktai_ns;
>> +     default:
>> +             return NULL;
>> +     }
>> +}
>> +
>> +static inline int
>> +__read_timestamps(struct intel_uncore *uncore,
>> +               i915_reg_t lower_reg,
>> +               i915_reg_t upper_reg,
>> +               u64 *cs_ts,
>> +               u64 *cpu_ts,
>> +               __ktime_func_t cpu_clock)
>> +{
>> +     u32 upper, lower, old_upper, loop = 0;
>> +
>> +     upper = intel_uncore_read_fw(uncore, upper_reg);
>> +     do {
>> +             cpu_ts[1] = local_clock();
>> +             cpu_ts[0] = cpu_clock();
>> +             lower = intel_uncore_read_fw(uncore, lower_reg);
>> +             cpu_ts[1] = local_clock() - cpu_ts[1];
>> +             old_upper = upper;
>> +             upper = intel_uncore_read_fw(uncore, upper_reg);
>> +     } while (upper != old_upper && loop++ < 2);
>> +
>> +     *cs_ts = (u64)upper << 32 | lower;
>> +
>> +     return 0;
>> +}
>> +
>> +static int
>> +__query_cs_cycles(struct intel_engine_cs *engine,
>> +               u64 *cs_ts, u64 *cpu_ts,
>> +               __ktime_func_t cpu_clock)
>> +{
>> +     struct intel_uncore *uncore = engine->uncore;
>> +     enum forcewake_domains fw_domains;
>> +     u32 base = engine->mmio_base;
>> +     intel_wakeref_t wakeref;
>> +     int ret;
>> +
>> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
>> +                                                 RING_TIMESTAMP(base),
>> +                                                 FW_REG_READ);
>> +
>> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
>> +             spin_lock_irq(&uncore->lock);
>> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
>> +
>> +             ret = __read_timestamps(uncore,
>> +                                     RING_TIMESTAMP(base),
>> +                                     RING_TIMESTAMP_UDW(base),
>> +                                     cs_ts,
>> +                                     cpu_ts,
>> +                                     cpu_clock);
>> +
>> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
>> +             spin_unlock_irq(&uncore->lock);
>> +     }
>> +
>> +     return ret;
>> +}
>> +
>> +static int
>> +query_cs_cycles(struct drm_i915_private *i915,
>> +             struct drm_i915_query_item *query_item)
>> +{
>> +     struct drm_i915_query_cs_cycles __user *query_ptr;
>> +     struct drm_i915_query_cs_cycles query;
>> +     struct intel_engine_cs *engine;
>> +     __ktime_func_t cpu_clock;
>> +     int ret;
>> +
>> +     if (GRAPHICS_VER(i915) < 6)
>> +             return -ENODEV;
>> +
>> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
>> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
>> +     if (ret != 0)
>> +             return ret;
>> +
>> +     if (query.flags)
>> +             return -EINVAL;
>> +
>> +     if (query.rsvd)
>> +             return -EINVAL;
>> +
>> +     cpu_clock = __clock_id_to_func(query.clockid);
>> +     if (!cpu_clock)
>> +             return -EINVAL;
>> +
>> +     engine = intel_engine_lookup_user(i915,
>> +                                       query.engine.engine_class,
>> +                                       query.engine.engine_instance);
>> +     if (!engine)
>> +             return -EINVAL;
>> +
>> +     if (GRAPHICS_VER(i915) == 6 &&
>> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
>> +             return -ENODEV;
>> +
>> +     query.cs_frequency = engine->gt->clock_frequency;
>> +     ret = __query_cs_cycles(engine,
>> +                             &query.cs_cycles,
>> +                             query.cpu_timestamp,
>> +                             cpu_clock);
>> +     if (ret)
>> +             return ret;
>> +
>> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
>> +             return -EFAULT;
>> +
>> +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
>> +             return -EFAULT;
>> +
>> +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
>> +             return -EFAULT;
>> +
>> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
>> +             return -EFAULT;
>> +
>> +     return sizeof(query);
>> +}
>> +
>>   static int
>>   query_engine_info(struct drm_i915_private *i915,
>>                  struct drm_i915_query_item *query_item)
>> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>>        query_topology_info,
>>        query_engine_info,
>>        query_perf_config,
>> +     query_cs_cycles,
>>   };
>>
>>   int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 6a34243a7646..08b00f1709b5 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>>   #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>>   #define DRM_I915_QUERY_ENGINE_INFO   2
>>   #define DRM_I915_QUERY_PERF_CONFIG      3
>> +     /**
>> +      * Query Command Streamer timestamp register.
>> +      */
>> +#define DRM_I915_QUERY_CS_CYCLES     4
>>   /* Must be kept compact -- no holes and well documented */
>>
>>        /**
>> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>>        __u64 rsvd1[4];
>>   };
>>
>> +/**
>> + * struct drm_i915_query_cs_cycles
>> + *
>> + * The query returns the command streamer cycles and the frequency that can be
>> + * used to calculate the command streamer timestamp. In addition the query
>> + * returns a set of cpu timestamps that indicate when the command streamer cycle
>> + * count was captured.
>> + */
>> +struct drm_i915_query_cs_cycles {
>> +     /** Engine for which command streamer cycles is queried. */
>> +     struct i915_engine_class_instance engine;
>>
>> Why is this per-engine?  Do we actually expect it to change between
>> engines?
>>
>>
>> Each engine has its own timestamp register.
>>
>>
>>    If so, we may have a problem because Vulkan expects a
>> unified timestamp domain for all command streamer timestamp queries.
>>
>>
>> I don't think it does : "
>>
>> Timestamps may only be meaningfully compared if they are written by commands submitted to the same queue.
> Yes but vkGetCalibratedTimestampsEXT() doesn't take a queue or even a
> queue family.


I know, I brought up the issue recently. See khronos issue 2551.

You might not like the resolution... I did propose to do a rev2 of the 
extension to let the user specify the queue.

We can still do that in the future.


>    Also, VkPhysicalDeviceLimits::timestampPeriod gives a
> single timestampPeriod for all queues.


That is fine for us, we should have the same period on all command 
streamers.


-Lionel


>    It's possible that Vulkan
> messed up real bad there but I thought we did a HW survey at the time
> and determined that it was ok.
>
> --Jason
>
>
>> " [1]
>>
>>
>> [1] : https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html
>>
>>
>> -Lionel
>>
>>
>>
>> --Jason
>>
>>
>> +     /** Must be zero. */
>> +     __u32 flags;
>> +
>> +     /**
>> +      * Command streamer cycles as read from the command streamer
>> +      * register at 0x358 offset.
>> +      */
>> +     __u64 cs_cycles;
>> +
>> +     /** Frequency of the cs cycles in Hz. */
>> +     __u64 cs_frequency;
>> +
>> +     /**
>> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
>> +      * cs_cycles register using the reference clockid set by the user.
>> +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
>> +      * the cs_cycles register.
>> +      */
>> +     __u64 cpu_timestamp[2];
>> +
>> +     /**
>> +      * Reference clock id for CPU timestamp. For definition, see
>> +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
>> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
>> +      * CLOCK_TAI.
>> +      */
>> +     __s32 clockid;
>> +
>> +     /** Must be zero. */
>> +     __u32 rsvd;
>> +};
>> +
>>   /**
>>    * struct drm_i915_query_engine_info
>>    *
>>
>> --
>> Jani Nikula, Intel Open Source Graphics Center
>>
>>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
  2021-04-28 20:14             ` [Intel-gfx] " Lionel Landwerlin
@ 2021-04-28 20:16               ` Lionel Landwerlin
  -1 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2021-04-28 20:16 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Intel GFX, Umesh Nerlige Ramappa, Chris Wilson,
	Maling list - DRI developers

On 28/04/2021 23:14, Lionel Landwerlin wrote:
> On 28/04/2021 22:54, Jason Ekstrand wrote:
>> On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
>> <lionel.g.landwerlin@intel.com> wrote:
>>> On 28/04/2021 22:24, Jason Ekstrand wrote:
>>>
>>> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula 
>>> <jani.nikula@linux.intel.com> wrote:
>>>
>>> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa 
>>> <umesh.nerlige.ramappa@intel.com> wrote:
>>>
>>> Perf measurements rely on CPU and engine timestamps to correlate
>>> events of interest across these time domains. Current mechanisms get
>>> these timestamps separately and the calculated delta between these
>>> timestamps lack enough accuracy.
>>>
>>> To improve the accuracy of these time measurements to within a few us,
>>> add a query that returns the engine and cpu timestamps captured as
>>> close to each other as possible.
>>>
>>> Cc: dri-devel, Jason and Daniel for review.
>>>
>>> Thanks!
>>>
>>> v2: (Tvrtko)
>>> - document clock reference used
>>> - return cpu timestamp always
>>> - capture cpu time just before lower dword of cs timestamp
>>>
>>> v3: (Chris)
>>> - use uncore-rpm
>>> - use __query_cs_timestamp helper
>>>
>>> v4: (Lionel)
>>> - Kernel perf subsytem allows users to specify the clock id to be used
>>>    in perf_event_open. This clock id is used by the perf subsystem to
>>>    return the appropriate cpu timestamp in perf events. Similarly, let
>>>    the user pass the clockid to this query so that cpu timestamp
>>>    corresponds to the clock id requested.
>>>
>>> v5: (Tvrtko)
>>> - Use normal ktime accessors instead of fast versions
>>> - Add more uApi documentation
>>>
>>> v6: (Lionel)
>>> - Move switch out of spinlock
>>>
>>> v7: (Chris)
>>> - cs_timestamp is a misnomer, use cs_cycles instead
>>> - return the cs cycle frequency as well in the query
>>>
>>> v8:
>>> - Add platform and engine specific checks
>>>
>>> v9: (Lionel)
>>> - Return 2 cpu timestamps in the query - captured before and after the
>>>    register read
>>>
>>> v10: (Chris)
>>> - Use local_clock() to measure time taken to read lower dword of
>>>    register and return it to user.
>>>
>>> v11: (Jani)
>>> - IS_GEN deprecated. User GRAPHICS_VER instead.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_query.c | 145 
>>> ++++++++++++++++++++++++++++++
>>>   include/uapi/drm/i915_drm.h       |  48 ++++++++++
>>>   2 files changed, 193 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_query.c 
>>> b/drivers/gpu/drm/i915/i915_query.c
>>> index fed337ad7b68..2594b93901ac 100644
>>> --- a/drivers/gpu/drm/i915/i915_query.c
>>> +++ b/drivers/gpu/drm/i915/i915_query.c
>>> @@ -6,6 +6,8 @@
>>>
>>>   #include <linux/nospec.h>
>>>
>>> +#include "gt/intel_engine_pm.h"
>>> +#include "gt/intel_engine_user.h"
>>>   #include "i915_drv.h"
>>>   #include "i915_perf.h"
>>>   #include "i915_query.h"
>>> @@ -90,6 +92,148 @@ static int query_topology_info(struct 
>>> drm_i915_private *dev_priv,
>>>        return total_length;
>>>   }
>>>
>>> +typedef u64 (*__ktime_func_t)(void);
>>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>>> +{
>>> +     /*
>>> +      * Use logic same as the perf subsystem to allow user to 
>>> select the
>>> +      * reference clock id to be used for timestamps.
>>> +      */
>>> +     switch (clk_id) {
>>> +     case CLOCK_MONOTONIC:
>>> +             return &ktime_get_ns;
>>> +     case CLOCK_MONOTONIC_RAW:
>>> +             return &ktime_get_raw_ns;
>>> +     case CLOCK_REALTIME:
>>> +             return &ktime_get_real_ns;
>>> +     case CLOCK_BOOTTIME:
>>> +             return &ktime_get_boottime_ns;
>>> +     case CLOCK_TAI:
>>> +             return &ktime_get_clocktai_ns;
>>> +     default:
>>> +             return NULL;
>>> +     }
>>> +}
>>> +
>>> +static inline int
>>> +__read_timestamps(struct intel_uncore *uncore,
>>> +               i915_reg_t lower_reg,
>>> +               i915_reg_t upper_reg,
>>> +               u64 *cs_ts,
>>> +               u64 *cpu_ts,
>>> +               __ktime_func_t cpu_clock)
>>> +{
>>> +     u32 upper, lower, old_upper, loop = 0;
>>> +
>>> +     upper = intel_uncore_read_fw(uncore, upper_reg);
>>> +     do {
>>> +             cpu_ts[1] = local_clock();
>>> +             cpu_ts[0] = cpu_clock();
>>> +             lower = intel_uncore_read_fw(uncore, lower_reg);
>>> +             cpu_ts[1] = local_clock() - cpu_ts[1];
>>> +             old_upper = upper;
>>> +             upper = intel_uncore_read_fw(uncore, upper_reg);
>>> +     } while (upper != old_upper && loop++ < 2);
>>> +
>>> +     *cs_ts = (u64)upper << 32 | lower;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static int
>>> +__query_cs_cycles(struct intel_engine_cs *engine,
>>> +               u64 *cs_ts, u64 *cpu_ts,
>>> +               __ktime_func_t cpu_clock)
>>> +{
>>> +     struct intel_uncore *uncore = engine->uncore;
>>> +     enum forcewake_domains fw_domains;
>>> +     u32 base = engine->mmio_base;
>>> +     intel_wakeref_t wakeref;
>>> +     int ret;
>>> +
>>> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
>>> + RING_TIMESTAMP(base),
>>> + FW_REG_READ);
>>> +
>>> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
>>> +             spin_lock_irq(&uncore->lock);
>>> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
>>> +
>>> +             ret = __read_timestamps(uncore,
>>> +                                     RING_TIMESTAMP(base),
>>> + RING_TIMESTAMP_UDW(base),
>>> +                                     cs_ts,
>>> +                                     cpu_ts,
>>> +                                     cpu_clock);
>>> +
>>> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
>>> +             spin_unlock_irq(&uncore->lock);
>>> +     }
>>> +
>>> +     return ret;
>>> +}
>>> +
>>> +static int
>>> +query_cs_cycles(struct drm_i915_private *i915,
>>> +             struct drm_i915_query_item *query_item)
>>> +{
>>> +     struct drm_i915_query_cs_cycles __user *query_ptr;
>>> +     struct drm_i915_query_cs_cycles query;
>>> +     struct intel_engine_cs *engine;
>>> +     __ktime_func_t cpu_clock;
>>> +     int ret;
>>> +
>>> +     if (GRAPHICS_VER(i915) < 6)
>>> +             return -ENODEV;
>>> +
>>> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
>>> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), 
>>> query_item);
>>> +     if (ret != 0)
>>> +             return ret;
>>> +
>>> +     if (query.flags)
>>> +             return -EINVAL;
>>> +
>>> +     if (query.rsvd)
>>> +             return -EINVAL;
>>> +
>>> +     cpu_clock = __clock_id_to_func(query.clockid);
>>> +     if (!cpu_clock)
>>> +             return -EINVAL;
>>> +
>>> +     engine = intel_engine_lookup_user(i915,
>>> + query.engine.engine_class,
>>> + query.engine.engine_instance);
>>> +     if (!engine)
>>> +             return -EINVAL;
>>> +
>>> +     if (GRAPHICS_VER(i915) == 6 &&
>>> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
>>> +             return -ENODEV;
>>> +
>>> +     query.cs_frequency = engine->gt->clock_frequency;
>>> +     ret = __query_cs_cycles(engine,
>>> +                             &query.cs_cycles,
>>> +                             query.cpu_timestamp,
>>> +                             cpu_clock);
>>> +     if (ret)
>>> +             return ret;
>>> +
>>> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cpu_timestamp[0], 
>>> &query_ptr->cpu_timestamp[0]))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cpu_timestamp[1], 
>>> &query_ptr->cpu_timestamp[1]))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
>>> +             return -EFAULT;
>>> +
>>> +     return sizeof(query);
>>> +}
>>> +
>>>   static int
>>>   query_engine_info(struct drm_i915_private *i915,
>>>                  struct drm_i915_query_item *query_item)
>>> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct 
>>> drm_i915_private *dev_priv,
>>>        query_topology_info,
>>>        query_engine_info,
>>>        query_perf_config,
>>> +     query_cs_cycles,
>>>   };
>>>
>>>   int i915_query_ioctl(struct drm_device *dev, void *data, struct 
>>> drm_file *file)
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 6a34243a7646..08b00f1709b5 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>>>   #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>>>   #define DRM_I915_QUERY_ENGINE_INFO   2
>>>   #define DRM_I915_QUERY_PERF_CONFIG      3
>>> +     /**
>>> +      * Query Command Streamer timestamp register.
>>> +      */
>>> +#define DRM_I915_QUERY_CS_CYCLES     4
>>>   /* Must be kept compact -- no holes and well documented */
>>>
>>>        /**
>>> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>>>        __u64 rsvd1[4];
>>>   };
>>>
>>> +/**
>>> + * struct drm_i915_query_cs_cycles
>>> + *
>>> + * The query returns the command streamer cycles and the frequency 
>>> that can be
>>> + * used to calculate the command streamer timestamp. In addition 
>>> the query
>>> + * returns a set of cpu timestamps that indicate when the command 
>>> streamer cycle
>>> + * count was captured.
>>> + */
>>> +struct drm_i915_query_cs_cycles {
>>> +     /** Engine for which command streamer cycles is queried. */
>>> +     struct i915_engine_class_instance engine;
>>>
>>> Why is this per-engine?  Do we actually expect it to change between
>>> engines?
>>>
>>>
>>> Each engine has its own timestamp register.
>>>
>>>
>>>    If so, we may have a problem because Vulkan expects a
>>> unified timestamp domain for all command streamer timestamp queries.
>>>
>>>
>>> I don't think it does : "
>>>
>>> Timestamps may only be meaningfully compared if they are written by 
>>> commands submitted to the same queue.
>> Yes but vkGetCalibratedTimestampsEXT() doesn't take a queue or even a
>> queue family.
>
>
> I know, I brought up the issue recently. See khronos issue 2551.
>
> You might not like the resolution... I did propose to do a rev2 of the 
> extension to let the user specify the queue.
>
> We can still do that in the future.
>
>
>>    Also, VkPhysicalDeviceLimits::timestampPeriod gives a
>> single timestampPeriod for all queues.
>
>
> That is fine for us, we should have the same period on all command 
> streamers.
>
>
> -Lionel


Here is the Mesa MR using this extension btw : 
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9407


-Lionel


>
>
>>    It's possible that Vulkan
>> messed up real bad there but I thought we did a HW survey at the time
>> and determined that it was ok.
>>
>> --Jason
>>
>>
>>> " [1]
>>>
>>>
>>> [1] : 
>>> https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html
>>>
>>>
>>> -Lionel
>>>
>>>
>>>
>>> --Jason
>>>
>>>
>>> +     /** Must be zero. */
>>> +     __u32 flags;
>>> +
>>> +     /**
>>> +      * Command streamer cycles as read from the command streamer
>>> +      * register at 0x358 offset.
>>> +      */
>>> +     __u64 cs_cycles;
>>> +
>>> +     /** Frequency of the cs cycles in Hz. */
>>> +     __u64 cs_frequency;
>>> +
>>> +     /**
>>> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before 
>>> reading the
>>> +      * cs_cycles register using the reference clockid set by the 
>>> user.
>>> +      * cpu_timestamp[1] is the time taken in ns to read the lower 
>>> dword of
>>> +      * the cs_cycles register.
>>> +      */
>>> +     __u64 cpu_timestamp[2];
>>> +
>>> +     /**
>>> +      * Reference clock id for CPU timestamp. For definition, see
>>> +      * clock_gettime(2) and perf_event_open(2). Supported clock 
>>> ids are
>>> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, 
>>> CLOCK_BOOTTIME,
>>> +      * CLOCK_TAI.
>>> +      */
>>> +     __s32 clockid;
>>> +
>>> +     /** Must be zero. */
>>> +     __u32 rsvd;
>>> +};
>>> +
>>>   /**
>>>    * struct drm_i915_query_engine_info
>>>    *
>>>
>>> -- 
>>> Jani Nikula, Intel Open Source Graphics Center
>>>
>>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
@ 2021-04-28 20:16               ` Lionel Landwerlin
  0 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2021-04-28 20:16 UTC (permalink / raw)
  To: Jason Ekstrand; +Cc: Intel GFX, Chris Wilson, Maling list - DRI developers

On 28/04/2021 23:14, Lionel Landwerlin wrote:
> On 28/04/2021 22:54, Jason Ekstrand wrote:
>> On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
>> <lionel.g.landwerlin@intel.com> wrote:
>>> On 28/04/2021 22:24, Jason Ekstrand wrote:
>>>
>>> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula 
>>> <jani.nikula@linux.intel.com> wrote:
>>>
>>> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa 
>>> <umesh.nerlige.ramappa@intel.com> wrote:
>>>
>>> Perf measurements rely on CPU and engine timestamps to correlate
>>> events of interest across these time domains. Current mechanisms get
>>> these timestamps separately and the calculated delta between these
>>> timestamps lack enough accuracy.
>>>
>>> To improve the accuracy of these time measurements to within a few us,
>>> add a query that returns the engine and cpu timestamps captured as
>>> close to each other as possible.
>>>
>>> Cc: dri-devel, Jason and Daniel for review.
>>>
>>> Thanks!
>>>
>>> v2: (Tvrtko)
>>> - document clock reference used
>>> - return cpu timestamp always
>>> - capture cpu time just before lower dword of cs timestamp
>>>
>>> v3: (Chris)
>>> - use uncore-rpm
>>> - use __query_cs_timestamp helper
>>>
>>> v4: (Lionel)
>>> - Kernel perf subsytem allows users to specify the clock id to be used
>>>    in perf_event_open. This clock id is used by the perf subsystem to
>>>    return the appropriate cpu timestamp in perf events. Similarly, let
>>>    the user pass the clockid to this query so that cpu timestamp
>>>    corresponds to the clock id requested.
>>>
>>> v5: (Tvrtko)
>>> - Use normal ktime accessors instead of fast versions
>>> - Add more uApi documentation
>>>
>>> v6: (Lionel)
>>> - Move switch out of spinlock
>>>
>>> v7: (Chris)
>>> - cs_timestamp is a misnomer, use cs_cycles instead
>>> - return the cs cycle frequency as well in the query
>>>
>>> v8:
>>> - Add platform and engine specific checks
>>>
>>> v9: (Lionel)
>>> - Return 2 cpu timestamps in the query - captured before and after the
>>>    register read
>>>
>>> v10: (Chris)
>>> - Use local_clock() to measure time taken to read lower dword of
>>>    register and return it to user.
>>>
>>> v11: (Jani)
>>> - IS_GEN deprecated. User GRAPHICS_VER instead.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_query.c | 145 
>>> ++++++++++++++++++++++++++++++
>>>   include/uapi/drm/i915_drm.h       |  48 ++++++++++
>>>   2 files changed, 193 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_query.c 
>>> b/drivers/gpu/drm/i915/i915_query.c
>>> index fed337ad7b68..2594b93901ac 100644
>>> --- a/drivers/gpu/drm/i915/i915_query.c
>>> +++ b/drivers/gpu/drm/i915/i915_query.c
>>> @@ -6,6 +6,8 @@
>>>
>>>   #include <linux/nospec.h>
>>>
>>> +#include "gt/intel_engine_pm.h"
>>> +#include "gt/intel_engine_user.h"
>>>   #include "i915_drv.h"
>>>   #include "i915_perf.h"
>>>   #include "i915_query.h"
>>> @@ -90,6 +92,148 @@ static int query_topology_info(struct 
>>> drm_i915_private *dev_priv,
>>>        return total_length;
>>>   }
>>>
>>> +typedef u64 (*__ktime_func_t)(void);
>>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>>> +{
>>> +     /*
>>> +      * Use logic same as the perf subsystem to allow user to 
>>> select the
>>> +      * reference clock id to be used for timestamps.
>>> +      */
>>> +     switch (clk_id) {
>>> +     case CLOCK_MONOTONIC:
>>> +             return &ktime_get_ns;
>>> +     case CLOCK_MONOTONIC_RAW:
>>> +             return &ktime_get_raw_ns;
>>> +     case CLOCK_REALTIME:
>>> +             return &ktime_get_real_ns;
>>> +     case CLOCK_BOOTTIME:
>>> +             return &ktime_get_boottime_ns;
>>> +     case CLOCK_TAI:
>>> +             return &ktime_get_clocktai_ns;
>>> +     default:
>>> +             return NULL;
>>> +     }
>>> +}
>>> +
>>> +static inline int
>>> +__read_timestamps(struct intel_uncore *uncore,
>>> +               i915_reg_t lower_reg,
>>> +               i915_reg_t upper_reg,
>>> +               u64 *cs_ts,
>>> +               u64 *cpu_ts,
>>> +               __ktime_func_t cpu_clock)
>>> +{
>>> +     u32 upper, lower, old_upper, loop = 0;
>>> +
>>> +     upper = intel_uncore_read_fw(uncore, upper_reg);
>>> +     do {
>>> +             cpu_ts[1] = local_clock();
>>> +             cpu_ts[0] = cpu_clock();
>>> +             lower = intel_uncore_read_fw(uncore, lower_reg);
>>> +             cpu_ts[1] = local_clock() - cpu_ts[1];
>>> +             old_upper = upper;
>>> +             upper = intel_uncore_read_fw(uncore, upper_reg);
>>> +     } while (upper != old_upper && loop++ < 2);
>>> +
>>> +     *cs_ts = (u64)upper << 32 | lower;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static int
>>> +__query_cs_cycles(struct intel_engine_cs *engine,
>>> +               u64 *cs_ts, u64 *cpu_ts,
>>> +               __ktime_func_t cpu_clock)
>>> +{
>>> +     struct intel_uncore *uncore = engine->uncore;
>>> +     enum forcewake_domains fw_domains;
>>> +     u32 base = engine->mmio_base;
>>> +     intel_wakeref_t wakeref;
>>> +     int ret;
>>> +
>>> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
>>> + RING_TIMESTAMP(base),
>>> + FW_REG_READ);
>>> +
>>> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
>>> +             spin_lock_irq(&uncore->lock);
>>> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
>>> +
>>> +             ret = __read_timestamps(uncore,
>>> +                                     RING_TIMESTAMP(base),
>>> + RING_TIMESTAMP_UDW(base),
>>> +                                     cs_ts,
>>> +                                     cpu_ts,
>>> +                                     cpu_clock);
>>> +
>>> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
>>> +             spin_unlock_irq(&uncore->lock);
>>> +     }
>>> +
>>> +     return ret;
>>> +}
>>> +
>>> +static int
>>> +query_cs_cycles(struct drm_i915_private *i915,
>>> +             struct drm_i915_query_item *query_item)
>>> +{
>>> +     struct drm_i915_query_cs_cycles __user *query_ptr;
>>> +     struct drm_i915_query_cs_cycles query;
>>> +     struct intel_engine_cs *engine;
>>> +     __ktime_func_t cpu_clock;
>>> +     int ret;
>>> +
>>> +     if (GRAPHICS_VER(i915) < 6)
>>> +             return -ENODEV;
>>> +
>>> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
>>> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), 
>>> query_item);
>>> +     if (ret != 0)
>>> +             return ret;
>>> +
>>> +     if (query.flags)
>>> +             return -EINVAL;
>>> +
>>> +     if (query.rsvd)
>>> +             return -EINVAL;
>>> +
>>> +     cpu_clock = __clock_id_to_func(query.clockid);
>>> +     if (!cpu_clock)
>>> +             return -EINVAL;
>>> +
>>> +     engine = intel_engine_lookup_user(i915,
>>> + query.engine.engine_class,
>>> + query.engine.engine_instance);
>>> +     if (!engine)
>>> +             return -EINVAL;
>>> +
>>> +     if (GRAPHICS_VER(i915) == 6 &&
>>> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
>>> +             return -ENODEV;
>>> +
>>> +     query.cs_frequency = engine->gt->clock_frequency;
>>> +     ret = __query_cs_cycles(engine,
>>> +                             &query.cs_cycles,
>>> +                             query.cpu_timestamp,
>>> +                             cpu_clock);
>>> +     if (ret)
>>> +             return ret;
>>> +
>>> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cpu_timestamp[0], 
>>> &query_ptr->cpu_timestamp[0]))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cpu_timestamp[1], 
>>> &query_ptr->cpu_timestamp[1]))
>>> +             return -EFAULT;
>>> +
>>> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
>>> +             return -EFAULT;
>>> +
>>> +     return sizeof(query);
>>> +}
>>> +
>>>   static int
>>>   query_engine_info(struct drm_i915_private *i915,
>>>                  struct drm_i915_query_item *query_item)
>>> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct 
>>> drm_i915_private *dev_priv,
>>>        query_topology_info,
>>>        query_engine_info,
>>>        query_perf_config,
>>> +     query_cs_cycles,
>>>   };
>>>
>>>   int i915_query_ioctl(struct drm_device *dev, void *data, struct 
>>> drm_file *file)
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 6a34243a7646..08b00f1709b5 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>>>   #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>>>   #define DRM_I915_QUERY_ENGINE_INFO   2
>>>   #define DRM_I915_QUERY_PERF_CONFIG      3
>>> +     /**
>>> +      * Query Command Streamer timestamp register.
>>> +      */
>>> +#define DRM_I915_QUERY_CS_CYCLES     4
>>>   /* Must be kept compact -- no holes and well documented */
>>>
>>>        /**
>>> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>>>        __u64 rsvd1[4];
>>>   };
>>>
>>> +/**
>>> + * struct drm_i915_query_cs_cycles
>>> + *
>>> + * The query returns the command streamer cycles and the frequency 
>>> that can be
>>> + * used to calculate the command streamer timestamp. In addition 
>>> the query
>>> + * returns a set of cpu timestamps that indicate when the command 
>>> streamer cycle
>>> + * count was captured.
>>> + */
>>> +struct drm_i915_query_cs_cycles {
>>> +     /** Engine for which command streamer cycles is queried. */
>>> +     struct i915_engine_class_instance engine;
>>>
>>> Why is this per-engine?  Do we actually expect it to change between
>>> engines?
>>>
>>>
>>> Each engine has its own timestamp register.
>>>
>>>
>>>    If so, we may have a problem because Vulkan expects a
>>> unified timestamp domain for all command streamer timestamp queries.
>>>
>>>
>>> I don't think it does : "
>>>
>>> Timestamps may only be meaningfully compared if they are written by 
>>> commands submitted to the same queue.
>> Yes but vkGetCalibratedTimestampsEXT() doesn't take a queue or even a
>> queue family.
>
>
> I know, I brought up the issue recently. See khronos issue 2551.
>
> You might not like the resolution... I did propose to do a rev2 of the 
> extension to let the user specify the queue.
>
> We can still do that in the future.
>
>
>>    Also, VkPhysicalDeviceLimits::timestampPeriod gives a
>> single timestampPeriod for all queues.
>
>
> That is fine for us, we should have the same period on all command 
> streamers.
>
>
> -Lionel


Here is the Mesa MR using this extension btw : 
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9407


-Lionel


>
>
>>    It's possible that Vulkan
>> messed up real bad there but I thought we did a HW survey at the time
>> and determined that it was ok.
>>
>> --Jason
>>
>>
>>> " [1]
>>>
>>>
>>> [1] : 
>>> https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html
>>>
>>>
>>> -Lionel
>>>
>>>
>>>
>>> --Jason
>>>
>>>
>>> +     /** Must be zero. */
>>> +     __u32 flags;
>>> +
>>> +     /**
>>> +      * Command streamer cycles as read from the command streamer
>>> +      * register at 0x358 offset.
>>> +      */
>>> +     __u64 cs_cycles;
>>> +
>>> +     /** Frequency of the cs cycles in Hz. */
>>> +     __u64 cs_frequency;
>>> +
>>> +     /**
>>> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before 
>>> reading the
>>> +      * cs_cycles register using the reference clockid set by the 
>>> user.
>>> +      * cpu_timestamp[1] is the time taken in ns to read the lower 
>>> dword of
>>> +      * the cs_cycles register.
>>> +      */
>>> +     __u64 cpu_timestamp[2];
>>> +
>>> +     /**
>>> +      * Reference clock id for CPU timestamp. For definition, see
>>> +      * clock_gettime(2) and perf_event_open(2). Supported clock 
>>> ids are
>>> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, 
>>> CLOCK_BOOTTIME,
>>> +      * CLOCK_TAI.
>>> +      */
>>> +     __s32 clockid;
>>> +
>>> +     /** Must be zero. */
>>> +     __u32 rsvd;
>>> +};
>>> +
>>>   /**
>>>    * struct drm_i915_query_engine_info
>>>    *
>>>
>>> -- 
>>> Jani Nikula, Intel Open Source Graphics Center
>>>
>>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
  2021-04-28 20:14             ` [Intel-gfx] " Lionel Landwerlin
@ 2021-04-28 20:45               ` Jason Ekstrand
  -1 siblings, 0 replies; 26+ messages in thread
From: Jason Ekstrand @ 2021-04-28 20:45 UTC (permalink / raw)
  To: Lionel Landwerlin
  Cc: Intel GFX, Maling list - DRI developers, Chris Wilson,
	Umesh Nerlige Ramappa

On Wed, Apr 28, 2021 at 3:14 PM Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
>
> On 28/04/2021 22:54, Jason Ekstrand wrote:
> > On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
> > <lionel.g.landwerlin@intel.com> wrote:
> >> On 28/04/2021 22:24, Jason Ekstrand wrote:
> >>
> >> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
> >>
> >> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
> >>
> >> Perf measurements rely on CPU and engine timestamps to correlate
> >> events of interest across these time domains. Current mechanisms get
> >> these timestamps separately and the calculated delta between these
> >> timestamps lack enough accuracy.
> >>
> >> To improve the accuracy of these time measurements to within a few us,
> >> add a query that returns the engine and cpu timestamps captured as
> >> close to each other as possible.
> >>
> >> Cc: dri-devel, Jason and Daniel for review.
> >>
> >> Thanks!
> >>
> >> v2: (Tvrtko)
> >> - document clock reference used
> >> - return cpu timestamp always
> >> - capture cpu time just before lower dword of cs timestamp
> >>
> >> v3: (Chris)
> >> - use uncore-rpm
> >> - use __query_cs_timestamp helper
> >>
> >> v4: (Lionel)
> >> - Kernel perf subsytem allows users to specify the clock id to be used
> >>    in perf_event_open. This clock id is used by the perf subsystem to
> >>    return the appropriate cpu timestamp in perf events. Similarly, let
> >>    the user pass the clockid to this query so that cpu timestamp
> >>    corresponds to the clock id requested.
> >>
> >> v5: (Tvrtko)
> >> - Use normal ktime accessors instead of fast versions
> >> - Add more uApi documentation
> >>
> >> v6: (Lionel)
> >> - Move switch out of spinlock
> >>
> >> v7: (Chris)
> >> - cs_timestamp is a misnomer, use cs_cycles instead
> >> - return the cs cycle frequency as well in the query
> >>
> >> v8:
> >> - Add platform and engine specific checks
> >>
> >> v9: (Lionel)
> >> - Return 2 cpu timestamps in the query - captured before and after the
> >>    register read
> >>
> >> v10: (Chris)
> >> - Use local_clock() to measure time taken to read lower dword of
> >>    register and return it to user.
> >>
> >> v11: (Jani)
> >> - IS_GEN deprecated. User GRAPHICS_VER instead.
> >>
> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
> >>   include/uapi/drm/i915_drm.h       |  48 ++++++++++
> >>   2 files changed, 193 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> >> index fed337ad7b68..2594b93901ac 100644
> >> --- a/drivers/gpu/drm/i915/i915_query.c
> >> +++ b/drivers/gpu/drm/i915/i915_query.c
> >> @@ -6,6 +6,8 @@
> >>
> >>   #include <linux/nospec.h>
> >>
> >> +#include "gt/intel_engine_pm.h"
> >> +#include "gt/intel_engine_user.h"
> >>   #include "i915_drv.h"
> >>   #include "i915_perf.h"
> >>   #include "i915_query.h"
> >> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
> >>        return total_length;
> >>   }
> >>
> >> +typedef u64 (*__ktime_func_t)(void);
> >> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> >> +{
> >> +     /*
> >> +      * Use logic same as the perf subsystem to allow user to select the
> >> +      * reference clock id to be used for timestamps.
> >> +      */
> >> +     switch (clk_id) {
> >> +     case CLOCK_MONOTONIC:
> >> +             return &ktime_get_ns;
> >> +     case CLOCK_MONOTONIC_RAW:
> >> +             return &ktime_get_raw_ns;
> >> +     case CLOCK_REALTIME:
> >> +             return &ktime_get_real_ns;
> >> +     case CLOCK_BOOTTIME:
> >> +             return &ktime_get_boottime_ns;
> >> +     case CLOCK_TAI:
> >> +             return &ktime_get_clocktai_ns;
> >> +     default:
> >> +             return NULL;
> >> +     }
> >> +}
> >> +
> >> +static inline int
> >> +__read_timestamps(struct intel_uncore *uncore,
> >> +               i915_reg_t lower_reg,
> >> +               i915_reg_t upper_reg,
> >> +               u64 *cs_ts,
> >> +               u64 *cpu_ts,
> >> +               __ktime_func_t cpu_clock)
> >> +{
> >> +     u32 upper, lower, old_upper, loop = 0;
> >> +
> >> +     upper = intel_uncore_read_fw(uncore, upper_reg);
> >> +     do {
> >> +             cpu_ts[1] = local_clock();
> >> +             cpu_ts[0] = cpu_clock();
> >> +             lower = intel_uncore_read_fw(uncore, lower_reg);
> >> +             cpu_ts[1] = local_clock() - cpu_ts[1];
> >> +             old_upper = upper;
> >> +             upper = intel_uncore_read_fw(uncore, upper_reg);
> >> +     } while (upper != old_upper && loop++ < 2);
> >> +
> >> +     *cs_ts = (u64)upper << 32 | lower;
> >> +
> >> +     return 0;
> >> +}
> >> +
> >> +static int
> >> +__query_cs_cycles(struct intel_engine_cs *engine,
> >> +               u64 *cs_ts, u64 *cpu_ts,
> >> +               __ktime_func_t cpu_clock)
> >> +{
> >> +     struct intel_uncore *uncore = engine->uncore;
> >> +     enum forcewake_domains fw_domains;
> >> +     u32 base = engine->mmio_base;
> >> +     intel_wakeref_t wakeref;
> >> +     int ret;
> >> +
> >> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
> >> +                                                 RING_TIMESTAMP(base),
> >> +                                                 FW_REG_READ);
> >> +
> >> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
> >> +             spin_lock_irq(&uncore->lock);
> >> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
> >> +
> >> +             ret = __read_timestamps(uncore,
> >> +                                     RING_TIMESTAMP(base),
> >> +                                     RING_TIMESTAMP_UDW(base),
> >> +                                     cs_ts,
> >> +                                     cpu_ts,
> >> +                                     cpu_clock);
> >> +
> >> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
> >> +             spin_unlock_irq(&uncore->lock);
> >> +     }
> >> +
> >> +     return ret;
> >> +}
> >> +
> >> +static int
> >> +query_cs_cycles(struct drm_i915_private *i915,
> >> +             struct drm_i915_query_item *query_item)
> >> +{
> >> +     struct drm_i915_query_cs_cycles __user *query_ptr;
> >> +     struct drm_i915_query_cs_cycles query;
> >> +     struct intel_engine_cs *engine;
> >> +     __ktime_func_t cpu_clock;
> >> +     int ret;
> >> +
> >> +     if (GRAPHICS_VER(i915) < 6)
> >> +             return -ENODEV;
> >> +
> >> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
> >> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
> >> +     if (ret != 0)
> >> +             return ret;
> >> +
> >> +     if (query.flags)
> >> +             return -EINVAL;
> >> +
> >> +     if (query.rsvd)
> >> +             return -EINVAL;
> >> +
> >> +     cpu_clock = __clock_id_to_func(query.clockid);
> >> +     if (!cpu_clock)
> >> +             return -EINVAL;
> >> +
> >> +     engine = intel_engine_lookup_user(i915,
> >> +                                       query.engine.engine_class,
> >> +                                       query.engine.engine_instance);
> >> +     if (!engine)
> >> +             return -EINVAL;
> >> +
> >> +     if (GRAPHICS_VER(i915) == 6 &&
> >> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
> >> +             return -ENODEV;
> >> +
> >> +     query.cs_frequency = engine->gt->clock_frequency;
> >> +     ret = __query_cs_cycles(engine,
> >> +                             &query.cs_cycles,
> >> +                             query.cpu_timestamp,
> >> +                             cpu_clock);
> >> +     if (ret)
> >> +             return ret;
> >> +
> >> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
> >> +             return -EFAULT;
> >> +
> >> +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
> >> +             return -EFAULT;
> >> +
> >> +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
> >> +             return -EFAULT;
> >> +
> >> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
> >> +             return -EFAULT;
> >> +
> >> +     return sizeof(query);
> >> +}
> >> +
> >>   static int
> >>   query_engine_info(struct drm_i915_private *i915,
> >>                  struct drm_i915_query_item *query_item)
> >> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
> >>        query_topology_info,
> >>        query_engine_info,
> >>        query_perf_config,
> >> +     query_cs_cycles,
> >>   };
> >>
> >>   int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> >> index 6a34243a7646..08b00f1709b5 100644
> >> --- a/include/uapi/drm/i915_drm.h
> >> +++ b/include/uapi/drm/i915_drm.h
> >> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
> >>   #define DRM_I915_QUERY_TOPOLOGY_INFO    1
> >>   #define DRM_I915_QUERY_ENGINE_INFO   2
> >>   #define DRM_I915_QUERY_PERF_CONFIG      3
> >> +     /**
> >> +      * Query Command Streamer timestamp register.
> >> +      */
> >> +#define DRM_I915_QUERY_CS_CYCLES     4
> >>   /* Must be kept compact -- no holes and well documented */
> >>
> >>        /**
> >> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
> >>        __u64 rsvd1[4];
> >>   };
> >>
> >> +/**
> >> + * struct drm_i915_query_cs_cycles
> >> + *
> >> + * The query returns the command streamer cycles and the frequency that can be
> >> + * used to calculate the command streamer timestamp. In addition the query
> >> + * returns a set of cpu timestamps that indicate when the command streamer cycle
> >> + * count was captured.
> >> + */
> >> +struct drm_i915_query_cs_cycles {
> >> +     /** Engine for which command streamer cycles is queried. */
> >> +     struct i915_engine_class_instance engine;
> >>
> >> Why is this per-engine?  Do we actually expect it to change between
> >> engines?
> >>
> >>
> >> Each engine has its own timestamp register.
> >>
> >>
> >>    If so, we may have a problem because Vulkan expects a
> >> unified timestamp domain for all command streamer timestamp queries.
> >>
> >>
> >> I don't think it does : "
> >>
> >> Timestamps may only be meaningfully compared if they are written by commands submitted to the same queue.
> > Yes but vkGetCalibratedTimestampsEXT() doesn't take a queue or even a
> > queue family.
>
>
> I know, I brought up the issue recently. See khronos issue 2551.

I guess this is what I get for not attending the Vulkan SI call
anymore.  Small price to pay....

So the answer is that we just stop exposing the DEVICE time domain as
soon as we start using anything other than RENDER?  Seems a bit rough
but should be doable.

> You might not like the resolution... I did propose to do a rev2 of the
> extension to let the user specify the queue.
>
> We can still do that in the future.

Yeah, I think we'll want to do something if we care about this
extension.  One option would be to make it take a queue family.
Another would be to expose it as one domain per queue family.
Anyway... that's a discussion for another forum.

> >    Also, VkPhysicalDeviceLimits::timestampPeriod gives a
> > single timestampPeriod for all queues.
>
>
> That is fine for us, we should have the same period on all command
> streamers.

I guess I've got no problem returning the period as part of this
query.  ANV should probably assert that it's what it expects, though.

> -Lionel
>
>
> >    It's possible that Vulkan
> > messed up real bad there but I thought we did a HW survey at the time
> > and determined that it was ok.
> >
> > --Jason
> >
> >
> >> " [1]
> >>
> >>
> >> [1] : https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html
> >>
> >>
> >> -Lionel
> >>
> >>
> >>
> >> --Jason
> >>
> >>
> >> +     /** Must be zero. */
> >> +     __u32 flags;
> >> +
> >> +     /**
> >> +      * Command streamer cycles as read from the command streamer
> >> +      * register at 0x358 offset.
> >> +      */
> >> +     __u64 cs_cycles;
> >> +
> >> +     /** Frequency of the cs cycles in Hz. */
> >> +     __u64 cs_frequency;
> >> +
> >> +     /**
> >> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
> >> +      * cs_cycles register using the reference clockid set by the user.
> >> +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
> >> +      * the cs_cycles register.
> >> +      */
> >> +     __u64 cpu_timestamp[2];

I think the API would be more clear if we had separate cpu_timestamp
and cpu_delta fields or something like that.  That or make
cpu_timestamp[1] the end time rather than a delta.  It's weird to have
an array where the first entry is absolute and the second entry is a
delta.

--Jason


> >> +
> >> +     /**
> >> +      * Reference clock id for CPU timestamp. For definition, see
> >> +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> >> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> >> +      * CLOCK_TAI.
> >> +      */
> >> +     __s32 clockid;
> >> +
> >> +     /** Must be zero. */
> >> +     __u32 rsvd;
> >> +};
> >> +
> >>   /**
> >>    * struct drm_i915_query_engine_info
> >>    *
> >>
> >> --
> >> Jani Nikula, Intel Open Source Graphics Center
> >>
> >>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
@ 2021-04-28 20:45               ` Jason Ekstrand
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Ekstrand @ 2021-04-28 20:45 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Intel GFX, Maling list - DRI developers, Chris Wilson

On Wed, Apr 28, 2021 at 3:14 PM Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
>
> On 28/04/2021 22:54, Jason Ekstrand wrote:
> > On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
> > <lionel.g.landwerlin@intel.com> wrote:
> >> On 28/04/2021 22:24, Jason Ekstrand wrote:
> >>
> >> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
> >>
> >> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
> >>
> >> Perf measurements rely on CPU and engine timestamps to correlate
> >> events of interest across these time domains. Current mechanisms get
> >> these timestamps separately and the calculated delta between these
> >> timestamps lack enough accuracy.
> >>
> >> To improve the accuracy of these time measurements to within a few us,
> >> add a query that returns the engine and cpu timestamps captured as
> >> close to each other as possible.
> >>
> >> Cc: dri-devel, Jason and Daniel for review.
> >>
> >> Thanks!
> >>
> >> v2: (Tvrtko)
> >> - document clock reference used
> >> - return cpu timestamp always
> >> - capture cpu time just before lower dword of cs timestamp
> >>
> >> v3: (Chris)
> >> - use uncore-rpm
> >> - use __query_cs_timestamp helper
> >>
> >> v4: (Lionel)
> >> - Kernel perf subsytem allows users to specify the clock id to be used
> >>    in perf_event_open. This clock id is used by the perf subsystem to
> >>    return the appropriate cpu timestamp in perf events. Similarly, let
> >>    the user pass the clockid to this query so that cpu timestamp
> >>    corresponds to the clock id requested.
> >>
> >> v5: (Tvrtko)
> >> - Use normal ktime accessors instead of fast versions
> >> - Add more uApi documentation
> >>
> >> v6: (Lionel)
> >> - Move switch out of spinlock
> >>
> >> v7: (Chris)
> >> - cs_timestamp is a misnomer, use cs_cycles instead
> >> - return the cs cycle frequency as well in the query
> >>
> >> v8:
> >> - Add platform and engine specific checks
> >>
> >> v9: (Lionel)
> >> - Return 2 cpu timestamps in the query - captured before and after the
> >>    register read
> >>
> >> v10: (Chris)
> >> - Use local_clock() to measure time taken to read lower dword of
> >>    register and return it to user.
> >>
> >> v11: (Jani)
> >> - IS_GEN deprecated. User GRAPHICS_VER instead.
> >>
> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
> >>   include/uapi/drm/i915_drm.h       |  48 ++++++++++
> >>   2 files changed, 193 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> >> index fed337ad7b68..2594b93901ac 100644
> >> --- a/drivers/gpu/drm/i915/i915_query.c
> >> +++ b/drivers/gpu/drm/i915/i915_query.c
> >> @@ -6,6 +6,8 @@
> >>
> >>   #include <linux/nospec.h>
> >>
> >> +#include "gt/intel_engine_pm.h"
> >> +#include "gt/intel_engine_user.h"
> >>   #include "i915_drv.h"
> >>   #include "i915_perf.h"
> >>   #include "i915_query.h"
> >> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
> >>        return total_length;
> >>   }
> >>
> >> +typedef u64 (*__ktime_func_t)(void);
> >> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> >> +{
> >> +     /*
> >> +      * Use logic same as the perf subsystem to allow user to select the
> >> +      * reference clock id to be used for timestamps.
> >> +      */
> >> +     switch (clk_id) {
> >> +     case CLOCK_MONOTONIC:
> >> +             return &ktime_get_ns;
> >> +     case CLOCK_MONOTONIC_RAW:
> >> +             return &ktime_get_raw_ns;
> >> +     case CLOCK_REALTIME:
> >> +             return &ktime_get_real_ns;
> >> +     case CLOCK_BOOTTIME:
> >> +             return &ktime_get_boottime_ns;
> >> +     case CLOCK_TAI:
> >> +             return &ktime_get_clocktai_ns;
> >> +     default:
> >> +             return NULL;
> >> +     }
> >> +}
> >> +
> >> +static inline int
> >> +__read_timestamps(struct intel_uncore *uncore,
> >> +               i915_reg_t lower_reg,
> >> +               i915_reg_t upper_reg,
> >> +               u64 *cs_ts,
> >> +               u64 *cpu_ts,
> >> +               __ktime_func_t cpu_clock)
> >> +{
> >> +     u32 upper, lower, old_upper, loop = 0;
> >> +
> >> +     upper = intel_uncore_read_fw(uncore, upper_reg);
> >> +     do {
> >> +             cpu_ts[1] = local_clock();
> >> +             cpu_ts[0] = cpu_clock();
> >> +             lower = intel_uncore_read_fw(uncore, lower_reg);
> >> +             cpu_ts[1] = local_clock() - cpu_ts[1];
> >> +             old_upper = upper;
> >> +             upper = intel_uncore_read_fw(uncore, upper_reg);
> >> +     } while (upper != old_upper && loop++ < 2);
> >> +
> >> +     *cs_ts = (u64)upper << 32 | lower;
> >> +
> >> +     return 0;
> >> +}
> >> +
> >> +static int
> >> +__query_cs_cycles(struct intel_engine_cs *engine,
> >> +               u64 *cs_ts, u64 *cpu_ts,
> >> +               __ktime_func_t cpu_clock)
> >> +{
> >> +     struct intel_uncore *uncore = engine->uncore;
> >> +     enum forcewake_domains fw_domains;
> >> +     u32 base = engine->mmio_base;
> >> +     intel_wakeref_t wakeref;
> >> +     int ret;
> >> +
> >> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
> >> +                                                 RING_TIMESTAMP(base),
> >> +                                                 FW_REG_READ);
> >> +
> >> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
> >> +             spin_lock_irq(&uncore->lock);
> >> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
> >> +
> >> +             ret = __read_timestamps(uncore,
> >> +                                     RING_TIMESTAMP(base),
> >> +                                     RING_TIMESTAMP_UDW(base),
> >> +                                     cs_ts,
> >> +                                     cpu_ts,
> >> +                                     cpu_clock);
> >> +
> >> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
> >> +             spin_unlock_irq(&uncore->lock);
> >> +     }
> >> +
> >> +     return ret;
> >> +}
> >> +
> >> +static int
> >> +query_cs_cycles(struct drm_i915_private *i915,
> >> +             struct drm_i915_query_item *query_item)
> >> +{
> >> +     struct drm_i915_query_cs_cycles __user *query_ptr;
> >> +     struct drm_i915_query_cs_cycles query;
> >> +     struct intel_engine_cs *engine;
> >> +     __ktime_func_t cpu_clock;
> >> +     int ret;
> >> +
> >> +     if (GRAPHICS_VER(i915) < 6)
> >> +             return -ENODEV;
> >> +
> >> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
> >> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
> >> +     if (ret != 0)
> >> +             return ret;
> >> +
> >> +     if (query.flags)
> >> +             return -EINVAL;
> >> +
> >> +     if (query.rsvd)
> >> +             return -EINVAL;
> >> +
> >> +     cpu_clock = __clock_id_to_func(query.clockid);
> >> +     if (!cpu_clock)
> >> +             return -EINVAL;
> >> +
> >> +     engine = intel_engine_lookup_user(i915,
> >> +                                       query.engine.engine_class,
> >> +                                       query.engine.engine_instance);
> >> +     if (!engine)
> >> +             return -EINVAL;
> >> +
> >> +     if (GRAPHICS_VER(i915) == 6 &&
> >> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
> >> +             return -ENODEV;
> >> +
> >> +     query.cs_frequency = engine->gt->clock_frequency;
> >> +     ret = __query_cs_cycles(engine,
> >> +                             &query.cs_cycles,
> >> +                             query.cpu_timestamp,
> >> +                             cpu_clock);
> >> +     if (ret)
> >> +             return ret;
> >> +
> >> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
> >> +             return -EFAULT;
> >> +
> >> +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
> >> +             return -EFAULT;
> >> +
> >> +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
> >> +             return -EFAULT;
> >> +
> >> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
> >> +             return -EFAULT;
> >> +
> >> +     return sizeof(query);
> >> +}
> >> +
> >>   static int
> >>   query_engine_info(struct drm_i915_private *i915,
> >>                  struct drm_i915_query_item *query_item)
> >> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
> >>        query_topology_info,
> >>        query_engine_info,
> >>        query_perf_config,
> >> +     query_cs_cycles,
> >>   };
> >>
> >>   int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> >> index 6a34243a7646..08b00f1709b5 100644
> >> --- a/include/uapi/drm/i915_drm.h
> >> +++ b/include/uapi/drm/i915_drm.h
> >> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
> >>   #define DRM_I915_QUERY_TOPOLOGY_INFO    1
> >>   #define DRM_I915_QUERY_ENGINE_INFO   2
> >>   #define DRM_I915_QUERY_PERF_CONFIG      3
> >> +     /**
> >> +      * Query Command Streamer timestamp register.
> >> +      */
> >> +#define DRM_I915_QUERY_CS_CYCLES     4
> >>   /* Must be kept compact -- no holes and well documented */
> >>
> >>        /**
> >> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
> >>        __u64 rsvd1[4];
> >>   };
> >>
> >> +/**
> >> + * struct drm_i915_query_cs_cycles
> >> + *
> >> + * The query returns the command streamer cycles and the frequency that can be
> >> + * used to calculate the command streamer timestamp. In addition the query
> >> + * returns a set of cpu timestamps that indicate when the command streamer cycle
> >> + * count was captured.
> >> + */
> >> +struct drm_i915_query_cs_cycles {
> >> +     /** Engine for which command streamer cycles is queried. */
> >> +     struct i915_engine_class_instance engine;
> >>
> >> Why is this per-engine?  Do we actually expect it to change between
> >> engines?
> >>
> >>
> >> Each engine has its own timestamp register.
> >>
> >>
> >>    If so, we may have a problem because Vulkan expects a
> >> unified timestamp domain for all command streamer timestamp queries.
> >>
> >>
> >> I don't think it does : "
> >>
> >> Timestamps may only be meaningfully compared if they are written by commands submitted to the same queue.
> > Yes but vkGetCalibratedTimestampsEXT() doesn't take a queue or even a
> > queue family.
>
>
> I know, I brought up the issue recently. See khronos issue 2551.

I guess this is what I get for not attending the Vulkan SI call
anymore.  Small price to pay....

So the answer is that we just stop exposing the DEVICE time domain as
soon as we start using anything other than RENDER?  Seems a bit rough
but should be doable.

> You might not like the resolution... I did propose to do a rev2 of the
> extension to let the user specify the queue.
>
> We can still do that in the future.

Yeah, I think we'll want to do something if we care about this
extension.  One option would be to make it take a queue family.
Another would be to expose it as one domain per queue family.
Anyway... that's a discussion for another forum.

> >    Also, VkPhysicalDeviceLimits::timestampPeriod gives a
> > single timestampPeriod for all queues.
>
>
> That is fine for us, we should have the same period on all command
> streamers.

I guess I've got no problem returning the period as part of this
query.  ANV should probably assert that it's what it expects, though.

> -Lionel
>
>
> >    It's possible that Vulkan
> > messed up real bad there but I thought we did a HW survey at the time
> > and determined that it was ok.
> >
> > --Jason
> >
> >
> >> " [1]
> >>
> >>
> >> [1] : https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html
> >>
> >>
> >> -Lionel
> >>
> >>
> >>
> >> --Jason
> >>
> >>
> >> +     /** Must be zero. */
> >> +     __u32 flags;
> >> +
> >> +     /**
> >> +      * Command streamer cycles as read from the command streamer
> >> +      * register at 0x358 offset.
> >> +      */
> >> +     __u64 cs_cycles;
> >> +
> >> +     /** Frequency of the cs cycles in Hz. */
> >> +     __u64 cs_frequency;
> >> +
> >> +     /**
> >> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
> >> +      * cs_cycles register using the reference clockid set by the user.
> >> +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
> >> +      * the cs_cycles register.
> >> +      */
> >> +     __u64 cpu_timestamp[2];

I think the API would be more clear if we had separate cpu_timestamp
and cpu_delta fields or something like that.  That or make
cpu_timestamp[1] the end time rather than a delta.  It's weird to have
an array where the first entry is absolute and the second entry is a
delta.

--Jason


> >> +
> >> +     /**
> >> +      * Reference clock id for CPU timestamp. For definition, see
> >> +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> >> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> >> +      * CLOCK_TAI.
> >> +      */
> >> +     __s32 clockid;
> >> +
> >> +     /** Must be zero. */
> >> +     __u32 rsvd;
> >> +};
> >> +
> >>   /**
> >>    * struct drm_i915_query_engine_info
> >>    *
> >>
> >> --
> >> Jani Nikula, Intel Open Source Graphics Center
> >>
> >>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
  2021-04-28 20:45               ` [Intel-gfx] " Jason Ekstrand
@ 2021-04-28 21:18                 ` Lionel Landwerlin
  -1 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2021-04-28 21:18 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Intel GFX, Maling list - DRI developers, Chris Wilson,
	Umesh Nerlige Ramappa

On 28/04/2021 23:45, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 3:14 PM Lionel Landwerlin
> <lionel.g.landwerlin@intel.com> wrote:
>> On 28/04/2021 22:54, Jason Ekstrand wrote:
>>> On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
>>> <lionel.g.landwerlin@intel.com> wrote:
>>>> On 28/04/2021 22:24, Jason Ekstrand wrote:
>>>>
>>>> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>>>>
>>>> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
>>>>
>>>> Perf measurements rely on CPU and engine timestamps to correlate
>>>> events of interest across these time domains. Current mechanisms get
>>>> these timestamps separately and the calculated delta between these
>>>> timestamps lack enough accuracy.
>>>>
>>>> To improve the accuracy of these time measurements to within a few us,
>>>> add a query that returns the engine and cpu timestamps captured as
>>>> close to each other as possible.
>>>>
>>>> Cc: dri-devel, Jason and Daniel for review.
>>>>
>>>> Thanks!
>>>>
>>>> v2: (Tvrtko)
>>>> - document clock reference used
>>>> - return cpu timestamp always
>>>> - capture cpu time just before lower dword of cs timestamp
>>>>
>>>> v3: (Chris)
>>>> - use uncore-rpm
>>>> - use __query_cs_timestamp helper
>>>>
>>>> v4: (Lionel)
>>>> - Kernel perf subsytem allows users to specify the clock id to be used
>>>>     in perf_event_open. This clock id is used by the perf subsystem to
>>>>     return the appropriate cpu timestamp in perf events. Similarly, let
>>>>     the user pass the clockid to this query so that cpu timestamp
>>>>     corresponds to the clock id requested.
>>>>
>>>> v5: (Tvrtko)
>>>> - Use normal ktime accessors instead of fast versions
>>>> - Add more uApi documentation
>>>>
>>>> v6: (Lionel)
>>>> - Move switch out of spinlock
>>>>
>>>> v7: (Chris)
>>>> - cs_timestamp is a misnomer, use cs_cycles instead
>>>> - return the cs cycle frequency as well in the query
>>>>
>>>> v8:
>>>> - Add platform and engine specific checks
>>>>
>>>> v9: (Lionel)
>>>> - Return 2 cpu timestamps in the query - captured before and after the
>>>>     register read
>>>>
>>>> v10: (Chris)
>>>> - Use local_clock() to measure time taken to read lower dword of
>>>>     register and return it to user.
>>>>
>>>> v11: (Jani)
>>>> - IS_GEN deprecated. User GRAPHICS_VER instead.
>>>>
>>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
>>>>    include/uapi/drm/i915_drm.h       |  48 ++++++++++
>>>>    2 files changed, 193 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
>>>> index fed337ad7b68..2594b93901ac 100644
>>>> --- a/drivers/gpu/drm/i915/i915_query.c
>>>> +++ b/drivers/gpu/drm/i915/i915_query.c
>>>> @@ -6,6 +6,8 @@
>>>>
>>>>    #include <linux/nospec.h>
>>>>
>>>> +#include "gt/intel_engine_pm.h"
>>>> +#include "gt/intel_engine_user.h"
>>>>    #include "i915_drv.h"
>>>>    #include "i915_perf.h"
>>>>    #include "i915_query.h"
>>>> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>>>>         return total_length;
>>>>    }
>>>>
>>>> +typedef u64 (*__ktime_func_t)(void);
>>>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>>>> +{
>>>> +     /*
>>>> +      * Use logic same as the perf subsystem to allow user to select the
>>>> +      * reference clock id to be used for timestamps.
>>>> +      */
>>>> +     switch (clk_id) {
>>>> +     case CLOCK_MONOTONIC:
>>>> +             return &ktime_get_ns;
>>>> +     case CLOCK_MONOTONIC_RAW:
>>>> +             return &ktime_get_raw_ns;
>>>> +     case CLOCK_REALTIME:
>>>> +             return &ktime_get_real_ns;
>>>> +     case CLOCK_BOOTTIME:
>>>> +             return &ktime_get_boottime_ns;
>>>> +     case CLOCK_TAI:
>>>> +             return &ktime_get_clocktai_ns;
>>>> +     default:
>>>> +             return NULL;
>>>> +     }
>>>> +}
>>>> +
>>>> +static inline int
>>>> +__read_timestamps(struct intel_uncore *uncore,
>>>> +               i915_reg_t lower_reg,
>>>> +               i915_reg_t upper_reg,
>>>> +               u64 *cs_ts,
>>>> +               u64 *cpu_ts,
>>>> +               __ktime_func_t cpu_clock)
>>>> +{
>>>> +     u32 upper, lower, old_upper, loop = 0;
>>>> +
>>>> +     upper = intel_uncore_read_fw(uncore, upper_reg);
>>>> +     do {
>>>> +             cpu_ts[1] = local_clock();
>>>> +             cpu_ts[0] = cpu_clock();
>>>> +             lower = intel_uncore_read_fw(uncore, lower_reg);
>>>> +             cpu_ts[1] = local_clock() - cpu_ts[1];
>>>> +             old_upper = upper;
>>>> +             upper = intel_uncore_read_fw(uncore, upper_reg);
>>>> +     } while (upper != old_upper && loop++ < 2);
>>>> +
>>>> +     *cs_ts = (u64)upper << 32 | lower;
>>>> +
>>>> +     return 0;
>>>> +}
>>>> +
>>>> +static int
>>>> +__query_cs_cycles(struct intel_engine_cs *engine,
>>>> +               u64 *cs_ts, u64 *cpu_ts,
>>>> +               __ktime_func_t cpu_clock)
>>>> +{
>>>> +     struct intel_uncore *uncore = engine->uncore;
>>>> +     enum forcewake_domains fw_domains;
>>>> +     u32 base = engine->mmio_base;
>>>> +     intel_wakeref_t wakeref;
>>>> +     int ret;
>>>> +
>>>> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
>>>> +                                                 RING_TIMESTAMP(base),
>>>> +                                                 FW_REG_READ);
>>>> +
>>>> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
>>>> +             spin_lock_irq(&uncore->lock);
>>>> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
>>>> +
>>>> +             ret = __read_timestamps(uncore,
>>>> +                                     RING_TIMESTAMP(base),
>>>> +                                     RING_TIMESTAMP_UDW(base),
>>>> +                                     cs_ts,
>>>> +                                     cpu_ts,
>>>> +                                     cpu_clock);
>>>> +
>>>> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
>>>> +             spin_unlock_irq(&uncore->lock);
>>>> +     }
>>>> +
>>>> +     return ret;
>>>> +}
>>>> +
>>>> +static int
>>>> +query_cs_cycles(struct drm_i915_private *i915,
>>>> +             struct drm_i915_query_item *query_item)
>>>> +{
>>>> +     struct drm_i915_query_cs_cycles __user *query_ptr;
>>>> +     struct drm_i915_query_cs_cycles query;
>>>> +     struct intel_engine_cs *engine;
>>>> +     __ktime_func_t cpu_clock;
>>>> +     int ret;
>>>> +
>>>> +     if (GRAPHICS_VER(i915) < 6)
>>>> +             return -ENODEV;
>>>> +
>>>> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
>>>> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
>>>> +     if (ret != 0)
>>>> +             return ret;
>>>> +
>>>> +     if (query.flags)
>>>> +             return -EINVAL;
>>>> +
>>>> +     if (query.rsvd)
>>>> +             return -EINVAL;
>>>> +
>>>> +     cpu_clock = __clock_id_to_func(query.clockid);
>>>> +     if (!cpu_clock)
>>>> +             return -EINVAL;
>>>> +
>>>> +     engine = intel_engine_lookup_user(i915,
>>>> +                                       query.engine.engine_class,
>>>> +                                       query.engine.engine_instance);
>>>> +     if (!engine)
>>>> +             return -EINVAL;
>>>> +
>>>> +     if (GRAPHICS_VER(i915) == 6 &&
>>>> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
>>>> +             return -ENODEV;
>>>> +
>>>> +     query.cs_frequency = engine->gt->clock_frequency;
>>>> +     ret = __query_cs_cycles(engine,
>>>> +                             &query.cs_cycles,
>>>> +                             query.cpu_timestamp,
>>>> +                             cpu_clock);
>>>> +     if (ret)
>>>> +             return ret;
>>>> +
>>>> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
>>>> +             return -EFAULT;
>>>> +
>>>> +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
>>>> +             return -EFAULT;
>>>> +
>>>> +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
>>>> +             return -EFAULT;
>>>> +
>>>> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
>>>> +             return -EFAULT;
>>>> +
>>>> +     return sizeof(query);
>>>> +}
>>>> +
>>>>    static int
>>>>    query_engine_info(struct drm_i915_private *i915,
>>>>                   struct drm_i915_query_item *query_item)
>>>> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>>>>         query_topology_info,
>>>>         query_engine_info,
>>>>         query_perf_config,
>>>> +     query_cs_cycles,
>>>>    };
>>>>
>>>>    int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>> index 6a34243a7646..08b00f1709b5 100644
>>>> --- a/include/uapi/drm/i915_drm.h
>>>> +++ b/include/uapi/drm/i915_drm.h
>>>> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>>>>    #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>>>>    #define DRM_I915_QUERY_ENGINE_INFO   2
>>>>    #define DRM_I915_QUERY_PERF_CONFIG      3
>>>> +     /**
>>>> +      * Query Command Streamer timestamp register.
>>>> +      */
>>>> +#define DRM_I915_QUERY_CS_CYCLES     4
>>>>    /* Must be kept compact -- no holes and well documented */
>>>>
>>>>         /**
>>>> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>>>>         __u64 rsvd1[4];
>>>>    };
>>>>
>>>> +/**
>>>> + * struct drm_i915_query_cs_cycles
>>>> + *
>>>> + * The query returns the command streamer cycles and the frequency that can be
>>>> + * used to calculate the command streamer timestamp. In addition the query
>>>> + * returns a set of cpu timestamps that indicate when the command streamer cycle
>>>> + * count was captured.
>>>> + */
>>>> +struct drm_i915_query_cs_cycles {
>>>> +     /** Engine for which command streamer cycles is queried. */
>>>> +     struct i915_engine_class_instance engine;
>>>>
>>>> Why is this per-engine?  Do we actually expect it to change between
>>>> engines?
>>>>
>>>>
>>>> Each engine has its own timestamp register.
>>>>
>>>>
>>>>     If so, we may have a problem because Vulkan expects a
>>>> unified timestamp domain for all command streamer timestamp queries.
>>>>
>>>>
>>>> I don't think it does : "
>>>>
>>>> Timestamps may only be meaningfully compared if they are written by commands submitted to the same queue.
>>> Yes but vkGetCalibratedTimestampsEXT() doesn't take a queue or even a
>>> queue family.
>>
>> I know, I brought up the issue recently. See khronos issue 2551.
> I guess this is what I get for not attending the Vulkan SI call
> anymore.  Small price to pay....
>
> So the answer is that we just stop exposing the DEVICE time domain as
> soon as we start using anything other than RENDER?  Seems a bit rough
> but should be doable.
>
>> You might not like the resolution... I did propose to do a rev2 of the
>> extension to let the user specify the queue.
>>
>> We can still do that in the future.
> Yeah, I think we'll want to do something if we care about this
> extension.  One option would be to make it take a queue family.
> Another would be to expose it as one domain per queue family.
> Anyway... that's a discussion for another forum.
>
>>>     Also, VkPhysicalDeviceLimits::timestampPeriod gives a
>>> single timestampPeriod for all queues.
>>
>> That is fine for us, we should have the same period on all command
>> streamers.
> I guess I've got no problem returning the period as part of this
> query.  ANV should probably assert that it's what it expects, though.
>
>> -Lionel
>>
>>
>>>     It's possible that Vulkan
>>> messed up real bad there but I thought we did a HW survey at the time
>>> and determined that it was ok.
>>>
>>> --Jason
>>>
>>>
>>>> " [1]
>>>>
>>>>
>>>> [1] : https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html
>>>>
>>>>
>>>> -Lionel
>>>>
>>>>
>>>>
>>>> --Jason
>>>>
>>>>
>>>> +     /** Must be zero. */
>>>> +     __u32 flags;
>>>> +
>>>> +     /**
>>>> +      * Command streamer cycles as read from the command streamer
>>>> +      * register at 0x358 offset.
>>>> +      */
>>>> +     __u64 cs_cycles;
>>>> +
>>>> +     /** Frequency of the cs cycles in Hz. */
>>>> +     __u64 cs_frequency;
>>>> +
>>>> +     /**
>>>> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
>>>> +      * cs_cycles register using the reference clockid set by the user.
>>>> +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
>>>> +      * the cs_cycles register.
>>>> +      */
>>>> +     __u64 cpu_timestamp[2];
> I think the API would be more clear if we had separate cpu_timestamp
> and cpu_delta fields or something like that.  That or make
> cpu_timestamp[1] the end time rather than a delta.  It's weird to have
> an array where the first entry is absolute and the second entry is a
> delta.


Oh dear... I did not notice that :(

I thought that was just a little dance to save local variable...

Agreed, 2 different names or 2 snapshots.


-Lionel


>
> --Jason
>
>
>>>> +
>>>> +     /**
>>>> +      * Reference clock id for CPU timestamp. For definition, see
>>>> +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
>>>> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
>>>> +      * CLOCK_TAI.
>>>> +      */
>>>> +     __s32 clockid;
>>>> +
>>>> +     /** Must be zero. */
>>>> +     __u32 rsvd;
>>>> +};
>>>> +
>>>>    /**
>>>>     * struct drm_i915_query_engine_info
>>>>     *
>>>>
>>>> --
>>>> Jani Nikula, Intel Open Source Graphics Center
>>>>
>>>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
@ 2021-04-28 21:18                 ` Lionel Landwerlin
  0 siblings, 0 replies; 26+ messages in thread
From: Lionel Landwerlin @ 2021-04-28 21:18 UTC (permalink / raw)
  To: Jason Ekstrand; +Cc: Intel GFX, Maling list - DRI developers, Chris Wilson

On 28/04/2021 23:45, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 3:14 PM Lionel Landwerlin
> <lionel.g.landwerlin@intel.com> wrote:
>> On 28/04/2021 22:54, Jason Ekstrand wrote:
>>> On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
>>> <lionel.g.landwerlin@intel.com> wrote:
>>>> On 28/04/2021 22:24, Jason Ekstrand wrote:
>>>>
>>>> On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>>>>
>>>> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
>>>>
>>>> Perf measurements rely on CPU and engine timestamps to correlate
>>>> events of interest across these time domains. Current mechanisms get
>>>> these timestamps separately and the calculated delta between these
>>>> timestamps lack enough accuracy.
>>>>
>>>> To improve the accuracy of these time measurements to within a few us,
>>>> add a query that returns the engine and cpu timestamps captured as
>>>> close to each other as possible.
>>>>
>>>> Cc: dri-devel, Jason and Daniel for review.
>>>>
>>>> Thanks!
>>>>
>>>> v2: (Tvrtko)
>>>> - document clock reference used
>>>> - return cpu timestamp always
>>>> - capture cpu time just before lower dword of cs timestamp
>>>>
>>>> v3: (Chris)
>>>> - use uncore-rpm
>>>> - use __query_cs_timestamp helper
>>>>
>>>> v4: (Lionel)
>>>> - Kernel perf subsytem allows users to specify the clock id to be used
>>>>     in perf_event_open. This clock id is used by the perf subsystem to
>>>>     return the appropriate cpu timestamp in perf events. Similarly, let
>>>>     the user pass the clockid to this query so that cpu timestamp
>>>>     corresponds to the clock id requested.
>>>>
>>>> v5: (Tvrtko)
>>>> - Use normal ktime accessors instead of fast versions
>>>> - Add more uApi documentation
>>>>
>>>> v6: (Lionel)
>>>> - Move switch out of spinlock
>>>>
>>>> v7: (Chris)
>>>> - cs_timestamp is a misnomer, use cs_cycles instead
>>>> - return the cs cycle frequency as well in the query
>>>>
>>>> v8:
>>>> - Add platform and engine specific checks
>>>>
>>>> v9: (Lionel)
>>>> - Return 2 cpu timestamps in the query - captured before and after the
>>>>     register read
>>>>
>>>> v10: (Chris)
>>>> - Use local_clock() to measure time taken to read lower dword of
>>>>     register and return it to user.
>>>>
>>>> v11: (Jani)
>>>> - IS_GEN deprecated. User GRAPHICS_VER instead.
>>>>
>>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
>>>>    include/uapi/drm/i915_drm.h       |  48 ++++++++++
>>>>    2 files changed, 193 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
>>>> index fed337ad7b68..2594b93901ac 100644
>>>> --- a/drivers/gpu/drm/i915/i915_query.c
>>>> +++ b/drivers/gpu/drm/i915/i915_query.c
>>>> @@ -6,6 +6,8 @@
>>>>
>>>>    #include <linux/nospec.h>
>>>>
>>>> +#include "gt/intel_engine_pm.h"
>>>> +#include "gt/intel_engine_user.h"
>>>>    #include "i915_drv.h"
>>>>    #include "i915_perf.h"
>>>>    #include "i915_query.h"
>>>> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
>>>>         return total_length;
>>>>    }
>>>>
>>>> +typedef u64 (*__ktime_func_t)(void);
>>>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
>>>> +{
>>>> +     /*
>>>> +      * Use logic same as the perf subsystem to allow user to select the
>>>> +      * reference clock id to be used for timestamps.
>>>> +      */
>>>> +     switch (clk_id) {
>>>> +     case CLOCK_MONOTONIC:
>>>> +             return &ktime_get_ns;
>>>> +     case CLOCK_MONOTONIC_RAW:
>>>> +             return &ktime_get_raw_ns;
>>>> +     case CLOCK_REALTIME:
>>>> +             return &ktime_get_real_ns;
>>>> +     case CLOCK_BOOTTIME:
>>>> +             return &ktime_get_boottime_ns;
>>>> +     case CLOCK_TAI:
>>>> +             return &ktime_get_clocktai_ns;
>>>> +     default:
>>>> +             return NULL;
>>>> +     }
>>>> +}
>>>> +
>>>> +static inline int
>>>> +__read_timestamps(struct intel_uncore *uncore,
>>>> +               i915_reg_t lower_reg,
>>>> +               i915_reg_t upper_reg,
>>>> +               u64 *cs_ts,
>>>> +               u64 *cpu_ts,
>>>> +               __ktime_func_t cpu_clock)
>>>> +{
>>>> +     u32 upper, lower, old_upper, loop = 0;
>>>> +
>>>> +     upper = intel_uncore_read_fw(uncore, upper_reg);
>>>> +     do {
>>>> +             cpu_ts[1] = local_clock();
>>>> +             cpu_ts[0] = cpu_clock();
>>>> +             lower = intel_uncore_read_fw(uncore, lower_reg);
>>>> +             cpu_ts[1] = local_clock() - cpu_ts[1];
>>>> +             old_upper = upper;
>>>> +             upper = intel_uncore_read_fw(uncore, upper_reg);
>>>> +     } while (upper != old_upper && loop++ < 2);
>>>> +
>>>> +     *cs_ts = (u64)upper << 32 | lower;
>>>> +
>>>> +     return 0;
>>>> +}
>>>> +
>>>> +static int
>>>> +__query_cs_cycles(struct intel_engine_cs *engine,
>>>> +               u64 *cs_ts, u64 *cpu_ts,
>>>> +               __ktime_func_t cpu_clock)
>>>> +{
>>>> +     struct intel_uncore *uncore = engine->uncore;
>>>> +     enum forcewake_domains fw_domains;
>>>> +     u32 base = engine->mmio_base;
>>>> +     intel_wakeref_t wakeref;
>>>> +     int ret;
>>>> +
>>>> +     fw_domains = intel_uncore_forcewake_for_reg(uncore,
>>>> +                                                 RING_TIMESTAMP(base),
>>>> +                                                 FW_REG_READ);
>>>> +
>>>> +     with_intel_runtime_pm(uncore->rpm, wakeref) {
>>>> +             spin_lock_irq(&uncore->lock);
>>>> +             intel_uncore_forcewake_get__locked(uncore, fw_domains);
>>>> +
>>>> +             ret = __read_timestamps(uncore,
>>>> +                                     RING_TIMESTAMP(base),
>>>> +                                     RING_TIMESTAMP_UDW(base),
>>>> +                                     cs_ts,
>>>> +                                     cpu_ts,
>>>> +                                     cpu_clock);
>>>> +
>>>> +             intel_uncore_forcewake_put__locked(uncore, fw_domains);
>>>> +             spin_unlock_irq(&uncore->lock);
>>>> +     }
>>>> +
>>>> +     return ret;
>>>> +}
>>>> +
>>>> +static int
>>>> +query_cs_cycles(struct drm_i915_private *i915,
>>>> +             struct drm_i915_query_item *query_item)
>>>> +{
>>>> +     struct drm_i915_query_cs_cycles __user *query_ptr;
>>>> +     struct drm_i915_query_cs_cycles query;
>>>> +     struct intel_engine_cs *engine;
>>>> +     __ktime_func_t cpu_clock;
>>>> +     int ret;
>>>> +
>>>> +     if (GRAPHICS_VER(i915) < 6)
>>>> +             return -ENODEV;
>>>> +
>>>> +     query_ptr = u64_to_user_ptr(query_item->data_ptr);
>>>> +     ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
>>>> +     if (ret != 0)
>>>> +             return ret;
>>>> +
>>>> +     if (query.flags)
>>>> +             return -EINVAL;
>>>> +
>>>> +     if (query.rsvd)
>>>> +             return -EINVAL;
>>>> +
>>>> +     cpu_clock = __clock_id_to_func(query.clockid);
>>>> +     if (!cpu_clock)
>>>> +             return -EINVAL;
>>>> +
>>>> +     engine = intel_engine_lookup_user(i915,
>>>> +                                       query.engine.engine_class,
>>>> +                                       query.engine.engine_instance);
>>>> +     if (!engine)
>>>> +             return -EINVAL;
>>>> +
>>>> +     if (GRAPHICS_VER(i915) == 6 &&
>>>> +         query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
>>>> +             return -ENODEV;
>>>> +
>>>> +     query.cs_frequency = engine->gt->clock_frequency;
>>>> +     ret = __query_cs_cycles(engine,
>>>> +                             &query.cs_cycles,
>>>> +                             query.cpu_timestamp,
>>>> +                             cpu_clock);
>>>> +     if (ret)
>>>> +             return ret;
>>>> +
>>>> +     if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
>>>> +             return -EFAULT;
>>>> +
>>>> +     if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
>>>> +             return -EFAULT;
>>>> +
>>>> +     if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
>>>> +             return -EFAULT;
>>>> +
>>>> +     if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
>>>> +             return -EFAULT;
>>>> +
>>>> +     return sizeof(query);
>>>> +}
>>>> +
>>>>    static int
>>>>    query_engine_info(struct drm_i915_private *i915,
>>>>                   struct drm_i915_query_item *query_item)
>>>> @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>>>>         query_topology_info,
>>>>         query_engine_info,
>>>>         query_perf_config,
>>>> +     query_cs_cycles,
>>>>    };
>>>>
>>>>    int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>> index 6a34243a7646..08b00f1709b5 100644
>>>> --- a/include/uapi/drm/i915_drm.h
>>>> +++ b/include/uapi/drm/i915_drm.h
>>>> @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
>>>>    #define DRM_I915_QUERY_TOPOLOGY_INFO    1
>>>>    #define DRM_I915_QUERY_ENGINE_INFO   2
>>>>    #define DRM_I915_QUERY_PERF_CONFIG      3
>>>> +     /**
>>>> +      * Query Command Streamer timestamp register.
>>>> +      */
>>>> +#define DRM_I915_QUERY_CS_CYCLES     4
>>>>    /* Must be kept compact -- no holes and well documented */
>>>>
>>>>         /**
>>>> @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
>>>>         __u64 rsvd1[4];
>>>>    };
>>>>
>>>> +/**
>>>> + * struct drm_i915_query_cs_cycles
>>>> + *
>>>> + * The query returns the command streamer cycles and the frequency that can be
>>>> + * used to calculate the command streamer timestamp. In addition the query
>>>> + * returns a set of cpu timestamps that indicate when the command streamer cycle
>>>> + * count was captured.
>>>> + */
>>>> +struct drm_i915_query_cs_cycles {
>>>> +     /** Engine for which command streamer cycles is queried. */
>>>> +     struct i915_engine_class_instance engine;
>>>>
>>>> Why is this per-engine?  Do we actually expect it to change between
>>>> engines?
>>>>
>>>>
>>>> Each engine has its own timestamp register.
>>>>
>>>>
>>>>     If so, we may have a problem because Vulkan expects a
>>>> unified timestamp domain for all command streamer timestamp queries.
>>>>
>>>>
>>>> I don't think it does : "
>>>>
>>>> Timestamps may only be meaningfully compared if they are written by commands submitted to the same queue.
>>> Yes but vkGetCalibratedTimestampsEXT() doesn't take a queue or even a
>>> queue family.
>>
>> I know, I brought up the issue recently. See khronos issue 2551.
> I guess this is what I get for not attending the Vulkan SI call
> anymore.  Small price to pay....
>
> So the answer is that we just stop exposing the DEVICE time domain as
> soon as we start using anything other than RENDER?  Seems a bit rough
> but should be doable.
>
>> You might not like the resolution... I did propose to do a rev2 of the
>> extension to let the user specify the queue.
>>
>> We can still do that in the future.
> Yeah, I think we'll want to do something if we care about this
> extension.  One option would be to make it take a queue family.
> Another would be to expose it as one domain per queue family.
> Anyway... that's a discussion for another forum.
>
>>>     Also, VkPhysicalDeviceLimits::timestampPeriod gives a
>>> single timestampPeriod for all queues.
>>
>> That is fine for us, we should have the same period on all command
>> streamers.
> I guess I've got no problem returning the period as part of this
> query.  ANV should probably assert that it's what it expects, though.
>
>> -Lionel
>>
>>
>>>     It's possible that Vulkan
>>> messed up real bad there but I thought we did a HW survey at the time
>>> and determined that it was ok.
>>>
>>> --Jason
>>>
>>>
>>>> " [1]
>>>>
>>>>
>>>> [1] : https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdWriteTimestamp.html
>>>>
>>>>
>>>> -Lionel
>>>>
>>>>
>>>>
>>>> --Jason
>>>>
>>>>
>>>> +     /** Must be zero. */
>>>> +     __u32 flags;
>>>> +
>>>> +     /**
>>>> +      * Command streamer cycles as read from the command streamer
>>>> +      * register at 0x358 offset.
>>>> +      */
>>>> +     __u64 cs_cycles;
>>>> +
>>>> +     /** Frequency of the cs cycles in Hz. */
>>>> +     __u64 cs_frequency;
>>>> +
>>>> +     /**
>>>> +      * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
>>>> +      * cs_cycles register using the reference clockid set by the user.
>>>> +      * cpu_timestamp[1] is the time taken in ns to read the lower dword of
>>>> +      * the cs_cycles register.
>>>> +      */
>>>> +     __u64 cpu_timestamp[2];
> I think the API would be more clear if we had separate cpu_timestamp
> and cpu_delta fields or something like that.  That or make
> cpu_timestamp[1] the end time rather than a delta.  It's weird to have
> an array where the first entry is absolute and the second entry is a
> delta.


Oh dear... I did not notice that :(

I thought that was just a little dance to save local variable...

Agreed, 2 different names or 2 snapshots.


-Lionel


>
> --Jason
>
>
>>>> +
>>>> +     /**
>>>> +      * Reference clock id for CPU timestamp. For definition, see
>>>> +      * clock_gettime(2) and perf_event_open(2). Supported clock ids are
>>>> +      * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
>>>> +      * CLOCK_TAI.
>>>> +      */
>>>> +     __s32 clockid;
>>>> +
>>>> +     /** Must be zero. */
>>>> +     __u32 rsvd;
>>>> +};
>>>> +
>>>>    /**
>>>>     * struct drm_i915_query_engine_info
>>>>     *
>>>>
>>>> --
>>>> Jani Nikula, Intel Open Source Graphics Center
>>>>
>>>>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
  2021-04-28  8:43     ` [Intel-gfx] " Jani Nikula
@ 2021-04-29 11:15       ` Daniel Vetter
  -1 siblings, 0 replies; 26+ messages in thread
From: Daniel Vetter @ 2021-04-29 11:15 UTC (permalink / raw)
  To: Jani Nikula
  Cc: intel-gfx, dri-devel, Chris Wilson, Jason Ekstrand,
	Umesh Nerlige Ramappa

On Wed, Apr 28, 2021 at 11:43:22AM +0300, Jani Nikula wrote:
> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
> > Perf measurements rely on CPU and engine timestamps to correlate
> > events of interest across these time domains. Current mechanisms get
> > these timestamps separately and the calculated delta between these
> > timestamps lack enough accuracy.
> >
> > To improve the accuracy of these time measurements to within a few us,
> > add a query that returns the engine and cpu timestamps captured as
> > close to each other as possible.
> 
> Cc: dri-devel, Jason and Daniel for review.

Yeah going forward pls cc: dri-devel on everything touching
gem/core/scheduler and anything related. Review for these is supposed to
be cc: dri-devel, including anything that's in-flight right now.

Thanks, Daniel

> 
> >
> > v2: (Tvrtko)
> > - document clock reference used
> > - return cpu timestamp always
> > - capture cpu time just before lower dword of cs timestamp
> >
> > v3: (Chris)
> > - use uncore-rpm
> > - use __query_cs_timestamp helper
> >
> > v4: (Lionel)
> > - Kernel perf subsytem allows users to specify the clock id to be used
> >   in perf_event_open. This clock id is used by the perf subsystem to
> >   return the appropriate cpu timestamp in perf events. Similarly, let
> >   the user pass the clockid to this query so that cpu timestamp
> >   corresponds to the clock id requested.
> >
> > v5: (Tvrtko)
> > - Use normal ktime accessors instead of fast versions
> > - Add more uApi documentation
> >
> > v6: (Lionel)
> > - Move switch out of spinlock
> >
> > v7: (Chris)
> > - cs_timestamp is a misnomer, use cs_cycles instead
> > - return the cs cycle frequency as well in the query
> >
> > v8:
> > - Add platform and engine specific checks
> >
> > v9: (Lionel)
> > - Return 2 cpu timestamps in the query - captured before and after the
> >   register read
> >
> > v10: (Chris)
> > - Use local_clock() to measure time taken to read lower dword of
> >   register and return it to user.
> >
> > v11: (Jani)
> > - IS_GEN deprecated. User GRAPHICS_VER instead.
> >
> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
> >  include/uapi/drm/i915_drm.h       |  48 ++++++++++
> >  2 files changed, 193 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> > index fed337ad7b68..2594b93901ac 100644
> > --- a/drivers/gpu/drm/i915/i915_query.c
> > +++ b/drivers/gpu/drm/i915/i915_query.c
> > @@ -6,6 +6,8 @@
> >  
> >  #include <linux/nospec.h>
> >  
> > +#include "gt/intel_engine_pm.h"
> > +#include "gt/intel_engine_user.h"
> >  #include "i915_drv.h"
> >  #include "i915_perf.h"
> >  #include "i915_query.h"
> > @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
> >  	return total_length;
> >  }
> >  
> > +typedef u64 (*__ktime_func_t)(void);
> > +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> > +{
> > +	/*
> > +	 * Use logic same as the perf subsystem to allow user to select the
> > +	 * reference clock id to be used for timestamps.
> > +	 */
> > +	switch (clk_id) {
> > +	case CLOCK_MONOTONIC:
> > +		return &ktime_get_ns;
> > +	case CLOCK_MONOTONIC_RAW:
> > +		return &ktime_get_raw_ns;
> > +	case CLOCK_REALTIME:
> > +		return &ktime_get_real_ns;
> > +	case CLOCK_BOOTTIME:
> > +		return &ktime_get_boottime_ns;
> > +	case CLOCK_TAI:
> > +		return &ktime_get_clocktai_ns;
> > +	default:
> > +		return NULL;
> > +	}
> > +}
> > +
> > +static inline int
> > +__read_timestamps(struct intel_uncore *uncore,
> > +		  i915_reg_t lower_reg,
> > +		  i915_reg_t upper_reg,
> > +		  u64 *cs_ts,
> > +		  u64 *cpu_ts,
> > +		  __ktime_func_t cpu_clock)
> > +{
> > +	u32 upper, lower, old_upper, loop = 0;
> > +
> > +	upper = intel_uncore_read_fw(uncore, upper_reg);
> > +	do {
> > +		cpu_ts[1] = local_clock();
> > +		cpu_ts[0] = cpu_clock();
> > +		lower = intel_uncore_read_fw(uncore, lower_reg);
> > +		cpu_ts[1] = local_clock() - cpu_ts[1];
> > +		old_upper = upper;
> > +		upper = intel_uncore_read_fw(uncore, upper_reg);
> > +	} while (upper != old_upper && loop++ < 2);
> > +
> > +	*cs_ts = (u64)upper << 32 | lower;
> > +
> > +	return 0;
> > +}
> > +
> > +static int
> > +__query_cs_cycles(struct intel_engine_cs *engine,
> > +		  u64 *cs_ts, u64 *cpu_ts,
> > +		  __ktime_func_t cpu_clock)
> > +{
> > +	struct intel_uncore *uncore = engine->uncore;
> > +	enum forcewake_domains fw_domains;
> > +	u32 base = engine->mmio_base;
> > +	intel_wakeref_t wakeref;
> > +	int ret;
> > +
> > +	fw_domains = intel_uncore_forcewake_for_reg(uncore,
> > +						    RING_TIMESTAMP(base),
> > +						    FW_REG_READ);
> > +
> > +	with_intel_runtime_pm(uncore->rpm, wakeref) {
> > +		spin_lock_irq(&uncore->lock);
> > +		intel_uncore_forcewake_get__locked(uncore, fw_domains);
> > +
> > +		ret = __read_timestamps(uncore,
> > +					RING_TIMESTAMP(base),
> > +					RING_TIMESTAMP_UDW(base),
> > +					cs_ts,
> > +					cpu_ts,
> > +					cpu_clock);
> > +
> > +		intel_uncore_forcewake_put__locked(uncore, fw_domains);
> > +		spin_unlock_irq(&uncore->lock);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static int
> > +query_cs_cycles(struct drm_i915_private *i915,
> > +		struct drm_i915_query_item *query_item)
> > +{
> > +	struct drm_i915_query_cs_cycles __user *query_ptr;
> > +	struct drm_i915_query_cs_cycles query;
> > +	struct intel_engine_cs *engine;
> > +	__ktime_func_t cpu_clock;
> > +	int ret;
> > +
> > +	if (GRAPHICS_VER(i915) < 6)
> > +		return -ENODEV;
> > +
> > +	query_ptr = u64_to_user_ptr(query_item->data_ptr);
> > +	ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
> > +	if (ret != 0)
> > +		return ret;
> > +
> > +	if (query.flags)
> > +		return -EINVAL;
> > +
> > +	if (query.rsvd)
> > +		return -EINVAL;
> > +
> > +	cpu_clock = __clock_id_to_func(query.clockid);
> > +	if (!cpu_clock)
> > +		return -EINVAL;
> > +
> > +	engine = intel_engine_lookup_user(i915,
> > +					  query.engine.engine_class,
> > +					  query.engine.engine_instance);
> > +	if (!engine)
> > +		return -EINVAL;
> > +
> > +	if (GRAPHICS_VER(i915) == 6 &&
> > +	    query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
> > +		return -ENODEV;
> > +
> > +	query.cs_frequency = engine->gt->clock_frequency;
> > +	ret = __query_cs_cycles(engine,
> > +				&query.cs_cycles,
> > +				query.cpu_timestamp,
> > +				cpu_clock);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
> > +		return -EFAULT;
> > +
> > +	if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
> > +		return -EFAULT;
> > +
> > +	if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
> > +		return -EFAULT;
> > +
> > +	if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
> > +		return -EFAULT;
> > +
> > +	return sizeof(query);
> > +}
> > +
> >  static int
> >  query_engine_info(struct drm_i915_private *i915,
> >  		  struct drm_i915_query_item *query_item)
> > @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
> >  	query_topology_info,
> >  	query_engine_info,
> >  	query_perf_config,
> > +	query_cs_cycles,
> >  };
> >  
> >  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 6a34243a7646..08b00f1709b5 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
> >  #define DRM_I915_QUERY_TOPOLOGY_INFO    1
> >  #define DRM_I915_QUERY_ENGINE_INFO	2
> >  #define DRM_I915_QUERY_PERF_CONFIG      3
> > +	/**
> > +	 * Query Command Streamer timestamp register.
> > +	 */
> > +#define DRM_I915_QUERY_CS_CYCLES	4
> >  /* Must be kept compact -- no holes and well documented */
> >  
> >  	/**
> > @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
> >  	__u64 rsvd1[4];
> >  };
> >  
> > +/**
> > + * struct drm_i915_query_cs_cycles
> > + *
> > + * The query returns the command streamer cycles and the frequency that can be
> > + * used to calculate the command streamer timestamp. In addition the query
> > + * returns a set of cpu timestamps that indicate when the command streamer cycle
> > + * count was captured.
> > + */
> > +struct drm_i915_query_cs_cycles {
> > +	/** Engine for which command streamer cycles is queried. */
> > +	struct i915_engine_class_instance engine;
> > +
> > +	/** Must be zero. */
> > +	__u32 flags;
> > +
> > +	/**
> > +	 * Command streamer cycles as read from the command streamer
> > +	 * register at 0x358 offset.
> > +	 */
> > +	__u64 cs_cycles;
> > +
> > +	/** Frequency of the cs cycles in Hz. */
> > +	__u64 cs_frequency;
> > +
> > +	/**
> > +	 * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
> > +	 * cs_cycles register using the reference clockid set by the user.
> > +	 * cpu_timestamp[1] is the time taken in ns to read the lower dword of
> > +	 * the cs_cycles register.
> > +	 */
> > +	__u64 cpu_timestamp[2];
> > +
> > +	/**
> > +	 * Reference clock id for CPU timestamp. For definition, see
> > +	 * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> > +	 * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> > +	 * CLOCK_TAI.
> > +	 */
> > +	__s32 clockid;
> > +
> > +	/** Must be zero. */
> > +	__u32 rsvd;
> > +};
> > +
> >  /**
> >   * struct drm_i915_query_engine_info
> >   *
> 
> -- 
> Jani Nikula, Intel Open Source Graphics Center

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
@ 2021-04-29 11:15       ` Daniel Vetter
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Vetter @ 2021-04-29 11:15 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx, dri-devel, Chris Wilson

On Wed, Apr 28, 2021 at 11:43:22AM +0300, Jani Nikula wrote:
> On Tue, 27 Apr 2021, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
> > Perf measurements rely on CPU and engine timestamps to correlate
> > events of interest across these time domains. Current mechanisms get
> > these timestamps separately and the calculated delta between these
> > timestamps lack enough accuracy.
> >
> > To improve the accuracy of these time measurements to within a few us,
> > add a query that returns the engine and cpu timestamps captured as
> > close to each other as possible.
> 
> Cc: dri-devel, Jason and Daniel for review.

Yeah going forward pls cc: dri-devel on everything touching
gem/core/scheduler and anything related. Review for these is supposed to
be cc: dri-devel, including anything that's in-flight right now.

Thanks, Daniel

> 
> >
> > v2: (Tvrtko)
> > - document clock reference used
> > - return cpu timestamp always
> > - capture cpu time just before lower dword of cs timestamp
> >
> > v3: (Chris)
> > - use uncore-rpm
> > - use __query_cs_timestamp helper
> >
> > v4: (Lionel)
> > - Kernel perf subsytem allows users to specify the clock id to be used
> >   in perf_event_open. This clock id is used by the perf subsystem to
> >   return the appropriate cpu timestamp in perf events. Similarly, let
> >   the user pass the clockid to this query so that cpu timestamp
> >   corresponds to the clock id requested.
> >
> > v5: (Tvrtko)
> > - Use normal ktime accessors instead of fast versions
> > - Add more uApi documentation
> >
> > v6: (Lionel)
> > - Move switch out of spinlock
> >
> > v7: (Chris)
> > - cs_timestamp is a misnomer, use cs_cycles instead
> > - return the cs cycle frequency as well in the query
> >
> > v8:
> > - Add platform and engine specific checks
> >
> > v9: (Lionel)
> > - Return 2 cpu timestamps in the query - captured before and after the
> >   register read
> >
> > v10: (Chris)
> > - Use local_clock() to measure time taken to read lower dword of
> >   register and return it to user.
> >
> > v11: (Jani)
> > - IS_GEN deprecated. User GRAPHICS_VER instead.
> >
> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_query.c | 145 ++++++++++++++++++++++++++++++
> >  include/uapi/drm/i915_drm.h       |  48 ++++++++++
> >  2 files changed, 193 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> > index fed337ad7b68..2594b93901ac 100644
> > --- a/drivers/gpu/drm/i915/i915_query.c
> > +++ b/drivers/gpu/drm/i915/i915_query.c
> > @@ -6,6 +6,8 @@
> >  
> >  #include <linux/nospec.h>
> >  
> > +#include "gt/intel_engine_pm.h"
> > +#include "gt/intel_engine_user.h"
> >  #include "i915_drv.h"
> >  #include "i915_perf.h"
> >  #include "i915_query.h"
> > @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
> >  	return total_length;
> >  }
> >  
> > +typedef u64 (*__ktime_func_t)(void);
> > +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> > +{
> > +	/*
> > +	 * Use logic same as the perf subsystem to allow user to select the
> > +	 * reference clock id to be used for timestamps.
> > +	 */
> > +	switch (clk_id) {
> > +	case CLOCK_MONOTONIC:
> > +		return &ktime_get_ns;
> > +	case CLOCK_MONOTONIC_RAW:
> > +		return &ktime_get_raw_ns;
> > +	case CLOCK_REALTIME:
> > +		return &ktime_get_real_ns;
> > +	case CLOCK_BOOTTIME:
> > +		return &ktime_get_boottime_ns;
> > +	case CLOCK_TAI:
> > +		return &ktime_get_clocktai_ns;
> > +	default:
> > +		return NULL;
> > +	}
> > +}
> > +
> > +static inline int
> > +__read_timestamps(struct intel_uncore *uncore,
> > +		  i915_reg_t lower_reg,
> > +		  i915_reg_t upper_reg,
> > +		  u64 *cs_ts,
> > +		  u64 *cpu_ts,
> > +		  __ktime_func_t cpu_clock)
> > +{
> > +	u32 upper, lower, old_upper, loop = 0;
> > +
> > +	upper = intel_uncore_read_fw(uncore, upper_reg);
> > +	do {
> > +		cpu_ts[1] = local_clock();
> > +		cpu_ts[0] = cpu_clock();
> > +		lower = intel_uncore_read_fw(uncore, lower_reg);
> > +		cpu_ts[1] = local_clock() - cpu_ts[1];
> > +		old_upper = upper;
> > +		upper = intel_uncore_read_fw(uncore, upper_reg);
> > +	} while (upper != old_upper && loop++ < 2);
> > +
> > +	*cs_ts = (u64)upper << 32 | lower;
> > +
> > +	return 0;
> > +}
> > +
> > +static int
> > +__query_cs_cycles(struct intel_engine_cs *engine,
> > +		  u64 *cs_ts, u64 *cpu_ts,
> > +		  __ktime_func_t cpu_clock)
> > +{
> > +	struct intel_uncore *uncore = engine->uncore;
> > +	enum forcewake_domains fw_domains;
> > +	u32 base = engine->mmio_base;
> > +	intel_wakeref_t wakeref;
> > +	int ret;
> > +
> > +	fw_domains = intel_uncore_forcewake_for_reg(uncore,
> > +						    RING_TIMESTAMP(base),
> > +						    FW_REG_READ);
> > +
> > +	with_intel_runtime_pm(uncore->rpm, wakeref) {
> > +		spin_lock_irq(&uncore->lock);
> > +		intel_uncore_forcewake_get__locked(uncore, fw_domains);
> > +
> > +		ret = __read_timestamps(uncore,
> > +					RING_TIMESTAMP(base),
> > +					RING_TIMESTAMP_UDW(base),
> > +					cs_ts,
> > +					cpu_ts,
> > +					cpu_clock);
> > +
> > +		intel_uncore_forcewake_put__locked(uncore, fw_domains);
> > +		spin_unlock_irq(&uncore->lock);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static int
> > +query_cs_cycles(struct drm_i915_private *i915,
> > +		struct drm_i915_query_item *query_item)
> > +{
> > +	struct drm_i915_query_cs_cycles __user *query_ptr;
> > +	struct drm_i915_query_cs_cycles query;
> > +	struct intel_engine_cs *engine;
> > +	__ktime_func_t cpu_clock;
> > +	int ret;
> > +
> > +	if (GRAPHICS_VER(i915) < 6)
> > +		return -ENODEV;
> > +
> > +	query_ptr = u64_to_user_ptr(query_item->data_ptr);
> > +	ret = copy_query_item(&query, sizeof(query), sizeof(query), query_item);
> > +	if (ret != 0)
> > +		return ret;
> > +
> > +	if (query.flags)
> > +		return -EINVAL;
> > +
> > +	if (query.rsvd)
> > +		return -EINVAL;
> > +
> > +	cpu_clock = __clock_id_to_func(query.clockid);
> > +	if (!cpu_clock)
> > +		return -EINVAL;
> > +
> > +	engine = intel_engine_lookup_user(i915,
> > +					  query.engine.engine_class,
> > +					  query.engine.engine_instance);
> > +	if (!engine)
> > +		return -EINVAL;
> > +
> > +	if (GRAPHICS_VER(i915) == 6 &&
> > +	    query.engine.engine_class != I915_ENGINE_CLASS_RENDER)
> > +		return -ENODEV;
> > +
> > +	query.cs_frequency = engine->gt->clock_frequency;
> > +	ret = __query_cs_cycles(engine,
> > +				&query.cs_cycles,
> > +				query.cpu_timestamp,
> > +				cpu_clock);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (put_user(query.cs_frequency, &query_ptr->cs_frequency))
> > +		return -EFAULT;
> > +
> > +	if (put_user(query.cpu_timestamp[0], &query_ptr->cpu_timestamp[0]))
> > +		return -EFAULT;
> > +
> > +	if (put_user(query.cpu_timestamp[1], &query_ptr->cpu_timestamp[1]))
> > +		return -EFAULT;
> > +
> > +	if (put_user(query.cs_cycles, &query_ptr->cs_cycles))
> > +		return -EFAULT;
> > +
> > +	return sizeof(query);
> > +}
> > +
> >  static int
> >  query_engine_info(struct drm_i915_private *i915,
> >  		  struct drm_i915_query_item *query_item)
> > @@ -424,6 +568,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
> >  	query_topology_info,
> >  	query_engine_info,
> >  	query_perf_config,
> > +	query_cs_cycles,
> >  };
> >  
> >  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 6a34243a7646..08b00f1709b5 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -2230,6 +2230,10 @@ struct drm_i915_query_item {
> >  #define DRM_I915_QUERY_TOPOLOGY_INFO    1
> >  #define DRM_I915_QUERY_ENGINE_INFO	2
> >  #define DRM_I915_QUERY_PERF_CONFIG      3
> > +	/**
> > +	 * Query Command Streamer timestamp register.
> > +	 */
> > +#define DRM_I915_QUERY_CS_CYCLES	4
> >  /* Must be kept compact -- no holes and well documented */
> >  
> >  	/**
> > @@ -2397,6 +2401,50 @@ struct drm_i915_engine_info {
> >  	__u64 rsvd1[4];
> >  };
> >  
> > +/**
> > + * struct drm_i915_query_cs_cycles
> > + *
> > + * The query returns the command streamer cycles and the frequency that can be
> > + * used to calculate the command streamer timestamp. In addition the query
> > + * returns a set of cpu timestamps that indicate when the command streamer cycle
> > + * count was captured.
> > + */
> > +struct drm_i915_query_cs_cycles {
> > +	/** Engine for which command streamer cycles is queried. */
> > +	struct i915_engine_class_instance engine;
> > +
> > +	/** Must be zero. */
> > +	__u32 flags;
> > +
> > +	/**
> > +	 * Command streamer cycles as read from the command streamer
> > +	 * register at 0x358 offset.
> > +	 */
> > +	__u64 cs_cycles;
> > +
> > +	/** Frequency of the cs cycles in Hz. */
> > +	__u64 cs_frequency;
> > +
> > +	/**
> > +	 * CPU timestamps in ns. cpu_timestamp[0] is captured before reading the
> > +	 * cs_cycles register using the reference clockid set by the user.
> > +	 * cpu_timestamp[1] is the time taken in ns to read the lower dword of
> > +	 * the cs_cycles register.
> > +	 */
> > +	__u64 cpu_timestamp[2];
> > +
> > +	/**
> > +	 * Reference clock id for CPU timestamp. For definition, see
> > +	 * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> > +	 * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> > +	 * CLOCK_TAI.
> > +	 */
> > +	__s32 clockid;
> > +
> > +	/** Must be zero. */
> > +	__u32 rsvd;
> > +};
> > +
> >  /**
> >   * struct drm_i915_query_engine_info
> >   *
> 
> -- 
> Jani Nikula, Intel Open Source Graphics Center

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Add support for querying engine cycles
  2021-04-29  0:34 [PATCH 0/1] " Umesh Nerlige Ramappa
@ 2021-04-29  1:59 ` Patchwork
  0 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2021-04-29  1:59 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2830 bytes --]

== Series Details ==

Series: Add support for querying engine cycles
URL   : https://patchwork.freedesktop.org/series/89615/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10023 -> Patchwork_20025
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20025/index.html

Known issues
------------

  Here are the changes found in Patchwork_20025 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_pm_rpm@module-reload:
    - fi-kbl-7500u:       [PASS][1] -> [DMESG-WARN][2] ([i915#2605])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10023/fi-kbl-7500u/igt@i915_pm_rpm@module-reload.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20025/fi-kbl-7500u/igt@i915_pm_rpm@module-reload.html

  * igt@runner@aborted:
    - fi-bdw-5557u:       NOTRUN -> [FAIL][3] ([i915#1602] / [i915#2029])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20025/fi-bdw-5557u/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@kms_frontbuffer_tracking@basic:
    - {fi-rkl-11500t}:    [SKIP][4] ([i915#1849] / [i915#3180]) -> [PASS][5]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10023/fi-rkl-11500t/igt@kms_frontbuffer_tracking@basic.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20025/fi-rkl-11500t/igt@kms_frontbuffer_tracking@basic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#1602]: https://gitlab.freedesktop.org/drm/intel/issues/1602
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#2029]: https://gitlab.freedesktop.org/drm/intel/issues/2029
  [i915#2605]: https://gitlab.freedesktop.org/drm/intel/issues/2605
  [i915#3180]: https://gitlab.freedesktop.org/drm/intel/issues/3180


Participating hosts (44 -> 40)
------------------------------

  Missing    (4): fi-ilk-m540 fi-bsw-cyan fi-bdw-samus fi-hsw-4200u 


Build changes
-------------

  * IGT: IGT_6076 -> IGTPW_5769
  * Linux: CI_DRM_10023 -> Patchwork_20025

  CI-20190529: 20190529
  CI_DRM_10023: a8bf9e284933fa5c1cb821b48ba95821e5d1cc3f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_5769: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_5769/index.html
  IGT_6076: 9ab0820dbd07781161c1ace6973ea222fd24e53a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_20025: 2ad9b6fced1ef0ba95279c3b6c9891829ce37694 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

2ad9b6fced1e i915/query: Correlate engine and cpu timestamps with better accuracy

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20025/index.html

[-- Attachment #1.2: Type: text/html, Size: 3503 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Add support for querying engine cycles
  2021-04-27 21:53 [Intel-gfx] [PATCH 0/1] " Umesh Nerlige Ramappa
@ 2021-04-28  0:22 ` Patchwork
  0 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2021-04-28  0:22 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 3249 bytes --]

== Series Details ==

Series: Add support for querying engine cycles
URL   : https://patchwork.freedesktop.org/series/89561/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10019 -> Patchwork_20009
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20009/index.html

Known issues
------------

  Here are the changes found in Patchwork_20009 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_cs_nop@sync-fork-compute0:
    - fi-snb-2600:        NOTRUN -> [SKIP][1] ([fdo#109271]) +17 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20009/fi-snb-2600/igt@amdgpu/amd_cs_nop@sync-fork-compute0.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@hangcheck:
    - fi-snb-2600:        [INCOMPLETE][2] ([i915#2782]) -> [PASS][3]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10019/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20009/fi-snb-2600/igt@i915_selftest@live@hangcheck.html

  * igt@kms_frontbuffer_tracking@basic:
    - {fi-rkl-11500t}:    [SKIP][4] ([i915#1849] / [i915#3180]) -> [PASS][5]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10019/fi-rkl-11500t/igt@kms_frontbuffer_tracking@basic.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20009/fi-rkl-11500t/igt@kms_frontbuffer_tracking@basic.html

  
#### Warnings ####

  * igt@kms_chamelium@vga-edid-read:
    - fi-icl-u2:          [SKIP][6] -> [SKIP][7] ([fdo#109309]) +1 similar issue
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10019/fi-icl-u2/igt@kms_chamelium@vga-edid-read.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20009/fi-icl-u2/igt@kms_chamelium@vga-edid-read.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109309]: https://bugs.freedesktop.org/show_bug.cgi?id=109309
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#2782]: https://gitlab.freedesktop.org/drm/intel/issues/2782
  [i915#3180]: https://gitlab.freedesktop.org/drm/intel/issues/3180


Participating hosts (45 -> 41)
------------------------------

  Missing    (4): fi-ilk-m540 fi-bsw-cyan fi-bdw-samus fi-hsw-4200u 


Build changes
-------------

  * IGT: IGT_6076 -> IGTPW_5757
  * Linux: CI_DRM_10019 -> Patchwork_20009

  CI-20190529: 20190529
  CI_DRM_10019: acf28153df39c6dab44a8691ecaad05f1f37ed46 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_5757: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_5757/index.html
  IGT_6076: 9ab0820dbd07781161c1ace6973ea222fd24e53a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_20009: 99d2530ba90b8230fa31735614b07a1c27f1117d @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

99d2530ba90b i915/query: Correlate engine and cpu timestamps with better accuracy

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20009/index.html

[-- Attachment #1.2: Type: text/html, Size: 3949 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Add support for querying engine cycles
  2021-04-21 17:28 [Intel-gfx] [PATCH 0/1] " Umesh Nerlige Ramappa
@ 2021-04-21 18:50 ` Patchwork
  0 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2021-04-21 18:50 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2465 bytes --]

== Series Details ==

Series: Add support for querying engine cycles
URL   : https://patchwork.freedesktop.org/series/89314/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_9993 -> Patchwork_19966
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19966/index.html

Known issues
------------

  Here are the changes found in Patchwork_19966 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-tgl-u2:          [PASS][1] -> [FAIL][2] ([i915#1888])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9993/fi-tgl-u2/igt@gem_exec_suspend@basic-s3.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19966/fi-tgl-u2/igt@gem_exec_suspend@basic-s3.html

  
#### Possible fixes ####

  * igt@kms_frontbuffer_tracking@basic:
    - {fi-rkl-11500t}:    [SKIP][3] ([i915#1849] / [i915#3180]) -> [PASS][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9993/fi-rkl-11500t/igt@kms_frontbuffer_tracking@basic.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19966/fi-rkl-11500t/igt@kms_frontbuffer_tracking@basic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#3180]: https://gitlab.freedesktop.org/drm/intel/issues/3180


Participating hosts (42 -> 39)
------------------------------

  Missing    (3): fi-kbl-soraka fi-bsw-cyan fi-bdw-samus 


Build changes
-------------

  * IGT: IGT_6072 -> IGTPW_5757
  * Linux: CI_DRM_9993 -> Patchwork_19966

  CI-20190529: 20190529
  CI_DRM_9993: 629d3809e6d926c77ba5e9c5405e64eeba564560 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_5757: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_5757/index.html
  IGT_6072: 0a51f49df9f5ca535fc0206a27a6780de6b52320 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_19966: ca44314a5a5d70c39de70db3f333a2228809e1d4 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

ca44314a5a5d i915/query: Correlate engine and cpu timestamps with better accuracy

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19966/index.html

[-- Attachment #1.2: Type: text/html, Size: 3104 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-04-29 11:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 21:49 [Intel-gfx] [PATCH 0/1] Add support for querying engine cycles Umesh Nerlige Ramappa
2021-04-27 21:49 ` [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy Umesh Nerlige Ramappa
2021-04-28  8:43   ` Jani Nikula
2021-04-28  8:43     ` [Intel-gfx] " Jani Nikula
2021-04-28 19:24     ` Jason Ekstrand
2021-04-28 19:24       ` [Intel-gfx] " Jason Ekstrand
2021-04-28 19:49       ` Lionel Landwerlin
2021-04-28 19:49         ` [Intel-gfx] " Lionel Landwerlin
2021-04-28 19:54         ` Jason Ekstrand
2021-04-28 19:54           ` [Intel-gfx] " Jason Ekstrand
2021-04-28 20:14           ` Lionel Landwerlin
2021-04-28 20:14             ` [Intel-gfx] " Lionel Landwerlin
2021-04-28 20:16             ` Lionel Landwerlin
2021-04-28 20:16               ` [Intel-gfx] " Lionel Landwerlin
2021-04-28 20:45             ` Jason Ekstrand
2021-04-28 20:45               ` [Intel-gfx] " Jason Ekstrand
2021-04-28 21:18               ` Lionel Landwerlin
2021-04-28 21:18                 ` [Intel-gfx] " Lionel Landwerlin
2021-04-29 11:15     ` Daniel Vetter
2021-04-29 11:15       ` [Intel-gfx] " Daniel Vetter
2021-04-27 22:16 ` [Intel-gfx] ✗ Fi.CI.DOCS: warning for Add support for querying engine cycles Patchwork
2021-04-27 22:41 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-04-28  3:31 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2021-04-29  0:34 [PATCH 0/1] " Umesh Nerlige Ramappa
2021-04-29  1:59 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2021-04-27 21:53 [Intel-gfx] [PATCH 0/1] " Umesh Nerlige Ramappa
2021-04-28  0:22 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2021-04-21 17:28 [Intel-gfx] [PATCH 0/1] " Umesh Nerlige Ramappa
2021-04-21 18:50 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.