All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 00/19] Add DG2 OA support
@ 2022-08-23 20:41 Umesh Nerlige Ramappa
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
                   ` (22 more replies)
  0 siblings, 23 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

Add OA format support for DG2 and various fixes for DG2.
The below 2 patches have uapi changes:

drm/i915/perf: Add OA formats for DG2
drm/i915/perf: Apply Wa_18013179988

v2:
- Drop inline (Jani)
- Repost as some patches did not make it to the ML
- Update Test-with id

Test-with: 20220823183036.5270-1-umesh.nerlige.ramappa@intel.com
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Umesh Nerlige Ramappa (18):
  drm/i915/perf: Fix OA filtering logic for GuC mode
  drm/i915/perf: Add OA formats for DG2
  drm/i915/perf: Fix noa wait predication for DG2
  drm/i915/perf: Determine gen12 oa ctx offset at runtime
  drm/i915/perf: Enable commands per clock reporting in OA
  drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size
  drm/i915/perf: Simply use stream->ctx
  drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
  drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
  drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
  drm/i915/perf: Store a pointer to oa_format in oa_buffer
  drm/i915/perf: Parse 64bit report header formats correctly
  drm/i915/perf: Add Wa_16010703925:dg2
  drm/i915/perf: Add Wa_1608133521:dg2
  drm/i915/perf: Add Wa_1508761755:dg2
  drm/i915/perf: Apply Wa_18013179988
  drm/i915/perf: Save/restore EU flex counters across reset
  drm/i915/perf: Enable OA for DG2

Vinay Belgaumkar (1):
  drm/i915/guc: Support OA when Wa_16011777198 is enabled

 drivers/gpu/drm/i915/gt/intel_engine_regs.h   |   1 +
 drivers/gpu/drm/i915/gt/intel_gt_regs.h       |   1 +
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |   3 +
 drivers/gpu/drm/i915/gt/intel_lrc.h           |   2 +
 drivers/gpu/drm/i915/gt/intel_sseu.c          |   4 +-
 .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h |   9 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |   8 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   |  45 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |   2 +
 drivers/gpu/drm/i915/i915_drv.h               |   3 +
 drivers/gpu/drm/i915/i915_getparam.c          |   3 +
 drivers/gpu/drm/i915/i915_pci.c               |   1 +
 drivers/gpu/drm/i915/i915_perf.c              | 760 ++++++++++++++----
 drivers/gpu/drm/i915/i915_perf.h              |   2 +
 drivers/gpu/drm/i915/i915_perf_oa_regs.h      |   6 +-
 drivers/gpu/drm/i915/i915_perf_types.h        |  53 +-
 drivers/gpu/drm/i915/intel_device_info.h      |   1 +
 drivers/gpu/drm/i915/selftests/i915_perf.c    |  16 +-
 include/uapi/drm/i915_drm.h                   |  12 +
 19 files changed, 737 insertions(+), 195 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-06 14:33   ` Lionel Landwerlin
  2022-09-09 23:47   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2 Umesh Nerlige Ramappa
                   ` (21 subsequent siblings)
  22 siblings, 2 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

With GuC mode of submission, GuC is in control of defining the context id field
that is part of the OA reports. To filter reports, UMD and KMD must know what sw
context id was chosen by GuC. There is not interface between KMD and GuC to
determine this, so read the upper-dword of EXECLIST_STATUS to filter/squash OA
reports for the specific context.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.h |   2 +
 drivers/gpu/drm/i915/i915_perf.c    | 141 ++++++++++++++++++++++++----
 2 files changed, 124 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
index a390f0813c8b..7111bae759f3 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -110,6 +110,8 @@ enum {
 #define XEHP_SW_CTX_ID_WIDTH			16
 #define XEHP_SW_COUNTER_SHIFT			58
 #define XEHP_SW_COUNTER_WIDTH			6
+#define GEN12_GUC_SW_CTX_ID_SHIFT		39
+#define GEN12_GUC_SW_CTX_ID_WIDTH		16
 
 static inline void lrc_runtime_start(struct intel_context *ce)
 {
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index f3c23fe9ad9c..735244a3aedd 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1233,6 +1233,125 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
 	return stream->pinned_ctx;
 }
 
+static int
+__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset)
+{
+	u32 *cs, cmd;
+
+	cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
+	if (GRAPHICS_VER(rq->engine->i915) >= 8)
+		cmd++;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = cmd;
+	*cs++ = i915_mmio_reg_offset(reg);
+	*cs++ = ggtt_offset;
+	*cs++ = 0;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+static int
+__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset)
+{
+	struct i915_request *rq;
+	int err;
+
+	rq = i915_request_create(ce);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	i915_request_get(rq);
+
+	err = __store_reg_to_mem(rq, reg, ggtt_offset);
+
+	i915_request_add(rq);
+	if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
+		err = -ETIME;
+
+	i915_request_put(rq);
+
+	return err;
+}
+
+static int
+gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
+{
+	struct i915_vma *scratch;
+	u32 *val;
+	int err;
+
+	scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4);
+	if (IS_ERR(scratch))
+		return PTR_ERR(scratch);
+
+	err = i915_vma_sync(scratch);
+	if (err)
+		goto err_scratch;
+
+	err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
+			 i915_ggtt_offset(scratch));
+	if (err)
+		goto err_scratch;
+
+	val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB);
+	if (IS_ERR(val)) {
+		err = PTR_ERR(val);
+		goto err_scratch;
+	}
+
+	*ctx_id = *val;
+	i915_gem_object_unpin_map(scratch->obj);
+
+err_scratch:
+	i915_vma_unpin_and_release(&scratch, 0);
+	return err;
+}
+
+/*
+ * For execlist mode of submission, pick an unused context id
+ * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
+ * XXX_MAX_CONTEXT_HW_ID is used by idle context
+ *
+ * For GuC mode of submission read context id from the upper dword of the
+ * EXECLIST_STATUS register.
+ */
+static int gen12_get_render_context_id(struct i915_perf_stream *stream)
+{
+	u32 ctx_id, mask;
+	int ret;
+
+	if (intel_engine_uses_guc(stream->engine)) {
+		ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
+		if (ret)
+			return ret;
+
+		mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
+			(GEN12_GUC_SW_CTX_ID_SHIFT - 32);
+	} else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) {
+		ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
+			(XEHP_SW_CTX_ID_SHIFT - 32);
+
+		mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
+			(XEHP_SW_CTX_ID_SHIFT - 32);
+	} else {
+		ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
+			 (GEN11_SW_CTX_ID_SHIFT - 32);
+
+		mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
+			(GEN11_SW_CTX_ID_SHIFT - 32);
+	}
+	stream->specific_ctx_id = ctx_id & mask;
+	stream->specific_ctx_id_mask = mask;
+
+	return 0;
+}
+
 /**
  * oa_get_render_ctx_id - determine and hold ctx hw id
  * @stream: An i915-perf stream opened for OA metrics
@@ -1246,6 +1365,7 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
 static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 {
 	struct intel_context *ce;
+	int ret = 0;
 
 	ce = oa_pin_context(stream);
 	if (IS_ERR(ce))
@@ -1292,24 +1412,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 
 	case 11:
 	case 12:
-		if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 50)) {
-			stream->specific_ctx_id_mask =
-				((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
-				(XEHP_SW_CTX_ID_SHIFT - 32);
-			stream->specific_ctx_id =
-				(XEHP_MAX_CONTEXT_HW_ID - 1) <<
-				(XEHP_SW_CTX_ID_SHIFT - 32);
-		} else {
-			stream->specific_ctx_id_mask =
-				((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
-			/*
-			 * Pick an unused context id
-			 * 0 - BITS_PER_LONG are used by other contexts
-			 * GEN12_MAX_CONTEXT_HW_ID (0x7ff) is used by idle context
-			 */
-			stream->specific_ctx_id =
-				(GEN12_MAX_CONTEXT_HW_ID - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
-		}
+		ret = gen12_get_render_context_id(stream);
 		break;
 
 	default:
@@ -1323,7 +1426,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 		stream->specific_ctx_id,
 		stream->specific_ctx_id_mask);
 
-	return 0;
+	return ret;
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-06 19:35   ` Lionel Landwerlin
  2022-09-13 15:40   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 03/19] drm/i915/perf: Fix noa wait predication " Umesh Nerlige Ramappa
                   ` (20 subsequent siblings)
  22 siblings, 2 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

Add new OA formats for DG2. Some of the newer OA formats are not
multples of 64 bytes and are not powers of 2. For those formats, adjust
hw_tail accordingly when checking for new reports.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramampa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 63 ++++++++++++++++++++------------
 include/uapi/drm/i915_drm.h      |  6 +++
 2 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 735244a3aedd..c8331b549d31 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -306,7 +306,8 @@ static u32 i915_oa_max_sample_rate = 100000;
 
 /* XXX: beware if future OA HW adds new report formats that the current
  * code assumes all reports have a power-of-two size and ~(size - 1) can
- * be used as a mask to align the OA tail pointer.
+ * be used as a mask to align the OA tail pointer. In some of the
+ * formats, R is used to denote reserved field.
  */
 static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A13]	    = { 0, 64 },
@@ -320,6 +321,10 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A12]		    = { 0, 64 },
 	[I915_OA_FORMAT_A12_B8_C8]	    = { 2, 128 },
 	[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
+	[I915_OAR_FORMAT_A32u40_A4u32_B8_C8]    = { 5, 256 },
+	[I915_OA_FORMAT_A24u40_A14u32_B8_C8]    = { 5, 256 },
+	[I915_OAR_FORMAT_A36u64_B8_C8]		= { 1, 384 },
+	[I915_OA_FORMAT_A38u64_R2u64_B8_C8]	= { 1, 448 },
 };
 
 #define SAMPLE_OA_REPORT      (1<<0)
@@ -467,6 +472,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	bool pollin;
 	u32 hw_tail;
 	u64 now;
+	u32 partial_report_size;
 
 	/* We have to consider the (unlikely) possibility that read() errors
 	 * could result in an OA buffer reset which might reset the head and
@@ -476,10 +482,16 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 
 	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
 
-	/* The tail pointer increases in 64 byte increments,
-	 * not in report_size steps...
+	/* The tail pointer increases in 64 byte increments, whereas report
+	 * sizes need not be integral multiples or 64 or powers of 2.
+	 * Compute potentially partially landed report in the OA buffer
 	 */
-	hw_tail &= ~(report_size - 1);
+	partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail);
+	partial_report_size %= report_size;
+
+	/* Subtract partial amount off the tail */
+	hw_tail = gtt_offset + ((hw_tail - partial_report_size) &
+				(stream->oa_buffer.vma->size - 1));
 
 	now = ktime_get_mono_fast_ns();
 
@@ -601,6 +613,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 {
 	int report_size = stream->oa_buffer.format_size;
 	struct drm_i915_perf_record_header header;
+	int report_size_partial;
+	u8 *oa_buf_end;
 
 	header.type = DRM_I915_PERF_RECORD_SAMPLE;
 	header.pad = 0;
@@ -614,7 +628,19 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 		return -EFAULT;
 	buf += sizeof(header);
 
-	if (copy_to_user(buf, report, report_size))
+	oa_buf_end = stream->oa_buffer.vaddr +
+		     stream->oa_buffer.vma->size;
+	report_size_partial = oa_buf_end - report;
+
+	if (report_size_partial < report_size) {
+		if(copy_to_user(buf, report, report_size_partial))
+			return -EFAULT;
+		buf += report_size_partial;
+
+		if(copy_to_user(buf, stream->oa_buffer.vaddr,
+				report_size - report_size_partial))
+			return -EFAULT;
+	} else if (copy_to_user(buf, report, report_size))
 		return -EFAULT;
 
 	(*offset) += header.size;
@@ -684,8 +710,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	 * all a power of two).
 	 */
 	if (drm_WARN_ONCE(&uncore->i915->drm,
-			  head > OA_BUFFER_SIZE || head % report_size ||
-			  tail > OA_BUFFER_SIZE || tail % report_size,
+			  head > stream->oa_buffer.vma->size ||
+			  tail > stream->oa_buffer.vma->size,
 			  "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
 			  head, tail))
 		return -EIO;
@@ -699,22 +725,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		u32 ctx_id;
 		u32 reason;
 
-		/*
-		 * All the report sizes factor neatly into the buffer
-		 * size so we never expect to see a report split
-		 * between the beginning and end of the buffer.
-		 *
-		 * Given the initial alignment check a misalignment
-		 * here would imply a driver bug that would result
-		 * in an overrun.
-		 */
-		if (drm_WARN_ON(&uncore->i915->drm,
-				(OA_BUFFER_SIZE - head) < report_size)) {
-			drm_err(&uncore->i915->drm,
-				"Spurious OA head ptr: non-integral report offset\n");
-			break;
-		}
-
 		/*
 		 * The reason field includes flags identifying what
 		 * triggered this specific report (mostly timer
@@ -4513,6 +4523,13 @@ static void oa_init_supported_formats(struct i915_perf *perf)
 		oa_format_add(perf, I915_OA_FORMAT_C4_B8);
 		break;
 
+	case INTEL_DG2:
+		oa_format_add(perf, I915_OAR_FORMAT_A32u40_A4u32_B8_C8);
+		oa_format_add(perf, I915_OA_FORMAT_A24u40_A14u32_B8_C8);
+		oa_format_add(perf, I915_OAR_FORMAT_A36u64_B8_C8);
+		oa_format_add(perf, I915_OA_FORMAT_A38u64_R2u64_B8_C8);
+		break;
+
 	default:
 		MISSING_CASE(platform);
 	}
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 520ad2691a99..d20d723925b5 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2650,6 +2650,12 @@ enum drm_i915_oa_format {
 	I915_OA_FORMAT_A12_B8_C8,
 	I915_OA_FORMAT_A32u40_A4u32_B8_C8,
 
+	/* DG2 */
+	I915_OAR_FORMAT_A32u40_A4u32_B8_C8,
+	I915_OA_FORMAT_A24u40_A14u32_B8_C8,
+	I915_OAR_FORMAT_A36u64_B8_C8,
+	I915_OA_FORMAT_A38u64_R2u64_B8_C8,
+
 	I915_OA_FORMAT_MAX	    /* non-ABI */
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 03/19] drm/i915/perf: Fix noa wait predication for DG2
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2 Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-20  0:35   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

Predication for batch buffer commands changed in XEHPSDV.
MI_BATCH_BUFFER_START predicates based on MI_SET_PREDICATE_RESULT
register. The MI_SET_PREDICATE_RESULT register can only be modified
with MI_SET_PREDICATE command. When configured, the MI_SET_PREDICATE
command sets MI_SET_PREDICATE_RESULT based on bit 0 of
MI_PREDICATE_RESULT_2. Use this to configure predication in noa_wait.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_regs.h |  1 +
 drivers/gpu/drm/i915/i915_perf.c            | 24 +++++++++++++++++----
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_regs.h b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
index 889f0df3940b..25d23f3a4769 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
@@ -200,6 +200,7 @@
 #define RING_CONTEXT_STATUS_PTR(base)		_MMIO((base) + 0x3a0)
 #define RING_CTX_TIMESTAMP(base)		_MMIO((base) + 0x3a8) /* gen8+ */
 #define RING_PREDICATE_RESULT(base)		_MMIO((base) + 0x3b8)
+#define MI_PREDICATE_RESULT_2_ENGINE(base)	_MMIO((base) + 0x3bc)
 #define RING_FORCE_TO_NONPRIV(base, i)		_MMIO(((base) + 0x4D0) + (i) * 4)
 #define   RING_FORCE_TO_NONPRIV_DENY		REG_BIT(30)
 #define   RING_FORCE_TO_NONPRIV_ADDRESS_MASK	REG_GENMASK(25, 2)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c8331b549d31..3526693d64fa 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -286,6 +286,7 @@ static u32 i915_perf_stream_paranoid = true;
 #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
 #define OAREPORT_REASON_CLK_RATIO      (1<<5)
 
+#define HAS_MI_SET_PREDICATE(i915) (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
 
 /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
  *
@@ -1766,6 +1767,9 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 		DELTA_TARGET,
 		N_CS_GPR
 	};
+	i915_reg_t mi_predicate_result = HAS_MI_SET_PREDICATE(i915) ?
+					  MI_PREDICATE_RESULT_2_ENGINE(base) :
+					  MI_PREDICATE_RESULT_1(RENDER_RING_BASE);
 
 	bo = i915_gem_object_create_internal(i915, 4096);
 	if (IS_ERR(bo)) {
@@ -1803,7 +1807,7 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 			stream, cs, true /* save */, CS_GPR(i),
 			INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
 	cs = save_restore_register(
-		stream, cs, true /* save */, MI_PREDICATE_RESULT_1(RENDER_RING_BASE),
+		stream, cs, true /* save */, mi_predicate_result,
 		INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
 
 	/* First timestamp snapshot location. */
@@ -1857,7 +1861,10 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 	 */
 	*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
 	*cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE));
-	*cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1(RENDER_RING_BASE));
+	*cs++ = i915_mmio_reg_offset(mi_predicate_result);
+
+	if (HAS_MI_SET_PREDICATE(i915))
+		*cs++ = MI_SET_PREDICATE | 1;
 
 	/* Restart from the beginning if we had timestamps roll over. */
 	*cs++ = (GRAPHICS_VER(i915) < 8 ?
@@ -1867,6 +1874,9 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 	*cs++ = i915_ggtt_offset(vma) + (ts0 - batch) * 4;
 	*cs++ = 0;
 
+	if (HAS_MI_SET_PREDICATE(i915))
+		*cs++ = MI_SET_PREDICATE;
+
 	/*
 	 * Now add the diff between to previous timestamps and add it to :
 	 *      (((1 * << 64) - 1) - delay_ns)
@@ -1894,7 +1904,10 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 	 */
 	*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
 	*cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE));
-	*cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1(RENDER_RING_BASE));
+	*cs++ = i915_mmio_reg_offset(mi_predicate_result);
+
+	if (HAS_MI_SET_PREDICATE(i915))
+		*cs++ = MI_SET_PREDICATE | 1;
 
 	/* Predicate the jump.  */
 	*cs++ = (GRAPHICS_VER(i915) < 8 ?
@@ -1904,13 +1917,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 	*cs++ = i915_ggtt_offset(vma) + (jump - batch) * 4;
 	*cs++ = 0;
 
+	if (HAS_MI_SET_PREDICATE(i915))
+		*cs++ = MI_SET_PREDICATE;
+
 	/* Restore registers. */
 	for (i = 0; i < N_CS_GPR; i++)
 		cs = save_restore_register(
 			stream, cs, false /* restore */, CS_GPR(i),
 			INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
 	cs = save_restore_register(
-		stream, cs, false /* restore */, MI_PREDICATE_RESULT_1(RENDER_RING_BASE),
+		stream, cs, false /* restore */, mi_predicate_result,
 		INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
 
 	/* And return to the ring. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (2 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 03/19] drm/i915/perf: Fix noa wait predication " Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-06 19:48   ` Lionel Landwerlin
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 05/19] drm/i915/perf: Enable commands per clock reporting in OA Umesh Nerlige Ramappa
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

Some SKUs of same gen12 platform may have different oactxctrl
offsets. For gen12, determine oactxctrl offsets at runtime.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c         | 149 ++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_perf_oa_regs.h |   2 +-
 2 files changed, 120 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 3526693d64fa..efa7eda83edd 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1363,6 +1363,67 @@ static int gen12_get_render_context_id(struct i915_perf_stream *stream)
 	return 0;
 }
 
+#define MI_OPCODE(x) (((x) >> 23) & 0x3f)
+#define IS_MI_LRI_CMD(x) (MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 0)))
+#define MI_LRI_LEN(x) (((x) & 0xff) + 1)
+#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX)
+static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset)
+{
+	u32 idx = *offset;
+	u32 len = MI_LRI_LEN(state[idx]) + idx;
+
+	idx++;
+	for (; idx < len; idx += 2)
+		if (state[idx] == reg)
+			break;
+
+	*offset = idx;
+	return state[idx] == reg;
+}
+
+static u32 __context_image_offset(struct intel_context *ce, u32 reg)
+{
+	u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
+	u32 *state = ce->lrc_reg_state;
+
+	for (offset = 0; offset < len; ) {
+		if (IS_MI_LRI_CMD(state[offset])) {
+			if (__find_reg_in_lri(state, reg, &offset))
+				break;
+		} else {
+			offset++;
+		}
+	}
+
+	return offset < len ? offset : U32_MAX;
+}
+
+static int __set_oa_ctx_ctrl_offset(struct intel_context *ce)
+{
+	i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
+	struct i915_perf *perf = &ce->engine->i915->perf;
+	u32 saved_offset = perf->ctx_oactxctrl_offset;
+	u32 offset;
+
+	/* Do this only once. Failure is stored as offset of U32_MAX */
+	if (saved_offset)
+		return 0;
+
+	offset = __context_image_offset(ce, i915_mmio_reg_offset(reg));
+	perf->ctx_oactxctrl_offset = offset;
+
+	drm_dbg(&ce->engine->i915->drm,
+		"%s oa ctx control at 0x%08x dword offset\n",
+		ce->engine->name, offset);
+
+	return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
+}
+
+static bool engine_supports_mi_query(struct intel_engine_cs *engine)
+{
+	return engine->class == RENDER_CLASS;
+}
+
 /**
  * oa_get_render_ctx_id - determine and hold ctx hw id
  * @stream: An i915-perf stream opened for OA metrics
@@ -1382,6 +1443,17 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
+	if (engine_supports_mi_query(stream->engine)) {
+		ret = __set_oa_ctx_ctrl_offset(ce);
+		if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) {
+			intel_context_unpin(ce);
+			drm_err(&stream->perf->i915->drm,
+				"Enabling perf query failed for %s\n",
+				stream->engine->name);
+			return ret;
+		}
+	}
+
 	switch (GRAPHICS_VER(ce->engine->i915)) {
 	case 7: {
 		/*
@@ -2412,10 +2484,11 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
 	int err;
 	struct intel_context *ce = stream->pinned_ctx;
 	u32 format = stream->oa_buffer.format;
+	u32 offset = stream->perf->ctx_oactxctrl_offset;
 	struct flex regs_context[] = {
 		{
 			GEN8_OACTXCONTROL,
-			stream->perf->ctx_oactxctrl_offset + 1,
+			offset + 1,
 			active ? GEN8_OA_COUNTER_RESUME : 0,
 		},
 	};
@@ -2440,15 +2513,18 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
 		},
 	};
 
-	/* Modify the context image of pinned context with regs_context*/
-	err = intel_context_lock_pinned(ce);
-	if (err)
-		return err;
+	/* Modify the context image of pinned context with regs_context */
+	if (__valid_oactxctrl_offset(offset)) {
+		err = intel_context_lock_pinned(ce);
+		if (err)
+			return err;
 
-	err = gen8_modify_context(ce, regs_context, ARRAY_SIZE(regs_context));
-	intel_context_unlock_pinned(ce);
-	if (err)
-		return err;
+		err = gen8_modify_context(ce, regs_context,
+					  ARRAY_SIZE(regs_context));
+		intel_context_unlock_pinned(ce);
+		if (err)
+			return err;
+	}
 
 	/* Apply regs_lri using LRI with pinned context */
 	return gen8_modify_self(ce, regs_lri, ARRAY_SIZE(regs_lri), active);
@@ -2570,6 +2646,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
 			   const struct i915_oa_config *oa_config,
 			   struct i915_active *active)
 {
+	u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset;
 	/* The MMIO offsets for Flex EU registers aren't contiguous */
 	const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset;
 #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1)
@@ -2580,7 +2657,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
 		},
 		{
 			GEN8_OACTXCONTROL,
-			stream->perf->ctx_oactxctrl_offset + 1,
+			ctx_oactxctrl + 1,
 		},
 		{ EU_PERF_CNTL0, ctx_flexeuN(0) },
 		{ EU_PERF_CNTL1, ctx_flexeuN(1) },
@@ -4551,6 +4628,37 @@ static void oa_init_supported_formats(struct i915_perf *perf)
 	}
 }
 
+static void i915_perf_init_info(struct drm_i915_private *i915)
+{
+	struct i915_perf *perf = &i915->perf;
+
+	switch (GRAPHICS_VER(i915)) {
+	case 8:
+		perf->ctx_oactxctrl_offset = 0x120;
+		perf->ctx_flexeu0_offset = 0x2ce;
+		perf->gen8_valid_ctx_bit = BIT(25);
+		break;
+	case 9:
+		perf->ctx_oactxctrl_offset = 0x128;
+		perf->ctx_flexeu0_offset = 0x3de;
+		perf->gen8_valid_ctx_bit = BIT(16);
+		break;
+	case 11:
+		perf->ctx_oactxctrl_offset = 0x124;
+		perf->ctx_flexeu0_offset = 0x78e;
+		perf->gen8_valid_ctx_bit = BIT(16);
+		break;
+	case 12:
+		/*
+		 * Calculate offset at runtime in oa_pin_context for gen12 and
+		 * cache the value in perf->ctx_oactxctrl_offset.
+		 */
+		break;
+	default:
+		MISSING_CASE(GRAPHICS_VER(i915));
+	}
+}
+
 /**
  * i915_perf_init - initialize i915-perf state on module bind
  * @i915: i915 device instance
@@ -4589,6 +4697,7 @@ void i915_perf_init(struct drm_i915_private *i915)
 		 * execlist mode by default.
 		 */
 		perf->ops.read = gen8_oa_read;
+		i915_perf_init_info(i915);
 
 		if (IS_GRAPHICS_VER(i915, 8, 9)) {
 			perf->ops.is_valid_b_counter_reg =
@@ -4608,18 +4717,6 @@ void i915_perf_init(struct drm_i915_private *i915)
 			perf->ops.enable_metric_set = gen8_enable_metric_set;
 			perf->ops.disable_metric_set = gen8_disable_metric_set;
 			perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
-
-			if (GRAPHICS_VER(i915) == 8) {
-				perf->ctx_oactxctrl_offset = 0x120;
-				perf->ctx_flexeu0_offset = 0x2ce;
-
-				perf->gen8_valid_ctx_bit = BIT(25);
-			} else {
-				perf->ctx_oactxctrl_offset = 0x128;
-				perf->ctx_flexeu0_offset = 0x3de;
-
-				perf->gen8_valid_ctx_bit = BIT(16);
-			}
 		} else if (GRAPHICS_VER(i915) == 11) {
 			perf->ops.is_valid_b_counter_reg =
 				gen7_is_valid_b_counter_addr;
@@ -4633,11 +4730,6 @@ void i915_perf_init(struct drm_i915_private *i915)
 			perf->ops.enable_metric_set = gen8_enable_metric_set;
 			perf->ops.disable_metric_set = gen11_disable_metric_set;
 			perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
-
-			perf->ctx_oactxctrl_offset = 0x124;
-			perf->ctx_flexeu0_offset = 0x78e;
-
-			perf->gen8_valid_ctx_bit = BIT(16);
 		} else if (GRAPHICS_VER(i915) == 12) {
 			perf->ops.is_valid_b_counter_reg =
 				gen12_is_valid_b_counter_addr;
@@ -4651,9 +4743,6 @@ void i915_perf_init(struct drm_i915_private *i915)
 			perf->ops.enable_metric_set = gen12_enable_metric_set;
 			perf->ops.disable_metric_set = gen12_disable_metric_set;
 			perf->ops.oa_hw_tail_read = gen12_oa_hw_tail_read;
-
-			perf->ctx_flexeu0_offset = 0;
-			perf->ctx_oactxctrl_offset = 0x144;
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
index f31c9f13a9fc..0ef3562ff4aa 100644
--- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
+++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
@@ -97,7 +97,7 @@
 #define  GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT 1
 #define  GEN12_OAR_OACONTROL_COUNTER_ENABLE       (1 << 0)
 
-#define GEN12_OACTXCONTROL _MMIO(0x2360)
+#define GEN12_OACTXCONTROL(base) _MMIO((base) + 0x360)
 #define GEN12_OAR_OASTATUS _MMIO(0x2968)
 
 /* Gen12 OAG unit */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 05/19] drm/i915/perf: Enable commands per clock reporting in OA
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (3 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-06 19:51   ` Lionel Landwerlin
  2022-09-14  0:19   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 06/19] drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size Umesh Nerlige Ramappa
                   ` (17 subsequent siblings)
  22 siblings, 2 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

XEHPSDV and DG2 provide a way to configure bytes per clock vs commands
per clock reporting. Enable command per clock setting on enabling OA.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          |  3 +++
 drivers/gpu/drm/i915/i915_pci.c          |  1 +
 drivers/gpu/drm/i915/i915_perf.c         | 20 ++++++++++++++++++++
 drivers/gpu/drm/i915/i915_perf_oa_regs.h |  4 ++++
 drivers/gpu/drm/i915/intel_device_info.h |  1 +
 5 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b4733c5a01da..b2e8a44bd976 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1287,6 +1287,9 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_RUNTIME_PM(dev_priv) (INTEL_INFO(dev_priv)->has_runtime_pm)
 #define HAS_64BIT_RELOC(dev_priv) (INTEL_INFO(dev_priv)->has_64bit_reloc)
 
+#define HAS_OA_BPC_REPORTING(dev_priv) \
+	(INTEL_INFO(dev_priv)->has_oa_bpc_reporting)
+
 /*
  * Set this flag, when platform requires 64K GTT page sizes or larger for
  * device local memory access.
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index d8446bb25d5e..bd0b8502b91e 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1019,6 +1019,7 @@ static const struct intel_device_info adl_p_info = {
 	.has_logical_ring_contexts = 1, \
 	.has_logical_ring_elsq = 1, \
 	.has_mslice_steering = 1, \
+	.has_oa_bpc_reporting = 1, \
 	.has_rc6 = 1, \
 	.has_reset_engine = 1, \
 	.has_rps = 1, \
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index efa7eda83edd..6fc4f0d8fc5a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2745,10 +2745,12 @@ static int
 gen12_enable_metric_set(struct i915_perf_stream *stream,
 			struct i915_active *active)
 {
+	struct drm_i915_private *i915 = stream->perf->i915;
 	struct intel_uncore *uncore = stream->uncore;
 	struct i915_oa_config *oa_config = stream->oa_config;
 	bool periodic = stream->periodic;
 	u32 period_exponent = stream->period_exponent;
+	u32 sqcnt1;
 	int ret;
 
 	intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG,
@@ -2767,6 +2769,16 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
 			    (period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT))
 			    : 0);
 
+ 	/*
+ 	 * Initialize Super Queue Internal Cnt Register
+ 	 * Set PMON Enable in order to collect valid metrics.
+	 * Enable commands per clock reporting in OA for XEHPSDV onward.
+ 	 */
+	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
+		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
+
+	intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, sqcnt1);
+
 	/*
 	 * Update all contexts prior writing the mux configurations as we need
 	 * to make sure all slices/subslices are ON before writing to NOA
@@ -2816,6 +2828,8 @@ static void gen11_disable_metric_set(struct i915_perf_stream *stream)
 static void gen12_disable_metric_set(struct i915_perf_stream *stream)
 {
 	struct intel_uncore *uncore = stream->uncore;
+	struct drm_i915_private *i915 = stream->perf->i915;
+	u32 sqcnt1;
 
 	/* Reset all contexts' slices/subslices configurations. */
 	gen12_configure_all_contexts(stream, NULL, NULL);
@@ -2826,6 +2840,12 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
 
 	/* Make sure we disable noa to save power. */
 	intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0);
+
+	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
+		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
+
+ 	/* Reset PMON Enable to save power. */
+	intel_uncore_rmw(uncore, GEN12_SQCNT1, sqcnt1, 0);
 }
 
 static void gen7_oa_enable(struct i915_perf_stream *stream)
diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
index 0ef3562ff4aa..381d94101610 100644
--- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
+++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
@@ -134,4 +134,8 @@
 #define GDT_CHICKEN_BITS    _MMIO(0x9840)
 #define   GT_NOA_ENABLE	    0x00000080
 
+#define GEN12_SQCNT1				_MMIO(0x8718)
+#define   GEN12_SQCNT1_PMON_ENABLE		REG_BIT(30)
+#define   GEN12_SQCNT1_OABPC			REG_BIT(29)
+
 #endif /* __INTEL_PERF_OA_REGS__ */
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 23bf230aa104..fc2a0660426e 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -163,6 +163,7 @@ enum intel_ppgtt_type {
 	func(has_logical_ring_elsq); \
 	func(has_media_ratio_mode); \
 	func(has_mslice_steering); \
+	func(has_oa_bpc_reporting); \
 	func(has_one_eu_per_fuse_bit); \
 	func(has_pooled_eu); \
 	func(has_pxp); \
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 06/19] drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (4 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 05/19] drm/i915/perf: Enable commands per clock reporting in OA Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-14 16:04   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 07/19] drm/i915/perf: Simply use stream->ctx Umesh Nerlige Ramappa
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

DG2 has a new feature to supports OA buffer sizes up to 128Mb by
toggling a bit in OA_DEBUG. This would eventually be a user configurable
parameter. Use OA buffer vma size in all calculations with some helpers.

v2: Let compiler decide inline (Jani)

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 46 +++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 6fc4f0d8fc5a..bbf1c574f393 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -385,6 +385,21 @@ static struct ctl_table_header *sysctl_header;
 
 static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer);
 
+static u32 _oa_taken(struct i915_perf_stream * stream, u32 tail, u32 head)
+{
+	u32 size = stream->oa_buffer.vma->size;
+
+	return tail >= head ? tail - head : size - (head - tail);
+}
+
+static u32 _rewind_tail(struct i915_perf_stream * stream, u32 relative_hw_tail,
+			u32 rewind_delta)
+{
+	return rewind_delta > relative_hw_tail ?
+	       stream->oa_buffer.vma->size - (rewind_delta - relative_hw_tail) :
+	       relative_hw_tail - rewind_delta;
+}
+
 void i915_oa_config_release(struct kref *ref)
 {
 	struct i915_oa_config *oa_config =
@@ -487,12 +502,14 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	 * sizes need not be integral multiples or 64 or powers of 2.
 	 * Compute potentially partially landed report in the OA buffer
 	 */
-	partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail);
+	partial_report_size =
+		_oa_taken(stream, hw_tail, stream->oa_buffer.tail);
 	partial_report_size %= report_size;
 
 	/* Subtract partial amount off the tail */
-	hw_tail = gtt_offset + ((hw_tail - partial_report_size) &
-				(stream->oa_buffer.vma->size - 1));
+	hw_tail = gtt_offset + _rewind_tail(stream,
+					    hw_tail - gtt_offset,
+					    partial_report_size);
 
 	now = ktime_get_mono_fast_ns();
 
@@ -527,16 +544,16 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 		 * memory in the order they were written to.
 		 * If not : (╯°□°)╯︵ ┻━┻
 		 */
-		while (OA_TAKEN(tail, aged_tail) >= report_size) {
+		while (_oa_taken(stream, tail, aged_tail) >= report_size) {
 			u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail);
 
 			if (report32[0] != 0 || report32[1] != 0)
 				break;
 
-			tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
+			tail = _rewind_tail(stream, tail, report_size);
 		}
 
-		if (OA_TAKEN(hw_tail, tail) > report_size &&
+		if (_oa_taken(stream, hw_tail, tail) > report_size &&
 		    __ratelimit(&stream->perf->tail_pointer_race))
 			DRM_NOTE("unlanded report(s) head=0x%x "
 				 "tail=0x%x hw_tail=0x%x\n",
@@ -547,8 +564,9 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 		stream->oa_buffer.aging_timestamp = now;
 	}
 
-	pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
-			  stream->oa_buffer.head - gtt_offset) >= report_size;
+	pollin = _oa_taken(stream,
+			   stream->oa_buffer.tail,
+			   stream->oa_buffer.head) >= report_size;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
@@ -679,11 +697,9 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	int report_size = stream->oa_buffer.format_size;
 	u8 *oa_buf_base = stream->oa_buffer.vaddr;
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
-	u32 mask = (OA_BUFFER_SIZE - 1);
 	size_t start_offset = *offset;
 	unsigned long flags;
-	u32 head, tail;
-	u32 taken;
+	u32 head, tail, size;
 	int ret = 0;
 
 	if (drm_WARN_ON(&uncore->i915->drm, !stream->enabled))
@@ -693,6 +709,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 
 	head = stream->oa_buffer.head;
 	tail = stream->oa_buffer.tail;
+	size = stream->oa_buffer.vma->size;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
@@ -711,16 +728,15 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	 * all a power of two).
 	 */
 	if (drm_WARN_ONCE(&uncore->i915->drm,
-			  head > stream->oa_buffer.vma->size ||
-			  tail > stream->oa_buffer.vma->size,
+			  head > size || tail > size,
 			  "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
 			  head, tail))
 		return -EIO;
 
 
 	for (/* none */;
-	     (taken = OA_TAKEN(tail, head));
-	     head = (head + report_size) & mask) {
+	     _oa_taken(stream, tail, head);
+	     head = (head + report_size) % size) {
 		u8 *report = oa_buf_base + head;
 		u32 *report32 = (void *)report;
 		u32 ctx_id;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 07/19] drm/i915/perf: Simply use stream->ctx
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (5 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 06/19] drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-06 19:52   ` Lionel Landwerlin
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 08/19] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

Earlier code used exclusive_stream to check for user passed context.
Simplify this by accessing stream->ctx.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index bbf1c574f393..3e3bda147c48 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -801,7 +801,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		 * switches since it's not-uncommon for periodic samples to
 		 * identify a switch before any 'context switch' report.
 		 */
-		if (!stream->perf->exclusive_stream->ctx ||
+		if (!stream->ctx ||
 		    stream->specific_ctx_id == ctx_id ||
 		    stream->oa_buffer.last_ctx_id == stream->specific_ctx_id ||
 		    reason & OAREPORT_REASON_CTX_SWITCH) {
@@ -810,7 +810,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 			 * While filtering for a single context we avoid
 			 * leaking the IDs of other contexts.
 			 */
-			if (stream->perf->exclusive_stream->ctx &&
+			if (stream->ctx &&
 			    stream->specific_ctx_id != ctx_id) {
 				report32[2] = INVALID_CTX_ID;
 			}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 08/19] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (6 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 07/19] drm/i915/perf: Simply use stream->ctx Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-06 19:54   ` Lionel Landwerlin
  2022-09-14 18:20   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 09/19] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops Umesh Nerlige Ramappa
                   ` (14 subsequent siblings)
  22 siblings, 2 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

Make perf part of gt as the OAG buffer is specific to a gt. The refactor
eventually simplifies programming the right OA buffer and the right HW
registers when supporting multiple gts.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_types.h   |  3 +
 drivers/gpu/drm/i915/gt/intel_sseu.c       |  4 +-
 drivers/gpu/drm/i915/i915_perf.c           | 75 +++++++++++++---------
 drivers/gpu/drm/i915/i915_perf_types.h     | 39 +++++------
 drivers/gpu/drm/i915/selftests/i915_perf.c | 16 +++--
 5 files changed, 80 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 4d56f7d5a3be..3d079d206cec 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -20,6 +20,7 @@
 #include "intel_gsc.h"
 
 #include "i915_vma.h"
+#include "i915_perf_types.h"
 #include "intel_engine_types.h"
 #include "intel_gt_buffer_pool_types.h"
 #include "intel_hwconfig.h"
@@ -260,6 +261,8 @@ struct intel_gt {
 	/* sysfs defaults per gt */
 	struct gt_defaults defaults;
 	struct kobject *sysfs_defaults;
+
+	struct i915_perf_gt perf;
 };
 
 enum intel_gt_scratch_field {
diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c
index c6d3050604c8..fcaf3c58b554 100644
--- a/drivers/gpu/drm/i915/gt/intel_sseu.c
+++ b/drivers/gpu/drm/i915/gt/intel_sseu.c
@@ -678,8 +678,8 @@ u32 intel_sseu_make_rpcs(struct intel_gt *gt,
 	 * If i915/perf is active, we want a stable powergating configuration
 	 * on the system. Use the configuration pinned by i915/perf.
 	 */
-	if (i915->perf.exclusive_stream)
-		req_sseu = &i915->perf.sseu;
+	if (gt->perf.exclusive_stream)
+		req_sseu = &gt->perf.sseu;
 
 	slices = hweight8(req_sseu->slice_mask);
 	subslices = hweight8(req_sseu->subslice_mask);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 3e3bda147c48..5dccb3ffffc5 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1577,8 +1577,9 @@ free_noa_wait(struct i915_perf_stream *stream)
 static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 {
 	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 
-	BUG_ON(stream != perf->exclusive_stream);
+	BUG_ON(stream != gt->perf.exclusive_stream);
 
 	/*
 	 * Unset exclusive_stream first, it will be checked while disabling
@@ -1586,7 +1587,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 	 *
 	 * See i915_oa_init_reg_state() and lrc_configure_all_contexts()
 	 */
-	WRITE_ONCE(perf->exclusive_stream, NULL);
+	WRITE_ONCE(gt->perf.exclusive_stream, NULL);
 	perf->ops.disable_metric_set(stream);
 
 	free_oa_buffer(stream);
@@ -2579,10 +2580,11 @@ oa_configure_all_contexts(struct i915_perf_stream *stream,
 {
 	struct drm_i915_private *i915 = stream->perf->i915;
 	struct intel_engine_cs *engine;
+	struct intel_gt *gt = stream->engine->gt;
 	struct i915_gem_context *ctx, *cn;
 	int err;
 
-	lockdep_assert_held(&stream->perf->lock);
+	lockdep_assert_held(&gt->perf.lock);
 
 	/*
 	 * The OA register config is setup through the context image. This image
@@ -3103,6 +3105,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 {
 	struct drm_i915_private *i915 = stream->perf->i915;
 	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt;
 	int format_size;
 	int ret;
 
@@ -3111,6 +3114,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 			"OA engine not specified\n");
 		return -EINVAL;
 	}
+	gt = props->engine->gt;
 
 	/*
 	 * If the sysfs metrics/ directory wasn't registered for some
@@ -3141,7 +3145,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	 * counter reports and marshal to the appropriate client
 	 * we currently only allow exclusive access
 	 */
-	if (perf->exclusive_stream) {
+	if (gt->perf.exclusive_stream) {
 		drm_dbg(&stream->perf->i915->drm,
 			"OA unit already in use\n");
 		return -EBUSY;
@@ -3221,8 +3225,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 
 	stream->ops = &i915_oa_stream_ops;
 
-	perf->sseu = props->sseu;
-	WRITE_ONCE(perf->exclusive_stream, stream);
+	stream->engine->gt->perf.sseu = props->sseu;
+	WRITE_ONCE(gt->perf.exclusive_stream, stream);
 
 	ret = i915_perf_stream_enable_sync(stream);
 	if (ret) {
@@ -3244,7 +3248,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	return 0;
 
 err_enable:
-	WRITE_ONCE(perf->exclusive_stream, NULL);
+	WRITE_ONCE(gt->perf.exclusive_stream, NULL);
 	perf->ops.disable_metric_set(stream);
 
 	free_oa_buffer(stream);
@@ -3274,7 +3278,7 @@ void i915_oa_init_reg_state(const struct intel_context *ce,
 		return;
 
 	/* perf.exclusive_stream serialised by lrc_configure_all_contexts() */
-	stream = READ_ONCE(engine->i915->perf.exclusive_stream);
+	stream = READ_ONCE(engine->gt->perf.exclusive_stream);
 	if (stream && GRAPHICS_VER(stream->perf->i915) < 12)
 		gen8_update_reg_state_unlocked(ce, stream);
 }
@@ -3303,7 +3307,7 @@ static ssize_t i915_perf_read(struct file *file,
 			      loff_t *ppos)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 	size_t offset = 0;
 	int ret;
 
@@ -3327,14 +3331,14 @@ static ssize_t i915_perf_read(struct file *file,
 			if (ret)
 				return ret;
 
-			mutex_lock(&perf->lock);
+			mutex_lock(&gt->perf.lock);
 			ret = stream->ops->read(stream, buf, count, &offset);
-			mutex_unlock(&perf->lock);
+			mutex_unlock(&gt->perf.lock);
 		} while (!offset && !ret);
 	} else {
-		mutex_lock(&perf->lock);
+		mutex_lock(&gt->perf.lock);
 		ret = stream->ops->read(stream, buf, count, &offset);
-		mutex_unlock(&perf->lock);
+		mutex_unlock(&gt->perf.lock);
 	}
 
 	/* We allow the poll checking to sometimes report false positive EPOLLIN
@@ -3381,7 +3385,7 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
  * &i915_perf_stream_ops->poll_wait to call poll_wait() with a wait queue that
  * will be woken for new stream data.
  *
- * Note: The &perf->lock mutex has been taken to serialize
+ * Note: The &gt->perf.lock mutex has been taken to serialize
  * with any non-file-operation driver hooks.
  *
  * Returns: any poll events that are ready without sleeping
@@ -3422,12 +3426,12 @@ static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
 static __poll_t i915_perf_poll(struct file *file, poll_table *wait)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 	__poll_t ret;
 
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 	ret = i915_perf_poll_locked(stream, file, wait);
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 
 	return ret;
 }
@@ -3526,7 +3530,7 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
  * @cmd: the ioctl request
  * @arg: the ioctl data
  *
- * Note: The &perf->lock mutex has been taken to serialize
+ * Note: The &gt->perf.lock mutex has been taken to serialize
  * with any non-file-operation driver hooks.
  *
  * Returns: zero on success or a negative error code. Returns -EINVAL for
@@ -3566,12 +3570,12 @@ static long i915_perf_ioctl(struct file *file,
 			    unsigned long arg)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 	long ret;
 
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 	ret = i915_perf_ioctl_locked(stream, cmd, arg);
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 
 	return ret;
 }
@@ -3583,7 +3587,7 @@ static long i915_perf_ioctl(struct file *file,
  * Frees all resources associated with the given i915 perf @stream, disabling
  * any associated data capture in the process.
  *
- * Note: The &perf->lock mutex has been taken to serialize
+ * Note: The &gt->perf.lock mutex has been taken to serialize
  * with any non-file-operation driver hooks.
  */
 static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
@@ -3615,10 +3619,11 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 {
 	struct i915_perf_stream *stream = file->private_data;
 	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 	i915_perf_destroy_locked(stream);
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 
 	/* Release the reference the perf stream kept on the driver. */
 	drm_dev_put(&perf->i915->drm);
@@ -3651,7 +3656,7 @@ static const struct file_operations fops = {
  * See i915_perf_ioctl_open() for interface details.
  *
  * Implements further stream config validation and stream initialization on
- * behalf of i915_perf_open_ioctl() with the &perf->lock mutex
+ * behalf of i915_perf_open_ioctl() with the &gt->perf.lock mutex
  * taken to serialize with any non-file-operation driver hooks.
  *
  * Note: at this point the @props have only been validated in isolation and
@@ -4035,7 +4040,7 @@ static int read_properties_unlocked(struct i915_perf *perf,
  * mutex to avoid an awkward lockdep with mmap_lock.
  *
  * Most of the implementation details are handled by
- * i915_perf_open_ioctl_locked() after taking the &perf->lock
+ * i915_perf_open_ioctl_locked() after taking the &gt->perf.lock
  * mutex for serializing with any non-file-operation driver hooks.
  *
  * Return: A newly opened i915 Perf stream file descriptor or negative
@@ -4046,6 +4051,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 {
 	struct i915_perf *perf = &to_i915(dev)->perf;
 	struct drm_i915_perf_open_param *param = data;
+	struct intel_gt *gt;
 	struct perf_open_properties props;
 	u32 known_open_flags;
 	int ret;
@@ -4072,9 +4078,11 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	mutex_lock(&perf->lock);
+	gt = props.engine->gt;
+
+	mutex_lock(&gt->perf.lock);
 	ret = i915_perf_open_ioctl_locked(perf, param, &props, file);
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 
 	return ret;
 }
@@ -4090,6 +4098,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 void i915_perf_register(struct drm_i915_private *i915)
 {
 	struct i915_perf *perf = &i915->perf;
+	struct intel_gt *gt = to_gt(i915);
 
 	if (!perf->i915)
 		return;
@@ -4098,13 +4107,13 @@ void i915_perf_register(struct drm_i915_private *i915)
 	 * i915_perf_open_ioctl(); considering that we register after
 	 * being exposed to userspace.
 	 */
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 
 	perf->metrics_kobj =
 		kobject_create_and_add("metrics",
 				       &i915->drm.primary->kdev->kobj);
 
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 }
 
 /**
@@ -4783,7 +4792,11 @@ void i915_perf_init(struct drm_i915_private *i915)
 	}
 
 	if (perf->ops.enable_metric_set) {
-		mutex_init(&perf->lock);
+		struct intel_gt *gt;
+		int i;
+
+		for_each_gt(gt, i915, i)
+			mutex_init(&gt->perf.lock);
 
 		/* Choose a representative limit */
 		oa_sample_rate_hard_limit = to_gt(i915)->clock_frequency / 2;
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index 05cb9a335a97..e888bfab478f 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -380,6 +380,26 @@ struct i915_oa_ops {
 	u32 (*oa_hw_tail_read)(struct i915_perf_stream *stream);
 };
 
+struct i915_perf_gt {
+	/*
+	 * Lock associated with anything below within this structure.
+	 */
+	struct mutex lock;
+
+	/**
+	 * @sseu: sseu configuration selected to run while perf is active,
+	 * applies to all contexts.
+	 */
+	struct intel_sseu sseu;
+
+	/*
+	 * @exclusive_stream: The stream currently using the OA unit. This is
+	 * sometimes accessed outside a syscall associated to its file
+	 * descriptor.
+	 */
+	struct i915_perf_stream *exclusive_stream;
+};
+
 struct i915_perf {
 	struct drm_i915_private *i915;
 
@@ -397,25 +417,6 @@ struct i915_perf {
 	 */
 	struct idr metrics_idr;
 
-	/*
-	 * Lock associated with anything below within this structure
-	 * except exclusive_stream.
-	 */
-	struct mutex lock;
-
-	/*
-	 * The stream currently using the OA unit. If accessed
-	 * outside a syscall associated to its file
-	 * descriptor.
-	 */
-	struct i915_perf_stream *exclusive_stream;
-
-	/**
-	 * @sseu: sseu configuration selected to run while perf is active,
-	 * applies to all contexts.
-	 */
-	struct intel_sseu sseu;
-
 	/**
 	 * For rate limiting any notifications of spurious
 	 * invalid OA reports
diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c
index 429c6d73b159..24dde5531423 100644
--- a/drivers/gpu/drm/i915/selftests/i915_perf.c
+++ b/drivers/gpu/drm/i915/selftests/i915_perf.c
@@ -102,6 +102,12 @@ test_stream(struct i915_perf *perf)
 		I915_OA_FORMAT_A32u40_A4u32_B8_C8 : I915_OA_FORMAT_C4_B8,
 	};
 	struct i915_perf_stream *stream;
+	struct intel_gt *gt;
+
+	if (!props.engine)
+		return NULL;
+
+	gt = props.engine->gt;
 
 	if (!oa_config)
 		return NULL;
@@ -116,12 +122,12 @@ test_stream(struct i915_perf *perf)
 
 	stream->perf = perf;
 
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 	if (i915_oa_stream_init(stream, &param, &props)) {
 		kfree(stream);
 		stream =  NULL;
 	}
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 
 	i915_oa_config_put(oa_config);
 
@@ -130,11 +136,11 @@ test_stream(struct i915_perf *perf)
 
 static void stream_destroy(struct i915_perf_stream *stream)
 {
-	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 	i915_perf_destroy_locked(stream);
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 }
 
 static int live_sanitycheck(void *arg)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 09/19] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (7 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 08/19] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-14 19:04   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 10/19] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers Umesh Nerlige Ramappa
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

With multi-gt, user can access multiple OA buffers concurrently. Use
stream->lock instead of gt->perf.lock to serialize file operations.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 31 ++++++++++++--------------
 drivers/gpu/drm/i915/i915_perf_types.h |  5 +++++
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5dccb3ffffc5..87b92d2946f4 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3244,6 +3244,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	stream->poll_check_timer.function = oa_poll_check_timer_cb;
 	init_waitqueue_head(&stream->poll_wq);
 	spin_lock_init(&stream->oa_buffer.ptr_lock);
+	mutex_init(&stream->lock);
 
 	return 0;
 
@@ -3307,7 +3308,6 @@ static ssize_t i915_perf_read(struct file *file,
 			      loff_t *ppos)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct intel_gt *gt = stream->engine->gt;
 	size_t offset = 0;
 	int ret;
 
@@ -3331,14 +3331,14 @@ static ssize_t i915_perf_read(struct file *file,
 			if (ret)
 				return ret;
 
-			mutex_lock(&gt->perf.lock);
+			mutex_lock(&stream->lock);
 			ret = stream->ops->read(stream, buf, count, &offset);
-			mutex_unlock(&gt->perf.lock);
+			mutex_unlock(&stream->lock);
 		} while (!offset && !ret);
 	} else {
-		mutex_lock(&gt->perf.lock);
+		mutex_lock(&stream->lock);
 		ret = stream->ops->read(stream, buf, count, &offset);
-		mutex_unlock(&gt->perf.lock);
+		mutex_unlock(&stream->lock);
 	}
 
 	/* We allow the poll checking to sometimes report false positive EPOLLIN
@@ -3385,9 +3385,6 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
  * &i915_perf_stream_ops->poll_wait to call poll_wait() with a wait queue that
  * will be woken for new stream data.
  *
- * Note: The &gt->perf.lock mutex has been taken to serialize
- * with any non-file-operation driver hooks.
- *
  * Returns: any poll events that are ready without sleeping
  */
 static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
@@ -3426,12 +3423,11 @@ static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
 static __poll_t i915_perf_poll(struct file *file, poll_table *wait)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct intel_gt *gt = stream->engine->gt;
 	__poll_t ret;
 
-	mutex_lock(&gt->perf.lock);
+	mutex_lock(&stream->lock);
 	ret = i915_perf_poll_locked(stream, file, wait);
-	mutex_unlock(&gt->perf.lock);
+	mutex_unlock(&stream->lock);
 
 	return ret;
 }
@@ -3530,9 +3526,6 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
  * @cmd: the ioctl request
  * @arg: the ioctl data
  *
- * Note: The &gt->perf.lock mutex has been taken to serialize
- * with any non-file-operation driver hooks.
- *
  * Returns: zero on success or a negative error code. Returns -EINVAL for
  * an unknown ioctl request.
  */
@@ -3570,12 +3563,11 @@ static long i915_perf_ioctl(struct file *file,
 			    unsigned long arg)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct intel_gt *gt = stream->engine->gt;
 	long ret;
 
-	mutex_lock(&gt->perf.lock);
+	mutex_lock(&stream->lock);
 	ret = i915_perf_ioctl_locked(stream, cmd, arg);
-	mutex_unlock(&gt->perf.lock);
+	mutex_unlock(&stream->lock);
 
 	return ret;
 }
@@ -3621,6 +3613,11 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	struct i915_perf *perf = stream->perf;
 	struct intel_gt *gt = stream->engine->gt;
 
+	/*
+	 * Within this call, we know that the fd is being closed and we have no
+	 * other user of stream->lock. Use the perf lock to destroy the stream
+	 * here.
+	 */
 	mutex_lock(&gt->perf.lock);
 	i915_perf_destroy_locked(stream);
 	mutex_unlock(&gt->perf.lock);
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index e888bfab478f..dc9bfd8086cf 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -146,6 +146,11 @@ struct i915_perf_stream {
 	 */
 	struct intel_engine_cs *engine;
 
+	/*
+	 * Lock associated with operations on stream
+	 */
+	struct mutex lock;
+
 	/**
 	 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
 	 * properties given when opening a stream, representing the contents
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 10/19] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (8 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 09/19] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-06 19:56   ` Lionel Landwerlin
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 11/19] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

User passes uabi engine class and instance to the perf OA interface. Use
gt corresponding to the engine to pin the buffers to the right ggtt.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 87b92d2946f4..f7621b45966c 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1765,6 +1765,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
 static int alloc_oa_buffer(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *i915 = stream->perf->i915;
+	struct intel_gt *gt = stream->engine->gt;
 	struct drm_i915_gem_object *bo;
 	struct i915_vma *vma;
 	int ret;
@@ -1784,11 +1785,22 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
 	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
 
 	/* PreHSW required 512K alignment, HSW requires 16M */
-	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
+	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_unref;
 	}
+
+	/*
+	 * PreHSW required 512K alignment.
+	 * HSW and onwards, align to requested size of OA buffer.
+	 */
+	ret = i915_vma_pin(vma, 0, SZ_16M, PIN_GLOBAL | PIN_HIGH);
+	if (ret) {
+		drm_err(&gt->i915->drm, "Failed to pin OA buffer %d\n", ret);
+		goto err_unref;
+	}
+
 	stream->oa_buffer.vma = vma;
 
 	stream->oa_buffer.vaddr =
@@ -1838,6 +1850,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
 static int alloc_noa_wait(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *i915 = stream->perf->i915;
+	struct intel_gt *gt = stream->engine->gt;
 	struct drm_i915_gem_object *bo;
 	struct i915_vma *vma;
 	const u64 delay_ticks = 0xffffffffffffffff -
@@ -1878,12 +1891,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 	 * multiple OA config BOs will have a jump to this address and it
 	 * needs to be fixed during the lifetime of the i915/perf stream.
 	 */
-	vma = i915_gem_object_ggtt_pin_ww(bo, &ww, NULL, 0, 0, PIN_HIGH);
+	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto out_ww;
 	}
 
+	ret = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_GLOBAL | PIN_HIGH);
+	if (ret)
+		goto out_ww;
+
 	batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
 	if (IS_ERR(batch)) {
 		ret = PTR_ERR(batch);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 11/19] drm/i915/perf: Store a pointer to oa_format in oa_buffer
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (9 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 10/19] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-06 19:56   ` Lionel Landwerlin
  2022-09-14 20:43   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 12/19] drm/i915/perf: Parse 64bit report header formats correctly Umesh Nerlige Ramappa
                   ` (11 subsequent siblings)
  22 siblings, 2 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

DG2 introduces OA reports with 64 bit report header fields. Perf OA
would need more information about the OA format in order to process such
reports. Store all OA format info in oa_buffer instead of just the size
and format-id.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 23 ++++++++++-------------
 drivers/gpu/drm/i915/i915_perf_types.h |  3 +--
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index f7621b45966c..9e455bd3bce5 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -483,7 +483,7 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
 static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 {
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
-	int report_size = stream->oa_buffer.format_size;
+	int report_size = stream->oa_buffer.format->size;
 	unsigned long flags;
 	bool pollin;
 	u32 hw_tail;
@@ -630,7 +630,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 			    size_t *offset,
 			    const u8 *report)
 {
-	int report_size = stream->oa_buffer.format_size;
+	int report_size = stream->oa_buffer.format->size;
 	struct drm_i915_perf_record_header header;
 	int report_size_partial;
 	u8 *oa_buf_end;
@@ -694,7 +694,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 				  size_t *offset)
 {
 	struct intel_uncore *uncore = stream->uncore;
-	int report_size = stream->oa_buffer.format_size;
+	int report_size = stream->oa_buffer.format->size;
 	u8 *oa_buf_base = stream->oa_buffer.vaddr;
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	size_t start_offset = *offset;
@@ -970,7 +970,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 				  size_t *offset)
 {
 	struct intel_uncore *uncore = stream->uncore;
-	int report_size = stream->oa_buffer.format_size;
+	int report_size = stream->oa_buffer.format->size;
 	u8 *oa_buf_base = stream->oa_buffer.vaddr;
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	u32 mask = (OA_BUFFER_SIZE - 1);
@@ -2517,7 +2517,7 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
 {
 	int err;
 	struct intel_context *ce = stream->pinned_ctx;
-	u32 format = stream->oa_buffer.format;
+	u32 format = stream->oa_buffer.format->format;
 	u32 offset = stream->perf->ctx_oactxctrl_offset;
 	struct flex regs_context[] = {
 		{
@@ -2890,7 +2890,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream)
 	u32 ctx_id = stream->specific_ctx_id;
 	bool periodic = stream->periodic;
 	u32 period_exponent = stream->period_exponent;
-	u32 report_format = stream->oa_buffer.format;
+	u32 report_format = stream->oa_buffer.format->format;
 
 	/*
 	 * Reset buf pointers so we don't forward reports from before now.
@@ -2916,7 +2916,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream)
 static void gen8_oa_enable(struct i915_perf_stream *stream)
 {
 	struct intel_uncore *uncore = stream->uncore;
-	u32 report_format = stream->oa_buffer.format;
+	u32 report_format = stream->oa_buffer.format->format;
 
 	/*
 	 * Reset buf pointers so we don't forward reports from before now.
@@ -2942,7 +2942,7 @@ static void gen8_oa_enable(struct i915_perf_stream *stream)
 static void gen12_oa_enable(struct i915_perf_stream *stream)
 {
 	struct intel_uncore *uncore = stream->uncore;
-	u32 report_format = stream->oa_buffer.format;
+	u32 report_format = stream->oa_buffer.format->format;
 
 	/*
 	 * If we don't want OA reports from the OA buffer, then we don't even
@@ -3184,15 +3184,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	stream->sample_flags = props->sample_flags;
 	stream->sample_size += format_size;
 
-	stream->oa_buffer.format_size = format_size;
-	if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format_size == 0))
+	stream->oa_buffer.format = &perf->oa_formats[props->oa_format];
+	if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format->size == 0))
 		return -EINVAL;
 
 	stream->hold_preemption = props->hold_preemption;
 
-	stream->oa_buffer.format =
-		perf->oa_formats[props->oa_format].format;
-
 	stream->periodic = props->oa_periodic;
 	if (stream->periodic)
 		stream->period_exponent = props->oa_period_exponent;
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index dc9bfd8086cf..e0c96b44eda8 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -250,11 +250,10 @@ struct i915_perf_stream {
 	 * @oa_buffer: State of the OA buffer.
 	 */
 	struct {
+		const struct i915_oa_format *format;
 		struct i915_vma *vma;
 		u8 *vaddr;
 		u32 last_ctx_id;
-		int format;
-		int format_size;
 		int size_exponent;
 
 		/**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 12/19] drm/i915/perf: Parse 64bit report header formats correctly
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (10 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 11/19] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-16  0:47   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 13/19] drm/i915/perf: Add Wa_16010703925:dg2 Umesh Nerlige Ramappa
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

Now that OA formats come in flavor of 64 bit reports, the report header
has 64 bit report-id, timestamp, context-id and gpu-ticks fields. When
filtering these reports, use the right width for these fields.

v2: Let compiler decide on inline (Jani)

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 101 ++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_perf_types.h |   6 ++
 2 files changed, 86 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 9e455bd3bce5..167e7355980a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -324,8 +324,8 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
 	[I915_OAR_FORMAT_A32u40_A4u32_B8_C8]    = { 5, 256 },
 	[I915_OA_FORMAT_A24u40_A14u32_B8_C8]    = { 5, 256 },
-	[I915_OAR_FORMAT_A36u64_B8_C8]		= { 1, 384 },
-	[I915_OA_FORMAT_A38u64_R2u64_B8_C8]	= { 1, 448 },
+	[I915_OAR_FORMAT_A36u64_B8_C8]		= { 1, 384, HDR_64_BIT },
+	[I915_OA_FORMAT_A38u64_R2u64_B8_C8]	= { 1, 448, HDR_64_BIT },
 };
 
 #define SAMPLE_OA_REPORT      (1<<0)
@@ -456,6 +456,67 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
 	return oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
 }
 
+#define oa_report_header_64bit(__s) \
+	((__s)->oa_buffer.format->header == HDR_64_BIT)
+
+static u64 oa_report_id(struct i915_perf_stream *stream, void *report)
+{
+	return oa_report_header_64bit(stream) ? *(u64 *)report : *(u32 *)report;
+}
+
+static u64 oa_report_reason(struct i915_perf_stream *stream, void *report)
+{
+	return (oa_report_id(stream, report) >> OAREPORT_REASON_SHIFT) &
+	       (GRAPHICS_VER(stream->perf->i915) == 12 ?
+		OAREPORT_REASON_MASK_EXTENDED :
+		OAREPORT_REASON_MASK);
+}
+
+static void oa_report_id_clear(struct i915_perf_stream *stream, u32 *report)
+{
+	if (oa_report_header_64bit(stream))
+		*(u64 *)report = 0;
+	else
+		*report = 0;
+}
+
+static bool oa_report_ctx_invalid(struct i915_perf_stream *stream, void *report)
+{
+	return !(oa_report_id(stream, report) &
+	       stream->perf->gen8_valid_ctx_bit) &&
+	       GRAPHICS_VER(stream->perf->i915) <= 11;
+}
+
+static u64 oa_timestamp(struct i915_perf_stream *stream, void *report)
+{
+	return oa_report_header_64bit(stream) ?
+		*((u64 *)report + 1) :
+		*((u32 *)report + 1);
+}
+
+static void oa_timestamp_clear(struct i915_perf_stream *stream, u32 *report)
+{
+	if (oa_report_header_64bit(stream))
+		*(u64 *)&report[2] = 0;
+	else
+		report[1] = 0;
+}
+
+static u32 oa_context_id(struct i915_perf_stream *stream, u32 *report)
+{
+	u32 ctx_id = oa_report_header_64bit(stream) ? report[4] : report[2];
+
+	return ctx_id & stream->specific_ctx_id_mask;
+}
+
+static void oa_context_id_squash(struct i915_perf_stream *stream, u32 *report)
+{
+	if (oa_report_header_64bit(stream))
+		report[4] = INVALID_CTX_ID;
+	else
+		report[2] = INVALID_CTX_ID;
+}
+
 /**
  * oa_buffer_check_unlocked - check for data and update tail ptr state
  * @stream: i915 stream instance
@@ -545,9 +606,10 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 		 * If not : (╯°□°)╯︵ ┻━┻
 		 */
 		while (_oa_taken(stream, tail, aged_tail) >= report_size) {
-			u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail);
+			void *report = stream->oa_buffer.vaddr + tail;
 
-			if (report32[0] != 0 || report32[1] != 0)
+			if (oa_report_id(stream, report) ||
+			    oa_timestamp(stream, report))
 				break;
 
 			tail = _rewind_tail(stream, tail, report_size);
@@ -740,23 +802,19 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		u8 *report = oa_buf_base + head;
 		u32 *report32 = (void *)report;
 		u32 ctx_id;
-		u32 reason;
+		u64 reason;
 
 		/*
 		 * The reason field includes flags identifying what
 		 * triggered this specific report (mostly timer
 		 * triggered or e.g. due to a context switch).
 		 *
-		 * This field is never expected to be zero so we can
-		 * check that the report isn't invalid before copying
-		 * it to userspace...
+		 * In MMIO triggered reports, some platforms do not set the
+		 * reason bit in this field and it is valid to have a reason
+		 * field of zero.
 		 */
-		reason = ((report32[0] >> OAREPORT_REASON_SHIFT) &
-			  (GRAPHICS_VER(stream->perf->i915) == 12 ?
-			   OAREPORT_REASON_MASK_EXTENDED :
-			   OAREPORT_REASON_MASK));
-
-		ctx_id = report32[2] & stream->specific_ctx_id_mask;
+		reason = oa_report_reason(stream, report);
+		ctx_id = oa_context_id(stream, report32);
 
 		/*
 		 * Squash whatever is in the CTX_ID field if it's marked as
@@ -766,9 +824,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		 * Note: that we don't clear the valid_ctx_bit so userspace can
 		 * understand that the ID has been squashed by the kernel.
 		 */
-		if (!(report32[0] & stream->perf->gen8_valid_ctx_bit) &&
-		    GRAPHICS_VER(stream->perf->i915) <= 11)
-			ctx_id = report32[2] = INVALID_CTX_ID;
+		if (oa_report_ctx_invalid(stream, report)) {
+			ctx_id = INVALID_CTX_ID;
+			oa_context_id_squash(stream, report32);
+		}
 
 		/*
 		 * NB: For Gen 8 the OA unit no longer supports clock gating
@@ -812,7 +871,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 			 */
 			if (stream->ctx &&
 			    stream->specific_ctx_id != ctx_id) {
-				report32[2] = INVALID_CTX_ID;
+				oa_context_id_squash(stream, report32);
 			}
 
 			ret = append_oa_sample(stream, buf, count, offset,
@@ -824,11 +883,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		}
 
 		/*
-		 * Clear out the first 2 dword as a mean to detect unlanded
+		 * Clear out the report id and timestamp as a means to detect unlanded
 		 * reports.
 		 */
-		report32[0] = 0;
-		report32[1] = 0;
+		oa_report_id_clear(stream, report32);
+		oa_timestamp_clear(stream, report32);
 	}
 
 	if (start_offset != *offset) {
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index e0c96b44eda8..68db5f94bc58 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -30,9 +30,15 @@ struct i915_vma;
 struct intel_context;
 struct intel_engine_cs;
 
+enum report_header {
+	HDR_32_BIT = 0,
+	HDR_64_BIT,
+};
+
 struct i915_oa_format {
 	u32 format;
 	int size;
+	enum report_header header;
 };
 
 struct i915_oa_reg {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 13/19] drm/i915/perf: Add Wa_16010703925:dg2
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (11 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 12/19] drm/i915/perf: Parse 64bit report header formats correctly Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-16  1:08   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 14/19] drm/i915/perf: Add Wa_1608133521:dg2 Umesh Nerlige Ramappa
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

On DG2 A0, the OAR report format is buggy. Workaround is to not use it
for A0. For A0, remove the OAR format from the bitmask of supported
formats.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 167e7355980a..a28f07923d8f 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4741,6 +4741,11 @@ static void oa_init_supported_formats(struct i915_perf *perf)
 	default:
 		MISSING_CASE(platform);
 	}
+
+ 	if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) {
+		/* Wa_16010703925:dg2 */
+		clear_bit(I915_OAR_FORMAT_A36u64_B8_C8, perf->format_mask);
+ 	}
 }
 
 static void i915_perf_init_info(struct drm_i915_private *i915)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 14/19] drm/i915/perf: Add Wa_1608133521:dg2
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (12 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 13/19] drm/i915/perf: Add Wa_16010703925:dg2 Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-08-29 14:04   ` Jani Nikula
  2022-09-16  1:21   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 15/19] drm/i915/perf: Add Wa_1508761755:dg2 Umesh Nerlige Ramappa
                   ` (8 subsequent siblings)
  22 siblings, 2 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

DG2 introduces 64 bit counters and OA reports that have 64 bit values
for fields in the report header - report_id, timestamp, context_id and
gpu ticks. i915 uses report_id, timestamp and context_id to check for
valid reports.

In some DG2 variants, only the lower dwords for timestamp, report_id and
context_id are accessible. Add workaround for such reports.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index a28f07923d8f..a858ce57e465 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -310,7 +310,7 @@ static u32 i915_oa_max_sample_rate = 100000;
  * be used as a mask to align the OA tail pointer. In some of the
  * formats, R is used to denote reserved field.
  */
-static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
+static struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A13]	    = { 0, 64 },
 	[I915_OA_FORMAT_A29]	    = { 1, 128 },
 	[I915_OA_FORMAT_A13_B8_C8]  = { 2, 128 },
@@ -4746,6 +4746,13 @@ static void oa_init_supported_formats(struct i915_perf *perf)
 		/* Wa_16010703925:dg2 */
 		clear_bit(I915_OAR_FORMAT_A36u64_B8_C8, perf->format_mask);
  	}
+
+	if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0) ||
+	    IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_FOREVER)) {
+		/* Wa_1608133521:dg2 */
+		oa_formats[I915_OAR_FORMAT_A36u64_B8_C8].header = HDR_32_BIT;
+		oa_formats[I915_OA_FORMAT_A38u64_R2u64_B8_C8].header = HDR_32_BIT;
+	}
 }
 
 static void i915_perf_init_info(struct drm_i915_private *i915)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 15/19] drm/i915/perf: Add Wa_1508761755:dg2
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (13 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 14/19] drm/i915/perf: Add Wa_1608133521:dg2 Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-16  1:34   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988 Umesh Nerlige Ramappa
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

Disable Clock gating in EU when gathering the events so that EU events
are not lost.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_regs.h |  1 +
 drivers/gpu/drm/i915/i915_perf.c        | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 94f9ddcfb3a5..28ef74f948e2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1129,6 +1129,7 @@
 #define   GEN12_DISABLE_EARLY_READ		REG_BIT(14)
 #define   GEN12_ENABLE_LARGE_GRF_MODE		REG_BIT(12)
 #define   GEN12_PUSH_CONST_DEREF_HOLD_DIS	REG_BIT(8)
+#define   GEN12_DISABLE_DOP_GATING              REG_BIT(0)
 
 #define RT_CTRL					_MMIO(0xe530)
 #define   DIS_NULL_QUERY			REG_BIT(10)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index a858ce57e465..efdd16edf8f3 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2847,6 +2847,18 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
 	u32 sqcnt1;
 	int ret;
 
+ 	/*
+	 * Wa_1508761755:xehpsdv, dg2
+ 	 * EU NOA signals behave incorrectly if EU clock gating is enabled.
+ 	 * Disable thread stall DOP gating and EU DOP gating.
+ 	 */
+	if (IS_XEHPSDV(i915) || IS_DG2(i915)) {
+		intel_uncore_write(uncore, GEN8_ROW_CHICKEN,
+				_MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
+		intel_uncore_write(uncore, GEN7_ROW_CHICKEN2,
+				_MASKED_BIT_ENABLE(GEN12_DISABLE_DOP_GATING));
+	}
+
 	intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG,
 			   /* Disable clk ratio reports, like previous Gens. */
 			   _MASKED_BIT_ENABLE(GEN12_OAG_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS |
@@ -2925,6 +2937,17 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
 	struct drm_i915_private *i915 = stream->perf->i915;
 	u32 sqcnt1;
 
+	/*
+	 * Wa_1508761755:xehpsdv, dg2
+	 * Enable thread stall DOP gating and EU DOP gating.
+	 */
+	if (IS_XEHPSDV(i915) || IS_DG2(i915)) {
+		intel_uncore_write(uncore, GEN8_ROW_CHICKEN,
+				_MASKED_BIT_DISABLE(STALL_DOP_GATING_DISABLE));
+		intel_uncore_write(uncore, GEN7_ROW_CHICKEN2,
+				_MASKED_BIT_DISABLE(GEN12_DISABLE_DOP_GATING));
+	}
+
 	/* Reset all contexts' slices/subslices configurations. */
 	gen12_configure_all_contexts(stream, NULL, NULL);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (14 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 15/19] drm/i915/perf: Add Wa_1508761755:dg2 Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-16  5:16   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 17/19] drm/i915/perf: Save/restore EU flex counters across reset Umesh Nerlige Ramappa
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

OA reports in the OA buffer contain an OA timestamp field that helps
user calculate delta between 2 OA reports. The calculation relies on the
CS timestamp frequency to convert the timestamp value to nanoseconds.
The CS timestamp frequency is a function of the CTC_SHIFT value in
RPM_CONFIG0.

In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
actual value from RPM_CONFIG0. At the user level, this results in an
error in calculating delta between 2 OA reports since the OA timestamp
is not shifted in the same manner as CS timestamp.

To resolve this, return actual OA timestamp frequency to the user in
i915_getparam_ioctl.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_getparam.c |  3 +++
 drivers/gpu/drm/i915/i915_perf.c     | 30 ++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_perf.h     |  2 ++
 include/uapi/drm/i915_drm.h          |  6 ++++++
 4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c
index 6fd15b39570c..cdb2208ecabd 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_PERF_REVISION:
 		value = i915_perf_ioctl_version();
 		break;
+	case I915_PARAM_OA_TIMESTAMP_FREQUENCY:
+		value = i915_perf_oa_timestamp_frequency(i915);
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index efdd16edf8f3..132c2ce8b33b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3180,6 +3180,30 @@ get_sseu_config(struct intel_sseu *out_sseu,
 	return i915_gem_user_to_context_sseu(engine->gt, drm_sseu, out_sseu);
 }
 
+/*
+ * OA timestamp frequency = CS timestamp frequency in most platforms. On some
+ * platforms OA unit ignores the CTC_SHIFT and the 2 timestamps differ. In such
+ * cases, return the adjusted CS timestamp frequency to the user.
+ */
+u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915)
+{
+	/* Wa_18013179988:dg2 */
+	if (IS_DG2(i915)) {
+		intel_wakeref_t wakeref;
+		u32 reg, shift;
+
+		with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref)
+			reg = intel_uncore_read(to_gt(i915)->uncore, RPM_CONFIG0);
+
+		shift = (reg & GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK) >>
+			 GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_SHIFT;
+
+		return to_gt(i915)->clock_frequency << (3 - shift);
+	}
+
+	return to_gt(i915)->clock_frequency;
+}
+
 /**
  * i915_oa_stream_init - validate combined props for OA stream and init
  * @stream: An i915 perf stream
@@ -3904,8 +3928,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
 
 static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
 {
-	return intel_gt_clock_interval_to_ns(to_gt(perf->i915),
-					     2ULL << exponent);
+	u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
+	u32 den = i915_perf_oa_timestamp_frequency(perf->i915);
+
+	return div_u64(nom + den - 1, den);
 }
 
 static __always_inline bool
diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h
index 1d1329e5af3a..f96e09a4af04 100644
--- a/drivers/gpu/drm/i915/i915_perf.h
+++ b/drivers/gpu/drm/i915/i915_perf.h
@@ -57,4 +57,6 @@ static inline void i915_oa_config_put(struct i915_oa_config *oa_config)
 	kref_put(&oa_config->ref, i915_oa_config_release);
 }
 
+u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915);
+
 #endif /* __I915_PERF_H__ */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index d20d723925b5..5e42e94ea534 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -749,6 +749,12 @@ typedef struct drm_i915_irq_wait {
 /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
 #define I915_PARAM_HAS_USERPTR_PROBE 56
 
+/*
+ * Frequency of the timestamps in OA reports. This used to be the same as the CS
+ * timestamp frequency, but differs on some platforms.
+ */
+#define I915_PARAM_OA_TIMESTAMP_FREQUENCY 57
+
 /* Must be kept compact -- no holes and well documented */
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 17/19] drm/i915/perf: Save/restore EU flex counters across reset
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (15 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988 Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-16  5:40   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 18/19] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

If a drm client is killed, then hw contexts used by the client are reset
immediately. This reset clears the EU flex counter configuration. If an
OA use case is running in parallel, it would start seeing zeroed eu
counter values following the reset even if the drm client is restarted.
Save/restore the EU flex counter config so that the EU counters can be
monitored continuously across resets.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 74cbe8eaf531..3e152219fcb2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -375,6 +375,14 @@ static int guc_mmio_regset_init(struct temp_regset *regset,
 	for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++)
 		ret |= GUC_MMIO_REG_ADD(gt, regset, GEN9_LNCFCMOCS(i), false);
 
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL0, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL1, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL2, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL3, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL4, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL5, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL6, false);
+
 	return ret ? -1 : 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 18/19] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (16 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 17/19] drm/i915/perf: Save/restore EU flex counters across reset Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-09-16 21:41   ` Dixit, Ashutosh
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 19/19] drm/i915/perf: Enable OA for DG2 Umesh Nerlige Ramappa
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>

There is a w/a to reset RCS/CCS before it goes into RC6. This breaks
OA. Fix it by disabling RC6.

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
---
 .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h |  9 ++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 45 +++++++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  2 +
 drivers/gpu/drm/i915/i915_perf.c              | 29 ++++++++++++
 4 files changed, 85 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
index 4c840a2639dc..811add10c30d 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
@@ -128,6 +128,15 @@ enum slpc_media_ratio_mode {
 	SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_TWO = 2,
 };
 
+enum slpc_gucrc_mode {
+	SLPC_GUCRC_MODE_HW = 0,
+	SLPC_GUCRC_MODE_GUCRC_NO_RC6 = 1,
+	SLPC_GUCRC_MODE_GUCRC_STATIC_TIMEOUT = 2,
+	SLPC_GUCRC_MODE_GUCRC_DYNAMIC_HYSTERESIS = 3,
+
+	SLPC_GUCRC_MODE_MAX,
+};
+
 enum slpc_event_id {
 	SLPC_EVENT_RESET = 0,
 	SLPC_EVENT_SHUTDOWN = 1,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
index e1fa1f32f29e..23989f5452a7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
@@ -642,6 +642,51 @@ static void slpc_get_rp_values(struct intel_guc_slpc *slpc)
 		slpc->boost_freq = slpc->rp0_freq;
 }
 
+/**
+ * intel_guc_slpc_override_gucrc_mode() - override GUCRC mode
+ * @slpc: pointer to intel_guc_slpc.
+ * @mode: new value of the mode.
+ *
+ * This function will override the GUCRC mode.
+ *
+ * Return: 0 on success, non-zero error code on failure.
+ */
+int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode)
+{
+	int ret;
+	struct drm_i915_private *i915 = slpc_to_i915(slpc);
+	intel_wakeref_t wakeref;
+
+	if (mode >= SLPC_GUCRC_MODE_MAX)
+		return -EINVAL;
+
+	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
+
+	ret = slpc_set_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE, mode);
+	if (ret)
+		drm_err(&i915->drm,
+			"Override gucrc mode %d failed %d\n",
+			mode, ret);
+
+	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
+
+	return ret;
+}
+
+int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc)
+{
+	struct drm_i915_private *i915 = slpc_to_i915(slpc);
+	int ret = 0;
+
+	ret = slpc_unset_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE);
+	if (ret)
+		drm_err(&i915->drm,
+			"Unsetting gucrc mode failed %d\n",
+			ret);
+
+	return ret;
+}
+
 /*
  * intel_guc_slpc_enable() - Start SLPC
  * @slpc: pointer to intel_guc_slpc.
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
index 82a98f78f96c..ccf483730d9d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
@@ -42,5 +42,7 @@ int intel_guc_slpc_set_media_ratio_mode(struct intel_guc_slpc *slpc, u32 val);
 void intel_guc_pm_intrmsk_enable(struct intel_gt *gt);
 void intel_guc_slpc_boost(struct intel_guc_slpc *slpc);
 void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc);
+int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc);
+int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode);
 
 #endif
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 132c2ce8b33b..ce1b6ad4d107 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -208,6 +208,7 @@
 #include "gt/intel_lrc.h"
 #include "gt/intel_lrc_reg.h"
 #include "gt/intel_ring.h"
+#include "gt/uc/intel_guc_slpc.h"
 
 #include "i915_drv.h"
 #include "i915_file_private.h"
@@ -1651,6 +1652,16 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 
 	free_oa_buffer(stream);
 
+	/*
+	 * Wa_16011777198:dg2: Unset the override of GUCRC mode to enable rc6.
+	 */
+	if (intel_guc_slpc_is_used(&gt->uc.guc) &&
+	    intel_uc_uses_guc_rc(&gt->uc) &&
+	    (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
+	     IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0)))
+		drm_WARN_ON(&gt->i915->drm,
+			    intel_guc_slpc_unset_gucrc_mode(&gt->uc.guc.slpc));
+
 	intel_uncore_forcewake_put(stream->uncore, FORCEWAKE_ALL);
 	intel_engine_pm_put(stream->engine);
 
@@ -3339,6 +3350,24 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	intel_engine_pm_get(stream->engine);
 	intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL);
 
+	/*
+	 * Wa_16011777198:dg2: GuC resets render as part of the Wa. This causes
+	 * OA to lose the configuration state. Prevent this by overriding GUCRC
+	 * mode.
+	 */
+	if (intel_guc_slpc_is_used(&gt->uc.guc) &&
+	    intel_uc_uses_guc_rc(&gt->uc) &&
+	    (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
+	     IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) {
+		ret = intel_guc_slpc_override_gucrc_mode(&gt->uc.guc.slpc,
+							 SLPC_GUCRC_MODE_GUCRC_NO_RC6);
+		if (ret) {
+			drm_dbg(&stream->perf->i915->drm,
+				"Unable to override gucrc mode\n");
+			goto err_config;
+		}
+	}
+
 	ret = alloc_oa_buffer(stream);
 	if (ret)
 		goto err_oa_buf_alloc;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 19/19] drm/i915/perf: Enable OA for DG2
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (17 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 18/19] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
@ 2022-08-23 20:41 ` Umesh Nerlige Ramappa
  2022-08-23 21:11 ` [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats " Umesh Nerlige Ramappa
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 20:41 UTC (permalink / raw)
  To: intel-gfx

OA was disabled for DG2 as support was missing. Enable it back now.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index ce1b6ad4d107..f109aeeece8d 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4877,12 +4877,6 @@ void i915_perf_init(struct drm_i915_private *i915)
 {
 	struct i915_perf *perf = &i915->perf;
 
-	/* XXX const struct i915_perf_ops! */
-
-	/* i915_perf is not enabled for DG2 yet */
-	if (IS_DG2(i915))
-		return;
-
 	perf->oa_formats = oa_formats;
 	if (IS_HASWELL(i915)) {
 		perf->ops.is_valid_b_counter_reg = gen7_is_valid_b_counter_addr;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (18 preceding siblings ...)
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 19/19] drm/i915/perf: Enable OA for DG2 Umesh Nerlige Ramappa
@ 2022-08-23 21:11 ` Umesh Nerlige Ramappa
  2022-08-23 21:12 ` [Intel-gfx] [PATCH 19/19] drm/i915/perf: Enable OA " Umesh Nerlige Ramappa
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 21:11 UTC (permalink / raw)
  To: intel-gfx

Add new OA formats for DG2. Some of the newer OA formats are not
multples of 64 bytes and are not powers of 2. For those formats, adjust
hw_tail accordingly when checking for new reports.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramampa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 63 ++++++++++++++++++++------------
 include/uapi/drm/i915_drm.h      |  6 +++
 2 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 735244a3aedd..c8331b549d31 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -306,7 +306,8 @@ static u32 i915_oa_max_sample_rate = 100000;
 
 /* XXX: beware if future OA HW adds new report formats that the current
  * code assumes all reports have a power-of-two size and ~(size - 1) can
- * be used as a mask to align the OA tail pointer.
+ * be used as a mask to align the OA tail pointer. In some of the
+ * formats, R is used to denote reserved field.
  */
 static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A13]	    = { 0, 64 },
@@ -320,6 +321,10 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A12]		    = { 0, 64 },
 	[I915_OA_FORMAT_A12_B8_C8]	    = { 2, 128 },
 	[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
+	[I915_OAR_FORMAT_A32u40_A4u32_B8_C8]    = { 5, 256 },
+	[I915_OA_FORMAT_A24u40_A14u32_B8_C8]    = { 5, 256 },
+	[I915_OAR_FORMAT_A36u64_B8_C8]		= { 1, 384 },
+	[I915_OA_FORMAT_A38u64_R2u64_B8_C8]	= { 1, 448 },
 };
 
 #define SAMPLE_OA_REPORT      (1<<0)
@@ -467,6 +472,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	bool pollin;
 	u32 hw_tail;
 	u64 now;
+	u32 partial_report_size;
 
 	/* We have to consider the (unlikely) possibility that read() errors
 	 * could result in an OA buffer reset which might reset the head and
@@ -476,10 +482,16 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 
 	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
 
-	/* The tail pointer increases in 64 byte increments,
-	 * not in report_size steps...
+	/* The tail pointer increases in 64 byte increments, whereas report
+	 * sizes need not be integral multiples or 64 or powers of 2.
+	 * Compute potentially partially landed report in the OA buffer
 	 */
-	hw_tail &= ~(report_size - 1);
+	partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail);
+	partial_report_size %= report_size;
+
+	/* Subtract partial amount off the tail */
+	hw_tail = gtt_offset + ((hw_tail - partial_report_size) &
+				(stream->oa_buffer.vma->size - 1));
 
 	now = ktime_get_mono_fast_ns();
 
@@ -601,6 +613,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 {
 	int report_size = stream->oa_buffer.format_size;
 	struct drm_i915_perf_record_header header;
+	int report_size_partial;
+	u8 *oa_buf_end;
 
 	header.type = DRM_I915_PERF_RECORD_SAMPLE;
 	header.pad = 0;
@@ -614,7 +628,19 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 		return -EFAULT;
 	buf += sizeof(header);
 
-	if (copy_to_user(buf, report, report_size))
+	oa_buf_end = stream->oa_buffer.vaddr +
+		     stream->oa_buffer.vma->size;
+	report_size_partial = oa_buf_end - report;
+
+	if (report_size_partial < report_size) {
+		if(copy_to_user(buf, report, report_size_partial))
+			return -EFAULT;
+		buf += report_size_partial;
+
+		if(copy_to_user(buf, stream->oa_buffer.vaddr,
+				report_size - report_size_partial))
+			return -EFAULT;
+	} else if (copy_to_user(buf, report, report_size))
 		return -EFAULT;
 
 	(*offset) += header.size;
@@ -684,8 +710,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	 * all a power of two).
 	 */
 	if (drm_WARN_ONCE(&uncore->i915->drm,
-			  head > OA_BUFFER_SIZE || head % report_size ||
-			  tail > OA_BUFFER_SIZE || tail % report_size,
+			  head > stream->oa_buffer.vma->size ||
+			  tail > stream->oa_buffer.vma->size,
 			  "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
 			  head, tail))
 		return -EIO;
@@ -699,22 +725,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		u32 ctx_id;
 		u32 reason;
 
-		/*
-		 * All the report sizes factor neatly into the buffer
-		 * size so we never expect to see a report split
-		 * between the beginning and end of the buffer.
-		 *
-		 * Given the initial alignment check a misalignment
-		 * here would imply a driver bug that would result
-		 * in an overrun.
-		 */
-		if (drm_WARN_ON(&uncore->i915->drm,
-				(OA_BUFFER_SIZE - head) < report_size)) {
-			drm_err(&uncore->i915->drm,
-				"Spurious OA head ptr: non-integral report offset\n");
-			break;
-		}
-
 		/*
 		 * The reason field includes flags identifying what
 		 * triggered this specific report (mostly timer
@@ -4513,6 +4523,13 @@ static void oa_init_supported_formats(struct i915_perf *perf)
 		oa_format_add(perf, I915_OA_FORMAT_C4_B8);
 		break;
 
+	case INTEL_DG2:
+		oa_format_add(perf, I915_OAR_FORMAT_A32u40_A4u32_B8_C8);
+		oa_format_add(perf, I915_OA_FORMAT_A24u40_A14u32_B8_C8);
+		oa_format_add(perf, I915_OAR_FORMAT_A36u64_B8_C8);
+		oa_format_add(perf, I915_OA_FORMAT_A38u64_R2u64_B8_C8);
+		break;
+
 	default:
 		MISSING_CASE(platform);
 	}
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 520ad2691a99..d20d723925b5 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2650,6 +2650,12 @@ enum drm_i915_oa_format {
 	I915_OA_FORMAT_A12_B8_C8,
 	I915_OA_FORMAT_A32u40_A4u32_B8_C8,
 
+	/* DG2 */
+	I915_OAR_FORMAT_A32u40_A4u32_B8_C8,
+	I915_OA_FORMAT_A24u40_A14u32_B8_C8,
+	I915_OAR_FORMAT_A36u64_B8_C8,
+	I915_OA_FORMAT_A38u64_R2u64_B8_C8,
+
 	I915_OA_FORMAT_MAX	    /* non-ABI */
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 19/19] drm/i915/perf: Enable OA for DG2
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (19 preceding siblings ...)
  2022-08-23 21:11 ` [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats " Umesh Nerlige Ramappa
@ 2022-08-23 21:12 ` Umesh Nerlige Ramappa
  2022-08-23 22:07 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev2) Patchwork
  2022-08-23 22:07 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
  22 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23 21:12 UTC (permalink / raw)
  To: intel-gfx

OA was disabled for DG2 as support was missing. Enable it back now.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index ce1b6ad4d107..f109aeeece8d 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4877,12 +4877,6 @@ void i915_perf_init(struct drm_i915_private *i915)
 {
 	struct i915_perf *perf = &i915->perf;
 
-	/* XXX const struct i915_perf_ops! */
-
-	/* i915_perf is not enabled for DG2 yet */
-	if (IS_DG2(i915))
-		return;
-
 	perf->oa_formats = oa_formats;
 	if (IS_HASWELL(i915)) {
 		perf->ops.is_valid_b_counter_reg = gen7_is_valid_b_counter_addr;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev2)
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (20 preceding siblings ...)
  2022-08-23 21:12 ` [Intel-gfx] [PATCH 19/19] drm/i915/perf: Enable OA " Umesh Nerlige Ramappa
@ 2022-08-23 22:07 ` Patchwork
  2022-08-23 22:07 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
  22 siblings, 0 replies; 84+ messages in thread
From: Patchwork @ 2022-08-23 22:07 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: Add DG2 OA support (rev2)
URL   : https://patchwork.freedesktop.org/series/107584/
State : warning

== Summary ==

Error: dim checkpatch failed
52bc7af769a8 drm/i915/perf: Fix OA filtering logic for GuC mode
-:6: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#6: 
With GuC mode of submission, GuC is in control of defining the context id field

total: 0 errors, 1 warnings, 0 checks, 173 lines checked
8030e7cc5068 drm/i915/perf: Add OA formats for DG2
-:83: CHECK:BRACES: braces {} should be used on all arms of this statement
#83: FILE: drivers/gpu/drm/i915/i915_perf.c:635:
+	if (report_size_partial < report_size) {
[...]
+	} else if (copy_to_user(buf, report, report_size))
[...]

-:84: ERROR:SPACING: space required before the open parenthesis '('
#84: FILE: drivers/gpu/drm/i915/i915_perf.c:636:
+		if(copy_to_user(buf, report, report_size_partial))

-:88: ERROR:SPACING: space required before the open parenthesis '('
#88: FILE: drivers/gpu/drm/i915/i915_perf.c:640:
+		if(copy_to_user(buf, stream->oa_buffer.vaddr,

-:159: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>' != 'Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramampa@intel.com>'

total: 2 errors, 1 warnings, 1 checks, 130 lines checked
0f1d1aa255aa drm/i915/perf: Fix noa wait predication for DG2
eb16b3efd45e drm/i915/perf: Determine gen12 oa ctx offset at runtime
-:22: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'x' - possible side-effects?
#22: FILE: drivers/gpu/drm/i915/i915_perf.c:1369:
+#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX)

total: 0 errors, 0 warnings, 1 checks, 227 lines checked
094c94d27845 drm/i915/perf: Enable commands per clock reporting in OA
-:58: ERROR:CODE_INDENT: code indent should use tabs where possible
#58: FILE: drivers/gpu/drm/i915/i915_perf.c:2772:
+ ^I/*$

-:58: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#58: FILE: drivers/gpu/drm/i915/i915_perf.c:2772:
+ ^I/*$

-:58: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#58: FILE: drivers/gpu/drm/i915/i915_perf.c:2772:
+ ^I/*$

-:59: ERROR:CODE_INDENT: code indent should use tabs where possible
#59: FILE: drivers/gpu/drm/i915/i915_perf.c:2773:
+ ^I * Initialize Super Queue Internal Cnt Register$

-:59: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#59: FILE: drivers/gpu/drm/i915/i915_perf.c:2773:
+ ^I * Initialize Super Queue Internal Cnt Register$

-:60: ERROR:CODE_INDENT: code indent should use tabs where possible
#60: FILE: drivers/gpu/drm/i915/i915_perf.c:2774:
+ ^I * Set PMON Enable in order to collect valid metrics.$

-:60: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#60: FILE: drivers/gpu/drm/i915/i915_perf.c:2774:
+ ^I * Set PMON Enable in order to collect valid metrics.$

-:62: ERROR:CODE_INDENT: code indent should use tabs where possible
#62: FILE: drivers/gpu/drm/i915/i915_perf.c:2776:
+ ^I */$

-:62: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#62: FILE: drivers/gpu/drm/i915/i915_perf.c:2776:
+ ^I */$

-:88: ERROR:CODE_INDENT: code indent should use tabs where possible
#88: FILE: drivers/gpu/drm/i915/i915_perf.c:2847:
+ ^I/* Reset PMON Enable to save power. */$

-:88: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#88: FILE: drivers/gpu/drm/i915/i915_perf.c:2847:
+ ^I/* Reset PMON Enable to save power. */$

-:88: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#88: FILE: drivers/gpu/drm/i915/i915_perf.c:2847:
+ ^I/* Reset PMON Enable to save power. */$

total: 5 errors, 7 warnings, 0 checks, 79 lines checked
011d3bb3b13a drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size
-:23: ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
#23: FILE: drivers/gpu/drm/i915/i915_perf.c:388:
+static u32 _oa_taken(struct i915_perf_stream * stream, u32 tail, u32 head)

-:30: ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
#30: FILE: drivers/gpu/drm/i915/i915_perf.c:395:
+static u32 _rewind_tail(struct i915_perf_stream * stream, u32 relative_hw_tail,

total: 2 errors, 0 warnings, 0 checks, 106 lines checked
706eb71d9849 drm/i915/perf: Simply use stream->ctx
e33a6e0a3b91 drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
-:60: WARNING:AVOID_BUG: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()
#60: FILE: drivers/gpu/drm/i915/i915_perf.c:1582:
+	BUG_ON(stream != gt->perf.exclusive_stream);

total: 0 errors, 1 warnings, 0 checks, 357 lines checked
a6af6257d624 drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
6b6ca4561fa0 drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
2b94ddd53cff drm/i915/perf: Store a pointer to oa_format in oa_buffer
4b075ef6d3c3 drm/i915/perf: Parse 64bit report header formats correctly
0eef5c79db8b drm/i915/perf: Add Wa_16010703925:dg2
-:21: ERROR:CODE_INDENT: code indent should use tabs where possible
#21: FILE: drivers/gpu/drm/i915/i915_perf.c:4745:
+ ^Iif (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) {$

-:21: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#21: FILE: drivers/gpu/drm/i915/i915_perf.c:4745:
+ ^Iif (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) {$

-:21: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#21: FILE: drivers/gpu/drm/i915/i915_perf.c:4745:
+ ^Iif (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) {$

-:24: ERROR:CODE_INDENT: code indent should use tabs where possible
#24: FILE: drivers/gpu/drm/i915/i915_perf.c:4748:
+ ^I}$

-:24: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#24: FILE: drivers/gpu/drm/i915/i915_perf.c:4748:
+ ^I}$

-:24: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#24: FILE: drivers/gpu/drm/i915/i915_perf.c:4748:
+ ^I}$

total: 2 errors, 4 warnings, 0 checks, 11 lines checked
ad8d1018d5c7 drm/i915/perf: Add Wa_1608133521:dg2
b1cdf8bc11ea drm/i915/perf: Add Wa_1508761755:dg2
-:31: ERROR:CODE_INDENT: code indent should use tabs where possible
#31: FILE: drivers/gpu/drm/i915/i915_perf.c:2850:
+ ^I/*$

-:31: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#31: FILE: drivers/gpu/drm/i915/i915_perf.c:2850:
+ ^I/*$

-:31: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#31: FILE: drivers/gpu/drm/i915/i915_perf.c:2850:
+ ^I/*$

-:33: ERROR:CODE_INDENT: code indent should use tabs where possible
#33: FILE: drivers/gpu/drm/i915/i915_perf.c:2852:
+ ^I * EU NOA signals behave incorrectly if EU clock gating is enabled.$

-:33: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#33: FILE: drivers/gpu/drm/i915/i915_perf.c:2852:
+ ^I * EU NOA signals behave incorrectly if EU clock gating is enabled.$

-:34: ERROR:CODE_INDENT: code indent should use tabs where possible
#34: FILE: drivers/gpu/drm/i915/i915_perf.c:2853:
+ ^I * Disable thread stall DOP gating and EU DOP gating.$

-:34: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#34: FILE: drivers/gpu/drm/i915/i915_perf.c:2853:
+ ^I * Disable thread stall DOP gating and EU DOP gating.$

-:35: ERROR:CODE_INDENT: code indent should use tabs where possible
#35: FILE: drivers/gpu/drm/i915/i915_perf.c:2854:
+ ^I */$

-:35: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#35: FILE: drivers/gpu/drm/i915/i915_perf.c:2854:
+ ^I */$

-:38: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#38: FILE: drivers/gpu/drm/i915/i915_perf.c:2857:
+		intel_uncore_write(uncore, GEN8_ROW_CHICKEN,
+				_MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));

-:40: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#40: FILE: drivers/gpu/drm/i915/i915_perf.c:2859:
+		intel_uncore_write(uncore, GEN7_ROW_CHICKEN2,
+				_MASKED_BIT_ENABLE(GEN12_DISABLE_DOP_GATING));

-:56: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#56: FILE: drivers/gpu/drm/i915/i915_perf.c:2946:
+		intel_uncore_write(uncore, GEN8_ROW_CHICKEN,
+				_MASKED_BIT_DISABLE(STALL_DOP_GATING_DISABLE));

-:58: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#58: FILE: drivers/gpu/drm/i915/i915_perf.c:2948:
+		intel_uncore_write(uncore, GEN7_ROW_CHICKEN2,
+				_MASKED_BIT_DISABLE(GEN12_DISABLE_DOP_GATING));

total: 4 errors, 5 warnings, 4 checks, 42 lines checked
5c73ed9b4c4a drm/i915/perf: Apply Wa_18013179988
39bef7bd2f84 drm/i915/perf: Save/restore EU flex counters across reset
906505495164 drm/i915/guc: Support OA when Wa_16011777198 is enabled
556ab08afed9 drm/i915/perf: Enable OA for DG2



^ permalink raw reply	[flat|nested] 84+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Add DG2 OA support (rev2)
  2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (21 preceding siblings ...)
  2022-08-23 22:07 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev2) Patchwork
@ 2022-08-23 22:07 ` Patchwork
  22 siblings, 0 replies; 84+ messages in thread
From: Patchwork @ 2022-08-23 22:07 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: Add DG2 OA support (rev2)
URL   : https://patchwork.freedesktop.org/series/107584/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 14/19] drm/i915/perf: Add Wa_1608133521:dg2
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 14/19] drm/i915/perf: Add Wa_1608133521:dg2 Umesh Nerlige Ramappa
@ 2022-08-29 14:04   ` Jani Nikula
  2022-09-16  1:21   ` Dixit, Ashutosh
  1 sibling, 0 replies; 84+ messages in thread
From: Jani Nikula @ 2022-08-29 14:04 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On Tue, 23 Aug 2022, Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> wrote:
> DG2 introduces 64 bit counters and OA reports that have 64 bit values
> for fields in the report header - report_id, timestamp, context_id and
> gpu ticks. i915 uses report_id, timestamp and context_id to check for
> valid reports.
>
> In some DG2 variants, only the lower dwords for timestamp, report_id and
> context_id are accessible. Add workaround for such reports.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index a28f07923d8f..a858ce57e465 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -310,7 +310,7 @@ static u32 i915_oa_max_sample_rate = 100000;
>   * be used as a mask to align the OA tail pointer. In some of the
>   * formats, R is used to denote reserved field.
>   */
> -static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
> +static struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {

Can't do this. This is shared between devices, and once you make it
mutable and change it, you'll change it for *all* devices.

Your options are having const data for all variants you need, or you
make a device specific copy and modify. The former is generally better
if you usually don't need to modify it.

BR,
Jani.


>  	[I915_OA_FORMAT_A13]	    = { 0, 64 },
>  	[I915_OA_FORMAT_A29]	    = { 1, 128 },
>  	[I915_OA_FORMAT_A13_B8_C8]  = { 2, 128 },
> @@ -4746,6 +4746,13 @@ static void oa_init_supported_formats(struct i915_perf *perf)
>  		/* Wa_16010703925:dg2 */
>  		clear_bit(I915_OAR_FORMAT_A36u64_B8_C8, perf->format_mask);
>   	}
> +
> +	if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0) ||
> +	    IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_FOREVER)) {
> +		/* Wa_1608133521:dg2 */
> +		oa_formats[I915_OAR_FORMAT_A36u64_B8_C8].header = HDR_32_BIT;
> +		oa_formats[I915_OA_FORMAT_A38u64_R2u64_B8_C8].header = HDR_32_BIT;
> +	}
>  }
>  
>  static void i915_perf_init_info(struct drm_i915_private *i915)

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
@ 2022-09-06 14:33   ` Lionel Landwerlin
  2022-09-06 17:39     ` Umesh Nerlige Ramappa
  2022-09-09 23:47   ` Dixit, Ashutosh
  1 sibling, 1 reply; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 14:33 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
> With GuC mode of submission, GuC is in control of defining the context id field
> that is part of the OA reports. To filter reports, UMD and KMD must know what sw
> context id was chosen by GuC. There is not interface between KMD and GuC to
> determine this, so read the upper-dword of EXECLIST_STATUS to filter/squash OA
> reports for the specific context.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>


I assume you checked with GuC that this doesn't change as the context is 
running?

With i915/execlist submission mode, we had to ask i915 to pin the 
sw_id/ctx_id.


If that's not the case then filtering is broken.


-Lionel


> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.h |   2 +
>   drivers/gpu/drm/i915/i915_perf.c    | 141 ++++++++++++++++++++++++----
>   2 files changed, 124 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
> index a390f0813c8b..7111bae759f3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
> @@ -110,6 +110,8 @@ enum {
>   #define XEHP_SW_CTX_ID_WIDTH			16
>   #define XEHP_SW_COUNTER_SHIFT			58
>   #define XEHP_SW_COUNTER_WIDTH			6
> +#define GEN12_GUC_SW_CTX_ID_SHIFT		39
> +#define GEN12_GUC_SW_CTX_ID_WIDTH		16
>   
>   static inline void lrc_runtime_start(struct intel_context *ce)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index f3c23fe9ad9c..735244a3aedd 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1233,6 +1233,125 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
>   	return stream->pinned_ctx;
>   }
>   
> +static int
> +__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset)
> +{
> +	u32 *cs, cmd;
> +
> +	cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
> +	if (GRAPHICS_VER(rq->engine->i915) >= 8)
> +		cmd++;
> +
> +	cs = intel_ring_begin(rq, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = cmd;
> +	*cs++ = i915_mmio_reg_offset(reg);
> +	*cs++ = ggtt_offset;
> +	*cs++ = 0;
> +
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +static int
> +__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset)
> +{
> +	struct i915_request *rq;
> +	int err;
> +
> +	rq = i915_request_create(ce);
> +	if (IS_ERR(rq))
> +		return PTR_ERR(rq);
> +
> +	i915_request_get(rq);
> +
> +	err = __store_reg_to_mem(rq, reg, ggtt_offset);
> +
> +	i915_request_add(rq);
> +	if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
> +		err = -ETIME;
> +
> +	i915_request_put(rq);
> +
> +	return err;
> +}
> +
> +static int
> +gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
> +{
> +	struct i915_vma *scratch;
> +	u32 *val;
> +	int err;
> +
> +	scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4);
> +	if (IS_ERR(scratch))
> +		return PTR_ERR(scratch);
> +
> +	err = i915_vma_sync(scratch);
> +	if (err)
> +		goto err_scratch;
> +
> +	err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
> +			 i915_ggtt_offset(scratch));
> +	if (err)
> +		goto err_scratch;
> +
> +	val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB);
> +	if (IS_ERR(val)) {
> +		err = PTR_ERR(val);
> +		goto err_scratch;
> +	}
> +
> +	*ctx_id = *val;
> +	i915_gem_object_unpin_map(scratch->obj);
> +
> +err_scratch:
> +	i915_vma_unpin_and_release(&scratch, 0);
> +	return err;
> +}
> +
> +/*
> + * For execlist mode of submission, pick an unused context id
> + * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
> + * XXX_MAX_CONTEXT_HW_ID is used by idle context
> + *
> + * For GuC mode of submission read context id from the upper dword of the
> + * EXECLIST_STATUS register.
> + */
> +static int gen12_get_render_context_id(struct i915_perf_stream *stream)
> +{
> +	u32 ctx_id, mask;
> +	int ret;
> +
> +	if (intel_engine_uses_guc(stream->engine)) {
> +		ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
> +		if (ret)
> +			return ret;
> +
> +		mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
> +			(GEN12_GUC_SW_CTX_ID_SHIFT - 32);
> +	} else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) {
> +		ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
> +			(XEHP_SW_CTX_ID_SHIFT - 32);
> +
> +		mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
> +			(XEHP_SW_CTX_ID_SHIFT - 32);
> +	} else {
> +		ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
> +			 (GEN11_SW_CTX_ID_SHIFT - 32);
> +
> +		mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
> +			(GEN11_SW_CTX_ID_SHIFT - 32);
> +	}
> +	stream->specific_ctx_id = ctx_id & mask;
> +	stream->specific_ctx_id_mask = mask;
> +
> +	return 0;
> +}
> +
>   /**
>    * oa_get_render_ctx_id - determine and hold ctx hw id
>    * @stream: An i915-perf stream opened for OA metrics
> @@ -1246,6 +1365,7 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
>   static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   {
>   	struct intel_context *ce;
> +	int ret = 0;
>   
>   	ce = oa_pin_context(stream);
>   	if (IS_ERR(ce))
> @@ -1292,24 +1412,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   
>   	case 11:
>   	case 12:
> -		if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 50)) {
> -			stream->specific_ctx_id_mask =
> -				((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
> -				(XEHP_SW_CTX_ID_SHIFT - 32);
> -			stream->specific_ctx_id =
> -				(XEHP_MAX_CONTEXT_HW_ID - 1) <<
> -				(XEHP_SW_CTX_ID_SHIFT - 32);
> -		} else {
> -			stream->specific_ctx_id_mask =
> -				((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
> -			/*
> -			 * Pick an unused context id
> -			 * 0 - BITS_PER_LONG are used by other contexts
> -			 * GEN12_MAX_CONTEXT_HW_ID (0x7ff) is used by idle context
> -			 */
> -			stream->specific_ctx_id =
> -				(GEN12_MAX_CONTEXT_HW_ID - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
> -		}
> +		ret = gen12_get_render_context_id(stream);
>   		break;
>   
>   	default:
> @@ -1323,7 +1426,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   		stream->specific_ctx_id,
>   		stream->specific_ctx_id_mask);
>   
> -	return 0;
> +	return ret;
>   }
>   
>   /**



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-06 14:33   ` Lionel Landwerlin
@ 2022-09-06 17:39     ` Umesh Nerlige Ramappa
  2022-09-06 18:39       ` Lionel Landwerlin
  0 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-06 17:39 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Tue, Sep 06, 2022 at 05:33:00PM +0300, Lionel Landwerlin wrote:
>On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>With GuC mode of submission, GuC is in control of defining the context id field
>>that is part of the OA reports. To filter reports, UMD and KMD must know what sw
>>context id was chosen by GuC. There is not interface between KMD and GuC to
>>determine this, so read the upper-dword of EXECLIST_STATUS to filter/squash OA
>>reports for the specific context.
>>
>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>
>
>I assume you checked with GuC that this doesn't change as the context 
>is running?

Correct.

>
>With i915/execlist submission mode, we had to ask i915 to pin the 
>sw_id/ctx_id.
>

 From GuC perspective, the context id can change once KMD de-registers 
the context and that will not happen while the context is in use.

Thanks,
Umesh

>
>If that's not the case then filtering is broken.
>
>
>-Lionel
>
>
>>---
>>  drivers/gpu/drm/i915/gt/intel_lrc.h |   2 +
>>  drivers/gpu/drm/i915/i915_perf.c    | 141 ++++++++++++++++++++++++----
>>  2 files changed, 124 insertions(+), 19 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
>>index a390f0813c8b..7111bae759f3 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
>>+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
>>@@ -110,6 +110,8 @@ enum {
>>  #define XEHP_SW_CTX_ID_WIDTH			16
>>  #define XEHP_SW_COUNTER_SHIFT			58
>>  #define XEHP_SW_COUNTER_WIDTH			6
>>+#define GEN12_GUC_SW_CTX_ID_SHIFT		39
>>+#define GEN12_GUC_SW_CTX_ID_WIDTH		16
>>  static inline void lrc_runtime_start(struct intel_context *ce)
>>  {
>>diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>index f3c23fe9ad9c..735244a3aedd 100644
>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>@@ -1233,6 +1233,125 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
>>  	return stream->pinned_ctx;
>>  }
>>+static int
>>+__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset)
>>+{
>>+	u32 *cs, cmd;
>>+
>>+	cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
>>+	if (GRAPHICS_VER(rq->engine->i915) >= 8)
>>+		cmd++;
>>+
>>+	cs = intel_ring_begin(rq, 4);
>>+	if (IS_ERR(cs))
>>+		return PTR_ERR(cs);
>>+
>>+	*cs++ = cmd;
>>+	*cs++ = i915_mmio_reg_offset(reg);
>>+	*cs++ = ggtt_offset;
>>+	*cs++ = 0;
>>+
>>+	intel_ring_advance(rq, cs);
>>+
>>+	return 0;
>>+}
>>+
>>+static int
>>+__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset)
>>+{
>>+	struct i915_request *rq;
>>+	int err;
>>+
>>+	rq = i915_request_create(ce);
>>+	if (IS_ERR(rq))
>>+		return PTR_ERR(rq);
>>+
>>+	i915_request_get(rq);
>>+
>>+	err = __store_reg_to_mem(rq, reg, ggtt_offset);
>>+
>>+	i915_request_add(rq);
>>+	if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
>>+		err = -ETIME;
>>+
>>+	i915_request_put(rq);
>>+
>>+	return err;
>>+}
>>+
>>+static int
>>+gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
>>+{
>>+	struct i915_vma *scratch;
>>+	u32 *val;
>>+	int err;
>>+
>>+	scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4);
>>+	if (IS_ERR(scratch))
>>+		return PTR_ERR(scratch);
>>+
>>+	err = i915_vma_sync(scratch);
>>+	if (err)
>>+		goto err_scratch;
>>+
>>+	err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
>>+			 i915_ggtt_offset(scratch));
>>+	if (err)
>>+		goto err_scratch;
>>+
>>+	val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB);
>>+	if (IS_ERR(val)) {
>>+		err = PTR_ERR(val);
>>+		goto err_scratch;
>>+	}
>>+
>>+	*ctx_id = *val;
>>+	i915_gem_object_unpin_map(scratch->obj);
>>+
>>+err_scratch:
>>+	i915_vma_unpin_and_release(&scratch, 0);
>>+	return err;
>>+}
>>+
>>+/*
>>+ * For execlist mode of submission, pick an unused context id
>>+ * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
>>+ * XXX_MAX_CONTEXT_HW_ID is used by idle context
>>+ *
>>+ * For GuC mode of submission read context id from the upper dword of the
>>+ * EXECLIST_STATUS register.
>>+ */
>>+static int gen12_get_render_context_id(struct i915_perf_stream *stream)
>>+{
>>+	u32 ctx_id, mask;
>>+	int ret;
>>+
>>+	if (intel_engine_uses_guc(stream->engine)) {
>>+		ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
>>+		if (ret)
>>+			return ret;
>>+
>>+		mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
>>+			(GEN12_GUC_SW_CTX_ID_SHIFT - 32);
>>+	} else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) {
>>+		ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
>>+			(XEHP_SW_CTX_ID_SHIFT - 32);
>>+
>>+		mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>>+			(XEHP_SW_CTX_ID_SHIFT - 32);
>>+	} else {
>>+		ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
>>+			 (GEN11_SW_CTX_ID_SHIFT - 32);
>>+
>>+		mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
>>+			(GEN11_SW_CTX_ID_SHIFT - 32);
>>+	}
>>+	stream->specific_ctx_id = ctx_id & mask;
>>+	stream->specific_ctx_id_mask = mask;
>>+
>>+	return 0;
>>+}
>>+
>>  /**
>>   * oa_get_render_ctx_id - determine and hold ctx hw id
>>   * @stream: An i915-perf stream opened for OA metrics
>>@@ -1246,6 +1365,7 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
>>  static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>  {
>>  	struct intel_context *ce;
>>+	int ret = 0;
>>  	ce = oa_pin_context(stream);
>>  	if (IS_ERR(ce))
>>@@ -1292,24 +1412,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>  	case 11:
>>  	case 12:
>>-		if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 50)) {
>>-			stream->specific_ctx_id_mask =
>>-				((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>>-				(XEHP_SW_CTX_ID_SHIFT - 32);
>>-			stream->specific_ctx_id =
>>-				(XEHP_MAX_CONTEXT_HW_ID - 1) <<
>>-				(XEHP_SW_CTX_ID_SHIFT - 32);
>>-		} else {
>>-			stream->specific_ctx_id_mask =
>>-				((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
>>-			/*
>>-			 * Pick an unused context id
>>-			 * 0 - BITS_PER_LONG are used by other contexts
>>-			 * GEN12_MAX_CONTEXT_HW_ID (0x7ff) is used by idle context
>>-			 */
>>-			stream->specific_ctx_id =
>>-				(GEN12_MAX_CONTEXT_HW_ID - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
>>-		}
>>+		ret = gen12_get_render_context_id(stream);
>>  		break;
>>  	default:
>>@@ -1323,7 +1426,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>  		stream->specific_ctx_id,
>>  		stream->specific_ctx_id_mask);
>>-	return 0;
>>+	return ret;
>>  }
>>  /**
>
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-06 17:39     ` Umesh Nerlige Ramappa
@ 2022-09-06 18:39       ` Lionel Landwerlin
  2022-09-14 22:26         ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 18:39 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On 06/09/2022 20:39, Umesh Nerlige Ramappa wrote:
> On Tue, Sep 06, 2022 at 05:33:00PM +0300, Lionel Landwerlin wrote:
>> On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>> With GuC mode of submission, GuC is in control of defining the 
>>> context id field
>>> that is part of the OA reports. To filter reports, UMD and KMD must 
>>> know what sw
>>> context id was chosen by GuC. There is not interface between KMD and 
>>> GuC to
>>> determine this, so read the upper-dword of EXECLIST_STATUS to 
>>> filter/squash OA
>>> reports for the specific context.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>
>>
>> I assume you checked with GuC that this doesn't change as the context 
>> is running?
>
> Correct.
>
>>
>> With i915/execlist submission mode, we had to ask i915 to pin the 
>> sw_id/ctx_id.
>>
>
> From GuC perspective, the context id can change once KMD de-registers 
> the context and that will not happen while the context is in use.
>
> Thanks,
> Umesh


Thanks Umesh,


Maybe I should have been more precise in my question :


Can the ID change while the i915-perf stream is opened?

Because the ID not changing while the context is running makes sense.

But since the number of available IDs is limited to 2k or something on 
Gfx12, it's possible the GuC has to reuse IDs if too many apps want to 
run during the period of time while i915-perf is active and filtering.


-Lionel


>
>>
>> If that's not the case then filtering is broken.
>>
>>
>> -Lionel
>>
>>
>>> ---
>>>  drivers/gpu/drm/i915/gt/intel_lrc.h |   2 +
>>>  drivers/gpu/drm/i915/i915_perf.c    | 141 ++++++++++++++++++++++++----
>>>  2 files changed, 124 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h 
>>> b/drivers/gpu/drm/i915/gt/intel_lrc.h
>>> index a390f0813c8b..7111bae759f3 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
>>> @@ -110,6 +110,8 @@ enum {
>>>  #define XEHP_SW_CTX_ID_WIDTH            16
>>>  #define XEHP_SW_COUNTER_SHIFT            58
>>>  #define XEHP_SW_COUNTER_WIDTH            6
>>> +#define GEN12_GUC_SW_CTX_ID_SHIFT        39
>>> +#define GEN12_GUC_SW_CTX_ID_WIDTH        16
>>>  static inline void lrc_runtime_start(struct intel_context *ce)
>>>  {
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>> b/drivers/gpu/drm/i915/i915_perf.c
>>> index f3c23fe9ad9c..735244a3aedd 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -1233,6 +1233,125 @@ static struct intel_context 
>>> *oa_pin_context(struct i915_perf_stream *stream)
>>>      return stream->pinned_ctx;
>>>  }
>>> +static int
>>> +__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 
>>> ggtt_offset)
>>> +{
>>> +    u32 *cs, cmd;
>>> +
>>> +    cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
>>> +    if (GRAPHICS_VER(rq->engine->i915) >= 8)
>>> +        cmd++;
>>> +
>>> +    cs = intel_ring_begin(rq, 4);
>>> +    if (IS_ERR(cs))
>>> +        return PTR_ERR(cs);
>>> +
>>> +    *cs++ = cmd;
>>> +    *cs++ = i915_mmio_reg_offset(reg);
>>> +    *cs++ = ggtt_offset;
>>> +    *cs++ = 0;
>>> +
>>> +    intel_ring_advance(rq, cs);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset)
>>> +{
>>> +    struct i915_request *rq;
>>> +    int err;
>>> +
>>> +    rq = i915_request_create(ce);
>>> +    if (IS_ERR(rq))
>>> +        return PTR_ERR(rq);
>>> +
>>> +    i915_request_get(rq);
>>> +
>>> +    err = __store_reg_to_mem(rq, reg, ggtt_offset);
>>> +
>>> +    i915_request_add(rq);
>>> +    if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
>>> +        err = -ETIME;
>>> +
>>> +    i915_request_put(rq);
>>> +
>>> +    return err;
>>> +}
>>> +
>>> +static int
>>> +gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
>>> +{
>>> +    struct i915_vma *scratch;
>>> +    u32 *val;
>>> +    int err;
>>> +
>>> +    scratch = 
>>> __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4);
>>> +    if (IS_ERR(scratch))
>>> +        return PTR_ERR(scratch);
>>> +
>>> +    err = i915_vma_sync(scratch);
>>> +    if (err)
>>> +        goto err_scratch;
>>> +
>>> +    err = __read_reg(ce, 
>>> RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
>>> +             i915_ggtt_offset(scratch));
>>> +    if (err)
>>> +        goto err_scratch;
>>> +
>>> +    val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB);
>>> +    if (IS_ERR(val)) {
>>> +        err = PTR_ERR(val);
>>> +        goto err_scratch;
>>> +    }
>>> +
>>> +    *ctx_id = *val;
>>> +    i915_gem_object_unpin_map(scratch->obj);
>>> +
>>> +err_scratch:
>>> +    i915_vma_unpin_and_release(&scratch, 0);
>>> +    return err;
>>> +}
>>> +
>>> +/*
>>> + * For execlist mode of submission, pick an unused context id
>>> + * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
>>> + * XXX_MAX_CONTEXT_HW_ID is used by idle context
>>> + *
>>> + * For GuC mode of submission read context id from the upper dword 
>>> of the
>>> + * EXECLIST_STATUS register.
>>> + */
>>> +static int gen12_get_render_context_id(struct i915_perf_stream 
>>> *stream)
>>> +{
>>> +    u32 ctx_id, mask;
>>> +    int ret;
>>> +
>>> +    if (intel_engine_uses_guc(stream->engine)) {
>>> +        ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
>>> +        if (ret)
>>> +            return ret;
>>> +
>>> +        mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
>>> +            (GEN12_GUC_SW_CTX_ID_SHIFT - 32);
>>> +    } else if (GRAPHICS_VER_FULL(stream->engine->i915) >= 
>>> IP_VER(12, 50)) {
>>> +        ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
>>> +            (XEHP_SW_CTX_ID_SHIFT - 32);
>>> +
>>> +        mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>>> +            (XEHP_SW_CTX_ID_SHIFT - 32);
>>> +    } else {
>>> +        ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
>>> +             (GEN11_SW_CTX_ID_SHIFT - 32);
>>> +
>>> +        mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
>>> +            (GEN11_SW_CTX_ID_SHIFT - 32);
>>> +    }
>>> +    stream->specific_ctx_id = ctx_id & mask;
>>> +    stream->specific_ctx_id_mask = mask;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>  /**
>>>   * oa_get_render_ctx_id - determine and hold ctx hw id
>>>   * @stream: An i915-perf stream opened for OA metrics
>>> @@ -1246,6 +1365,7 @@ static struct intel_context 
>>> *oa_pin_context(struct i915_perf_stream *stream)
>>>  static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>>  {
>>>      struct intel_context *ce;
>>> +    int ret = 0;
>>>      ce = oa_pin_context(stream);
>>>      if (IS_ERR(ce))
>>> @@ -1292,24 +1412,7 @@ static int oa_get_render_ctx_id(struct 
>>> i915_perf_stream *stream)
>>>      case 11:
>>>      case 12:
>>> -        if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 50)) {
>>> -            stream->specific_ctx_id_mask =
>>> -                ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>>> -                (XEHP_SW_CTX_ID_SHIFT - 32);
>>> -            stream->specific_ctx_id =
>>> -                (XEHP_MAX_CONTEXT_HW_ID - 1) <<
>>> -                (XEHP_SW_CTX_ID_SHIFT - 32);
>>> -        } else {
>>> -            stream->specific_ctx_id_mask =
>>> -                ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << 
>>> (GEN11_SW_CTX_ID_SHIFT - 32);
>>> -            /*
>>> -             * Pick an unused context id
>>> -             * 0 - BITS_PER_LONG are used by other contexts
>>> -             * GEN12_MAX_CONTEXT_HW_ID (0x7ff) is used by idle context
>>> -             */
>>> -            stream->specific_ctx_id =
>>> -                (GEN12_MAX_CONTEXT_HW_ID - 1) << 
>>> (GEN11_SW_CTX_ID_SHIFT - 32);
>>> -        }
>>> +        ret = gen12_get_render_context_id(stream);
>>>          break;
>>>      default:
>>> @@ -1323,7 +1426,7 @@ static int oa_get_render_ctx_id(struct 
>>> i915_perf_stream *stream)
>>>          stream->specific_ctx_id,
>>>          stream->specific_ctx_id_mask);
>>> -    return 0;
>>> +    return ret;
>>>  }
>>>  /**
>>
>>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2 Umesh Nerlige Ramappa
@ 2022-09-06 19:35   ` Lionel Landwerlin
  2022-09-06 19:46     ` Umesh Nerlige Ramappa
  2022-09-13 15:40   ` Dixit, Ashutosh
  1 sibling, 1 reply; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 19:35 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
> Add new OA formats for DG2. Some of the newer OA formats are not
> multples of 64 bytes and are not powers of 2. For those formats, adjust
> hw_tail accordingly when checking for new reports.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramampa@intel.com>

Apart from the coding style issue :


Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>


> ---
>   drivers/gpu/drm/i915/i915_perf.c | 63 ++++++++++++++++++++------------
>   include/uapi/drm/i915_drm.h      |  6 +++
>   2 files changed, 46 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 735244a3aedd..c8331b549d31 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -306,7 +306,8 @@ static u32 i915_oa_max_sample_rate = 100000;
>   
>   /* XXX: beware if future OA HW adds new report formats that the current
>    * code assumes all reports have a power-of-two size and ~(size - 1) can
> - * be used as a mask to align the OA tail pointer.
> + * be used as a mask to align the OA tail pointer. In some of the
> + * formats, R is used to denote reserved field.
>    */
>   static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>   	[I915_OA_FORMAT_A13]	    = { 0, 64 },
> @@ -320,6 +321,10 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>   	[I915_OA_FORMAT_A12]		    = { 0, 64 },
>   	[I915_OA_FORMAT_A12_B8_C8]	    = { 2, 128 },
>   	[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
> +	[I915_OAR_FORMAT_A32u40_A4u32_B8_C8]    = { 5, 256 },
> +	[I915_OA_FORMAT_A24u40_A14u32_B8_C8]    = { 5, 256 },
> +	[I915_OAR_FORMAT_A36u64_B8_C8]		= { 1, 384 },
> +	[I915_OA_FORMAT_A38u64_R2u64_B8_C8]	= { 1, 448 },
>   };
>   
>   #define SAMPLE_OA_REPORT      (1<<0)
> @@ -467,6 +472,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   	bool pollin;
>   	u32 hw_tail;
>   	u64 now;
> +	u32 partial_report_size;
>   
>   	/* We have to consider the (unlikely) possibility that read() errors
>   	 * could result in an OA buffer reset which might reset the head and
> @@ -476,10 +482,16 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   
>   	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
>   
> -	/* The tail pointer increases in 64 byte increments,
> -	 * not in report_size steps...
> +	/* The tail pointer increases in 64 byte increments, whereas report
> +	 * sizes need not be integral multiples or 64 or powers of 2.
> +	 * Compute potentially partially landed report in the OA buffer
>   	 */
> -	hw_tail &= ~(report_size - 1);
> +	partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail);
> +	partial_report_size %= report_size;
> +
> +	/* Subtract partial amount off the tail */
> +	hw_tail = gtt_offset + ((hw_tail - partial_report_size) &
> +				(stream->oa_buffer.vma->size - 1));
>   
>   	now = ktime_get_mono_fast_ns();
>   
> @@ -601,6 +613,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   {
>   	int report_size = stream->oa_buffer.format_size;
>   	struct drm_i915_perf_record_header header;
> +	int report_size_partial;
> +	u8 *oa_buf_end;
>   
>   	header.type = DRM_I915_PERF_RECORD_SAMPLE;
>   	header.pad = 0;
> @@ -614,7 +628,19 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   		return -EFAULT;
>   	buf += sizeof(header);
>   
> -	if (copy_to_user(buf, report, report_size))
> +	oa_buf_end = stream->oa_buffer.vaddr +
> +		     stream->oa_buffer.vma->size;
> +	report_size_partial = oa_buf_end - report;
> +
> +	if (report_size_partial < report_size) {
> +		if(copy_to_user(buf, report, report_size_partial))
> +			return -EFAULT;
> +		buf += report_size_partial;
> +
> +		if(copy_to_user(buf, stream->oa_buffer.vaddr,
> +				report_size - report_size_partial))
> +			return -EFAULT;

I think the coding style requires you to use if () not if()


Just a suggestion : you could make this code deal with the partial bit 
as the main bit of the function :


oa_buf_end = stream->oa_buffer.vaddr +
	     stream->oa_buffer.vma->size;

report_size_partial = oa_buf_end - report;

if (copy_to_user(buf, report, report_size_partial))
	return -EFAULT;
buf += report_size_partial;

if (report_size_partial < report_size &&
     copy_to_user(buf, stream->oa_buffer.vaddr,
		report_size - report_size_partial))
	return -EFAULT;
buf += report_size - report_size_partial;


> +	} else if (copy_to_user(buf, report, report_size))
>   		return -EFAULT;
>   
>   	(*offset) += header.size;
> @@ -684,8 +710,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   	 * all a power of two).
>   	 */
>   	if (drm_WARN_ONCE(&uncore->i915->drm,
> -			  head > OA_BUFFER_SIZE || head % report_size ||
> -			  tail > OA_BUFFER_SIZE || tail % report_size,
> +			  head > stream->oa_buffer.vma->size ||
> +			  tail > stream->oa_buffer.vma->size,
>   			  "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
>   			  head, tail))
>   		return -EIO;
> @@ -699,22 +725,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		u32 ctx_id;
>   		u32 reason;
>   
> -		/*
> -		 * All the report sizes factor neatly into the buffer
> -		 * size so we never expect to see a report split
> -		 * between the beginning and end of the buffer.
> -		 *
> -		 * Given the initial alignment check a misalignment
> -		 * here would imply a driver bug that would result
> -		 * in an overrun.
> -		 */
> -		if (drm_WARN_ON(&uncore->i915->drm,
> -				(OA_BUFFER_SIZE - head) < report_size)) {
> -			drm_err(&uncore->i915->drm,
> -				"Spurious OA head ptr: non-integral report offset\n");
> -			break;
> -		}
> -
>   		/*
>   		 * The reason field includes flags identifying what
>   		 * triggered this specific report (mostly timer
> @@ -4513,6 +4523,13 @@ static void oa_init_supported_formats(struct i915_perf *perf)
>   		oa_format_add(perf, I915_OA_FORMAT_C4_B8);
>   		break;
>   
> +	case INTEL_DG2:
> +		oa_format_add(perf, I915_OAR_FORMAT_A32u40_A4u32_B8_C8);
> +		oa_format_add(perf, I915_OA_FORMAT_A24u40_A14u32_B8_C8);
> +		oa_format_add(perf, I915_OAR_FORMAT_A36u64_B8_C8);
> +		oa_format_add(perf, I915_OA_FORMAT_A38u64_R2u64_B8_C8);
> +		break;
> +
>   	default:
>   		MISSING_CASE(platform);
>   	}
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 520ad2691a99..d20d723925b5 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2650,6 +2650,12 @@ enum drm_i915_oa_format {
>   	I915_OA_FORMAT_A12_B8_C8,
>   	I915_OA_FORMAT_A32u40_A4u32_B8_C8,
>   
> +	/* DG2 */
> +	I915_OAR_FORMAT_A32u40_A4u32_B8_C8,
> +	I915_OA_FORMAT_A24u40_A14u32_B8_C8,
> +	I915_OAR_FORMAT_A36u64_B8_C8,
> +	I915_OA_FORMAT_A38u64_R2u64_B8_C8,
> +
>   	I915_OA_FORMAT_MAX	    /* non-ABI */
>   };
>   



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
  2022-09-06 19:35   ` Lionel Landwerlin
@ 2022-09-06 19:46     ` Umesh Nerlige Ramappa
  2022-09-06 19:59       ` Lionel Landwerlin
  0 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-06 19:46 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Tue, Sep 06, 2022 at 10:35:16PM +0300, Lionel Landwerlin wrote:
>On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>Add new OA formats for DG2. Some of the newer OA formats are not
>>multples of 64 bytes and are not powers of 2. For those formats, adjust
>>hw_tail accordingly when checking for new reports.
>>
>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramampa@intel.com>
>
>Apart from the coding style issue :
>
>
>Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
>
>>---
>>  drivers/gpu/drm/i915/i915_perf.c | 63 ++++++++++++++++++++------------
>>  include/uapi/drm/i915_drm.h      |  6 +++
>>  2 files changed, 46 insertions(+), 23 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>index 735244a3aedd..c8331b549d31 100644
>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>@@ -306,7 +306,8 @@ static u32 i915_oa_max_sample_rate = 100000;
>>  /* XXX: beware if future OA HW adds new report formats that the current
>>   * code assumes all reports have a power-of-two size and ~(size - 1) can
>>- * be used as a mask to align the OA tail pointer.
>>+ * be used as a mask to align the OA tail pointer. In some of the
>>+ * formats, R is used to denote reserved field.
>>   */
>>  static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>>  	[I915_OA_FORMAT_A13]	    = { 0, 64 },
>>@@ -320,6 +321,10 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>>  	[I915_OA_FORMAT_A12]		    = { 0, 64 },
>>  	[I915_OA_FORMAT_A12_B8_C8]	    = { 2, 128 },
>>  	[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
>>+	[I915_OAR_FORMAT_A32u40_A4u32_B8_C8]    = { 5, 256 },
>>+	[I915_OA_FORMAT_A24u40_A14u32_B8_C8]    = { 5, 256 },
>>+	[I915_OAR_FORMAT_A36u64_B8_C8]		= { 1, 384 },
>>+	[I915_OA_FORMAT_A38u64_R2u64_B8_C8]	= { 1, 448 },
>>  };
>>  #define SAMPLE_OA_REPORT      (1<<0)
>>@@ -467,6 +472,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>  	bool pollin;
>>  	u32 hw_tail;
>>  	u64 now;
>>+	u32 partial_report_size;
>>  	/* We have to consider the (unlikely) possibility that read() errors
>>  	 * could result in an OA buffer reset which might reset the head and
>>@@ -476,10 +482,16 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>  	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
>>-	/* The tail pointer increases in 64 byte increments,
>>-	 * not in report_size steps...
>>+	/* The tail pointer increases in 64 byte increments, whereas report
>>+	 * sizes need not be integral multiples or 64 or powers of 2.
>>+	 * Compute potentially partially landed report in the OA buffer
>>  	 */
>>-	hw_tail &= ~(report_size - 1);
>>+	partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail);
>>+	partial_report_size %= report_size;
>>+
>>+	/* Subtract partial amount off the tail */
>>+	hw_tail = gtt_offset + ((hw_tail - partial_report_size) &
>>+				(stream->oa_buffer.vma->size - 1));
>>  	now = ktime_get_mono_fast_ns();
>>@@ -601,6 +613,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>>  {
>>  	int report_size = stream->oa_buffer.format_size;
>>  	struct drm_i915_perf_record_header header;
>>+	int report_size_partial;
>>+	u8 *oa_buf_end;
>>  	header.type = DRM_I915_PERF_RECORD_SAMPLE;
>>  	header.pad = 0;
>>@@ -614,7 +628,19 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>>  		return -EFAULT;
>>  	buf += sizeof(header);
>>-	if (copy_to_user(buf, report, report_size))
>>+	oa_buf_end = stream->oa_buffer.vaddr +
>>+		     stream->oa_buffer.vma->size;
>>+	report_size_partial = oa_buf_end - report;
>>+
>>+	if (report_size_partial < report_size) {
>>+		if(copy_to_user(buf, report, report_size_partial))
>>+			return -EFAULT;
>>+		buf += report_size_partial;
>>+
>>+		if(copy_to_user(buf, stream->oa_buffer.vaddr,
>>+				report_size - report_size_partial))
>>+			return -EFAULT;
>
>I think the coding style requires you to use if () not if()
>

Will fix.

>
>Just a suggestion : you could make this code deal with the partial bit 
>as the main bit of the function :
>
>
>oa_buf_end = stream->oa_buffer.vaddr +
>	     stream->oa_buffer.vma->size;
>
>report_size_partial = oa_buf_end - report;
>
>if (copy_to_user(buf, report, report_size_partial))
>	return -EFAULT;
>buf += report_size_partial;

This ^ may not work because append_oa_sample is appending exactly one 
report to the user buffer, whereas the above may append more than one.

Thanks,
Umesh

>
>if (report_size_partial < report_size &&
>    copy_to_user(buf, stream->oa_buffer.vaddr,
>		report_size - report_size_partial))
>	return -EFAULT;
>buf += report_size - report_size_partial;
>
>
>>+	} else if (copy_to_user(buf, report, report_size))
>>  		return -EFAULT;
>>  	(*offset) += header.size;
>>@@ -684,8 +710,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>>  	 * all a power of two).
>>  	 */
>>  	if (drm_WARN_ONCE(&uncore->i915->drm,
>>-			  head > OA_BUFFER_SIZE || head % report_size ||
>>-			  tail > OA_BUFFER_SIZE || tail % report_size,
>>+			  head > stream->oa_buffer.vma->size ||
>>+			  tail > stream->oa_buffer.vma->size,
>>  			  "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
>>  			  head, tail))
>>  		return -EIO;
>>@@ -699,22 +725,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>>  		u32 ctx_id;
>>  		u32 reason;
>>-		/*
>>-		 * All the report sizes factor neatly into the buffer
>>-		 * size so we never expect to see a report split
>>-		 * between the beginning and end of the buffer.
>>-		 *
>>-		 * Given the initial alignment check a misalignment
>>-		 * here would imply a driver bug that would result
>>-		 * in an overrun.
>>-		 */
>>-		if (drm_WARN_ON(&uncore->i915->drm,
>>-				(OA_BUFFER_SIZE - head) < report_size)) {
>>-			drm_err(&uncore->i915->drm,
>>-				"Spurious OA head ptr: non-integral report offset\n");
>>-			break;
>>-		}
>>-
>>  		/*
>>  		 * The reason field includes flags identifying what
>>  		 * triggered this specific report (mostly timer
>>@@ -4513,6 +4523,13 @@ static void oa_init_supported_formats(struct i915_perf *perf)
>>  		oa_format_add(perf, I915_OA_FORMAT_C4_B8);
>>  		break;
>>+	case INTEL_DG2:
>>+		oa_format_add(perf, I915_OAR_FORMAT_A32u40_A4u32_B8_C8);
>>+		oa_format_add(perf, I915_OA_FORMAT_A24u40_A14u32_B8_C8);
>>+		oa_format_add(perf, I915_OAR_FORMAT_A36u64_B8_C8);
>>+		oa_format_add(perf, I915_OA_FORMAT_A38u64_R2u64_B8_C8);
>>+		break;
>>+
>>  	default:
>>  		MISSING_CASE(platform);
>>  	}
>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>index 520ad2691a99..d20d723925b5 100644
>>--- a/include/uapi/drm/i915_drm.h
>>+++ b/include/uapi/drm/i915_drm.h
>>@@ -2650,6 +2650,12 @@ enum drm_i915_oa_format {
>>  	I915_OA_FORMAT_A12_B8_C8,
>>  	I915_OA_FORMAT_A32u40_A4u32_B8_C8,
>>+	/* DG2 */
>>+	I915_OAR_FORMAT_A32u40_A4u32_B8_C8,
>>+	I915_OA_FORMAT_A24u40_A14u32_B8_C8,
>>+	I915_OAR_FORMAT_A36u64_B8_C8,
>>+	I915_OA_FORMAT_A38u64_R2u64_B8_C8,
>>+
>>  	I915_OA_FORMAT_MAX	    /* non-ABI */
>>  };
>
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
@ 2022-09-06 19:48   ` Lionel Landwerlin
  2022-09-06 20:35     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 19:48 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
> Some SKUs of same gen12 platform may have different oactxctrl
> offsets. For gen12, determine oactxctrl offsets at runtime.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c         | 149 ++++++++++++++++++-----
>   drivers/gpu/drm/i915/i915_perf_oa_regs.h |   2 +-
>   2 files changed, 120 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 3526693d64fa..efa7eda83edd 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1363,6 +1363,67 @@ static int gen12_get_render_context_id(struct i915_perf_stream *stream)
>   	return 0;
>   }
>   
> +#define MI_OPCODE(x) (((x) >> 23) & 0x3f)
> +#define IS_MI_LRI_CMD(x) (MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 0)))
> +#define MI_LRI_LEN(x) (((x) & 0xff) + 1)


Maybe you want to put this in intel_gpu_commands.h


> +#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX)
> +static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset)
> +{
> +	u32 idx = *offset;
> +	u32 len = MI_LRI_LEN(state[idx]) + idx;
> +
> +	idx++;
> +	for (; idx < len; idx += 2)
> +		if (state[idx] == reg)
> +			break;
> +
> +	*offset = idx;
> +	return state[idx] == reg;
> +}
> +
> +static u32 __context_image_offset(struct intel_context *ce, u32 reg)
> +{
> +	u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
> +	u32 *state = ce->lrc_reg_state;
> +
> +	for (offset = 0; offset < len; ) {
> +		if (IS_MI_LRI_CMD(state[offset])) {

I'm a bit concerned you might find other matches with this.

Because let's say you run into a 3DSTATE_SUBSLICE_HASH_TABLE 
instruction, you'll iterate the instruction dword by dword because you 
don't know how to read its length and skip to the next one.

Now some of the fields can be programmed from userspace to look like an 
MI_LRI header, so you start to read data in the wrong way.


Unfortunately I don't have a better solution. My only ask is that you 
make __find_reg_in_lri() take the context image size in parameter so it 
NEVER goes over the the context image.


To limit the risk you should run this function only one at driver 
initialization and store the found offset.


Thanks,


-Lionel


> +			if (__find_reg_in_lri(state, reg, &offset))
> +				break;
> +		} else {
> +			offset++;
> +		}
> +	}
> +
> +	return offset < len ? offset : U32_MAX;
> +}
> +
> +static int __set_oa_ctx_ctrl_offset(struct intel_context *ce)
> +{
> +	i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
> +	struct i915_perf *perf = &ce->engine->i915->perf;
> +	u32 saved_offset = perf->ctx_oactxctrl_offset;
> +	u32 offset;
> +
> +	/* Do this only once. Failure is stored as offset of U32_MAX */
> +	if (saved_offset)
> +		return 0;
> +
> +	offset = __context_image_offset(ce, i915_mmio_reg_offset(reg));
> +	perf->ctx_oactxctrl_offset = offset;
> +
> +	drm_dbg(&ce->engine->i915->drm,
> +		"%s oa ctx control at 0x%08x dword offset\n",
> +		ce->engine->name, offset);
> +
> +	return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
> +}
> +
> +static bool engine_supports_mi_query(struct intel_engine_cs *engine)
> +{
> +	return engine->class == RENDER_CLASS;
> +}
> +
>   /**
>    * oa_get_render_ctx_id - determine and hold ctx hw id
>    * @stream: An i915-perf stream opened for OA metrics
> @@ -1382,6 +1443,17 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   	if (IS_ERR(ce))
>   		return PTR_ERR(ce);
>   
> +	if (engine_supports_mi_query(stream->engine)) {
> +		ret = __set_oa_ctx_ctrl_offset(ce);
> +		if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) {
> +			intel_context_unpin(ce);
> +			drm_err(&stream->perf->i915->drm,
> +				"Enabling perf query failed for %s\n",
> +				stream->engine->name);
> +			return ret;
> +		}
> +	}
> +
>   	switch (GRAPHICS_VER(ce->engine->i915)) {
>   	case 7: {
>   		/*
> @@ -2412,10 +2484,11 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
>   	int err;
>   	struct intel_context *ce = stream->pinned_ctx;
>   	u32 format = stream->oa_buffer.format;
> +	u32 offset = stream->perf->ctx_oactxctrl_offset;
>   	struct flex regs_context[] = {
>   		{
>   			GEN8_OACTXCONTROL,
> -			stream->perf->ctx_oactxctrl_offset + 1,
> +			offset + 1,
>   			active ? GEN8_OA_COUNTER_RESUME : 0,
>   		},
>   	};
> @@ -2440,15 +2513,18 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
>   		},
>   	};
>   
> -	/* Modify the context image of pinned context with regs_context*/
> -	err = intel_context_lock_pinned(ce);
> -	if (err)
> -		return err;
> +	/* Modify the context image of pinned context with regs_context */
> +	if (__valid_oactxctrl_offset(offset)) {
> +		err = intel_context_lock_pinned(ce);
> +		if (err)
> +			return err;
>   
> -	err = gen8_modify_context(ce, regs_context, ARRAY_SIZE(regs_context));
> -	intel_context_unlock_pinned(ce);
> -	if (err)
> -		return err;
> +		err = gen8_modify_context(ce, regs_context,
> +					  ARRAY_SIZE(regs_context));
> +		intel_context_unlock_pinned(ce);
> +		if (err)
> +			return err;
> +	}
>   
>   	/* Apply regs_lri using LRI with pinned context */
>   	return gen8_modify_self(ce, regs_lri, ARRAY_SIZE(regs_lri), active);
> @@ -2570,6 +2646,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
>   			   const struct i915_oa_config *oa_config,
>   			   struct i915_active *active)
>   {
> +	u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset;
>   	/* The MMIO offsets for Flex EU registers aren't contiguous */
>   	const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset;
>   #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1)
> @@ -2580,7 +2657,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
>   		},
>   		{
>   			GEN8_OACTXCONTROL,
> -			stream->perf->ctx_oactxctrl_offset + 1,
> +			ctx_oactxctrl + 1,
>   		},
>   		{ EU_PERF_CNTL0, ctx_flexeuN(0) },
>   		{ EU_PERF_CNTL1, ctx_flexeuN(1) },
> @@ -4551,6 +4628,37 @@ static void oa_init_supported_formats(struct i915_perf *perf)
>   	}
>   }
>   
> +static void i915_perf_init_info(struct drm_i915_private *i915)
> +{
> +	struct i915_perf *perf = &i915->perf;
> +
> +	switch (GRAPHICS_VER(i915)) {
> +	case 8:
> +		perf->ctx_oactxctrl_offset = 0x120;
> +		perf->ctx_flexeu0_offset = 0x2ce;
> +		perf->gen8_valid_ctx_bit = BIT(25);
> +		break;
> +	case 9:
> +		perf->ctx_oactxctrl_offset = 0x128;
> +		perf->ctx_flexeu0_offset = 0x3de;
> +		perf->gen8_valid_ctx_bit = BIT(16);
> +		break;
> +	case 11:
> +		perf->ctx_oactxctrl_offset = 0x124;
> +		perf->ctx_flexeu0_offset = 0x78e;
> +		perf->gen8_valid_ctx_bit = BIT(16);
> +		break;
> +	case 12:
> +		/*
> +		 * Calculate offset at runtime in oa_pin_context for gen12 and
> +		 * cache the value in perf->ctx_oactxctrl_offset.
> +		 */
> +		break;
> +	default:
> +		MISSING_CASE(GRAPHICS_VER(i915));
> +	}
> +}
> +
>   /**
>    * i915_perf_init - initialize i915-perf state on module bind
>    * @i915: i915 device instance
> @@ -4589,6 +4697,7 @@ void i915_perf_init(struct drm_i915_private *i915)
>   		 * execlist mode by default.
>   		 */
>   		perf->ops.read = gen8_oa_read;
> +		i915_perf_init_info(i915);
>   
>   		if (IS_GRAPHICS_VER(i915, 8, 9)) {
>   			perf->ops.is_valid_b_counter_reg =
> @@ -4608,18 +4717,6 @@ void i915_perf_init(struct drm_i915_private *i915)
>   			perf->ops.enable_metric_set = gen8_enable_metric_set;
>   			perf->ops.disable_metric_set = gen8_disable_metric_set;
>   			perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
> -
> -			if (GRAPHICS_VER(i915) == 8) {
> -				perf->ctx_oactxctrl_offset = 0x120;
> -				perf->ctx_flexeu0_offset = 0x2ce;
> -
> -				perf->gen8_valid_ctx_bit = BIT(25);
> -			} else {
> -				perf->ctx_oactxctrl_offset = 0x128;
> -				perf->ctx_flexeu0_offset = 0x3de;
> -
> -				perf->gen8_valid_ctx_bit = BIT(16);
> -			}
>   		} else if (GRAPHICS_VER(i915) == 11) {
>   			perf->ops.is_valid_b_counter_reg =
>   				gen7_is_valid_b_counter_addr;
> @@ -4633,11 +4730,6 @@ void i915_perf_init(struct drm_i915_private *i915)
>   			perf->ops.enable_metric_set = gen8_enable_metric_set;
>   			perf->ops.disable_metric_set = gen11_disable_metric_set;
>   			perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
> -
> -			perf->ctx_oactxctrl_offset = 0x124;
> -			perf->ctx_flexeu0_offset = 0x78e;
> -
> -			perf->gen8_valid_ctx_bit = BIT(16);
>   		} else if (GRAPHICS_VER(i915) == 12) {
>   			perf->ops.is_valid_b_counter_reg =
>   				gen12_is_valid_b_counter_addr;
> @@ -4651,9 +4743,6 @@ void i915_perf_init(struct drm_i915_private *i915)
>   			perf->ops.enable_metric_set = gen12_enable_metric_set;
>   			perf->ops.disable_metric_set = gen12_disable_metric_set;
>   			perf->ops.oa_hw_tail_read = gen12_oa_hw_tail_read;
> -
> -			perf->ctx_flexeu0_offset = 0;
> -			perf->ctx_oactxctrl_offset = 0x144;
>   		}
>   	}
>   
> diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
> index f31c9f13a9fc..0ef3562ff4aa 100644
> --- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
> +++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
> @@ -97,7 +97,7 @@
>   #define  GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT 1
>   #define  GEN12_OAR_OACONTROL_COUNTER_ENABLE       (1 << 0)
>   
> -#define GEN12_OACTXCONTROL _MMIO(0x2360)
> +#define GEN12_OACTXCONTROL(base) _MMIO((base) + 0x360)
>   #define GEN12_OAR_OASTATUS _MMIO(0x2968)
>   
>   /* Gen12 OAG unit */



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 05/19] drm/i915/perf: Enable commands per clock reporting in OA
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 05/19] drm/i915/perf: Enable commands per clock reporting in OA Umesh Nerlige Ramappa
@ 2022-09-06 19:51   ` Lionel Landwerlin
  2022-09-14  0:19   ` Dixit, Ashutosh
  1 sibling, 0 replies; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 19:51 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
> XEHPSDV and DG2 provide a way to configure bytes per clock vs commands
> per clock reporting. Enable command per clock setting on enabling OA.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h          |  3 +++
>   drivers/gpu/drm/i915/i915_pci.c          |  1 +
>   drivers/gpu/drm/i915/i915_perf.c         | 20 ++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_perf_oa_regs.h |  4 ++++
>   drivers/gpu/drm/i915/intel_device_info.h |  1 +
>   5 files changed, 29 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index b4733c5a01da..b2e8a44bd976 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1287,6 +1287,9 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
>   #define HAS_RUNTIME_PM(dev_priv) (INTEL_INFO(dev_priv)->has_runtime_pm)
>   #define HAS_64BIT_RELOC(dev_priv) (INTEL_INFO(dev_priv)->has_64bit_reloc)
>   
> +#define HAS_OA_BPC_REPORTING(dev_priv) \
> +	(INTEL_INFO(dev_priv)->has_oa_bpc_reporting)
> +
>   /*
>    * Set this flag, when platform requires 64K GTT page sizes or larger for
>    * device local memory access.
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index d8446bb25d5e..bd0b8502b91e 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -1019,6 +1019,7 @@ static const struct intel_device_info adl_p_info = {
>   	.has_logical_ring_contexts = 1, \
>   	.has_logical_ring_elsq = 1, \
>   	.has_mslice_steering = 1, \
> +	.has_oa_bpc_reporting = 1, \
>   	.has_rc6 = 1, \
>   	.has_reset_engine = 1, \
>   	.has_rps = 1, \
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index efa7eda83edd..6fc4f0d8fc5a 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -2745,10 +2745,12 @@ static int
>   gen12_enable_metric_set(struct i915_perf_stream *stream,
>   			struct i915_active *active)
>   {
> +	struct drm_i915_private *i915 = stream->perf->i915;
>   	struct intel_uncore *uncore = stream->uncore;
>   	struct i915_oa_config *oa_config = stream->oa_config;
>   	bool periodic = stream->periodic;
>   	u32 period_exponent = stream->period_exponent;
> +	u32 sqcnt1;
>   	int ret;
>   
>   	intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG,
> @@ -2767,6 +2769,16 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
>   			    (period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT))
>   			    : 0);
>   
> + 	/*
> + 	 * Initialize Super Queue Internal Cnt Register
> + 	 * Set PMON Enable in order to collect valid metrics.
> +	 * Enable commands per clock reporting in OA for XEHPSDV onward.
> + 	 */
> +	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
> +		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
> +
> +	intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, sqcnt1);
> +
>   	/*
>   	 * Update all contexts prior writing the mux configurations as we need
>   	 * to make sure all slices/subslices are ON before writing to NOA
> @@ -2816,6 +2828,8 @@ static void gen11_disable_metric_set(struct i915_perf_stream *stream)
>   static void gen12_disable_metric_set(struct i915_perf_stream *stream)
>   {
>   	struct intel_uncore *uncore = stream->uncore;
> +	struct drm_i915_private *i915 = stream->perf->i915;
> +	u32 sqcnt1;
>   
>   	/* Reset all contexts' slices/subslices configurations. */
>   	gen12_configure_all_contexts(stream, NULL, NULL);
> @@ -2826,6 +2840,12 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
>   
>   	/* Make sure we disable noa to save power. */
>   	intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0);
> +
> +	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
> +		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
> +
> + 	/* Reset PMON Enable to save power. */
> +	intel_uncore_rmw(uncore, GEN12_SQCNT1, sqcnt1, 0);
>   }
>   
>   static void gen7_oa_enable(struct i915_perf_stream *stream)
> diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
> index 0ef3562ff4aa..381d94101610 100644
> --- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
> +++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
> @@ -134,4 +134,8 @@
>   #define GDT_CHICKEN_BITS    _MMIO(0x9840)
>   #define   GT_NOA_ENABLE	    0x00000080
>   
> +#define GEN12_SQCNT1				_MMIO(0x8718)
> +#define   GEN12_SQCNT1_PMON_ENABLE		REG_BIT(30)
> +#define   GEN12_SQCNT1_OABPC			REG_BIT(29)
> +
>   #endif /* __INTEL_PERF_OA_REGS__ */
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> index 23bf230aa104..fc2a0660426e 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -163,6 +163,7 @@ enum intel_ppgtt_type {
>   	func(has_logical_ring_elsq); \
>   	func(has_media_ratio_mode); \
>   	func(has_mslice_steering); \
> +	func(has_oa_bpc_reporting); \
>   	func(has_one_eu_per_fuse_bit); \
>   	func(has_pooled_eu); \
>   	func(has_pxp); \



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 07/19] drm/i915/perf: Simply use stream->ctx
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 07/19] drm/i915/perf: Simply use stream->ctx Umesh Nerlige Ramappa
@ 2022-09-06 19:52   ` Lionel Landwerlin
  0 siblings, 0 replies; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 19:52 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
> Earlier code used exclusive_stream to check for user passed context.
> Simplify this by accessing stream->ctx.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index bbf1c574f393..3e3bda147c48 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -801,7 +801,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		 * switches since it's not-uncommon for periodic samples to
>   		 * identify a switch before any 'context switch' report.
>   		 */
> -		if (!stream->perf->exclusive_stream->ctx ||
> +		if (!stream->ctx ||
>   		    stream->specific_ctx_id == ctx_id ||
>   		    stream->oa_buffer.last_ctx_id == stream->specific_ctx_id ||
>   		    reason & OAREPORT_REASON_CTX_SWITCH) {
> @@ -810,7 +810,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   			 * While filtering for a single context we avoid
>   			 * leaking the IDs of other contexts.
>   			 */
> -			if (stream->perf->exclusive_stream->ctx &&
> +			if (stream->ctx &&
>   			    stream->specific_ctx_id != ctx_id) {
>   				report32[2] = INVALID_CTX_ID;
>   			}



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 08/19] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 08/19] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
@ 2022-09-06 19:54   ` Lionel Landwerlin
  2022-09-14 18:20   ` Dixit, Ashutosh
  1 sibling, 0 replies; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 19:54 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
> Make perf part of gt as the OAG buffer is specific to a gt. The refactor
> eventually simplifies programming the right OA buffer and the right HW
> registers when supporting multiple gts.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_gt_types.h   |  3 +
>   drivers/gpu/drm/i915/gt/intel_sseu.c       |  4 +-
>   drivers/gpu/drm/i915/i915_perf.c           | 75 +++++++++++++---------
>   drivers/gpu/drm/i915/i915_perf_types.h     | 39 +++++------
>   drivers/gpu/drm/i915/selftests/i915_perf.c | 16 +++--
>   5 files changed, 80 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index 4d56f7d5a3be..3d079d206cec 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -20,6 +20,7 @@
>   #include "intel_gsc.h"
>   
>   #include "i915_vma.h"
> +#include "i915_perf_types.h"
>   #include "intel_engine_types.h"
>   #include "intel_gt_buffer_pool_types.h"
>   #include "intel_hwconfig.h"
> @@ -260,6 +261,8 @@ struct intel_gt {
>   	/* sysfs defaults per gt */
>   	struct gt_defaults defaults;
>   	struct kobject *sysfs_defaults;
> +
> +	struct i915_perf_gt perf;
>   };
>   
>   enum intel_gt_scratch_field {
> diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c
> index c6d3050604c8..fcaf3c58b554 100644
> --- a/drivers/gpu/drm/i915/gt/intel_sseu.c
> +++ b/drivers/gpu/drm/i915/gt/intel_sseu.c
> @@ -678,8 +678,8 @@ u32 intel_sseu_make_rpcs(struct intel_gt *gt,
>   	 * If i915/perf is active, we want a stable powergating configuration
>   	 * on the system. Use the configuration pinned by i915/perf.
>   	 */
> -	if (i915->perf.exclusive_stream)
> -		req_sseu = &i915->perf.sseu;
> +	if (gt->perf.exclusive_stream)
> +		req_sseu = &gt->perf.sseu;
>   
>   	slices = hweight8(req_sseu->slice_mask);
>   	subslices = hweight8(req_sseu->subslice_mask);
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 3e3bda147c48..5dccb3ffffc5 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1577,8 +1577,9 @@ free_noa_wait(struct i915_perf_stream *stream)
>   static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
>   {
>   	struct i915_perf *perf = stream->perf;
> +	struct intel_gt *gt = stream->engine->gt;
>   
> -	BUG_ON(stream != perf->exclusive_stream);
> +	BUG_ON(stream != gt->perf.exclusive_stream);
>   
>   	/*
>   	 * Unset exclusive_stream first, it will be checked while disabling
> @@ -1586,7 +1587,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
>   	 *
>   	 * See i915_oa_init_reg_state() and lrc_configure_all_contexts()
>   	 */
> -	WRITE_ONCE(perf->exclusive_stream, NULL);
> +	WRITE_ONCE(gt->perf.exclusive_stream, NULL);
>   	perf->ops.disable_metric_set(stream);
>   
>   	free_oa_buffer(stream);
> @@ -2579,10 +2580,11 @@ oa_configure_all_contexts(struct i915_perf_stream *stream,
>   {
>   	struct drm_i915_private *i915 = stream->perf->i915;
>   	struct intel_engine_cs *engine;
> +	struct intel_gt *gt = stream->engine->gt;
>   	struct i915_gem_context *ctx, *cn;
>   	int err;
>   
> -	lockdep_assert_held(&stream->perf->lock);
> +	lockdep_assert_held(&gt->perf.lock);
>   
>   	/*
>   	 * The OA register config is setup through the context image. This image
> @@ -3103,6 +3105,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   {
>   	struct drm_i915_private *i915 = stream->perf->i915;
>   	struct i915_perf *perf = stream->perf;
> +	struct intel_gt *gt;
>   	int format_size;
>   	int ret;
>   
> @@ -3111,6 +3114,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   			"OA engine not specified\n");
>   		return -EINVAL;
>   	}
> +	gt = props->engine->gt;
>   
>   	/*
>   	 * If the sysfs metrics/ directory wasn't registered for some
> @@ -3141,7 +3145,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   	 * counter reports and marshal to the appropriate client
>   	 * we currently only allow exclusive access
>   	 */
> -	if (perf->exclusive_stream) {
> +	if (gt->perf.exclusive_stream) {
>   		drm_dbg(&stream->perf->i915->drm,
>   			"OA unit already in use\n");
>   		return -EBUSY;
> @@ -3221,8 +3225,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   
>   	stream->ops = &i915_oa_stream_ops;
>   
> -	perf->sseu = props->sseu;
> -	WRITE_ONCE(perf->exclusive_stream, stream);
> +	stream->engine->gt->perf.sseu = props->sseu;
> +	WRITE_ONCE(gt->perf.exclusive_stream, stream);
>   
>   	ret = i915_perf_stream_enable_sync(stream);
>   	if (ret) {
> @@ -3244,7 +3248,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   	return 0;
>   
>   err_enable:
> -	WRITE_ONCE(perf->exclusive_stream, NULL);
> +	WRITE_ONCE(gt->perf.exclusive_stream, NULL);
>   	perf->ops.disable_metric_set(stream);
>   
>   	free_oa_buffer(stream);
> @@ -3274,7 +3278,7 @@ void i915_oa_init_reg_state(const struct intel_context *ce,
>   		return;
>   
>   	/* perf.exclusive_stream serialised by lrc_configure_all_contexts() */
> -	stream = READ_ONCE(engine->i915->perf.exclusive_stream);
> +	stream = READ_ONCE(engine->gt->perf.exclusive_stream);
>   	if (stream && GRAPHICS_VER(stream->perf->i915) < 12)
>   		gen8_update_reg_state_unlocked(ce, stream);
>   }
> @@ -3303,7 +3307,7 @@ static ssize_t i915_perf_read(struct file *file,
>   			      loff_t *ppos)
>   {
>   	struct i915_perf_stream *stream = file->private_data;
> -	struct i915_perf *perf = stream->perf;
> +	struct intel_gt *gt = stream->engine->gt;
>   	size_t offset = 0;
>   	int ret;
>   
> @@ -3327,14 +3331,14 @@ static ssize_t i915_perf_read(struct file *file,
>   			if (ret)
>   				return ret;
>   
> -			mutex_lock(&perf->lock);
> +			mutex_lock(&gt->perf.lock);
>   			ret = stream->ops->read(stream, buf, count, &offset);
> -			mutex_unlock(&perf->lock);
> +			mutex_unlock(&gt->perf.lock);
>   		} while (!offset && !ret);
>   	} else {
> -		mutex_lock(&perf->lock);
> +		mutex_lock(&gt->perf.lock);
>   		ret = stream->ops->read(stream, buf, count, &offset);
> -		mutex_unlock(&perf->lock);
> +		mutex_unlock(&gt->perf.lock);
>   	}
>   
>   	/* We allow the poll checking to sometimes report false positive EPOLLIN
> @@ -3381,7 +3385,7 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
>    * &i915_perf_stream_ops->poll_wait to call poll_wait() with a wait queue that
>    * will be woken for new stream data.
>    *
> - * Note: The &perf->lock mutex has been taken to serialize
> + * Note: The &gt->perf.lock mutex has been taken to serialize
>    * with any non-file-operation driver hooks.
>    *
>    * Returns: any poll events that are ready without sleeping
> @@ -3422,12 +3426,12 @@ static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
>   static __poll_t i915_perf_poll(struct file *file, poll_table *wait)
>   {
>   	struct i915_perf_stream *stream = file->private_data;
> -	struct i915_perf *perf = stream->perf;
> +	struct intel_gt *gt = stream->engine->gt;
>   	__poll_t ret;
>   
> -	mutex_lock(&perf->lock);
> +	mutex_lock(&gt->perf.lock);
>   	ret = i915_perf_poll_locked(stream, file, wait);
> -	mutex_unlock(&perf->lock);
> +	mutex_unlock(&gt->perf.lock);
>   
>   	return ret;
>   }
> @@ -3526,7 +3530,7 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
>    * @cmd: the ioctl request
>    * @arg: the ioctl data
>    *
> - * Note: The &perf->lock mutex has been taken to serialize
> + * Note: The &gt->perf.lock mutex has been taken to serialize
>    * with any non-file-operation driver hooks.
>    *
>    * Returns: zero on success or a negative error code. Returns -EINVAL for
> @@ -3566,12 +3570,12 @@ static long i915_perf_ioctl(struct file *file,
>   			    unsigned long arg)
>   {
>   	struct i915_perf_stream *stream = file->private_data;
> -	struct i915_perf *perf = stream->perf;
> +	struct intel_gt *gt = stream->engine->gt;
>   	long ret;
>   
> -	mutex_lock(&perf->lock);
> +	mutex_lock(&gt->perf.lock);
>   	ret = i915_perf_ioctl_locked(stream, cmd, arg);
> -	mutex_unlock(&perf->lock);
> +	mutex_unlock(&gt->perf.lock);
>   
>   	return ret;
>   }
> @@ -3583,7 +3587,7 @@ static long i915_perf_ioctl(struct file *file,
>    * Frees all resources associated with the given i915 perf @stream, disabling
>    * any associated data capture in the process.
>    *
> - * Note: The &perf->lock mutex has been taken to serialize
> + * Note: The &gt->perf.lock mutex has been taken to serialize
>    * with any non-file-operation driver hooks.
>    */
>   static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
> @@ -3615,10 +3619,11 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>   {
>   	struct i915_perf_stream *stream = file->private_data;
>   	struct i915_perf *perf = stream->perf;
> +	struct intel_gt *gt = stream->engine->gt;
>   
> -	mutex_lock(&perf->lock);
> +	mutex_lock(&gt->perf.lock);
>   	i915_perf_destroy_locked(stream);
> -	mutex_unlock(&perf->lock);
> +	mutex_unlock(&gt->perf.lock);
>   
>   	/* Release the reference the perf stream kept on the driver. */
>   	drm_dev_put(&perf->i915->drm);
> @@ -3651,7 +3656,7 @@ static const struct file_operations fops = {
>    * See i915_perf_ioctl_open() for interface details.
>    *
>    * Implements further stream config validation and stream initialization on
> - * behalf of i915_perf_open_ioctl() with the &perf->lock mutex
> + * behalf of i915_perf_open_ioctl() with the &gt->perf.lock mutex
>    * taken to serialize with any non-file-operation driver hooks.
>    *
>    * Note: at this point the @props have only been validated in isolation and
> @@ -4035,7 +4040,7 @@ static int read_properties_unlocked(struct i915_perf *perf,
>    * mutex to avoid an awkward lockdep with mmap_lock.
>    *
>    * Most of the implementation details are handled by
> - * i915_perf_open_ioctl_locked() after taking the &perf->lock
> + * i915_perf_open_ioctl_locked() after taking the &gt->perf.lock
>    * mutex for serializing with any non-file-operation driver hooks.
>    *
>    * Return: A newly opened i915 Perf stream file descriptor or negative
> @@ -4046,6 +4051,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>   {
>   	struct i915_perf *perf = &to_i915(dev)->perf;
>   	struct drm_i915_perf_open_param *param = data;
> +	struct intel_gt *gt;
>   	struct perf_open_properties props;
>   	u32 known_open_flags;
>   	int ret;
> @@ -4072,9 +4078,11 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>   	if (ret)
>   		return ret;
>   
> -	mutex_lock(&perf->lock);
> +	gt = props.engine->gt;
> +
> +	mutex_lock(&gt->perf.lock);
>   	ret = i915_perf_open_ioctl_locked(perf, param, &props, file);
> -	mutex_unlock(&perf->lock);
> +	mutex_unlock(&gt->perf.lock);
>   
>   	return ret;
>   }
> @@ -4090,6 +4098,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>   void i915_perf_register(struct drm_i915_private *i915)
>   {
>   	struct i915_perf *perf = &i915->perf;
> +	struct intel_gt *gt = to_gt(i915);
>   
>   	if (!perf->i915)
>   		return;
> @@ -4098,13 +4107,13 @@ void i915_perf_register(struct drm_i915_private *i915)
>   	 * i915_perf_open_ioctl(); considering that we register after
>   	 * being exposed to userspace.
>   	 */
> -	mutex_lock(&perf->lock);
> +	mutex_lock(&gt->perf.lock);
>   
>   	perf->metrics_kobj =
>   		kobject_create_and_add("metrics",
>   				       &i915->drm.primary->kdev->kobj);
>   
> -	mutex_unlock(&perf->lock);
> +	mutex_unlock(&gt->perf.lock);
>   }
>   
>   /**
> @@ -4783,7 +4792,11 @@ void i915_perf_init(struct drm_i915_private *i915)
>   	}
>   
>   	if (perf->ops.enable_metric_set) {
> -		mutex_init(&perf->lock);
> +		struct intel_gt *gt;
> +		int i;
> +
> +		for_each_gt(gt, i915, i)
> +			mutex_init(&gt->perf.lock);
>   
>   		/* Choose a representative limit */
>   		oa_sample_rate_hard_limit = to_gt(i915)->clock_frequency / 2;
> diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
> index 05cb9a335a97..e888bfab478f 100644
> --- a/drivers/gpu/drm/i915/i915_perf_types.h
> +++ b/drivers/gpu/drm/i915/i915_perf_types.h
> @@ -380,6 +380,26 @@ struct i915_oa_ops {
>   	u32 (*oa_hw_tail_read)(struct i915_perf_stream *stream);
>   };
>   
> +struct i915_perf_gt {
> +	/*
> +	 * Lock associated with anything below within this structure.
> +	 */
> +	struct mutex lock;
> +
> +	/**
> +	 * @sseu: sseu configuration selected to run while perf is active,
> +	 * applies to all contexts.
> +	 */
> +	struct intel_sseu sseu;
> +
> +	/*
> +	 * @exclusive_stream: The stream currently using the OA unit. This is
> +	 * sometimes accessed outside a syscall associated to its file
> +	 * descriptor.
> +	 */
> +	struct i915_perf_stream *exclusive_stream;
> +};
> +
>   struct i915_perf {
>   	struct drm_i915_private *i915;
>   
> @@ -397,25 +417,6 @@ struct i915_perf {
>   	 */
>   	struct idr metrics_idr;
>   
> -	/*
> -	 * Lock associated with anything below within this structure
> -	 * except exclusive_stream.
> -	 */
> -	struct mutex lock;
> -
> -	/*
> -	 * The stream currently using the OA unit. If accessed
> -	 * outside a syscall associated to its file
> -	 * descriptor.
> -	 */
> -	struct i915_perf_stream *exclusive_stream;
> -
> -	/**
> -	 * @sseu: sseu configuration selected to run while perf is active,
> -	 * applies to all contexts.
> -	 */
> -	struct intel_sseu sseu;
> -
>   	/**
>   	 * For rate limiting any notifications of spurious
>   	 * invalid OA reports
> diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c
> index 429c6d73b159..24dde5531423 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_perf.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c
> @@ -102,6 +102,12 @@ test_stream(struct i915_perf *perf)
>   		I915_OA_FORMAT_A32u40_A4u32_B8_C8 : I915_OA_FORMAT_C4_B8,
>   	};
>   	struct i915_perf_stream *stream;
> +	struct intel_gt *gt;
> +
> +	if (!props.engine)
> +		return NULL;
> +
> +	gt = props.engine->gt;
>   
>   	if (!oa_config)
>   		return NULL;
> @@ -116,12 +122,12 @@ test_stream(struct i915_perf *perf)
>   
>   	stream->perf = perf;
>   
> -	mutex_lock(&perf->lock);
> +	mutex_lock(&gt->perf.lock);
>   	if (i915_oa_stream_init(stream, &param, &props)) {
>   		kfree(stream);
>   		stream =  NULL;
>   	}
> -	mutex_unlock(&perf->lock);
> +	mutex_unlock(&gt->perf.lock);
>   
>   	i915_oa_config_put(oa_config);
>   
> @@ -130,11 +136,11 @@ test_stream(struct i915_perf *perf)
>   
>   static void stream_destroy(struct i915_perf_stream *stream)
>   {
> -	struct i915_perf *perf = stream->perf;
> +	struct intel_gt *gt = stream->engine->gt;
>   
> -	mutex_lock(&perf->lock);
> +	mutex_lock(&gt->perf.lock);
>   	i915_perf_destroy_locked(stream);
> -	mutex_unlock(&perf->lock);
> +	mutex_unlock(&gt->perf.lock);
>   }
>   
>   static int live_sanitycheck(void *arg)



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 10/19] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 10/19] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers Umesh Nerlige Ramappa
@ 2022-09-06 19:56   ` Lionel Landwerlin
  2022-09-06 20:28     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 19:56 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
> User passes uabi engine class and instance to the perf OA interface. Use
> gt corresponding to the engine to pin the buffers to the right ggtt.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

I didn't know there was a GGTT per engine.

Do I understand this correct?


Thanks,

-Lionel


> ---
>   drivers/gpu/drm/i915/i915_perf.c | 21 +++++++++++++++++++--
>   1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 87b92d2946f4..f7621b45966c 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1765,6 +1765,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
>   static int alloc_oa_buffer(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *i915 = stream->perf->i915;
> +	struct intel_gt *gt = stream->engine->gt;
>   	struct drm_i915_gem_object *bo;
>   	struct i915_vma *vma;
>   	int ret;
> @@ -1784,11 +1785,22 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
>   	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
>   
>   	/* PreHSW required 512K alignment, HSW requires 16M */
> -	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
> +	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>   	if (IS_ERR(vma)) {
>   		ret = PTR_ERR(vma);
>   		goto err_unref;
>   	}
> +
> +	/*
> +	 * PreHSW required 512K alignment.
> +	 * HSW and onwards, align to requested size of OA buffer.
> +	 */
> +	ret = i915_vma_pin(vma, 0, SZ_16M, PIN_GLOBAL | PIN_HIGH);
> +	if (ret) {
> +		drm_err(&gt->i915->drm, "Failed to pin OA buffer %d\n", ret);
> +		goto err_unref;
> +	}
> +
>   	stream->oa_buffer.vma = vma;
>   
>   	stream->oa_buffer.vaddr =
> @@ -1838,6 +1850,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
>   static int alloc_noa_wait(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *i915 = stream->perf->i915;
> +	struct intel_gt *gt = stream->engine->gt;
>   	struct drm_i915_gem_object *bo;
>   	struct i915_vma *vma;
>   	const u64 delay_ticks = 0xffffffffffffffff -
> @@ -1878,12 +1891,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
>   	 * multiple OA config BOs will have a jump to this address and it
>   	 * needs to be fixed during the lifetime of the i915/perf stream.
>   	 */
> -	vma = i915_gem_object_ggtt_pin_ww(bo, &ww, NULL, 0, 0, PIN_HIGH);
> +	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>   	if (IS_ERR(vma)) {
>   		ret = PTR_ERR(vma);
>   		goto out_ww;
>   	}
>   
> +	ret = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_GLOBAL | PIN_HIGH);
> +	if (ret)
> +		goto out_ww;
> +
>   	batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
>   	if (IS_ERR(batch)) {
>   		ret = PTR_ERR(batch);



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 11/19] drm/i915/perf: Store a pointer to oa_format in oa_buffer
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 11/19] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
@ 2022-09-06 19:56   ` Lionel Landwerlin
  2022-09-14 20:43   ` Dixit, Ashutosh
  1 sibling, 0 replies; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 19:56 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
> DG2 introduces OA reports with 64 bit report header fields. Perf OA
> would need more information about the OA format in order to process such
> reports. Store all OA format info in oa_buffer instead of just the size
> and format-id.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c       | 23 ++++++++++-------------
>   drivers/gpu/drm/i915/i915_perf_types.h |  3 +--
>   2 files changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index f7621b45966c..9e455bd3bce5 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -483,7 +483,7 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
>   static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   {
>   	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
> -	int report_size = stream->oa_buffer.format_size;
> +	int report_size = stream->oa_buffer.format->size;
>   	unsigned long flags;
>   	bool pollin;
>   	u32 hw_tail;
> @@ -630,7 +630,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   			    size_t *offset,
>   			    const u8 *report)
>   {
> -	int report_size = stream->oa_buffer.format_size;
> +	int report_size = stream->oa_buffer.format->size;
>   	struct drm_i915_perf_record_header header;
>   	int report_size_partial;
>   	u8 *oa_buf_end;
> @@ -694,7 +694,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   				  size_t *offset)
>   {
>   	struct intel_uncore *uncore = stream->uncore;
> -	int report_size = stream->oa_buffer.format_size;
> +	int report_size = stream->oa_buffer.format->size;
>   	u8 *oa_buf_base = stream->oa_buffer.vaddr;
>   	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>   	size_t start_offset = *offset;
> @@ -970,7 +970,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   				  size_t *offset)
>   {
>   	struct intel_uncore *uncore = stream->uncore;
> -	int report_size = stream->oa_buffer.format_size;
> +	int report_size = stream->oa_buffer.format->size;
>   	u8 *oa_buf_base = stream->oa_buffer.vaddr;
>   	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>   	u32 mask = (OA_BUFFER_SIZE - 1);
> @@ -2517,7 +2517,7 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
>   {
>   	int err;
>   	struct intel_context *ce = stream->pinned_ctx;
> -	u32 format = stream->oa_buffer.format;
> +	u32 format = stream->oa_buffer.format->format;
>   	u32 offset = stream->perf->ctx_oactxctrl_offset;
>   	struct flex regs_context[] = {
>   		{
> @@ -2890,7 +2890,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream)
>   	u32 ctx_id = stream->specific_ctx_id;
>   	bool periodic = stream->periodic;
>   	u32 period_exponent = stream->period_exponent;
> -	u32 report_format = stream->oa_buffer.format;
> +	u32 report_format = stream->oa_buffer.format->format;
>   
>   	/*
>   	 * Reset buf pointers so we don't forward reports from before now.
> @@ -2916,7 +2916,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream)
>   static void gen8_oa_enable(struct i915_perf_stream *stream)
>   {
>   	struct intel_uncore *uncore = stream->uncore;
> -	u32 report_format = stream->oa_buffer.format;
> +	u32 report_format = stream->oa_buffer.format->format;
>   
>   	/*
>   	 * Reset buf pointers so we don't forward reports from before now.
> @@ -2942,7 +2942,7 @@ static void gen8_oa_enable(struct i915_perf_stream *stream)
>   static void gen12_oa_enable(struct i915_perf_stream *stream)
>   {
>   	struct intel_uncore *uncore = stream->uncore;
> -	u32 report_format = stream->oa_buffer.format;
> +	u32 report_format = stream->oa_buffer.format->format;
>   
>   	/*
>   	 * If we don't want OA reports from the OA buffer, then we don't even
> @@ -3184,15 +3184,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   	stream->sample_flags = props->sample_flags;
>   	stream->sample_size += format_size;
>   
> -	stream->oa_buffer.format_size = format_size;
> -	if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format_size == 0))
> +	stream->oa_buffer.format = &perf->oa_formats[props->oa_format];
> +	if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format->size == 0))
>   		return -EINVAL;
>   
>   	stream->hold_preemption = props->hold_preemption;
>   
> -	stream->oa_buffer.format =
> -		perf->oa_formats[props->oa_format].format;
> -
>   	stream->periodic = props->oa_periodic;
>   	if (stream->periodic)
>   		stream->period_exponent = props->oa_period_exponent;
> diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
> index dc9bfd8086cf..e0c96b44eda8 100644
> --- a/drivers/gpu/drm/i915/i915_perf_types.h
> +++ b/drivers/gpu/drm/i915/i915_perf_types.h
> @@ -250,11 +250,10 @@ struct i915_perf_stream {
>   	 * @oa_buffer: State of the OA buffer.
>   	 */
>   	struct {
> +		const struct i915_oa_format *format;
>   		struct i915_vma *vma;
>   		u8 *vaddr;
>   		u32 last_ctx_id;
> -		int format;
> -		int format_size;
>   		int size_exponent;
>   
>   		/**



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
  2022-09-06 19:46     ` Umesh Nerlige Ramappa
@ 2022-09-06 19:59       ` Lionel Landwerlin
  0 siblings, 0 replies; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 19:59 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On 06/09/2022 22:46, Umesh Nerlige Ramappa wrote:
> On Tue, Sep 06, 2022 at 10:35:16PM +0300, Lionel Landwerlin wrote:
>> On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>> Add new OA formats for DG2. Some of the newer OA formats are not
>>> multples of 64 bytes and are not powers of 2. For those formats, adjust
>>> hw_tail accordingly when checking for new reports.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramampa@intel.com>
>>
>> Apart from the coding style issue :
>>
>>
>> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>
>>
>>> ---
>>>  drivers/gpu/drm/i915/i915_perf.c | 63 ++++++++++++++++++++------------
>>>  include/uapi/drm/i915_drm.h      |  6 +++
>>>  2 files changed, 46 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>> b/drivers/gpu/drm/i915/i915_perf.c
>>> index 735244a3aedd..c8331b549d31 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -306,7 +306,8 @@ static u32 i915_oa_max_sample_rate = 100000;
>>>  /* XXX: beware if future OA HW adds new report formats that the 
>>> current
>>>   * code assumes all reports have a power-of-two size and ~(size - 
>>> 1) can
>>> - * be used as a mask to align the OA tail pointer.
>>> + * be used as a mask to align the OA tail pointer. In some of the
>>> + * formats, R is used to denote reserved field.
>>>   */
>>>  static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>>>      [I915_OA_FORMAT_A13]        = { 0, 64 },
>>> @@ -320,6 +321,10 @@ static const struct i915_oa_format 
>>> oa_formats[I915_OA_FORMAT_MAX] = {
>>>      [I915_OA_FORMAT_A12]            = { 0, 64 },
>>>      [I915_OA_FORMAT_A12_B8_C8]        = { 2, 128 },
>>>      [I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
>>> +    [I915_OAR_FORMAT_A32u40_A4u32_B8_C8]    = { 5, 256 },
>>> +    [I915_OA_FORMAT_A24u40_A14u32_B8_C8]    = { 5, 256 },
>>> +    [I915_OAR_FORMAT_A36u64_B8_C8]        = { 1, 384 },
>>> +    [I915_OA_FORMAT_A38u64_R2u64_B8_C8]    = { 1, 448 },
>>>  };
>>>  #define SAMPLE_OA_REPORT      (1<<0)
>>> @@ -467,6 +472,7 @@ static bool oa_buffer_check_unlocked(struct 
>>> i915_perf_stream *stream)
>>>      bool pollin;
>>>      u32 hw_tail;
>>>      u64 now;
>>> +    u32 partial_report_size;
>>>      /* We have to consider the (unlikely) possibility that read() 
>>> errors
>>>       * could result in an OA buffer reset which might reset the 
>>> head and
>>> @@ -476,10 +482,16 @@ static bool oa_buffer_check_unlocked(struct 
>>> i915_perf_stream *stream)
>>>      hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
>>> -    /* The tail pointer increases in 64 byte increments,
>>> -     * not in report_size steps...
>>> +    /* The tail pointer increases in 64 byte increments, whereas 
>>> report
>>> +     * sizes need not be integral multiples or 64 or powers of 2.
>>> +     * Compute potentially partially landed report in the OA buffer
>>>       */
>>> -    hw_tail &= ~(report_size - 1);
>>> +    partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail);
>>> +    partial_report_size %= report_size;
>>> +
>>> +    /* Subtract partial amount off the tail */
>>> +    hw_tail = gtt_offset + ((hw_tail - partial_report_size) &
>>> +                (stream->oa_buffer.vma->size - 1));
>>>      now = ktime_get_mono_fast_ns();
>>> @@ -601,6 +613,8 @@ static int append_oa_sample(struct 
>>> i915_perf_stream *stream,
>>>  {
>>>      int report_size = stream->oa_buffer.format_size;
>>>      struct drm_i915_perf_record_header header;
>>> +    int report_size_partial;
>>> +    u8 *oa_buf_end;
>>>      header.type = DRM_I915_PERF_RECORD_SAMPLE;
>>>      header.pad = 0;
>>> @@ -614,7 +628,19 @@ static int append_oa_sample(struct 
>>> i915_perf_stream *stream,
>>>          return -EFAULT;
>>>      buf += sizeof(header);
>>> -    if (copy_to_user(buf, report, report_size))
>>> +    oa_buf_end = stream->oa_buffer.vaddr +
>>> +             stream->oa_buffer.vma->size;
>>> +    report_size_partial = oa_buf_end - report;
>>> +
>>> +    if (report_size_partial < report_size) {
>>> +        if(copy_to_user(buf, report, report_size_partial))
>>> +            return -EFAULT;
>>> +        buf += report_size_partial;
>>> +
>>> +        if(copy_to_user(buf, stream->oa_buffer.vaddr,
>>> +                report_size - report_size_partial))
>>> +            return -EFAULT;
>>
>> I think the coding style requires you to use if () not if()
>>
>
> Will fix.
>
>>
>> Just a suggestion : you could make this code deal with the partial 
>> bit as the main bit of the function :
>>
>>
>> oa_buf_end = stream->oa_buffer.vaddr +
>>          stream->oa_buffer.vma->size;
>>
>> report_size_partial = oa_buf_end - report;
>>
>> if (copy_to_user(buf, report, report_size_partial))
>>     return -EFAULT;
>> buf += report_size_partial;
>
> This ^ may not work because append_oa_sample is appending exactly one 
> report to the user buffer, whereas the above may append more than one.
>
> Thanks,
> Umesh


Ah I see, thanks for pointing this out.

-Lionel


>
>>
>> if (report_size_partial < report_size &&
>>    copy_to_user(buf, stream->oa_buffer.vaddr,
>>         report_size - report_size_partial))
>>     return -EFAULT;
>> buf += report_size - report_size_partial;
>>
>>
>>> +    } else if (copy_to_user(buf, report, report_size))
>>>          return -EFAULT;
>>>      (*offset) += header.size;
>>> @@ -684,8 +710,8 @@ static int gen8_append_oa_reports(struct 
>>> i915_perf_stream *stream,
>>>       * all a power of two).
>>>       */
>>>      if (drm_WARN_ONCE(&uncore->i915->drm,
>>> -              head > OA_BUFFER_SIZE || head % report_size ||
>>> -              tail > OA_BUFFER_SIZE || tail % report_size,
>>> +              head > stream->oa_buffer.vma->size ||
>>> +              tail > stream->oa_buffer.vma->size,
>>>                "Inconsistent OA buffer pointers: head = %u, tail = 
>>> %u\n",
>>>                head, tail))
>>>          return -EIO;
>>> @@ -699,22 +725,6 @@ static int gen8_append_oa_reports(struct 
>>> i915_perf_stream *stream,
>>>          u32 ctx_id;
>>>          u32 reason;
>>> -        /*
>>> -         * All the report sizes factor neatly into the buffer
>>> -         * size so we never expect to see a report split
>>> -         * between the beginning and end of the buffer.
>>> -         *
>>> -         * Given the initial alignment check a misalignment
>>> -         * here would imply a driver bug that would result
>>> -         * in an overrun.
>>> -         */
>>> -        if (drm_WARN_ON(&uncore->i915->drm,
>>> -                (OA_BUFFER_SIZE - head) < report_size)) {
>>> -            drm_err(&uncore->i915->drm,
>>> -                "Spurious OA head ptr: non-integral report offset\n");
>>> -            break;
>>> -        }
>>> -
>>>          /*
>>>           * The reason field includes flags identifying what
>>>           * triggered this specific report (mostly timer
>>> @@ -4513,6 +4523,13 @@ static void oa_init_supported_formats(struct 
>>> i915_perf *perf)
>>>          oa_format_add(perf, I915_OA_FORMAT_C4_B8);
>>>          break;
>>> +    case INTEL_DG2:
>>> +        oa_format_add(perf, I915_OAR_FORMAT_A32u40_A4u32_B8_C8);
>>> +        oa_format_add(perf, I915_OA_FORMAT_A24u40_A14u32_B8_C8);
>>> +        oa_format_add(perf, I915_OAR_FORMAT_A36u64_B8_C8);
>>> +        oa_format_add(perf, I915_OA_FORMAT_A38u64_R2u64_B8_C8);
>>> +        break;
>>> +
>>>      default:
>>>          MISSING_CASE(platform);
>>>      }
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 520ad2691a99..d20d723925b5 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -2650,6 +2650,12 @@ enum drm_i915_oa_format {
>>>      I915_OA_FORMAT_A12_B8_C8,
>>>      I915_OA_FORMAT_A32u40_A4u32_B8_C8,
>>> +    /* DG2 */
>>> +    I915_OAR_FORMAT_A32u40_A4u32_B8_C8,
>>> +    I915_OA_FORMAT_A24u40_A14u32_B8_C8,
>>> +    I915_OAR_FORMAT_A36u64_B8_C8,
>>> +    I915_OA_FORMAT_A38u64_R2u64_B8_C8,
>>> +
>>>      I915_OA_FORMAT_MAX        /* non-ABI */
>>>  };
>>
>>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 10/19] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
  2022-09-06 19:56   ` Lionel Landwerlin
@ 2022-09-06 20:28     ` Umesh Nerlige Ramappa
  2022-09-06 20:31       ` Lionel Landwerlin
  0 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-06 20:28 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Tue, Sep 06, 2022 at 10:56:13PM +0300, Lionel Landwerlin wrote:
>On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>User passes uabi engine class and instance to the perf OA interface. Use
>>gt corresponding to the engine to pin the buffers to the right ggtt.
>>
>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>
>I didn't know there was a GGTT per engine.
>
>Do I understand this correct?

No, GGTT is still per-gt. We just derive the gt from engine class 
instance passed (as in engine->gt).

>
>
>Thanks,
>
>-Lionel
>
>
>>---
>>  drivers/gpu/drm/i915/i915_perf.c | 21 +++++++++++++++++++--
>>  1 file changed, 19 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>index 87b92d2946f4..f7621b45966c 100644
>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>@@ -1765,6 +1765,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
>>  static int alloc_oa_buffer(struct i915_perf_stream *stream)
>>  {
>>  	struct drm_i915_private *i915 = stream->perf->i915;
>>+	struct intel_gt *gt = stream->engine->gt;
>>  	struct drm_i915_gem_object *bo;
>>  	struct i915_vma *vma;
>>  	int ret;
>>@@ -1784,11 +1785,22 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
>>  	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
>>  	/* PreHSW required 512K alignment, HSW requires 16M */
>>-	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
>>+	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>>  	if (IS_ERR(vma)) {
>>  		ret = PTR_ERR(vma);
>>  		goto err_unref;
>>  	}
>>+
>>+	/*
>>+	 * PreHSW required 512K alignment.
>>+	 * HSW and onwards, align to requested size of OA buffer.
>>+	 */
>>+	ret = i915_vma_pin(vma, 0, SZ_16M, PIN_GLOBAL | PIN_HIGH);
>>+	if (ret) {
>>+		drm_err(&gt->i915->drm, "Failed to pin OA buffer %d\n", ret);
>>+		goto err_unref;
>>+	}
>>+
>>  	stream->oa_buffer.vma = vma;
>>  	stream->oa_buffer.vaddr =
>>@@ -1838,6 +1850,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
>>  static int alloc_noa_wait(struct i915_perf_stream *stream)
>>  {
>>  	struct drm_i915_private *i915 = stream->perf->i915;
>>+	struct intel_gt *gt = stream->engine->gt;
>>  	struct drm_i915_gem_object *bo;
>>  	struct i915_vma *vma;
>>  	const u64 delay_ticks = 0xffffffffffffffff -
>>@@ -1878,12 +1891,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
>>  	 * multiple OA config BOs will have a jump to this address and it
>>  	 * needs to be fixed during the lifetime of the i915/perf stream.
>>  	 */
>>-	vma = i915_gem_object_ggtt_pin_ww(bo, &ww, NULL, 0, 0, PIN_HIGH);
>>+	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>>  	if (IS_ERR(vma)) {
>>  		ret = PTR_ERR(vma);
>>  		goto out_ww;
>>  	}
>>+	ret = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_GLOBAL | PIN_HIGH);
>>+	if (ret)
>>+		goto out_ww;
>>+
>>  	batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
>>  	if (IS_ERR(batch)) {
>>  		ret = PTR_ERR(batch);
>
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 10/19] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
  2022-09-06 20:28     ` Umesh Nerlige Ramappa
@ 2022-09-06 20:31       ` Lionel Landwerlin
  0 siblings, 0 replies; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-06 20:31 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On 06/09/2022 23:28, Umesh Nerlige Ramappa wrote:
> On Tue, Sep 06, 2022 at 10:56:13PM +0300, Lionel Landwerlin wrote:
>> On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>> User passes uabi engine class and instance to the perf OA interface. 
>>> Use
>>> gt corresponding to the engine to pin the buffers to the right ggtt.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>
>> I didn't know there was a GGTT per engine.
>>
>> Do I understand this correct?
>
> No, GGTT is still per-gt. We just derive the gt from engine class 
> instance passed (as in engine->gt).


Oh thanks I understand now.


Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>


>
>>
>>
>> Thanks,
>>
>> -Lionel
>>
>>
>>> ---
>>>  drivers/gpu/drm/i915/i915_perf.c | 21 +++++++++++++++++++--
>>>  1 file changed, 19 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>> b/drivers/gpu/drm/i915/i915_perf.c
>>> index 87b92d2946f4..f7621b45966c 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -1765,6 +1765,7 @@ static void gen12_init_oa_buffer(struct 
>>> i915_perf_stream *stream)
>>>  static int alloc_oa_buffer(struct i915_perf_stream *stream)
>>>  {
>>>      struct drm_i915_private *i915 = stream->perf->i915;
>>> +    struct intel_gt *gt = stream->engine->gt;
>>>      struct drm_i915_gem_object *bo;
>>>      struct i915_vma *vma;
>>>      int ret;
>>> @@ -1784,11 +1785,22 @@ static int alloc_oa_buffer(struct 
>>> i915_perf_stream *stream)
>>>      i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
>>>      /* PreHSW required 512K alignment, HSW requires 16M */
>>> -    vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
>>> +    vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>>>      if (IS_ERR(vma)) {
>>>          ret = PTR_ERR(vma);
>>>          goto err_unref;
>>>      }
>>> +
>>> +    /*
>>> +     * PreHSW required 512K alignment.
>>> +     * HSW and onwards, align to requested size of OA buffer.
>>> +     */
>>> +    ret = i915_vma_pin(vma, 0, SZ_16M, PIN_GLOBAL | PIN_HIGH);
>>> +    if (ret) {
>>> +        drm_err(&gt->i915->drm, "Failed to pin OA buffer %d\n", ret);
>>> +        goto err_unref;
>>> +    }
>>> +
>>>      stream->oa_buffer.vma = vma;
>>>      stream->oa_buffer.vaddr =
>>> @@ -1838,6 +1850,7 @@ static u32 *save_restore_register(struct 
>>> i915_perf_stream *stream, u32 *cs,
>>>  static int alloc_noa_wait(struct i915_perf_stream *stream)
>>>  {
>>>      struct drm_i915_private *i915 = stream->perf->i915;
>>> +    struct intel_gt *gt = stream->engine->gt;
>>>      struct drm_i915_gem_object *bo;
>>>      struct i915_vma *vma;
>>>      const u64 delay_ticks = 0xffffffffffffffff -
>>> @@ -1878,12 +1891,16 @@ static int alloc_noa_wait(struct 
>>> i915_perf_stream *stream)
>>>       * multiple OA config BOs will have a jump to this address and it
>>>       * needs to be fixed during the lifetime of the i915/perf stream.
>>>       */
>>> -    vma = i915_gem_object_ggtt_pin_ww(bo, &ww, NULL, 0, 0, PIN_HIGH);
>>> +    vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
>>>      if (IS_ERR(vma)) {
>>>          ret = PTR_ERR(vma);
>>>          goto out_ww;
>>>      }
>>> +    ret = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_GLOBAL | PIN_HIGH);
>>> +    if (ret)
>>> +        goto out_ww;
>>> +
>>>      batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
>>>      if (IS_ERR(batch)) {
>>>          ret = PTR_ERR(batch);
>>
>>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime
  2022-09-06 19:48   ` Lionel Landwerlin
@ 2022-09-06 20:35     ` Umesh Nerlige Ramappa
  2022-09-08 18:32       ` Lionel Landwerlin
  0 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-06 20:35 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Tue, Sep 06, 2022 at 10:48:50PM +0300, Lionel Landwerlin wrote:
>On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>Some SKUs of same gen12 platform may have different oactxctrl
>>offsets. For gen12, determine oactxctrl offsets at runtime.
>>
>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>---
>>  drivers/gpu/drm/i915/i915_perf.c         | 149 ++++++++++++++++++-----
>>  drivers/gpu/drm/i915/i915_perf_oa_regs.h |   2 +-
>>  2 files changed, 120 insertions(+), 31 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>index 3526693d64fa..efa7eda83edd 100644
>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>@@ -1363,6 +1363,67 @@ static int gen12_get_render_context_id(struct i915_perf_stream *stream)
>>  	return 0;
>>  }
>>+#define MI_OPCODE(x) (((x) >> 23) & 0x3f)
>>+#define IS_MI_LRI_CMD(x) (MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 0)))
>>+#define MI_LRI_LEN(x) (((x) & 0xff) + 1)
>
>
>Maybe you want to put this in intel_gpu_commands.h
>
>
>>+#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX)
>>+static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset)
>>+{
>>+	u32 idx = *offset;
>>+	u32 len = MI_LRI_LEN(state[idx]) + idx;
>>+
>>+	idx++;
>>+	for (; idx < len; idx += 2)
>>+		if (state[idx] == reg)
>>+			break;
>>+
>>+	*offset = idx;
>>+	return state[idx] == reg;
>>+}
>>+
>>+static u32 __context_image_offset(struct intel_context *ce, u32 reg)
>>+{
>>+	u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
>>+	u32 *state = ce->lrc_reg_state;
>>+
>>+	for (offset = 0; offset < len; ) {
>>+		if (IS_MI_LRI_CMD(state[offset])) {
>
>I'm a bit concerned you might find other matches with this.
>
>Because let's say you run into a 3DSTATE_SUBSLICE_HASH_TABLE 
>instruction, you'll iterate the instruction dword by dword because you 
>don't know how to read its length and skip to the next one.
>
>Now some of the fields can be programmed from userspace to look like 
>an MI_LRI header, so you start to read data in the wrong way.
>
>
>Unfortunately I don't have a better solution. My only ask is that you 
>make __find_reg_in_lri() take the context image size in parameter so 
>it NEVER goes over the the context image.
>
>
>To limit the risk you should run this function only one at driver 
>initialization and store the found offset.
>

Hmm, didn't know that there may be non-LRI commands in the context image 
or user could add to the context image somehow. Does using the context 
image size alone address these issues?

Even after including the size in the logic, any reason you think we 
would be much more safer to do this from init? Is it because context 
image is not touched by user yet?

Thanks,
Umesh

>
>Thanks,
>
>
>-Lionel
>
>
>>+			if (__find_reg_in_lri(state, reg, &offset))
>>+				break;
>>+		} else {
>>+			offset++;
>>+		}
>>+	}
>>+
>>+	return offset < len ? offset : U32_MAX;
>>+}
>>+
>>+static int __set_oa_ctx_ctrl_offset(struct intel_context *ce)
>>+{
>>+	i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
>>+	struct i915_perf *perf = &ce->engine->i915->perf;
>>+	u32 saved_offset = perf->ctx_oactxctrl_offset;
>>+	u32 offset;
>>+
>>+	/* Do this only once. Failure is stored as offset of U32_MAX */
>>+	if (saved_offset)
>>+		return 0;
>>+
>>+	offset = __context_image_offset(ce, i915_mmio_reg_offset(reg));
>>+	perf->ctx_oactxctrl_offset = offset;
>>+
>>+	drm_dbg(&ce->engine->i915->drm,
>>+		"%s oa ctx control at 0x%08x dword offset\n",
>>+		ce->engine->name, offset);
>>+
>>+	return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
>>+}
>>+
>>+static bool engine_supports_mi_query(struct intel_engine_cs *engine)
>>+{
>>+	return engine->class == RENDER_CLASS;
>>+}
>>+
>>  /**
>>   * oa_get_render_ctx_id - determine and hold ctx hw id
>>   * @stream: An i915-perf stream opened for OA metrics
>>@@ -1382,6 +1443,17 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>  	if (IS_ERR(ce))
>>  		return PTR_ERR(ce);
>>+	if (engine_supports_mi_query(stream->engine)) {
>>+		ret = __set_oa_ctx_ctrl_offset(ce);
>>+		if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) {
>>+			intel_context_unpin(ce);
>>+			drm_err(&stream->perf->i915->drm,
>>+				"Enabling perf query failed for %s\n",
>>+				stream->engine->name);
>>+			return ret;
>>+		}
>>+	}
>>+
>>  	switch (GRAPHICS_VER(ce->engine->i915)) {
>>  	case 7: {
>>  		/*
>>@@ -2412,10 +2484,11 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
>>  	int err;
>>  	struct intel_context *ce = stream->pinned_ctx;
>>  	u32 format = stream->oa_buffer.format;
>>+	u32 offset = stream->perf->ctx_oactxctrl_offset;
>>  	struct flex regs_context[] = {
>>  		{
>>  			GEN8_OACTXCONTROL,
>>-			stream->perf->ctx_oactxctrl_offset + 1,
>>+			offset + 1,
>>  			active ? GEN8_OA_COUNTER_RESUME : 0,
>>  		},
>>  	};
>>@@ -2440,15 +2513,18 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
>>  		},
>>  	};
>>-	/* Modify the context image of pinned context with regs_context*/
>>-	err = intel_context_lock_pinned(ce);
>>-	if (err)
>>-		return err;
>>+	/* Modify the context image of pinned context with regs_context */
>>+	if (__valid_oactxctrl_offset(offset)) {
>>+		err = intel_context_lock_pinned(ce);
>>+		if (err)
>>+			return err;
>>-	err = gen8_modify_context(ce, regs_context, ARRAY_SIZE(regs_context));
>>-	intel_context_unlock_pinned(ce);
>>-	if (err)
>>-		return err;
>>+		err = gen8_modify_context(ce, regs_context,
>>+					  ARRAY_SIZE(regs_context));
>>+		intel_context_unlock_pinned(ce);
>>+		if (err)
>>+			return err;
>>+	}
>>  	/* Apply regs_lri using LRI with pinned context */
>>  	return gen8_modify_self(ce, regs_lri, ARRAY_SIZE(regs_lri), active);
>>@@ -2570,6 +2646,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
>>  			   const struct i915_oa_config *oa_config,
>>  			   struct i915_active *active)
>>  {
>>+	u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset;
>>  	/* The MMIO offsets for Flex EU registers aren't contiguous */
>>  	const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset;
>>  #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1)
>>@@ -2580,7 +2657,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
>>  		},
>>  		{
>>  			GEN8_OACTXCONTROL,
>>-			stream->perf->ctx_oactxctrl_offset + 1,
>>+			ctx_oactxctrl + 1,
>>  		},
>>  		{ EU_PERF_CNTL0, ctx_flexeuN(0) },
>>  		{ EU_PERF_CNTL1, ctx_flexeuN(1) },
>>@@ -4551,6 +4628,37 @@ static void oa_init_supported_formats(struct i915_perf *perf)
>>  	}
>>  }
>>+static void i915_perf_init_info(struct drm_i915_private *i915)
>>+{
>>+	struct i915_perf *perf = &i915->perf;
>>+
>>+	switch (GRAPHICS_VER(i915)) {
>>+	case 8:
>>+		perf->ctx_oactxctrl_offset = 0x120;
>>+		perf->ctx_flexeu0_offset = 0x2ce;
>>+		perf->gen8_valid_ctx_bit = BIT(25);
>>+		break;
>>+	case 9:
>>+		perf->ctx_oactxctrl_offset = 0x128;
>>+		perf->ctx_flexeu0_offset = 0x3de;
>>+		perf->gen8_valid_ctx_bit = BIT(16);
>>+		break;
>>+	case 11:
>>+		perf->ctx_oactxctrl_offset = 0x124;
>>+		perf->ctx_flexeu0_offset = 0x78e;
>>+		perf->gen8_valid_ctx_bit = BIT(16);
>>+		break;
>>+	case 12:
>>+		/*
>>+		 * Calculate offset at runtime in oa_pin_context for gen12 and
>>+		 * cache the value in perf->ctx_oactxctrl_offset.
>>+		 */
>>+		break;
>>+	default:
>>+		MISSING_CASE(GRAPHICS_VER(i915));
>>+	}
>>+}
>>+
>>  /**
>>   * i915_perf_init - initialize i915-perf state on module bind
>>   * @i915: i915 device instance
>>@@ -4589,6 +4697,7 @@ void i915_perf_init(struct drm_i915_private *i915)
>>  		 * execlist mode by default.
>>  		 */
>>  		perf->ops.read = gen8_oa_read;
>>+		i915_perf_init_info(i915);
>>  		if (IS_GRAPHICS_VER(i915, 8, 9)) {
>>  			perf->ops.is_valid_b_counter_reg =
>>@@ -4608,18 +4717,6 @@ void i915_perf_init(struct drm_i915_private *i915)
>>  			perf->ops.enable_metric_set = gen8_enable_metric_set;
>>  			perf->ops.disable_metric_set = gen8_disable_metric_set;
>>  			perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
>>-
>>-			if (GRAPHICS_VER(i915) == 8) {
>>-				perf->ctx_oactxctrl_offset = 0x120;
>>-				perf->ctx_flexeu0_offset = 0x2ce;
>>-
>>-				perf->gen8_valid_ctx_bit = BIT(25);
>>-			} else {
>>-				perf->ctx_oactxctrl_offset = 0x128;
>>-				perf->ctx_flexeu0_offset = 0x3de;
>>-
>>-				perf->gen8_valid_ctx_bit = BIT(16);
>>-			}
>>  		} else if (GRAPHICS_VER(i915) == 11) {
>>  			perf->ops.is_valid_b_counter_reg =
>>  				gen7_is_valid_b_counter_addr;
>>@@ -4633,11 +4730,6 @@ void i915_perf_init(struct drm_i915_private *i915)
>>  			perf->ops.enable_metric_set = gen8_enable_metric_set;
>>  			perf->ops.disable_metric_set = gen11_disable_metric_set;
>>  			perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
>>-
>>-			perf->ctx_oactxctrl_offset = 0x124;
>>-			perf->ctx_flexeu0_offset = 0x78e;
>>-
>>-			perf->gen8_valid_ctx_bit = BIT(16);
>>  		} else if (GRAPHICS_VER(i915) == 12) {
>>  			perf->ops.is_valid_b_counter_reg =
>>  				gen12_is_valid_b_counter_addr;
>>@@ -4651,9 +4743,6 @@ void i915_perf_init(struct drm_i915_private *i915)
>>  			perf->ops.enable_metric_set = gen12_enable_metric_set;
>>  			perf->ops.disable_metric_set = gen12_disable_metric_set;
>>  			perf->ops.oa_hw_tail_read = gen12_oa_hw_tail_read;
>>-
>>-			perf->ctx_flexeu0_offset = 0;
>>-			perf->ctx_oactxctrl_offset = 0x144;
>>  		}
>>  	}
>>diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>>index f31c9f13a9fc..0ef3562ff4aa 100644
>>--- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>>+++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>>@@ -97,7 +97,7 @@
>>  #define  GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT 1
>>  #define  GEN12_OAR_OACONTROL_COUNTER_ENABLE       (1 << 0)
>>-#define GEN12_OACTXCONTROL _MMIO(0x2360)
>>+#define GEN12_OACTXCONTROL(base) _MMIO((base) + 0x360)
>>  #define GEN12_OAR_OASTATUS _MMIO(0x2968)
>>  /* Gen12 OAG unit */
>
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime
  2022-09-06 20:35     ` Umesh Nerlige Ramappa
@ 2022-09-08 18:32       ` Lionel Landwerlin
  2022-09-08 23:04         ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-08 18:32 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On 06/09/2022 23:35, Umesh Nerlige Ramappa wrote:
> On Tue, Sep 06, 2022 at 10:48:50PM +0300, Lionel Landwerlin wrote:
>> On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>> Some SKUs of same gen12 platform may have different oactxctrl
>>> offsets. For gen12, determine oactxctrl offsets at runtime.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/i915_perf.c         | 149 ++++++++++++++++++-----
>>>  drivers/gpu/drm/i915/i915_perf_oa_regs.h |   2 +-
>>>  2 files changed, 120 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>> b/drivers/gpu/drm/i915/i915_perf.c
>>> index 3526693d64fa..efa7eda83edd 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -1363,6 +1363,67 @@ static int gen12_get_render_context_id(struct 
>>> i915_perf_stream *stream)
>>>      return 0;
>>>  }
>>> +#define MI_OPCODE(x) (((x) >> 23) & 0x3f)
>>> +#define IS_MI_LRI_CMD(x) (MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 
>>> 0)))
>>> +#define MI_LRI_LEN(x) (((x) & 0xff) + 1)
>>
>>
>> Maybe you want to put this in intel_gpu_commands.h
>>
>>
>>> +#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX)
>>> +static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset)
>>> +{
>>> +    u32 idx = *offset;
>>> +    u32 len = MI_LRI_LEN(state[idx]) + idx;
>>> +
>>> +    idx++;
>>> +    for (; idx < len; idx += 2)
>>> +        if (state[idx] == reg)
>>> +            break;
>>> +
>>> +    *offset = idx;
>>> +    return state[idx] == reg;
>>> +}
>>> +
>>> +static u32 __context_image_offset(struct intel_context *ce, u32 reg)
>>> +{
>>> +    u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
>>> +    u32 *state = ce->lrc_reg_state;
>>> +
>>> +    for (offset = 0; offset < len; ) {
>>> +        if (IS_MI_LRI_CMD(state[offset])) {
>>
>> I'm a bit concerned you might find other matches with this.
>>
>> Because let's say you run into a 3DSTATE_SUBSLICE_HASH_TABLE 
>> instruction, you'll iterate the instruction dword by dword because 
>> you don't know how to read its length and skip to the next one.
>>
>> Now some of the fields can be programmed from userspace to look like 
>> an MI_LRI header, so you start to read data in the wrong way.
>>
>>
>> Unfortunately I don't have a better solution. My only ask is that you 
>> make __find_reg_in_lri() take the context image size in parameter so 
>> it NEVER goes over the the context image.
>>
>>
>> To limit the risk you should run this function only one at driver 
>> initialization and store the found offset.
>>
>
> Hmm, didn't know that there may be non-LRI commands in the context 
> image or user could add to the context image somehow. Does using the 
> context image size alone address these issues?
>
> Even after including the size in the logic, any reason you think we 
> would be much more safer to do this from init? Is it because context 
> image is not touched by user yet?


The format of the image (commands in there and their offset) is fixed 
per HW generation.

Only the date in each of the commands will vary per context.

In the case of MI_LRI, the register offsets are the same for all 
context, but the value programmed will vary per context.

So executing once should be enough to find the right offset, rather 
than  every time we open the i915-perf stream.


I think once you have the logic to make sure you never read outside the 
image it should be alright.


-Lionel


>
> Thanks,
> Umesh
>
>>
>> Thanks,
>>
>>
>> -Lionel
>>
>>
>>> +            if (__find_reg_in_lri(state, reg, &offset))
>>> +                break;
>>> +        } else {
>>> +            offset++;
>>> +        }
>>> +    }
>>> +
>>> +    return offset < len ? offset : U32_MAX;
>>> +}
>>> +
>>> +static int __set_oa_ctx_ctrl_offset(struct intel_context *ce)
>>> +{
>>> +    i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
>>> +    struct i915_perf *perf = &ce->engine->i915->perf;
>>> +    u32 saved_offset = perf->ctx_oactxctrl_offset;
>>> +    u32 offset;
>>> +
>>> +    /* Do this only once. Failure is stored as offset of U32_MAX */
>>> +    if (saved_offset)
>>> +        return 0;
>>> +
>>> +    offset = __context_image_offset(ce, i915_mmio_reg_offset(reg));
>>> +    perf->ctx_oactxctrl_offset = offset;
>>> +
>>> +    drm_dbg(&ce->engine->i915->drm,
>>> +        "%s oa ctx control at 0x%08x dword offset\n",
>>> +        ce->engine->name, offset);
>>> +
>>> +    return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
>>> +}
>>> +
>>> +static bool engine_supports_mi_query(struct intel_engine_cs *engine)
>>> +{
>>> +    return engine->class == RENDER_CLASS;
>>> +}
>>> +
>>>  /**
>>>   * oa_get_render_ctx_id - determine and hold ctx hw id
>>>   * @stream: An i915-perf stream opened for OA metrics
>>> @@ -1382,6 +1443,17 @@ static int oa_get_render_ctx_id(struct 
>>> i915_perf_stream *stream)
>>>      if (IS_ERR(ce))
>>>          return PTR_ERR(ce);
>>> +    if (engine_supports_mi_query(stream->engine)) {
>>> +        ret = __set_oa_ctx_ctrl_offset(ce);
>>> +        if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) {
>>> +            intel_context_unpin(ce);
>>> +            drm_err(&stream->perf->i915->drm,
>>> +                "Enabling perf query failed for %s\n",
>>> +                stream->engine->name);
>>> +            return ret;
>>> +        }
>>> +    }
>>> +
>>>      switch (GRAPHICS_VER(ce->engine->i915)) {
>>>      case 7: {
>>>          /*
>>> @@ -2412,10 +2484,11 @@ static int 
>>> gen12_configure_oar_context(struct i915_perf_stream *stream,
>>>      int err;
>>>      struct intel_context *ce = stream->pinned_ctx;
>>>      u32 format = stream->oa_buffer.format;
>>> +    u32 offset = stream->perf->ctx_oactxctrl_offset;
>>>      struct flex regs_context[] = {
>>>          {
>>>              GEN8_OACTXCONTROL,
>>> -            stream->perf->ctx_oactxctrl_offset + 1,
>>> +            offset + 1,
>>>              active ? GEN8_OA_COUNTER_RESUME : 0,
>>>          },
>>>      };
>>> @@ -2440,15 +2513,18 @@ static int 
>>> gen12_configure_oar_context(struct i915_perf_stream *stream,
>>>          },
>>>      };
>>> -    /* Modify the context image of pinned context with regs_context*/
>>> -    err = intel_context_lock_pinned(ce);
>>> -    if (err)
>>> -        return err;
>>> +    /* Modify the context image of pinned context with regs_context */
>>> +    if (__valid_oactxctrl_offset(offset)) {
>>> +        err = intel_context_lock_pinned(ce);
>>> +        if (err)
>>> +            return err;
>>> -    err = gen8_modify_context(ce, regs_context, 
>>> ARRAY_SIZE(regs_context));
>>> -    intel_context_unlock_pinned(ce);
>>> -    if (err)
>>> -        return err;
>>> +        err = gen8_modify_context(ce, regs_context,
>>> +                      ARRAY_SIZE(regs_context));
>>> +        intel_context_unlock_pinned(ce);
>>> +        if (err)
>>> +            return err;
>>> +    }
>>>      /* Apply regs_lri using LRI with pinned context */
>>>      return gen8_modify_self(ce, regs_lri, ARRAY_SIZE(regs_lri), 
>>> active);
>>> @@ -2570,6 +2646,7 @@ lrc_configure_all_contexts(struct 
>>> i915_perf_stream *stream,
>>>                 const struct i915_oa_config *oa_config,
>>>                 struct i915_active *active)
>>>  {
>>> +    u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset;
>>>      /* The MMIO offsets for Flex EU registers aren't contiguous */
>>>      const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset;
>>>  #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1)
>>> @@ -2580,7 +2657,7 @@ lrc_configure_all_contexts(struct 
>>> i915_perf_stream *stream,
>>>          },
>>>          {
>>>              GEN8_OACTXCONTROL,
>>> -            stream->perf->ctx_oactxctrl_offset + 1,
>>> +            ctx_oactxctrl + 1,
>>>          },
>>>          { EU_PERF_CNTL0, ctx_flexeuN(0) },
>>>          { EU_PERF_CNTL1, ctx_flexeuN(1) },
>>> @@ -4551,6 +4628,37 @@ static void oa_init_supported_formats(struct 
>>> i915_perf *perf)
>>>      }
>>>  }
>>> +static void i915_perf_init_info(struct drm_i915_private *i915)
>>> +{
>>> +    struct i915_perf *perf = &i915->perf;
>>> +
>>> +    switch (GRAPHICS_VER(i915)) {
>>> +    case 8:
>>> +        perf->ctx_oactxctrl_offset = 0x120;
>>> +        perf->ctx_flexeu0_offset = 0x2ce;
>>> +        perf->gen8_valid_ctx_bit = BIT(25);
>>> +        break;
>>> +    case 9:
>>> +        perf->ctx_oactxctrl_offset = 0x128;
>>> +        perf->ctx_flexeu0_offset = 0x3de;
>>> +        perf->gen8_valid_ctx_bit = BIT(16);
>>> +        break;
>>> +    case 11:
>>> +        perf->ctx_oactxctrl_offset = 0x124;
>>> +        perf->ctx_flexeu0_offset = 0x78e;
>>> +        perf->gen8_valid_ctx_bit = BIT(16);
>>> +        break;
>>> +    case 12:
>>> +        /*
>>> +         * Calculate offset at runtime in oa_pin_context for gen12 and
>>> +         * cache the value in perf->ctx_oactxctrl_offset.
>>> +         */
>>> +        break;
>>> +    default:
>>> +        MISSING_CASE(GRAPHICS_VER(i915));
>>> +    }
>>> +}
>>> +
>>>  /**
>>>   * i915_perf_init - initialize i915-perf state on module bind
>>>   * @i915: i915 device instance
>>> @@ -4589,6 +4697,7 @@ void i915_perf_init(struct drm_i915_private 
>>> *i915)
>>>           * execlist mode by default.
>>>           */
>>>          perf->ops.read = gen8_oa_read;
>>> +        i915_perf_init_info(i915);
>>>          if (IS_GRAPHICS_VER(i915, 8, 9)) {
>>>              perf->ops.is_valid_b_counter_reg =
>>> @@ -4608,18 +4717,6 @@ void i915_perf_init(struct drm_i915_private 
>>> *i915)
>>>              perf->ops.enable_metric_set = gen8_enable_metric_set;
>>>              perf->ops.disable_metric_set = gen8_disable_metric_set;
>>>              perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
>>> -
>>> -            if (GRAPHICS_VER(i915) == 8) {
>>> -                perf->ctx_oactxctrl_offset = 0x120;
>>> -                perf->ctx_flexeu0_offset = 0x2ce;
>>> -
>>> -                perf->gen8_valid_ctx_bit = BIT(25);
>>> -            } else {
>>> -                perf->ctx_oactxctrl_offset = 0x128;
>>> -                perf->ctx_flexeu0_offset = 0x3de;
>>> -
>>> -                perf->gen8_valid_ctx_bit = BIT(16);
>>> -            }
>>>          } else if (GRAPHICS_VER(i915) == 11) {
>>>              perf->ops.is_valid_b_counter_reg =
>>>                  gen7_is_valid_b_counter_addr;
>>> @@ -4633,11 +4730,6 @@ void i915_perf_init(struct drm_i915_private 
>>> *i915)
>>>              perf->ops.enable_metric_set = gen8_enable_metric_set;
>>>              perf->ops.disable_metric_set = gen11_disable_metric_set;
>>>              perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
>>> -
>>> -            perf->ctx_oactxctrl_offset = 0x124;
>>> -            perf->ctx_flexeu0_offset = 0x78e;
>>> -
>>> -            perf->gen8_valid_ctx_bit = BIT(16);
>>>          } else if (GRAPHICS_VER(i915) == 12) {
>>>              perf->ops.is_valid_b_counter_reg =
>>>                  gen12_is_valid_b_counter_addr;
>>> @@ -4651,9 +4743,6 @@ void i915_perf_init(struct drm_i915_private 
>>> *i915)
>>>              perf->ops.enable_metric_set = gen12_enable_metric_set;
>>>              perf->ops.disable_metric_set = gen12_disable_metric_set;
>>>              perf->ops.oa_hw_tail_read = gen12_oa_hw_tail_read;
>>> -
>>> -            perf->ctx_flexeu0_offset = 0;
>>> -            perf->ctx_oactxctrl_offset = 0x144;
>>>          }
>>>      }
>>> diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h 
>>> b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>>> index f31c9f13a9fc..0ef3562ff4aa 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>>> +++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>>> @@ -97,7 +97,7 @@
>>>  #define  GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT 1
>>>  #define  GEN12_OAR_OACONTROL_COUNTER_ENABLE       (1 << 0)
>>> -#define GEN12_OACTXCONTROL _MMIO(0x2360)
>>> +#define GEN12_OACTXCONTROL(base) _MMIO((base) + 0x360)
>>>  #define GEN12_OAR_OASTATUS _MMIO(0x2968)
>>>  /* Gen12 OAG unit */
>>
>>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime
  2022-09-08 18:32       ` Lionel Landwerlin
@ 2022-09-08 23:04         ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-08 23:04 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Thu, Sep 08, 2022 at 09:32:12PM +0300, Lionel Landwerlin wrote:
>On 06/09/2022 23:35, Umesh Nerlige Ramappa wrote:
>>On Tue, Sep 06, 2022 at 10:48:50PM +0300, Lionel Landwerlin wrote:
>>>On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>>>Some SKUs of same gen12 platform may have different oactxctrl
>>>>offsets. For gen12, determine oactxctrl offsets at runtime.
>>>>
>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>>---
>>>> drivers/gpu/drm/i915/i915_perf.c         | 149 ++++++++++++++++++-----
>>>> drivers/gpu/drm/i915/i915_perf_oa_regs.h |   2 +-
>>>> 2 files changed, 120 insertions(+), 31 deletions(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>b/drivers/gpu/drm/i915/i915_perf.c
>>>>index 3526693d64fa..efa7eda83edd 100644
>>>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>@@ -1363,6 +1363,67 @@ static int 
>>>>gen12_get_render_context_id(struct i915_perf_stream *stream)
>>>>     return 0;
>>>> }
>>>>+#define MI_OPCODE(x) (((x) >> 23) & 0x3f)
>>>>+#define IS_MI_LRI_CMD(x) (MI_OPCODE(x) == 
>>>>MI_OPCODE(MI_INSTR(0x22, 0)))
>>>>+#define MI_LRI_LEN(x) (((x) & 0xff) + 1)
>>>
>>>
>>>Maybe you want to put this in intel_gpu_commands.h
>>>
>>>
>>>>+#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX)
>>>>+static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset)
>>>>+{
>>>>+    u32 idx = *offset;
>>>>+    u32 len = MI_LRI_LEN(state[idx]) + idx;
>>>>+
>>>>+    idx++;
>>>>+    for (; idx < len; idx += 2)
>>>>+        if (state[idx] == reg)
>>>>+            break;
>>>>+
>>>>+    *offset = idx;
>>>>+    return state[idx] == reg;
>>>>+}
>>>>+
>>>>+static u32 __context_image_offset(struct intel_context *ce, u32 reg)
>>>>+{
>>>>+    u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
>>>>+    u32 *state = ce->lrc_reg_state;
>>>>+
>>>>+    for (offset = 0; offset < len; ) {
>>>>+        if (IS_MI_LRI_CMD(state[offset])) {
>>>
>>>I'm a bit concerned you might find other matches with this.
>>>
>>>Because let's say you run into a 3DSTATE_SUBSLICE_HASH_TABLE 
>>>instruction, you'll iterate the instruction dword by dword because 
>>>you don't know how to read its length and skip to the next one.
>>>
>>>Now some of the fields can be programmed from userspace to look 
>>>like an MI_LRI header, so you start to read data in the wrong way.
>>>
>>>
>>>Unfortunately I don't have a better solution. My only ask is that 
>>>you make __find_reg_in_lri() take the context image size in 
>>>parameter so it NEVER goes over the the context image.
>>>
>>>
>>>To limit the risk you should run this function only one at driver 
>>>initialization and store the found offset.
>>>
>>
>>Hmm, didn't know that there may be non-LRI commands in the context 
>>image or user could add to the context image somehow. Does using the 
>>context image size alone address these issues?
>>
>>Even after including the size in the logic, any reason you think we 
>>would be much more safer to do this from init? Is it because context 
>>image is not touched by user yet?
>
>
>The format of the image (commands in there and their offset) is fixed 
>per HW generation.
>
>Only the date in each of the commands will vary per context.
>
>In the case of MI_LRI, the register offsets are the same for all 
>context, but the value programmed will vary per context.
>
>So executing once should be enough to find the right offset, rather 
>than  every time we open the i915-perf stream.
>

In the current logic, the context image is traversed only once per 
driver load (even though the first time it happens is when a stream is 
opened). see saved_offset below.

>
>I think once you have the logic to make sure you never read outside 
>the image it should be alright.

ok, I will check that __find_reg_in_lri() does not go over the context 
image size.

Thanks,
Umesh

>
>
>-Lionel
>
>
>>
>>Thanks,
>>Umesh
>>
>>>
>>>Thanks,
>>>
>>>
>>>-Lionel
>>>
>>>
>>>>+            if (__find_reg_in_lri(state, reg, &offset))
>>>>+                break;
>>>>+        } else {
>>>>+            offset++;
>>>>+        }
>>>>+    }
>>>>+
>>>>+    return offset < len ? offset : U32_MAX;
>>>>+}
>>>>+
>>>>+static int __set_oa_ctx_ctrl_offset(struct intel_context *ce)
>>>>+{
>>>>+    i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
>>>>+    struct i915_perf *perf = &ce->engine->i915->perf;
>>>>+    u32 saved_offset = perf->ctx_oactxctrl_offset;
>>>>+    u32 offset;
>>>>+
>>>>+    /* Do this only once. Failure is stored as offset of U32_MAX */
>>>>+    if (saved_offset)
>>>>+        return 0;
>>>>+
>>>>+    offset = __context_image_offset(ce, i915_mmio_reg_offset(reg));
>>>>+    perf->ctx_oactxctrl_offset = offset;
>>>>+
>>>>+    drm_dbg(&ce->engine->i915->drm,
>>>>+        "%s oa ctx control at 0x%08x dword offset\n",
>>>>+        ce->engine->name, offset);
>>>>+
>>>>+    return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
>>>>+}
>>>>+
>>>>+static bool engine_supports_mi_query(struct intel_engine_cs *engine)
>>>>+{
>>>>+    return engine->class == RENDER_CLASS;
>>>>+}
>>>>+
>>>> /**
>>>>  * oa_get_render_ctx_id - determine and hold ctx hw id
>>>>  * @stream: An i915-perf stream opened for OA metrics
>>>>@@ -1382,6 +1443,17 @@ static int oa_get_render_ctx_id(struct 
>>>>i915_perf_stream *stream)
>>>>     if (IS_ERR(ce))
>>>>         return PTR_ERR(ce);
>>>>+    if (engine_supports_mi_query(stream->engine)) {
>>>>+        ret = __set_oa_ctx_ctrl_offset(ce);
>>>>+        if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) {
>>>>+            intel_context_unpin(ce);
>>>>+            drm_err(&stream->perf->i915->drm,
>>>>+                "Enabling perf query failed for %s\n",
>>>>+                stream->engine->name);
>>>>+            return ret;
>>>>+        }
>>>>+    }
>>>>+
>>>>     switch (GRAPHICS_VER(ce->engine->i915)) {
>>>>     case 7: {
>>>>         /*
>>>>@@ -2412,10 +2484,11 @@ static int 
>>>>gen12_configure_oar_context(struct i915_perf_stream *stream,
>>>>     int err;
>>>>     struct intel_context *ce = stream->pinned_ctx;
>>>>     u32 format = stream->oa_buffer.format;
>>>>+    u32 offset = stream->perf->ctx_oactxctrl_offset;
>>>>     struct flex regs_context[] = {
>>>>         {
>>>>             GEN8_OACTXCONTROL,
>>>>-            stream->perf->ctx_oactxctrl_offset + 1,
>>>>+            offset + 1,
>>>>             active ? GEN8_OA_COUNTER_RESUME : 0,
>>>>         },
>>>>     };
>>>>@@ -2440,15 +2513,18 @@ static int 
>>>>gen12_configure_oar_context(struct i915_perf_stream *stream,
>>>>         },
>>>>     };
>>>>-    /* Modify the context image of pinned context with regs_context*/
>>>>-    err = intel_context_lock_pinned(ce);
>>>>-    if (err)
>>>>-        return err;
>>>>+    /* Modify the context image of pinned context with regs_context */
>>>>+    if (__valid_oactxctrl_offset(offset)) {
>>>>+        err = intel_context_lock_pinned(ce);
>>>>+        if (err)
>>>>+            return err;
>>>>-    err = gen8_modify_context(ce, regs_context, 
>>>>ARRAY_SIZE(regs_context));
>>>>-    intel_context_unlock_pinned(ce);
>>>>-    if (err)
>>>>-        return err;
>>>>+        err = gen8_modify_context(ce, regs_context,
>>>>+                      ARRAY_SIZE(regs_context));
>>>>+        intel_context_unlock_pinned(ce);
>>>>+        if (err)
>>>>+            return err;
>>>>+    }
>>>>     /* Apply regs_lri using LRI with pinned context */
>>>>     return gen8_modify_self(ce, regs_lri, ARRAY_SIZE(regs_lri), 
>>>>active);
>>>>@@ -2570,6 +2646,7 @@ lrc_configure_all_contexts(struct 
>>>>i915_perf_stream *stream,
>>>>                const struct i915_oa_config *oa_config,
>>>>                struct i915_active *active)
>>>> {
>>>>+    u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset;
>>>>     /* The MMIO offsets for Flex EU registers aren't contiguous */
>>>>     const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset;
>>>> #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1)
>>>>@@ -2580,7 +2657,7 @@ lrc_configure_all_contexts(struct 
>>>>i915_perf_stream *stream,
>>>>         },
>>>>         {
>>>>             GEN8_OACTXCONTROL,
>>>>-            stream->perf->ctx_oactxctrl_offset + 1,
>>>>+            ctx_oactxctrl + 1,
>>>>         },
>>>>         { EU_PERF_CNTL0, ctx_flexeuN(0) },
>>>>         { EU_PERF_CNTL1, ctx_flexeuN(1) },
>>>>@@ -4551,6 +4628,37 @@ static void 
>>>>oa_init_supported_formats(struct i915_perf *perf)
>>>>     }
>>>> }
>>>>+static void i915_perf_init_info(struct drm_i915_private *i915)
>>>>+{
>>>>+    struct i915_perf *perf = &i915->perf;
>>>>+
>>>>+    switch (GRAPHICS_VER(i915)) {
>>>>+    case 8:
>>>>+        perf->ctx_oactxctrl_offset = 0x120;
>>>>+        perf->ctx_flexeu0_offset = 0x2ce;
>>>>+        perf->gen8_valid_ctx_bit = BIT(25);
>>>>+        break;
>>>>+    case 9:
>>>>+        perf->ctx_oactxctrl_offset = 0x128;
>>>>+        perf->ctx_flexeu0_offset = 0x3de;
>>>>+        perf->gen8_valid_ctx_bit = BIT(16);
>>>>+        break;
>>>>+    case 11:
>>>>+        perf->ctx_oactxctrl_offset = 0x124;
>>>>+        perf->ctx_flexeu0_offset = 0x78e;
>>>>+        perf->gen8_valid_ctx_bit = BIT(16);
>>>>+        break;
>>>>+    case 12:
>>>>+        /*
>>>>+         * Calculate offset at runtime in oa_pin_context for gen12 and
>>>>+         * cache the value in perf->ctx_oactxctrl_offset.
>>>>+         */
>>>>+        break;
>>>>+    default:
>>>>+        MISSING_CASE(GRAPHICS_VER(i915));
>>>>+    }
>>>>+}
>>>>+
>>>> /**
>>>>  * i915_perf_init - initialize i915-perf state on module bind
>>>>  * @i915: i915 device instance
>>>>@@ -4589,6 +4697,7 @@ void i915_perf_init(struct 
>>>>drm_i915_private *i915)
>>>>          * execlist mode by default.
>>>>          */
>>>>         perf->ops.read = gen8_oa_read;
>>>>+        i915_perf_init_info(i915);
>>>>         if (IS_GRAPHICS_VER(i915, 8, 9)) {
>>>>             perf->ops.is_valid_b_counter_reg =
>>>>@@ -4608,18 +4717,6 @@ void i915_perf_init(struct 
>>>>drm_i915_private *i915)
>>>>             perf->ops.enable_metric_set = gen8_enable_metric_set;
>>>>             perf->ops.disable_metric_set = gen8_disable_metric_set;
>>>>             perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
>>>>-
>>>>-            if (GRAPHICS_VER(i915) == 8) {
>>>>-                perf->ctx_oactxctrl_offset = 0x120;
>>>>-                perf->ctx_flexeu0_offset = 0x2ce;
>>>>-
>>>>-                perf->gen8_valid_ctx_bit = BIT(25);
>>>>-            } else {
>>>>-                perf->ctx_oactxctrl_offset = 0x128;
>>>>-                perf->ctx_flexeu0_offset = 0x3de;
>>>>-
>>>>-                perf->gen8_valid_ctx_bit = BIT(16);
>>>>-            }
>>>>         } else if (GRAPHICS_VER(i915) == 11) {
>>>>             perf->ops.is_valid_b_counter_reg =
>>>>                 gen7_is_valid_b_counter_addr;
>>>>@@ -4633,11 +4730,6 @@ void i915_perf_init(struct 
>>>>drm_i915_private *i915)
>>>>             perf->ops.enable_metric_set = gen8_enable_metric_set;
>>>>             perf->ops.disable_metric_set = gen11_disable_metric_set;
>>>>             perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
>>>>-
>>>>-            perf->ctx_oactxctrl_offset = 0x124;
>>>>-            perf->ctx_flexeu0_offset = 0x78e;
>>>>-
>>>>-            perf->gen8_valid_ctx_bit = BIT(16);
>>>>         } else if (GRAPHICS_VER(i915) == 12) {
>>>>             perf->ops.is_valid_b_counter_reg =
>>>>                 gen12_is_valid_b_counter_addr;
>>>>@@ -4651,9 +4743,6 @@ void i915_perf_init(struct 
>>>>drm_i915_private *i915)
>>>>             perf->ops.enable_metric_set = gen12_enable_metric_set;
>>>>             perf->ops.disable_metric_set = gen12_disable_metric_set;
>>>>             perf->ops.oa_hw_tail_read = gen12_oa_hw_tail_read;
>>>>-
>>>>-            perf->ctx_flexeu0_offset = 0;
>>>>-            perf->ctx_oactxctrl_offset = 0x144;
>>>>         }
>>>>     }
>>>>diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h 
>>>>b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>>>>index f31c9f13a9fc..0ef3562ff4aa 100644
>>>>--- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>>>>+++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>>>>@@ -97,7 +97,7 @@
>>>> #define  GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT 1
>>>> #define  GEN12_OAR_OACONTROL_COUNTER_ENABLE       (1 << 0)
>>>>-#define GEN12_OACTXCONTROL _MMIO(0x2360)
>>>>+#define GEN12_OACTXCONTROL(base) _MMIO((base) + 0x360)
>>>> #define GEN12_OAR_OASTATUS _MMIO(0x2968)
>>>> /* Gen12 OAG unit */
>>>
>>>
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
  2022-09-06 14:33   ` Lionel Landwerlin
@ 2022-09-09 23:47   ` Dixit, Ashutosh
  2022-09-13  3:08     ` Dixit, Ashutosh
                       ` (2 more replies)
  1 sibling, 3 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-09 23:47 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:37 -0700, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> With GuC mode of submission, GuC is in control of defining the context id field
> that is part of the OA reports. To filter reports, UMD and KMD must know what sw
> context id was chosen by GuC. There is not interface between KMD and GuC to
> determine this, so read the upper-dword of EXECLIST_STATUS to filter/squash OA
> reports for the specific context.

Do you think it is worth defining an interface for GuC to return the sw
ctx_id it will be using for a ctx, say at ctx registration time?

The scheme implemented in this patch to read the ctx_id is certainly very
clever, at least to me. But as Lionel was saying is it a agreed upon
immutable interface? If it is, we can go with this patch.

(Though even then we will need to maintain this code even if in the future
GuC FW is changed to return the ctx_id in order to preserve backwards
comptability with previous GuC versions. So maybe better to have a real
interface between GuC and KMD earlier rather than later?).

Also a couple of general comments below.

>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.h |   2 +
>  drivers/gpu/drm/i915/i915_perf.c    | 141 ++++++++++++++++++++++++----
>  2 files changed, 124 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
> index a390f0813c8b..7111bae759f3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
> @@ -110,6 +110,8 @@ enum {
>  #define XEHP_SW_CTX_ID_WIDTH			16
>  #define XEHP_SW_COUNTER_SHIFT			58
>  #define XEHP_SW_COUNTER_WIDTH			6
> +#define GEN12_GUC_SW_CTX_ID_SHIFT		39
> +#define GEN12_GUC_SW_CTX_ID_WIDTH		16
>
>  static inline void lrc_runtime_start(struct intel_context *ce)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index f3c23fe9ad9c..735244a3aedd 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1233,6 +1233,125 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
>	return stream->pinned_ctx;
>  }
>
> +static int
> +__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset)
> +{
> +	u32 *cs, cmd;
> +
> +	cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
> +	if (GRAPHICS_VER(rq->engine->i915) >= 8)
> +		cmd++;
> +
> +	cs = intel_ring_begin(rq, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = cmd;
> +	*cs++ = i915_mmio_reg_offset(reg);
> +	*cs++ = ggtt_offset;
> +	*cs++ = 0;
> +
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +static int
> +__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset)
> +{
> +	struct i915_request *rq;
> +	int err;
> +
> +	rq = i915_request_create(ce);
> +	if (IS_ERR(rq))
> +		return PTR_ERR(rq);
> +
> +	i915_request_get(rq);
> +
> +	err = __store_reg_to_mem(rq, reg, ggtt_offset);
> +
> +	i915_request_add(rq);
> +	if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
> +		err = -ETIME;
> +
> +	i915_request_put(rq);
> +
> +	return err;
> +}
> +
> +static int
> +gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
> +{
> +	struct i915_vma *scratch;
> +	u32 *val;
> +	int err;
> +
> +	scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4);
> +	if (IS_ERR(scratch))
> +		return PTR_ERR(scratch);
> +
> +	err = i915_vma_sync(scratch);
> +	if (err)
> +		goto err_scratch;
> +
> +	err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
> +			 i915_ggtt_offset(scratch));

Actually the RING_EXECLIST_STATUS_HI is MMIO so can be read using say
ENGINE_READ/intel_uncore_read. The only issue is how to read it when this
ctx is scheduled which is cleverly solved by the scheme above. But I am not
sure if there is any other simpler way to do it.

> +	if (err)
> +		goto err_scratch;
> +
> +	val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB);
> +	if (IS_ERR(val)) {
> +		err = PTR_ERR(val);
> +		goto err_scratch;
> +	}
> +
> +	*ctx_id = *val;
> +	i915_gem_object_unpin_map(scratch->obj);
> +
> +err_scratch:
> +	i915_vma_unpin_and_release(&scratch, 0);
> +	return err;
> +}
> +
> +/*
> + * For execlist mode of submission, pick an unused context id
> + * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
> + * XXX_MAX_CONTEXT_HW_ID is used by idle context
> + *
> + * For GuC mode of submission read context id from the upper dword of the
> + * EXECLIST_STATUS register.
> + */
> +static int gen12_get_render_context_id(struct i915_perf_stream *stream)
> +{
> +	u32 ctx_id, mask;
> +	int ret;
> +
> +	if (intel_engine_uses_guc(stream->engine)) {
> +		ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
> +		if (ret)
> +			return ret;
> +
> +		mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
> +			(GEN12_GUC_SW_CTX_ID_SHIFT - 32);
> +	} else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) {
> +		ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
> +			(XEHP_SW_CTX_ID_SHIFT - 32);
> +
> +		mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
> +			(XEHP_SW_CTX_ID_SHIFT - 32);
> +	} else {
> +		ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
> +			 (GEN11_SW_CTX_ID_SHIFT - 32);
> +
> +		mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
> +			(GEN11_SW_CTX_ID_SHIFT - 32);

Previously I missed that these ctx_id's for non-GuC cases are just
constants. How does it work in these cases?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-09 23:47   ` Dixit, Ashutosh
@ 2022-09-13  3:08     ` Dixit, Ashutosh
  2022-09-14 23:37       ` Umesh Nerlige Ramappa
  2022-09-14 23:36     ` Umesh Nerlige Ramappa
  2022-09-22  3:44     ` Dixit, Ashutosh
  2 siblings, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-13  3:08 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 09 Sep 2022 16:47:36 -0700, Dixit, Ashutosh wrote:
>
> On Tue, 23 Aug 2022 13:41:37 -0700, Umesh Nerlige Ramappa wrote:
> >
>
> Hi Umesh,
>
> > With GuC mode of submission, GuC is in control of defining the context id field
> > that is part of the OA reports. To filter reports, UMD and KMD must know what sw
> > context id was chosen by GuC. There is not interface between KMD and GuC to
> > determine this, so read the upper-dword of EXECLIST_STATUS to filter/squash OA
> > reports for the specific context.
>
> Do you think it is worth defining an interface for GuC to return the sw
> ctx_id it will be using for a ctx, say at ctx registration time?

Umesh, I came across these in GuC documentation:

guc_pcv1_context_parameters_set_h2g_data_t::context_id
guc_pcv2_context_parameters_set_h2g_data_t::context_id

Also in the code we have in prepare_context_registration_info_v70 'ctx_id =
ce->guc_id.id' which seems to be assigned in new_guc_id. So wondering if
this is what we need and we already have it?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2 Umesh Nerlige Ramappa
  2022-09-06 19:35   ` Lionel Landwerlin
@ 2022-09-13 15:40   ` Dixit, Ashutosh
  2022-09-14 20:54     ` Umesh Nerlige Ramappa
  1 sibling, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-13 15:40 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:38 -0700, Umesh Nerlige Ramappa wrote:
>
> Add new OA formats for DG2.

Should we change the patch title and commit message a bit to 'Add OAR and
OAG formats for DG2'?

> Some of the newer OA formats are not
> multples of 64 bytes and are not powers of 2. For those formats, adjust
> hw_tail accordingly when checking for new reports.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramampa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 63 ++++++++++++++++++++------------
>  include/uapi/drm/i915_drm.h      |  6 +++
>  2 files changed, 46 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 735244a3aedd..c8331b549d31 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -306,7 +306,8 @@ static u32 i915_oa_max_sample_rate = 100000;
>
>  /* XXX: beware if future OA HW adds new report formats that the current
>   * code assumes all reports have a power-of-two size and ~(size - 1) can
> - * be used as a mask to align the OA tail pointer.
> + * be used as a mask to align the OA tail pointer. In some of the
> + * formats, R is used to denote reserved field.
>   */
>  static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>	[I915_OA_FORMAT_A13]	    = { 0, 64 },
> @@ -320,6 +321,10 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>	[I915_OA_FORMAT_A12]		    = { 0, 64 },
>	[I915_OA_FORMAT_A12_B8_C8]	    = { 2, 128 },
>	[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
> +	[I915_OAR_FORMAT_A32u40_A4u32_B8_C8]    = { 5, 256 },
> +	[I915_OA_FORMAT_A24u40_A14u32_B8_C8]    = { 5, 256 },
> +	[I915_OAR_FORMAT_A36u64_B8_C8]		= { 1, 384 },
> +	[I915_OA_FORMAT_A38u64_R2u64_B8_C8]	= { 1, 448 },

Isn't the size for this last one 416 (or 400)? Bspec: 52198. Unless the
size has to be a multiple of 64?

Looks like Lionel's R-b is not showing up on Patchwork, might need to be
manually added. For now this is:

Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 05/19] drm/i915/perf: Enable commands per clock reporting in OA
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 05/19] drm/i915/perf: Enable commands per clock reporting in OA Umesh Nerlige Ramappa
  2022-09-06 19:51   ` Lionel Landwerlin
@ 2022-09-14  0:19   ` Dixit, Ashutosh
  2022-09-15  0:04     ` Umesh Nerlige Ramappa
  1 sibling, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-14  0:19 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:41 -0700, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> XEHPSDV and DG2 provide a way to configure bytes per clock vs commands
> per clock reporting. Enable command per clock setting on enabling OA.

What is the reason for selecting commands per clock vs bytes per clock?
Also probably mention Bspec: 51762 in the commit message too.

> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index efa7eda83edd..6fc4f0d8fc5a 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -2745,10 +2745,12 @@ static int
>  gen12_enable_metric_set(struct i915_perf_stream *stream,
>			struct i915_active *active)
>  {
> +	struct drm_i915_private *i915 = stream->perf->i915;
>	struct intel_uncore *uncore = stream->uncore;
>	struct i915_oa_config *oa_config = stream->oa_config;
>	bool periodic = stream->periodic;
>	u32 period_exponent = stream->period_exponent;
> +	u32 sqcnt1;
>	int ret;
>
>	intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG,
> @@ -2767,6 +2769,16 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
>			    (period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT))
>			    : 0);
>
> +	/*
> +	 * Initialize Super Queue Internal Cnt Register
> +	 * Set PMON Enable in order to collect valid metrics.
> +	 * Enable commands per clock reporting in OA for XEHPSDV onward.
> +	 */
> +	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
> +		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);

Also from Bspec 0:Unitsof4cmd and 1:Unitsof128B so looks like bit 29 should
be set to 0 for commands per clock setting? Or I am wrong?

> +
> +	intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, sqcnt1);
> +
>	/*
>	 * Update all contexts prior writing the mux configurations as we need
>	 * to make sure all slices/subslices are ON before writing to NOA
> @@ -2816,6 +2828,8 @@ static void gen11_disable_metric_set(struct i915_perf_stream *stream)
>  static void gen12_disable_metric_set(struct i915_perf_stream *stream)
>  {
>	struct intel_uncore *uncore = stream->uncore;
> +	struct drm_i915_private *i915 = stream->perf->i915;
> +	u32 sqcnt1;
>
>	/* Reset all contexts' slices/subslices configurations. */
>	gen12_configure_all_contexts(stream, NULL, NULL);
> @@ -2826,6 +2840,12 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
>
>	/* Make sure we disable noa to save power. */
>	intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0);
> +
> +	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
> +		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
> +
> +	/* Reset PMON Enable to save power. */
> +	intel_uncore_rmw(uncore, GEN12_SQCNT1, sqcnt1, 0);
>  }
>
>  static void gen7_oa_enable(struct i915_perf_stream *stream)
> diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
> index 0ef3562ff4aa..381d94101610 100644
> --- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
> +++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
> @@ -134,4 +134,8 @@
>  #define GDT_CHICKEN_BITS    _MMIO(0x9840)
>  #define   GT_NOA_ENABLE	    0x00000080
>
> +#define GEN12_SQCNT1				_MMIO(0x8718)
> +#define   GEN12_SQCNT1_PMON_ENABLE		REG_BIT(30)
> +#define   GEN12_SQCNT1_OABPC			REG_BIT(29)
> +
>  #endif /* __INTEL_PERF_OA_REGS__ */

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 06/19] drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 06/19] drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size Umesh Nerlige Ramappa
@ 2022-09-14 16:04   ` Dixit, Ashutosh
  2022-09-14 18:19     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-14 16:04 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:42 -0700, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 6fc4f0d8fc5a..bbf1c574f393 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -385,6 +385,21 @@ static struct ctl_table_header *sysctl_header;
>
>  static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer);
>
> +static u32 _oa_taken(struct i915_perf_stream * stream, u32 tail, u32 head)

nit: no space between * and stream.

> +{
> +	u32 size = stream->oa_buffer.vma->size;
> +
> +	return tail >= head ? tail - head : size - (head - tail);
> +}

If we are doing this we should probably eliminate references to OA_TAKEN
which serves an identical purpose (I think there is one remaining
reference) and also delete OA_TAKEN #define.

> +
> +static u32 _rewind_tail(struct i915_perf_stream * stream, u32 relative_hw_tail,
> +			u32 rewind_delta)
> +{
> +	return rewind_delta > relative_hw_tail ?
> +	       stream->oa_buffer.vma->size - (rewind_delta - relative_hw_tail) :
> +	       relative_hw_tail - rewind_delta;
> +}

Also are we really saying here that we are supporting non-power-of-2 OA
buffer sizes? Because if we stayed with power-of-2 sizes the expression
above are nice and elegant and actually closer to the previous code being
changed in this patch. For example:

#include <linux/circ_buf.h>

static u32 _oa_taken(struct i915_perf_stream *stream, u32 tail, u32 head)
{
	return CIRC_CNT(tail, head, stream->oa_buffer.vma->size);
}

static u32 _rewind_tail(struct i915_perf_stream *stream, u32 relative_hw_tail,
       			u32 rewind_delta)
{
	return CIRC_CNT(relative_hw_tail, rewind_delta, stream->oa_buffer.vma->size);
}

Note that for power-of-2 sizes the two functions above are identical but we
should keep them separate for clarity (as is done in the patch) since they
are serving two different functions in the OA code.

Also another assumption in the code seems to be:

	stream->oa_buffer.vma->size == OA_BUFFER_SIZE

which I am pretty sure will not hold for arbitrary non-power-of-2 OA buffer
sizes? So we might as well stick with power-of-2 sizes and change later in
a separate patch only if needed?

Thanks.
--
Ashutosh

> +
>  void i915_oa_config_release(struct kref *ref)
>  {
>	struct i915_oa_config *oa_config =
> @@ -487,12 +502,14 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>	 * sizes need not be integral multiples or 64 or powers of 2.
>	 * Compute potentially partially landed report in the OA buffer
>	 */
> -	partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail);
> +	partial_report_size =
> +		_oa_taken(stream, hw_tail, stream->oa_buffer.tail);
>	partial_report_size %= report_size;
>
>	/* Subtract partial amount off the tail */
> -	hw_tail = gtt_offset + ((hw_tail - partial_report_size) &
> -				(stream->oa_buffer.vma->size - 1));
> +	hw_tail = gtt_offset + _rewind_tail(stream,
> +					    hw_tail - gtt_offset,
> +					    partial_report_size);
>
>	now = ktime_get_mono_fast_ns();
>
> @@ -527,16 +544,16 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>		 * memory in the order they were written to.
>		 * If not : (╯°□°)╯︵ ┻━┻
>		 */
> -		while (OA_TAKEN(tail, aged_tail) >= report_size) {
> +		while (_oa_taken(stream, tail, aged_tail) >= report_size) {
>			u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail);
>
>			if (report32[0] != 0 || report32[1] != 0)
>				break;
>
> -			tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
> +			tail = _rewind_tail(stream, tail, report_size);
>		}
>
> -		if (OA_TAKEN(hw_tail, tail) > report_size &&
> +		if (_oa_taken(stream, hw_tail, tail) > report_size &&
>		    __ratelimit(&stream->perf->tail_pointer_race))
>			DRM_NOTE("unlanded report(s) head=0x%x "
>				 "tail=0x%x hw_tail=0x%x\n",
> @@ -547,8 +564,9 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>		stream->oa_buffer.aging_timestamp = now;
>	}
>
> -	pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> -			  stream->oa_buffer.head - gtt_offset) >= report_size;
> +	pollin = _oa_taken(stream,
> +			   stream->oa_buffer.tail,
> +			   stream->oa_buffer.head) >= report_size;
>
>	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>
> @@ -679,11 +697,9 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>	int report_size = stream->oa_buffer.format_size;
>	u8 *oa_buf_base = stream->oa_buffer.vaddr;
>	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
> -	u32 mask = (OA_BUFFER_SIZE - 1);
>	size_t start_offset = *offset;
>	unsigned long flags;
> -	u32 head, tail;
> -	u32 taken;
> +	u32 head, tail, size;
>	int ret = 0;
>
>	if (drm_WARN_ON(&uncore->i915->drm, !stream->enabled))
> @@ -693,6 +709,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>
>	head = stream->oa_buffer.head;
>	tail = stream->oa_buffer.tail;
> +	size = stream->oa_buffer.vma->size;
>
>	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>
> @@ -711,16 +728,15 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>	 * all a power of two).
>	 */
>	if (drm_WARN_ONCE(&uncore->i915->drm,
> -			  head > stream->oa_buffer.vma->size ||
> -			  tail > stream->oa_buffer.vma->size,
> +			  head > size || tail > size,
>			  "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
>			  head, tail))
>		return -EIO;
>
>
>	for (/* none */;
> -	     (taken = OA_TAKEN(tail, head));
> -	     head = (head + report_size) & mask) {
> +	     _oa_taken(stream, tail, head);
> +	     head = (head + report_size) % size) {
>		u8 *report = oa_buf_base + head;
>		u32 *report32 = (void *)report;
>		u32 ctx_id;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 06/19] drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size
  2022-09-14 16:04   ` Dixit, Ashutosh
@ 2022-09-14 18:19     ` Umesh Nerlige Ramappa
  2022-09-14 19:07       ` Dixit, Ashutosh
  0 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-14 18:19 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Wed, Sep 14, 2022 at 09:04:10AM -0700, Dixit, Ashutosh wrote:
>On Tue, 23 Aug 2022 13:41:42 -0700, Umesh Nerlige Ramappa wrote:
>>
>
>Hi Umesh,
>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 6fc4f0d8fc5a..bbf1c574f393 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -385,6 +385,21 @@ static struct ctl_table_header *sysctl_header;
>>
>>  static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer);
>>
>> +static u32 _oa_taken(struct i915_perf_stream * stream, u32 tail, u32 head)
>
>nit: no space between * and stream.
>
>> +{
>> +	u32 size = stream->oa_buffer.vma->size;
>> +
>> +	return tail >= head ? tail - head : size - (head - tail);
>> +}
>
>If we are doing this we should probably eliminate references to OA_TAKEN
>which serves an identical purpose (I think there is one remaining
>reference) and also delete OA_TAKEN #define.
>
>> +
>> +static u32 _rewind_tail(struct i915_perf_stream * stream, u32 relative_hw_tail,
>> +			u32 rewind_delta)
>> +{
>> +	return rewind_delta > relative_hw_tail ?
>> +	       stream->oa_buffer.vma->size - (rewind_delta - relative_hw_tail) :
>> +	       relative_hw_tail - rewind_delta;
>> +}
>
>Also are we really saying here that we are supporting non-power-of-2 OA
>buffer sizes? Because if we stayed with power-of-2 sizes the expression
>above are nice and elegant and actually closer to the previous code being
>changed in this patch. For example:
>
>#include <linux/circ_buf.h>
>
>static u32 _oa_taken(struct i915_perf_stream *stream, u32 tail, u32 head)
>{
>	return CIRC_CNT(tail, head, stream->oa_buffer.vma->size);
>}
>
>static u32 _rewind_tail(struct i915_perf_stream *stream, u32 relative_hw_tail,
>       			u32 rewind_delta)
>{
>	return CIRC_CNT(relative_hw_tail, rewind_delta, stream->oa_buffer.vma->size);
>}
>
>Note that for power-of-2 sizes the two functions above are identical but we
>should keep them separate for clarity (as is done in the patch) since they
>are serving two different functions in the OA code.
>
>Also another assumption in the code seems to be:
>
>	stream->oa_buffer.vma->size == OA_BUFFER_SIZE
>
>which I am pretty sure will not hold for arbitrary non-power-of-2 OA buffer
>sizes? So we might as well stick with power-of-2 sizes and change later in
>a separate patch only if needed?

Most changes here are related to the OA buffer size issue and that is 
specific to xehpsdv where the size is not a power of 2. I am thinking of 
dropping these changes in the next revision since DG2 is fixed and OA 
buffer sizes are power of 2.

Thanks,
Umesh

>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 08/19] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 08/19] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
  2022-09-06 19:54   ` Lionel Landwerlin
@ 2022-09-14 18:20   ` Dixit, Ashutosh
  1 sibling, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-14 18:20 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:44 -0700, Umesh Nerlige Ramappa wrote:
>
> Make perf part of gt as the OAG buffer is specific to a gt. The refactor
> eventually simplifies programming the right OA buffer and the right HW
> registers when supporting multiple gts.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 09/19] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 09/19] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops Umesh Nerlige Ramappa
@ 2022-09-14 19:04   ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-14 19:04 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:45 -0700, Umesh Nerlige Ramappa wrote:
>
> With multi-gt, user can access multiple OA buffers concurrently. Use
> stream->lock instead of gt->perf.lock to serialize file operations.

Ok, will come in handy for multiple streams per gt:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 06/19] drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size
  2022-09-14 18:19     ` Umesh Nerlige Ramappa
@ 2022-09-14 19:07       ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-14 19:07 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Wed, 14 Sep 2022 11:19:30 -0700, Umesh Nerlige Ramappa wrote:
>
> On Wed, Sep 14, 2022 at 09:04:10AM -0700, Dixit, Ashutosh wrote:
> > On Tue, 23 Aug 2022 13:41:42 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >
> > Hi Umesh,
> >
> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >> index 6fc4f0d8fc5a..bbf1c574f393 100644
> >> --- a/drivers/gpu/drm/i915/i915_perf.c
> >> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> @@ -385,6 +385,21 @@ static struct ctl_table_header *sysctl_header;
> >>
> >>  static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer);
> >>
> >> +static u32 _oa_taken(struct i915_perf_stream * stream, u32 tail, u32 head)
> >
> > nit: no space between * and stream.
> >
> >> +{
> >> +	u32 size = stream->oa_buffer.vma->size;
> >> +
> >> +	return tail >= head ? tail - head : size - (head - tail);
> >> +}
> >
> > If we are doing this we should probably eliminate references to OA_TAKEN
> > which serves an identical purpose (I think there is one remaining
> > reference) and also delete OA_TAKEN #define.
> >
> >> +
> >> +static u32 _rewind_tail(struct i915_perf_stream * stream, u32 relative_hw_tail,
> >> +			u32 rewind_delta)
> >> +{
> >> +	return rewind_delta > relative_hw_tail ?
> >> +	       stream->oa_buffer.vma->size - (rewind_delta - relative_hw_tail) :
> >> +	       relative_hw_tail - rewind_delta;
> >> +}
> >
> > Also are we really saying here that we are supporting non-power-of-2 OA
> > buffer sizes? Because if we stayed with power-of-2 sizes the expression
> > above are nice and elegant and actually closer to the previous code being
> > changed in this patch. For example:
> >
> > #include <linux/circ_buf.h>
> >
> > static u32 _oa_taken(struct i915_perf_stream *stream, u32 tail, u32 head)
> > {
> >	return CIRC_CNT(tail, head, stream->oa_buffer.vma->size);
> > }
> >
> > static u32 _rewind_tail(struct i915_perf_stream *stream, u32 relative_hw_tail,
> >				u32 rewind_delta)
> > {
> >	return CIRC_CNT(relative_hw_tail, rewind_delta, stream->oa_buffer.vma->size);
> > }
> >
> > Note that for power-of-2 sizes the two functions above are identical but we
> > should keep them separate for clarity (as is done in the patch) since they
> > are serving two different functions in the OA code.
> >
> > Also another assumption in the code seems to be:
> >
> >	stream->oa_buffer.vma->size == OA_BUFFER_SIZE
> >
> > which I am pretty sure will not hold for arbitrary non-power-of-2 OA buffer
> > sizes? So we might as well stick with power-of-2 sizes and change later in
> > a separate patch only if needed?
>
> Most changes here are related to the OA buffer size issue and that is
> specific to xehpsdv where the size is not a power of 2. I am thinking of
> dropping these changes in the next revision since DG2 is fixed and OA
> buffer sizes are power of 2.

In the code stream->oa_buffer.vma->size and OA_BUFFER_SIZE are both used,
if we want to clean that up and only use stream->oa_buffer.vma->size, we
could still do soemthing like I suggested with just power-of-2 sizes and
keep this patch. If we ever have to support non-power-of-2 sizes in the
future we'll just need to change _oa_taken and _rewind_tail
functions. Anyway your call.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 11/19] drm/i915/perf: Store a pointer to oa_format in oa_buffer
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 11/19] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
  2022-09-06 19:56   ` Lionel Landwerlin
@ 2022-09-14 20:43   ` Dixit, Ashutosh
  1 sibling, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-14 20:43 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:47 -0700, Umesh Nerlige Ramappa wrote:
>
> @@ -3184,15 +3184,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>	stream->sample_flags = props->sample_flags;
>	stream->sample_size += format_size;
>
> -	stream->oa_buffer.format_size = format_size;
> -	if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format_size == 0))
> +	stream->oa_buffer.format = &perf->oa_formats[props->oa_format];
> +	if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format->size == 0))
>		return -EINVAL;

I would also move these 3 lines before the two lines on the top, eliminate
the format_size variable and assignment and just use
stream->oa_buffer.format->size. Otherwise:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

>	stream->hold_preemption = props->hold_preemption;
>
> -	stream->oa_buffer.format =
> -		perf->oa_formats[props->oa_format].format;
> -
>	stream->periodic = props->oa_periodic;
>	if (stream->periodic)
>		stream->period_exponent = props->oa_period_exponent;
> diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
> index dc9bfd8086cf..e0c96b44eda8 100644
> --- a/drivers/gpu/drm/i915/i915_perf_types.h
> +++ b/drivers/gpu/drm/i915/i915_perf_types.h
> @@ -250,11 +250,10 @@ struct i915_perf_stream {
>	 * @oa_buffer: State of the OA buffer.
>	 */
>	struct {
> +		const struct i915_oa_format *format;
>		struct i915_vma *vma;
>		u8 *vaddr;
>		u32 last_ctx_id;
> -		int format;
> -		int format_size;
>		int size_exponent;
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
  2022-09-13 15:40   ` Dixit, Ashutosh
@ 2022-09-14 20:54     ` Umesh Nerlige Ramappa
  2022-09-14 21:16       ` Dixit, Ashutosh
  0 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-14 20:54 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Tue, Sep 13, 2022 at 08:40:22AM -0700, Dixit, Ashutosh wrote:
>On Tue, 23 Aug 2022 13:41:38 -0700, Umesh Nerlige Ramappa wrote:
>>
>> Add new OA formats for DG2.
>
>Should we change the patch title and commit message a bit to 'Add OAR and
>OAG formats for DG2'?

Hmm, I assumed OAR was also part of TGL, but looks like it's not. I can 
change the title as suggested.

>
>> Some of the newer OA formats are not
>> multples of 64 bytes and are not powers of 2. For those formats, adjust
>> hw_tail accordingly when checking for new reports.
>>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramampa@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_perf.c | 63 ++++++++++++++++++++------------
>>  include/uapi/drm/i915_drm.h      |  6 +++
>>  2 files changed, 46 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 735244a3aedd..c8331b549d31 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -306,7 +306,8 @@ static u32 i915_oa_max_sample_rate = 100000;
>>
>>  /* XXX: beware if future OA HW adds new report formats that the current
>>   * code assumes all reports have a power-of-two size and ~(size - 1) can
>> - * be used as a mask to align the OA tail pointer.
>> + * be used as a mask to align the OA tail pointer. In some of the
>> + * formats, R is used to denote reserved field.
>>   */
>>  static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>>	[I915_OA_FORMAT_A13]	    = { 0, 64 },
>> @@ -320,6 +321,10 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>>	[I915_OA_FORMAT_A12]		    = { 0, 64 },
>>	[I915_OA_FORMAT_A12_B8_C8]	    = { 2, 128 },
>>	[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
>> +	[I915_OAR_FORMAT_A32u40_A4u32_B8_C8]    = { 5, 256 },
>> +	[I915_OA_FORMAT_A24u40_A14u32_B8_C8]    = { 5, 256 },
>> +	[I915_OAR_FORMAT_A36u64_B8_C8]		= { 1, 384 },
>> +	[I915_OA_FORMAT_A38u64_R2u64_B8_C8]	= { 1, 448 },
>
>Isn't the size for this last one 416 (or 400)? Bspec: 52198. Unless the
>size has to be a multiple of 64?

Format size is multiple of 64 bytes, so it is rounded up.

>
>Looks like Lionel's R-b is not showing up on Patchwork, might need to be
>manually added. For now this is:
>
>Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

Thanks,
Umesh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
  2022-09-14 20:54     ` Umesh Nerlige Ramappa
@ 2022-09-14 21:16       ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-14 21:16 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Wed, 14 Sep 2022 13:54:34 -0700, Umesh Nerlige Ramappa wrote:
>
> On Tue, Sep 13, 2022 at 08:40:22AM -0700, Dixit, Ashutosh wrote:
> > On Tue, 23 Aug 2022 13:41:38 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >> Add new OA formats for DG2.
> >
> > Should we change the patch title and commit message a bit to 'Add OAR and
> > OAG formats for DG2'?
>
> Hmm, I assumed OAR was also part of TGL, but looks like it's not. I can
> change the title as suggested.

By 'Add OAR and OAG formats for DG2' I meant we are only adding OAR and OAG
formats and not including other DG2 formats ;)

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-06 18:39       ` Lionel Landwerlin
@ 2022-09-14 22:26         ` Umesh Nerlige Ramappa
  2022-09-14 23:13           ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-14 22:26 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Tue, Sep 06, 2022 at 09:39:33PM +0300, Lionel Landwerlin wrote:
>On 06/09/2022 20:39, Umesh Nerlige Ramappa wrote:
>>On Tue, Sep 06, 2022 at 05:33:00PM +0300, Lionel Landwerlin wrote:
>>>On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>>>With GuC mode of submission, GuC is in control of defining the 
>>>>context id field
>>>>that is part of the OA reports. To filter reports, UMD and KMD 
>>>>must know what sw
>>>>context id was chosen by GuC. There is not interface between KMD 
>>>>and GuC to
>>>>determine this, so read the upper-dword of EXECLIST_STATUS to 
>>>>filter/squash OA
>>>>reports for the specific context.
>>>>
>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>
>>>
>>>I assume you checked with GuC that this doesn't change as the 
>>>context is running?
>>
>>Correct.
>>
>>>
>>>With i915/execlist submission mode, we had to ask i915 to pin the 
>>>sw_id/ctx_id.
>>>
>>
>>From GuC perspective, the context id can change once KMD 
>>de-registers the context and that will not happen while the context 
>>is in use.
>>
>>Thanks,
>>Umesh
>
>
>Thanks Umesh,
>
>
>Maybe I should have been more precise in my question :
>
>
>Can the ID change while the i915-perf stream is opened?
>
>Because the ID not changing while the context is running makes sense.
>
>But since the number of available IDs is limited to 2k or something on 
>Gfx12, it's possible the GuC has to reuse IDs if too many apps want to 
>run during the period of time while i915-perf is active and filtering.
>

available guc ids are 64k with 4k reserved for multi-lrc, so GuC may 
have to reuse ids once 60k ids are used up.

Thanks,
Umesh

>
>-Lionel
>
>
>>
>>>
>>>If that's not the case then filtering is broken.
>>>
>>>
>>>-Lionel
>>>
>>>
>>>>---
>>>> drivers/gpu/drm/i915/gt/intel_lrc.h |   2 +
>>>> drivers/gpu/drm/i915/i915_perf.c    | 141 ++++++++++++++++++++++++----
>>>> 2 files changed, 124 insertions(+), 19 deletions(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h 
>>>>b/drivers/gpu/drm/i915/gt/intel_lrc.h
>>>>index a390f0813c8b..7111bae759f3 100644
>>>>--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
>>>>+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
>>>>@@ -110,6 +110,8 @@ enum {
>>>> #define XEHP_SW_CTX_ID_WIDTH            16
>>>> #define XEHP_SW_COUNTER_SHIFT            58
>>>> #define XEHP_SW_COUNTER_WIDTH            6
>>>>+#define GEN12_GUC_SW_CTX_ID_SHIFT        39
>>>>+#define GEN12_GUC_SW_CTX_ID_WIDTH        16
>>>> static inline void lrc_runtime_start(struct intel_context *ce)
>>>> {
>>>>diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>b/drivers/gpu/drm/i915/i915_perf.c
>>>>index f3c23fe9ad9c..735244a3aedd 100644
>>>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>@@ -1233,6 +1233,125 @@ static struct intel_context 
>>>>*oa_pin_context(struct i915_perf_stream *stream)
>>>>     return stream->pinned_ctx;
>>>> }
>>>>+static int
>>>>+__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 
>>>>ggtt_offset)
>>>>+{
>>>>+    u32 *cs, cmd;
>>>>+
>>>>+    cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
>>>>+    if (GRAPHICS_VER(rq->engine->i915) >= 8)
>>>>+        cmd++;
>>>>+
>>>>+    cs = intel_ring_begin(rq, 4);
>>>>+    if (IS_ERR(cs))
>>>>+        return PTR_ERR(cs);
>>>>+
>>>>+    *cs++ = cmd;
>>>>+    *cs++ = i915_mmio_reg_offset(reg);
>>>>+    *cs++ = ggtt_offset;
>>>>+    *cs++ = 0;
>>>>+
>>>>+    intel_ring_advance(rq, cs);
>>>>+
>>>>+    return 0;
>>>>+}
>>>>+
>>>>+static int
>>>>+__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset)
>>>>+{
>>>>+    struct i915_request *rq;
>>>>+    int err;
>>>>+
>>>>+    rq = i915_request_create(ce);
>>>>+    if (IS_ERR(rq))
>>>>+        return PTR_ERR(rq);
>>>>+
>>>>+    i915_request_get(rq);
>>>>+
>>>>+    err = __store_reg_to_mem(rq, reg, ggtt_offset);
>>>>+
>>>>+    i915_request_add(rq);
>>>>+    if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
>>>>+        err = -ETIME;
>>>>+
>>>>+    i915_request_put(rq);
>>>>+
>>>>+    return err;
>>>>+}
>>>>+
>>>>+static int
>>>>+gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
>>>>+{
>>>>+    struct i915_vma *scratch;
>>>>+    u32 *val;
>>>>+    int err;
>>>>+
>>>>+    scratch = 
>>>>__vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 
>>>>4);
>>>>+    if (IS_ERR(scratch))
>>>>+        return PTR_ERR(scratch);
>>>>+
>>>>+    err = i915_vma_sync(scratch);
>>>>+    if (err)
>>>>+        goto err_scratch;
>>>>+
>>>>+    err = __read_reg(ce, 
>>>>RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
>>>>+             i915_ggtt_offset(scratch));
>>>>+    if (err)
>>>>+        goto err_scratch;
>>>>+
>>>>+    val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB);
>>>>+    if (IS_ERR(val)) {
>>>>+        err = PTR_ERR(val);
>>>>+        goto err_scratch;
>>>>+    }
>>>>+
>>>>+    *ctx_id = *val;
>>>>+    i915_gem_object_unpin_map(scratch->obj);
>>>>+
>>>>+err_scratch:
>>>>+    i915_vma_unpin_and_release(&scratch, 0);
>>>>+    return err;
>>>>+}
>>>>+
>>>>+/*
>>>>+ * For execlist mode of submission, pick an unused context id
>>>>+ * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
>>>>+ * XXX_MAX_CONTEXT_HW_ID is used by idle context
>>>>+ *
>>>>+ * For GuC mode of submission read context id from the upper 
>>>>dword of the
>>>>+ * EXECLIST_STATUS register.
>>>>+ */
>>>>+static int gen12_get_render_context_id(struct i915_perf_stream 
>>>>*stream)
>>>>+{
>>>>+    u32 ctx_id, mask;
>>>>+    int ret;
>>>>+
>>>>+    if (intel_engine_uses_guc(stream->engine)) {
>>>>+        ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
>>>>+        if (ret)
>>>>+            return ret;
>>>>+
>>>>+        mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
>>>>+            (GEN12_GUC_SW_CTX_ID_SHIFT - 32);
>>>>+    } else if (GRAPHICS_VER_FULL(stream->engine->i915) >= 
>>>>IP_VER(12, 50)) {
>>>>+        ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
>>>>+            (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>+
>>>>+        mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>>>>+            (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>+    } else {
>>>>+        ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
>>>>+             (GEN11_SW_CTX_ID_SHIFT - 32);
>>>>+
>>>>+        mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
>>>>+            (GEN11_SW_CTX_ID_SHIFT - 32);
>>>>+    }
>>>>+    stream->specific_ctx_id = ctx_id & mask;
>>>>+    stream->specific_ctx_id_mask = mask;
>>>>+
>>>>+    return 0;
>>>>+}
>>>>+
>>>> /**
>>>>  * oa_get_render_ctx_id - determine and hold ctx hw id
>>>>  * @stream: An i915-perf stream opened for OA metrics
>>>>@@ -1246,6 +1365,7 @@ static struct intel_context 
>>>>*oa_pin_context(struct i915_perf_stream *stream)
>>>> static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>>> {
>>>>     struct intel_context *ce;
>>>>+    int ret = 0;
>>>>     ce = oa_pin_context(stream);
>>>>     if (IS_ERR(ce))
>>>>@@ -1292,24 +1412,7 @@ static int oa_get_render_ctx_id(struct 
>>>>i915_perf_stream *stream)
>>>>     case 11:
>>>>     case 12:
>>>>-        if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 50)) {
>>>>-            stream->specific_ctx_id_mask =
>>>>-                ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>>>>-                (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>-            stream->specific_ctx_id =
>>>>-                (XEHP_MAX_CONTEXT_HW_ID - 1) <<
>>>>-                (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>-        } else {
>>>>-            stream->specific_ctx_id_mask =
>>>>-                ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << 
>>>>(GEN11_SW_CTX_ID_SHIFT - 32);
>>>>-            /*
>>>>-             * Pick an unused context id
>>>>-             * 0 - BITS_PER_LONG are used by other contexts
>>>>-             * GEN12_MAX_CONTEXT_HW_ID (0x7ff) is used by idle context
>>>>-             */
>>>>-            stream->specific_ctx_id =
>>>>-                (GEN12_MAX_CONTEXT_HW_ID - 1) << 
>>>>(GEN11_SW_CTX_ID_SHIFT - 32);
>>>>-        }
>>>>+        ret = gen12_get_render_context_id(stream);
>>>>         break;
>>>>     default:
>>>>@@ -1323,7 +1426,7 @@ static int oa_get_render_ctx_id(struct 
>>>>i915_perf_stream *stream)
>>>>         stream->specific_ctx_id,
>>>>         stream->specific_ctx_id_mask);
>>>>-    return 0;
>>>>+    return ret;
>>>> }
>>>> /**
>>>
>>>
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-14 22:26         ` Umesh Nerlige Ramappa
@ 2022-09-14 23:13           ` Umesh Nerlige Ramappa
  2022-09-15 22:49             ` Umesh Nerlige Ramappa
  2022-09-22 11:05             ` Lionel Landwerlin
  0 siblings, 2 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-14 23:13 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Wed, Sep 14, 2022 at 03:26:15PM -0700, Umesh Nerlige Ramappa wrote:
>On Tue, Sep 06, 2022 at 09:39:33PM +0300, Lionel Landwerlin wrote:
>>On 06/09/2022 20:39, Umesh Nerlige Ramappa wrote:
>>>On Tue, Sep 06, 2022 at 05:33:00PM +0300, Lionel Landwerlin wrote:
>>>>On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>>>>With GuC mode of submission, GuC is in control of defining the 
>>>>>context id field
>>>>>that is part of the OA reports. To filter reports, UMD and KMD 
>>>>>must know what sw
>>>>>context id was chosen by GuC. There is not interface between 
>>>>>KMD and GuC to
>>>>>determine this, so read the upper-dword of EXECLIST_STATUS to 
>>>>>filter/squash OA
>>>>>reports for the specific context.
>>>>>
>>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>>
>>>>
>>>>I assume you checked with GuC that this doesn't change as the 
>>>>context is running?
>>>
>>>Correct.
>>>
>>>>
>>>>With i915/execlist submission mode, we had to ask i915 to pin 
>>>>the sw_id/ctx_id.
>>>>
>>>
>>>From GuC perspective, the context id can change once KMD 
>>>de-registers the context and that will not happen while the 
>>>context is in use.
>>>
>>>Thanks,
>>>Umesh
>>
>>
>>Thanks Umesh,
>>
>>
>>Maybe I should have been more precise in my question :
>>
>>
>>Can the ID change while the i915-perf stream is opened?
>>
>>Because the ID not changing while the context is running makes sense.
>>
>>But since the number of available IDs is limited to 2k or something 
>>on Gfx12, it's possible the GuC has to reuse IDs if too many apps 
>>want to run during the period of time while i915-perf is active and 
>>filtering.
>>
>
>available guc ids are 64k with 4k reserved for multi-lrc, so GuC may 
>have to reuse ids once 60k ids are used up.

Spoke to the GuC team again and if there are a lot of contexts (> 60K) 
running, there is a possibility of the context id being recycled. In 
that case, the capture would be broken. I would track this as a separate 
JIRA and follow up on a solution.

 From OA use case perspective, are we interested in monitoring just one 
hardware context? If we make sure this context is not stolen, are we 
good? 

Thanks,
Umesh

>
>Thanks,
>Umesh
>
>>
>>-Lionel
>>
>>
>>>
>>>>
>>>>If that's not the case then filtering is broken.
>>>>
>>>>
>>>>-Lionel
>>>>
>>>>
>>>>>---
>>>>> drivers/gpu/drm/i915/gt/intel_lrc.h |   2 +
>>>>> drivers/gpu/drm/i915/i915_perf.c    | 141 ++++++++++++++++++++++++----
>>>>> 2 files changed, 124 insertions(+), 19 deletions(-)
>>>>>
>>>>>diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h 
>>>>>b/drivers/gpu/drm/i915/gt/intel_lrc.h
>>>>>index a390f0813c8b..7111bae759f3 100644
>>>>>--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
>>>>>+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
>>>>>@@ -110,6 +110,8 @@ enum {
>>>>> #define XEHP_SW_CTX_ID_WIDTH            16
>>>>> #define XEHP_SW_COUNTER_SHIFT            58
>>>>> #define XEHP_SW_COUNTER_WIDTH            6
>>>>>+#define GEN12_GUC_SW_CTX_ID_SHIFT        39
>>>>>+#define GEN12_GUC_SW_CTX_ID_WIDTH        16
>>>>> static inline void lrc_runtime_start(struct intel_context *ce)
>>>>> {
>>>>>diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>>b/drivers/gpu/drm/i915/i915_perf.c
>>>>>index f3c23fe9ad9c..735244a3aedd 100644
>>>>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>@@ -1233,6 +1233,125 @@ static struct intel_context 
>>>>>*oa_pin_context(struct i915_perf_stream *stream)
>>>>>     return stream->pinned_ctx;
>>>>> }
>>>>>+static int
>>>>>+__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, 
>>>>>u32 ggtt_offset)
>>>>>+{
>>>>>+    u32 *cs, cmd;
>>>>>+
>>>>>+    cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
>>>>>+    if (GRAPHICS_VER(rq->engine->i915) >= 8)
>>>>>+        cmd++;
>>>>>+
>>>>>+    cs = intel_ring_begin(rq, 4);
>>>>>+    if (IS_ERR(cs))
>>>>>+        return PTR_ERR(cs);
>>>>>+
>>>>>+    *cs++ = cmd;
>>>>>+    *cs++ = i915_mmio_reg_offset(reg);
>>>>>+    *cs++ = ggtt_offset;
>>>>>+    *cs++ = 0;
>>>>>+
>>>>>+    intel_ring_advance(rq, cs);
>>>>>+
>>>>>+    return 0;
>>>>>+}
>>>>>+
>>>>>+static int
>>>>>+__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset)
>>>>>+{
>>>>>+    struct i915_request *rq;
>>>>>+    int err;
>>>>>+
>>>>>+    rq = i915_request_create(ce);
>>>>>+    if (IS_ERR(rq))
>>>>>+        return PTR_ERR(rq);
>>>>>+
>>>>>+    i915_request_get(rq);
>>>>>+
>>>>>+    err = __store_reg_to_mem(rq, reg, ggtt_offset);
>>>>>+
>>>>>+    i915_request_add(rq);
>>>>>+    if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
>>>>>+        err = -ETIME;
>>>>>+
>>>>>+    i915_request_put(rq);
>>>>>+
>>>>>+    return err;
>>>>>+}
>>>>>+
>>>>>+static int
>>>>>+gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
>>>>>+{
>>>>>+    struct i915_vma *scratch;
>>>>>+    u32 *val;
>>>>>+    int err;
>>>>>+
>>>>>+    scratch = 
>>>>>__vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 
>>>>>4);
>>>>>+    if (IS_ERR(scratch))
>>>>>+        return PTR_ERR(scratch);
>>>>>+
>>>>>+    err = i915_vma_sync(scratch);
>>>>>+    if (err)
>>>>>+        goto err_scratch;
>>>>>+
>>>>>+    err = __read_reg(ce, 
>>>>>RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
>>>>>+             i915_ggtt_offset(scratch));
>>>>>+    if (err)
>>>>>+        goto err_scratch;
>>>>>+
>>>>>+    val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB);
>>>>>+    if (IS_ERR(val)) {
>>>>>+        err = PTR_ERR(val);
>>>>>+        goto err_scratch;
>>>>>+    }
>>>>>+
>>>>>+    *ctx_id = *val;
>>>>>+    i915_gem_object_unpin_map(scratch->obj);
>>>>>+
>>>>>+err_scratch:
>>>>>+    i915_vma_unpin_and_release(&scratch, 0);
>>>>>+    return err;
>>>>>+}
>>>>>+
>>>>>+/*
>>>>>+ * For execlist mode of submission, pick an unused context id
>>>>>+ * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
>>>>>+ * XXX_MAX_CONTEXT_HW_ID is used by idle context
>>>>>+ *
>>>>>+ * For GuC mode of submission read context id from the upper 
>>>>>dword of the
>>>>>+ * EXECLIST_STATUS register.
>>>>>+ */
>>>>>+static int gen12_get_render_context_id(struct 
>>>>>i915_perf_stream *stream)
>>>>>+{
>>>>>+    u32 ctx_id, mask;
>>>>>+    int ret;
>>>>>+
>>>>>+    if (intel_engine_uses_guc(stream->engine)) {
>>>>>+        ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
>>>>>+        if (ret)
>>>>>+            return ret;
>>>>>+
>>>>>+        mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
>>>>>+            (GEN12_GUC_SW_CTX_ID_SHIFT - 32);
>>>>>+    } else if (GRAPHICS_VER_FULL(stream->engine->i915) >= 
>>>>>IP_VER(12, 50)) {
>>>>>+        ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
>>>>>+            (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>>+
>>>>>+        mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>>>>>+            (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>>+    } else {
>>>>>+        ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
>>>>>+             (GEN11_SW_CTX_ID_SHIFT - 32);
>>>>>+
>>>>>+        mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
>>>>>+            (GEN11_SW_CTX_ID_SHIFT - 32);
>>>>>+    }
>>>>>+    stream->specific_ctx_id = ctx_id & mask;
>>>>>+    stream->specific_ctx_id_mask = mask;
>>>>>+
>>>>>+    return 0;
>>>>>+}
>>>>>+
>>>>> /**
>>>>>  * oa_get_render_ctx_id - determine and hold ctx hw id
>>>>>  * @stream: An i915-perf stream opened for OA metrics
>>>>>@@ -1246,6 +1365,7 @@ static struct intel_context 
>>>>>*oa_pin_context(struct i915_perf_stream *stream)
>>>>> static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>>>> {
>>>>>     struct intel_context *ce;
>>>>>+    int ret = 0;
>>>>>     ce = oa_pin_context(stream);
>>>>>     if (IS_ERR(ce))
>>>>>@@ -1292,24 +1412,7 @@ static int oa_get_render_ctx_id(struct 
>>>>>i915_perf_stream *stream)
>>>>>     case 11:
>>>>>     case 12:
>>>>>-        if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 50)) {
>>>>>-            stream->specific_ctx_id_mask =
>>>>>-                ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>>>>>-                (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>>-            stream->specific_ctx_id =
>>>>>-                (XEHP_MAX_CONTEXT_HW_ID - 1) <<
>>>>>-                (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>>-        } else {
>>>>>-            stream->specific_ctx_id_mask =
>>>>>-                ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << 
>>>>>(GEN11_SW_CTX_ID_SHIFT - 32);
>>>>>-            /*
>>>>>-             * Pick an unused context id
>>>>>-             * 0 - BITS_PER_LONG are used by other contexts
>>>>>-             * GEN12_MAX_CONTEXT_HW_ID (0x7ff) is used by idle context
>>>>>-             */
>>>>>-            stream->specific_ctx_id =
>>>>>-                (GEN12_MAX_CONTEXT_HW_ID - 1) << 
>>>>>(GEN11_SW_CTX_ID_SHIFT - 32);
>>>>>-        }
>>>>>+        ret = gen12_get_render_context_id(stream);
>>>>>         break;
>>>>>     default:
>>>>>@@ -1323,7 +1426,7 @@ static int oa_get_render_ctx_id(struct 
>>>>>i915_perf_stream *stream)
>>>>>         stream->specific_ctx_id,
>>>>>         stream->specific_ctx_id_mask);
>>>>>-    return 0;
>>>>>+    return ret;
>>>>> }
>>>>> /**
>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-09 23:47   ` Dixit, Ashutosh
  2022-09-13  3:08     ` Dixit, Ashutosh
@ 2022-09-14 23:36     ` Umesh Nerlige Ramappa
  2022-09-22  3:44     ` Dixit, Ashutosh
  2 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-14 23:36 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, Sep 09, 2022 at 04:47:36PM -0700, Dixit, Ashutosh wrote:
>On Tue, 23 Aug 2022 13:41:37 -0700, Umesh Nerlige Ramappa wrote:
>>
>
>Hi Umesh,
>
>> With GuC mode of submission, GuC is in control of defining the context id field
>> that is part of the OA reports. To filter reports, UMD and KMD must know what sw
>> context id was chosen by GuC. There is not interface between KMD and GuC to
>> determine this, so read the upper-dword of EXECLIST_STATUS to filter/squash OA
>> reports for the specific context.
>
>Do you think it is worth defining an interface for GuC to return the sw
>ctx_id it will be using for a ctx, say at ctx registration time?
>
>The scheme implemented in this patch to read the ctx_id is certainly very
>clever, at least to me. But as Lionel was saying is it a agreed upon
>immutable interface? If it is, we can go with this patch.
>
>(Though even then we will need to maintain this code even if in the future
>GuC FW is changed to return the ctx_id in order to preserve backwards
>comptability with previous GuC versions. So maybe better to have a real
>interface between GuC and KMD earlier rather than later?).

Agree, ideally this should be obtained from GuC and properly 
synchronized with kmd. OR GuC should provide a way to pin the context id 
for such cases so that the id is not stolen/unpinned. Anyways, we need 
to follow this up as a JIRA.

I may drop this patch and add a message that OA buffer filtering may be 
broken if a gem context is passed.

>
>Also a couple of general comments below.
>
>>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/intel_lrc.h |   2 +
>>  drivers/gpu/drm/i915/i915_perf.c    | 141 ++++++++++++++++++++++++----
>>  2 files changed, 124 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
>> index a390f0813c8b..7111bae759f3 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
>> @@ -110,6 +110,8 @@ enum {
>>  #define XEHP_SW_CTX_ID_WIDTH			16
>>  #define XEHP_SW_COUNTER_SHIFT			58
>>  #define XEHP_SW_COUNTER_WIDTH			6
>> +#define GEN12_GUC_SW_CTX_ID_SHIFT		39
>> +#define GEN12_GUC_SW_CTX_ID_WIDTH		16
>>
>>  static inline void lrc_runtime_start(struct intel_context *ce)
>>  {
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index f3c23fe9ad9c..735244a3aedd 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1233,6 +1233,125 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
>>	return stream->pinned_ctx;
>>  }
>>
>> +static int
>> +__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset)
>> +{
>> +	u32 *cs, cmd;
>> +
>> +	cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
>> +	if (GRAPHICS_VER(rq->engine->i915) >= 8)
>> +		cmd++;
>> +
>> +	cs = intel_ring_begin(rq, 4);
>> +	if (IS_ERR(cs))
>> +		return PTR_ERR(cs);
>> +
>> +	*cs++ = cmd;
>> +	*cs++ = i915_mmio_reg_offset(reg);
>> +	*cs++ = ggtt_offset;
>> +	*cs++ = 0;
>> +
>> +	intel_ring_advance(rq, cs);
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset)
>> +{
>> +	struct i915_request *rq;
>> +	int err;
>> +
>> +	rq = i915_request_create(ce);
>> +	if (IS_ERR(rq))
>> +		return PTR_ERR(rq);
>> +
>> +	i915_request_get(rq);
>> +
>> +	err = __store_reg_to_mem(rq, reg, ggtt_offset);
>> +
>> +	i915_request_add(rq);
>> +	if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
>> +		err = -ETIME;
>> +
>> +	i915_request_put(rq);
>> +
>> +	return err;
>> +}
>> +
>> +static int
>> +gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
>> +{
>> +	struct i915_vma *scratch;
>> +	u32 *val;
>> +	int err;
>> +
>> +	scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4);
>> +	if (IS_ERR(scratch))
>> +		return PTR_ERR(scratch);
>> +
>> +	err = i915_vma_sync(scratch);
>> +	if (err)
>> +		goto err_scratch;
>> +
>> +	err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
>> +			 i915_ggtt_offset(scratch));
>
>Actually the RING_EXECLIST_STATUS_HI is MMIO so can be read using say
>ENGINE_READ/intel_uncore_read. The only issue is how to read it when this
>ctx is scheduled which is cleverly solved by the scheme above. But I am not
>sure if there is any other simpler way to do it.
>
>> +	if (err)
>> +		goto err_scratch;
>> +
>> +	val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB);
>> +	if (IS_ERR(val)) {
>> +		err = PTR_ERR(val);
>> +		goto err_scratch;
>> +	}
>> +
>> +	*ctx_id = *val;
>> +	i915_gem_object_unpin_map(scratch->obj);
>> +
>> +err_scratch:
>> +	i915_vma_unpin_and_release(&scratch, 0);
>> +	return err;
>> +}
>> +
>> +/*
>> + * For execlist mode of submission, pick an unused context id
>> + * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
>> + * XXX_MAX_CONTEXT_HW_ID is used by idle context
>> + *
>> + * For GuC mode of submission read context id from the upper dword of the
>> + * EXECLIST_STATUS register.
>> + */
>> +static int gen12_get_render_context_id(struct i915_perf_stream *stream)
>> +{
>> +	u32 ctx_id, mask;
>> +	int ret;
>> +
>> +	if (intel_engine_uses_guc(stream->engine)) {
>> +		ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
>> +		if (ret)
>> +			return ret;
>> +
>> +		mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
>> +			(GEN12_GUC_SW_CTX_ID_SHIFT - 32);
>> +	} else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) {
>> +		ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
>> +			(XEHP_SW_CTX_ID_SHIFT - 32);
>> +
>> +		mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>> +			(XEHP_SW_CTX_ID_SHIFT - 32);
>> +	} else {
>> +		ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
>> +			 (GEN11_SW_CTX_ID_SHIFT - 32);
>> +
>> +		mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
>> +			(GEN11_SW_CTX_ID_SHIFT - 32);
>
>Previously I missed that these ctx_id's for non-GuC cases are just
>constants. How does it work in these cases?

In those cases we use a fixed id for the OA use case:

in gen12_get_render_context_id()
stream->specific_ctx_id = ctx_id & mask

in oa_get_render_ctx_id()
ce->tag = stream->specific_ctx_id;

in __execlists_schedule_in()
ce->lrc.ccid = ce->tag;

Thanks,
Umesh

>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-13  3:08     ` Dixit, Ashutosh
@ 2022-09-14 23:37       ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-14 23:37 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Mon, Sep 12, 2022 at 08:08:33PM -0700, Dixit, Ashutosh wrote:
>On Fri, 09 Sep 2022 16:47:36 -0700, Dixit, Ashutosh wrote:
>>
>> On Tue, 23 Aug 2022 13:41:37 -0700, Umesh Nerlige Ramappa wrote:
>> >
>>
>> Hi Umesh,
>>
>> > With GuC mode of submission, GuC is in control of defining the context id field
>> > that is part of the OA reports. To filter reports, UMD and KMD must know what sw
>> > context id was chosen by GuC. There is not interface between KMD and GuC to
>> > determine this, so read the upper-dword of EXECLIST_STATUS to filter/squash OA
>> > reports for the specific context.
>>
>> Do you think it is worth defining an interface for GuC to return the sw
>> ctx_id it will be using for a ctx, say at ctx registration time?
>
>Umesh, I came across these in GuC documentation:
>
>guc_pcv1_context_parameters_set_h2g_data_t::context_id
>guc_pcv2_context_parameters_set_h2g_data_t::context_id
>
>Also in the code we have in prepare_context_registration_info_v70 'ctx_id =
>ce->guc_id.id' which seems to be assigned in new_guc_id. So wondering if
>this is what we need and we already have it?

this id is different from what GuC programs into the lrca.

Thanks,
Umesh
>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 05/19] drm/i915/perf: Enable commands per clock reporting in OA
  2022-09-14  0:19   ` Dixit, Ashutosh
@ 2022-09-15  0:04     ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-15  0:04 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Tue, Sep 13, 2022 at 05:19:24PM -0700, Dixit, Ashutosh wrote:
>On Tue, 23 Aug 2022 13:41:41 -0700, Umesh Nerlige Ramappa wrote:
>>
>
>Hi Umesh,
>
>> XEHPSDV and DG2 provide a way to configure bytes per clock vs commands
>> per clock reporting. Enable command per clock setting on enabling OA.

should be: Enable bytes per clock setting
>
>What is the reason for selecting commands per clock vs bytes per clock?
>Also probably mention Bspec: 51762 in the commit message too.

It's a default configuration used to interpret the A36/A37 counters here 
- Bspec: 52201

>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index efa7eda83edd..6fc4f0d8fc5a 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -2745,10 +2745,12 @@ static int
>>  gen12_enable_metric_set(struct i915_perf_stream *stream,
>>			struct i915_active *active)
>>  {
>> +	struct drm_i915_private *i915 = stream->perf->i915;
>>	struct intel_uncore *uncore = stream->uncore;
>>	struct i915_oa_config *oa_config = stream->oa_config;
>>	bool periodic = stream->periodic;
>>	u32 period_exponent = stream->period_exponent;
>> +	u32 sqcnt1;
>>	int ret;
>>
>>	intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG,
>> @@ -2767,6 +2769,16 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
>>			    (period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT))
>>			    : 0);
>>
>> +	/*
>> +	 * Initialize Super Queue Internal Cnt Register
>> +	 * Set PMON Enable in order to collect valid metrics.
>> +	 * Enable commands per clock reporting in OA for XEHPSDV onward.
>> +	 */
>> +	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
>> +		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
>
>Also from Bspec 0:Unitsof4cmd and 1:Unitsof128B so looks like bit 29 should
>be set to 0 for commands per clock setting? Or I am wrong?

I know bit 29 has to be set for DG2. I think the commit message is 
wrong. Nice catch, thanks

>
>> +
>> +	intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, sqcnt1);
>> +
>>	/*
>>	 * Update all contexts prior writing the mux configurations as we need
>>	 * to make sure all slices/subslices are ON before writing to NOA
>> @@ -2816,6 +2828,8 @@ static void gen11_disable_metric_set(struct i915_perf_stream *stream)
>>  static void gen12_disable_metric_set(struct i915_perf_stream *stream)
>>  {
>>	struct intel_uncore *uncore = stream->uncore;
>> +	struct drm_i915_private *i915 = stream->perf->i915;
>> +	u32 sqcnt1;
>>
>>	/* Reset all contexts' slices/subslices configurations. */
>>	gen12_configure_all_contexts(stream, NULL, NULL);
>> @@ -2826,6 +2840,12 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
>>
>>	/* Make sure we disable noa to save power. */
>>	intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0);
>> +
>> +	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
>> +		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
>> +
>> +	/* Reset PMON Enable to save power. */
>> +	intel_uncore_rmw(uncore, GEN12_SQCNT1, sqcnt1, 0);
>>  }
>>
>>  static void gen7_oa_enable(struct i915_perf_stream *stream)
>> diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>> index 0ef3562ff4aa..381d94101610 100644
>> --- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>> +++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
>> @@ -134,4 +134,8 @@
>>  #define GDT_CHICKEN_BITS    _MMIO(0x9840)
>>  #define   GT_NOA_ENABLE	    0x00000080
>>
>> +#define GEN12_SQCNT1				_MMIO(0x8718)
>> +#define   GEN12_SQCNT1_PMON_ENABLE		REG_BIT(30)
>> +#define   GEN12_SQCNT1_OABPC			REG_BIT(29)
>> +
>>  #endif /* __INTEL_PERF_OA_REGS__ */

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-14 23:13           ` Umesh Nerlige Ramappa
@ 2022-09-15 22:49             ` Umesh Nerlige Ramappa
  2022-09-20  3:22               ` Dixit, Ashutosh
  2022-09-22 11:05             ` Lionel Landwerlin
  1 sibling, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-15 22:49 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Wed, Sep 14, 2022 at 04:13:41PM -0700, Umesh Nerlige Ramappa wrote:
>On Wed, Sep 14, 2022 at 03:26:15PM -0700, Umesh Nerlige Ramappa wrote:
>>On Tue, Sep 06, 2022 at 09:39:33PM +0300, Lionel Landwerlin wrote:
>>>On 06/09/2022 20:39, Umesh Nerlige Ramappa wrote:
>>>>On Tue, Sep 06, 2022 at 05:33:00PM +0300, Lionel Landwerlin wrote:
>>>>>On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>>>>>With GuC mode of submission, GuC is in control of defining 
>>>>>>the context id field
>>>>>>that is part of the OA reports. To filter reports, UMD and 
>>>>>>KMD must know what sw
>>>>>>context id was chosen by GuC. There is not interface between 
>>>>>>KMD and GuC to
>>>>>>determine this, so read the upper-dword of EXECLIST_STATUS 
>>>>>>to filter/squash OA
>>>>>>reports for the specific context.
>>>>>>
>>>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>>>
>>>>>
>>>>>I assume you checked with GuC that this doesn't change as the 
>>>>>context is running?
>>>>
>>>>Correct.
>>>>
>>>>>
>>>>>With i915/execlist submission mode, we had to ask i915 to pin 
>>>>>the sw_id/ctx_id.
>>>>>
>>>>
>>>>From GuC perspective, the context id can change once KMD 
>>>>de-registers the context and that will not happen while the 
>>>>context is in use.
>>>>
>>>>Thanks,
>>>>Umesh
>>>
>>>
>>>Thanks Umesh,
>>>
>>>
>>>Maybe I should have been more precise in my question :
>>>
>>>
>>>Can the ID change while the i915-perf stream is opened?
>>>
>>>Because the ID not changing while the context is running makes sense.
>>>
>>>But since the number of available IDs is limited to 2k or 
>>>something on Gfx12, it's possible the GuC has to reuse IDs if too 
>>>many apps want to run during the period of time while i915-perf is 
>>>active and filtering.
>>>
>>
>>available guc ids are 64k with 4k reserved for multi-lrc, so GuC may 
>>have to reuse ids once 60k ids are used up.
>
>Spoke to the GuC team again and if there are a lot of contexts (> 60K) 
>running, there is a possibility of the context id being recycled. In 
>that case, the capture would be broken. I would track this as a 
>separate JIRA and follow up on a solution.
>
>From OA use case perspective, are we interested in monitoring just one 
>hardware context? If we make sure this context is not stolen, are we 
>good?
>

+ John

Based on John's inputs - if a context is pinned, then KMD does not steal 
it's id. It would just look for something else or wait for a context to 
be available (pin count 0 I believe).

Since we pin the context for the duration of the OA use case, we should 
be good here.

Thanks,
Umesh

>Thanks,
>Umesh
>
>>
>>Thanks,
>>Umesh
>>
>>>
>>>-Lionel
>>>
>>>
>>>>
>>>>>
>>>>>If that's not the case then filtering is broken.
>>>>>
>>>>>
>>>>>-Lionel
>>>>>
>>>>>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 12/19] drm/i915/perf: Parse 64bit report header formats correctly
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 12/19] drm/i915/perf: Parse 64bit report header formats correctly Umesh Nerlige Ramappa
@ 2022-09-16  0:47   ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-16  0:47 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:48 -0700, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> @@ -740,23 +802,19 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>		u8 *report = oa_buf_base + head;
>		u32 *report32 = (void *)report;
>		u32 ctx_id;
> -		u32 reason;
> +		u64 reason;
>
>		/*
>		 * The reason field includes flags identifying what
>		 * triggered this specific report (mostly timer
>		 * triggered or e.g. due to a context switch).
>		 *
> -		 * This field is never expected to be zero so we can
> -		 * check that the report isn't invalid before copying
> -		 * it to userspace...
> +		 * In MMIO triggered reports, some platforms do not set the
> +		 * reason bit in this field and it is valid to have a reason
> +		 * field of zero.
>		 */
> -		reason = ((report32[0] >> OAREPORT_REASON_SHIFT) &
> -			  (GRAPHICS_VER(stream->perf->i915) == 12 ?
> -			   OAREPORT_REASON_MASK_EXTENDED :
> -			   OAREPORT_REASON_MASK));
> -
> -		ctx_id = report32[2] & stream->specific_ctx_id_mask;
> +		reason = oa_report_reason(stream, report);
> +		ctx_id = oa_context_id(stream, report32);
>
>		/*
>		 * Squash whatever is in the CTX_ID field if it's marked as
> @@ -766,9 +824,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>		 * Note: that we don't clear the valid_ctx_bit so userspace can
>		 * understand that the ID has been squashed by the kernel.
>		 */
> -		if (!(report32[0] & stream->perf->gen8_valid_ctx_bit) &&
> -		    GRAPHICS_VER(stream->perf->i915) <= 11)
> -			ctx_id = report32[2] = INVALID_CTX_ID;
> +		if (oa_report_ctx_invalid(stream, report)) {
> +			ctx_id = INVALID_CTX_ID;
> +			oa_context_id_squash(stream, report32);
> +		}
>
>		/*
>		 * NB: For Gen 8 the OA unit no longer supports clock gating
> @@ -812,7 +871,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>			 */
>			if (stream->ctx &&
>			    stream->specific_ctx_id != ctx_id) {
> -				report32[2] = INVALID_CTX_ID;
> +				oa_context_id_squash(stream, report32);
>			}
>
>			ret = append_oa_sample(stream, buf, count, offset,
> @@ -824,11 +883,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>		}
>
>		/*
> -		 * Clear out the first 2 dword as a mean to detect unlanded
> +		 * Clear out the report id and timestamp as a means to detect unlanded
>		 * reports.
>		 */
> -		report32[0] = 0;
> -		report32[1] = 0;
> +		oa_report_id_clear(stream, report32);
> +		oa_timestamp_clear(stream, report32);

Because we now have these new functions, why do we now need two pointers
report and report32 pointing to the same location? I think we can just have
a single 'void *report' which we can pass into all these functions,
correct?

With this change, this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 13/19] drm/i915/perf: Add Wa_16010703925:dg2
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 13/19] drm/i915/perf: Add Wa_16010703925:dg2 Umesh Nerlige Ramappa
@ 2022-09-16  1:08   ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-16  1:08 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:49 -0700, Umesh Nerlige Ramappa wrote:
>
> On DG2 A0, the OAR report format is buggy. Workaround is to not use it
> for A0. For A0, remove the OAR format from the bitmask of supported
> formats.

Are we going to support A0 upstream? If we are this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 167e7355980a..a28f07923d8f 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -4741,6 +4741,11 @@ static void oa_init_supported_formats(struct i915_perf *perf)
>	default:
>		MISSING_CASE(platform);
>	}
> +
> +	if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) {
> +		/* Wa_16010703925:dg2 */
> +		clear_bit(I915_OAR_FORMAT_A36u64_B8_C8, perf->format_mask);
> +	}
>  }
>
>  static void i915_perf_init_info(struct drm_i915_private *i915)
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 14/19] drm/i915/perf: Add Wa_1608133521:dg2
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 14/19] drm/i915/perf: Add Wa_1608133521:dg2 Umesh Nerlige Ramappa
  2022-08-29 14:04   ` Jani Nikula
@ 2022-09-16  1:21   ` Dixit, Ashutosh
  2022-09-16 18:19     ` Umesh Nerlige Ramappa
  1 sibling, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-16  1:21 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:50 -0700, Umesh Nerlige Ramappa wrote:
>
> DG2 introduces 64 bit counters and OA reports that have 64 bit values
> for fields in the report header - report_id, timestamp, context_id and
> gpu ticks. i915 uses report_id, timestamp and context_id to check for
> valid reports.
>
> In some DG2 variants, only the lower dwords for timestamp, report_id and
> context_id are accessible. Add workaround for such reports.

Once again, if we are productizing A-step or it is going to be in CI
upstream, this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index a28f07923d8f..a858ce57e465 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -310,7 +310,7 @@ static u32 i915_oa_max_sample_rate = 100000;
>   * be used as a mask to align the OA tail pointer. In some of the
>   * formats, R is used to denote reserved field.
>   */
> -static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
> +static struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>	[I915_OA_FORMAT_A13]	    = { 0, 64 },
>	[I915_OA_FORMAT_A29]	    = { 1, 128 },
>	[I915_OA_FORMAT_A13_B8_C8]  = { 2, 128 },
> @@ -4746,6 +4746,13 @@ static void oa_init_supported_formats(struct i915_perf *perf)
>		/* Wa_16010703925:dg2 */
>		clear_bit(I915_OAR_FORMAT_A36u64_B8_C8, perf->format_mask);
>	}
> +
> +	if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0) ||
> +	    IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_FOREVER)) {
> +		/* Wa_1608133521:dg2 */
> +		oa_formats[I915_OAR_FORMAT_A36u64_B8_C8].header = HDR_32_BIT;
> +		oa_formats[I915_OA_FORMAT_A38u64_R2u64_B8_C8].header = HDR_32_BIT;
> +	}
>  }
>
>  static void i915_perf_init_info(struct drm_i915_private *i915)
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 15/19] drm/i915/perf: Add Wa_1508761755:dg2
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 15/19] drm/i915/perf: Add Wa_1508761755:dg2 Umesh Nerlige Ramappa
@ 2022-09-16  1:34   ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-16  1:34 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:51 -0700, Umesh Nerlige Ramappa wrote:
>
> Disable Clock gating in EU when gathering the events so that EU events
> are not lost.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988 Umesh Nerlige Ramappa
@ 2022-09-16  5:16   ` Dixit, Ashutosh
  2022-09-16 15:22     ` Dixit, Ashutosh
  2022-09-16 18:56     ` Umesh Nerlige Ramappa
  0 siblings, 2 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-16  5:16 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> OA reports in the OA buffer contain an OA timestamp field that helps
> user calculate delta between 2 OA reports. The calculation relies on the
> CS timestamp frequency to convert the timestamp value to nanoseconds.
> The CS timestamp frequency is a function of the CTC_SHIFT value in
> RPM_CONFIG0.
>
> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
> actual value from RPM_CONFIG0. At the user level, this results in an
> error in calculating delta between 2 OA reports since the OA timestamp
> is not shifted in the same manner as CS timestamp.
>
> To resolve this, return actual OA timestamp frequency to the user in
> i915_getparam_ioctl.

Rather than exposing actual OA timestamp frequency to userspace (with the
corresponding uapi change, specially if it's only DG2 and not all future
products) questions about a couple of other options:

Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
          same as OA freq :-)

   The HSD seems to mention this:
   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
   Note: Changing the shift setting on live driver may break apps that are
   currently running (including desktop manager).

Option 2. Is it possible to correct the timestamps in OA report headers to
          compensate for the difference between OA and GT frequencies (say when
          copying OA data to userspace)?

	  Though not sure if this is preferable to having userspace do this.

A couple of minor optional nits on that patch below too.

> +u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915)
> +{
> +	/* Wa_18013179988:dg2 */
> +	if (IS_DG2(i915)) {
> +		intel_wakeref_t wakeref;
> +		u32 reg, shift;
> +
> +		with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref)
> +			reg = intel_uncore_read(to_gt(i915)->uncore, RPM_CONFIG0);
> +
> +		shift = (reg & GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK) >>
> +			 GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_SHIFT;

This can be:
		shift = REG_FIELD_GET(GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, reg);

>  static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
>  {
> -	return intel_gt_clock_interval_to_ns(to_gt(perf->i915),
> -					     2ULL << exponent);
> +	u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
> +	u32 den = i915_perf_oa_timestamp_frequency(perf->i915);
> +
> +	return div_u64(nom + den - 1, den);

div_u64_roundup?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 17/19] drm/i915/perf: Save/restore EU flex counters across reset
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 17/19] drm/i915/perf: Save/restore EU flex counters across reset Umesh Nerlige Ramappa
@ 2022-09-16  5:40   ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-16  5:40 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:53 -0700, Umesh Nerlige Ramappa wrote:
>
> If a drm client is killed, then hw contexts used by the client are reset
> immediately. This reset clears the EU flex counter configuration. If an
> OA use case is running in parallel, it would start seeing zeroed eu
> counter values following the reset even if the drm client is restarted.
> Save/restore the EU flex counter config so that the EU counters can be
> monitored continuously across resets.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

Not sure if this needs to be done for non-GuC (execlists) too? Anyway
that's a later patch.

> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 74cbe8eaf531..3e152219fcb2 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -375,6 +375,14 @@ static int guc_mmio_regset_init(struct temp_regset *regset,
>	for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++)
>		ret |= GUC_MMIO_REG_ADD(gt, regset, GEN9_LNCFCMOCS(i), false);
>
> +	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL0, false);
> +	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL1, false);
> +	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL2, false);
> +	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL3, false);
> +	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL4, false);
> +	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL5, false);
> +	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL6, false);
> +
>	return ret ? -1 : 0;
>  }
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988
  2022-09-16  5:16   ` Dixit, Ashutosh
@ 2022-09-16 15:22     ` Dixit, Ashutosh
  2022-09-16 19:04       ` Umesh Nerlige Ramappa
  2022-09-16 18:56     ` Umesh Nerlige Ramappa
  1 sibling, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-16 15:22 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Thu, 15 Sep 2022 22:16:30 -0700, Dixit, Ashutosh wrote:
>
> On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
> >
>
> Hi Umesh,
>
> > OA reports in the OA buffer contain an OA timestamp field that helps
> > user calculate delta between 2 OA reports. The calculation relies on the
> > CS timestamp frequency to convert the timestamp value to nanoseconds.
> > The CS timestamp frequency is a function of the CTC_SHIFT value in
> > RPM_CONFIG0.
> >
> > In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
> > actual value from RPM_CONFIG0. At the user level, this results in an
> > error in calculating delta between 2 OA reports since the OA timestamp
> > is not shifted in the same manner as CS timestamp.
> >
> > To resolve this, return actual OA timestamp frequency to the user in
> > i915_getparam_ioctl.
>
> Rather than exposing actual OA timestamp frequency to userspace (with the
> corresponding uapi change, specially if it's only DG2 and not all future
> products) questions about a couple of other options:
>
> Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
>           same as OA freq :-)
>
>    The HSD seems to mention this:
>    Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
>    Note: Changing the shift setting on live driver may break apps that are
>    currently running (including desktop manager).
>
> Option 2. Is it possible to correct the timestamps in OA report headers to
>           compensate for the difference between OA and GT frequencies (say when
>           copying OA data to userspace)?
>
>	  Though not sure if this is preferable to having userspace do this.

Also do we need input from userland on this patch? UMD's might need to
assess the impact of having different GT and OA frequencies at their end
since they consume OA data?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 14/19] drm/i915/perf: Add Wa_1608133521:dg2
  2022-09-16  1:21   ` Dixit, Ashutosh
@ 2022-09-16 18:19     ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-16 18:19 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Thu, Sep 15, 2022 at 06:21:55PM -0700, Dixit, Ashutosh wrote:
>On Tue, 23 Aug 2022 13:41:50 -0700, Umesh Nerlige Ramappa wrote:
>>
>> DG2 introduces 64 bit counters and OA reports that have 64 bit values
>> for fields in the report header - report_id, timestamp, context_id and
>> gpu ticks. i915 uses report_id, timestamp and context_id to check for
>> valid reports.
>>
>> In some DG2 variants, only the lower dwords for timestamp, report_id and
>> context_id are accessible. Add workaround for such reports.
>
>Once again, if we are productizing A-step or it is going to be in CI
>upstream, this is:

No, we are not. I am dropping A0 specific fixes from this series in the 
next revision. Doing so will also simplify implementing Jani's comment 
here to have a 'per variant const oa format array'.

Thanks,
Umesh
>
>Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_perf.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index a28f07923d8f..a858ce57e465 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -310,7 +310,7 @@ static u32 i915_oa_max_sample_rate = 100000;
>>   * be used as a mask to align the OA tail pointer. In some of the
>>   * formats, R is used to denote reserved field.
>>   */
>> -static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>> +static struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
>>	[I915_OA_FORMAT_A13]	    = { 0, 64 },
>>	[I915_OA_FORMAT_A29]	    = { 1, 128 },
>>	[I915_OA_FORMAT_A13_B8_C8]  = { 2, 128 },
>> @@ -4746,6 +4746,13 @@ static void oa_init_supported_formats(struct i915_perf *perf)
>>		/* Wa_16010703925:dg2 */
>>		clear_bit(I915_OAR_FORMAT_A36u64_B8_C8, perf->format_mask);
>>	}
>> +
>> +	if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0) ||
>> +	    IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_FOREVER)) {
>> +		/* Wa_1608133521:dg2 */
>> +		oa_formats[I915_OAR_FORMAT_A36u64_B8_C8].header = HDR_32_BIT;
>> +		oa_formats[I915_OA_FORMAT_A38u64_R2u64_B8_C8].header = HDR_32_BIT;
>> +	}
>>  }
>>
>>  static void i915_perf_init_info(struct drm_i915_private *i915)
>> --
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988
  2022-09-16  5:16   ` Dixit, Ashutosh
  2022-09-16 15:22     ` Dixit, Ashutosh
@ 2022-09-16 18:56     ` Umesh Nerlige Ramappa
  2022-09-16 19:57       ` Dixit, Ashutosh
  1 sibling, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-16 18:56 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
>On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
>>
>
>Hi Umesh,
>
>> OA reports in the OA buffer contain an OA timestamp field that helps
>> user calculate delta between 2 OA reports. The calculation relies on the
>> CS timestamp frequency to convert the timestamp value to nanoseconds.
>> The CS timestamp frequency is a function of the CTC_SHIFT value in
>> RPM_CONFIG0.
>>
>> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
>> actual value from RPM_CONFIG0. At the user level, this results in an
>> error in calculating delta between 2 OA reports since the OA timestamp
>> is not shifted in the same manner as CS timestamp.
>>
>> To resolve this, return actual OA timestamp frequency to the user in
>> i915_getparam_ioctl.
>
>Rather than exposing actual OA timestamp frequency to userspace (with the
>corresponding uapi change, specially if it's only DG2 and not all future
>products) questions about a couple of other options:
>
>Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
>          same as OA freq :-)
>
>   The HSD seems to mention this:
>   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
>   Note: Changing the shift setting on live driver may break apps that are
>   currently running (including desktop manager).
>
>Option 2. Is it possible to correct the timestamps in OA report headers to
>          compensate for the difference between OA and GT frequencies (say when
>          copying OA data to userspace)?
>
>	  Though not sure if this is preferable to having userspace do this.

It does affect other platforms too. There's no guarantee on what the 
CTC_SHIFT value would be for different platforms, so user would have to 
at least query that somehow (maybe from i915). It's simpler for user to 
use the exported OA frequency since it is also backwards compatible.

https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is 
consumed by GPUvis. That reminds me, I should include the UMD links for 
the patches with uapi changes.

>
>A couple of minor optional nits on that patch below too.
>
>> +u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915)
>> +{
>> +	/* Wa_18013179988:dg2 */
>> +	if (IS_DG2(i915)) {
>> +		intel_wakeref_t wakeref;
>> +		u32 reg, shift;
>> +
>> +		with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref)
>> +			reg = intel_uncore_read(to_gt(i915)->uncore, RPM_CONFIG0);
>> +
>> +		shift = (reg & GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK) >>
>> +			 GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_SHIFT;
>
>This can be:
>		shift = REG_FIELD_GET(GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, reg);

sure, will change

>
>>  static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
>>  {
>> -	return intel_gt_clock_interval_to_ns(to_gt(perf->i915),
>> -					     2ULL << exponent);
>> +	u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
>> +	u32 den = i915_perf_oa_timestamp_frequency(perf->i915);
>> +
>> +	return div_u64(nom + den - 1, den);
>
>div_u64_roundup?

true, but that is statically defined within intel_gt_clock_utils.c. I 
didn't think there are enough users to export it outside.

Thanks,
Umesh


>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988
  2022-09-16 15:22     ` Dixit, Ashutosh
@ 2022-09-16 19:04       ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-16 19:04 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, Sep 16, 2022 at 08:22:40AM -0700, Dixit, Ashutosh wrote:
>On Thu, 15 Sep 2022 22:16:30 -0700, Dixit, Ashutosh wrote:
>>
>> On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
>> >
>>
>> Hi Umesh,
>>
>> > OA reports in the OA buffer contain an OA timestamp field that helps
>> > user calculate delta between 2 OA reports. The calculation relies on the
>> > CS timestamp frequency to convert the timestamp value to nanoseconds.
>> > The CS timestamp frequency is a function of the CTC_SHIFT value in
>> > RPM_CONFIG0.
>> >
>> > In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
>> > actual value from RPM_CONFIG0. At the user level, this results in an
>> > error in calculating delta between 2 OA reports since the OA timestamp
>> > is not shifted in the same manner as CS timestamp.
>> >
>> > To resolve this, return actual OA timestamp frequency to the user in
>> > i915_getparam_ioctl.
>>
>> Rather than exposing actual OA timestamp frequency to userspace (with the
>> corresponding uapi change, specially if it's only DG2 and not all future
>> products) questions about a couple of other options:
>>
>> Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
>>           same as OA freq :-)
>>
>>    The HSD seems to mention this:
>>    Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
>>    Note: Changing the shift setting on live driver may break apps that are
>>    currently running (including desktop manager).
>>
>> Option 2. Is it possible to correct the timestamps in OA report headers to
>>           compensate for the difference between OA and GT frequencies (say when
>>           copying OA data to userspace)?
>>
>>	  Though not sure if this is preferable to having userspace do this.
>
>Also do we need input from userland on this patch? UMD's might need to
>assess the impact of having different GT and OA frequencies at their end
>since they consume OA data?

Lionel is aware of the change and I believe he has some patches to 
consume this API for the GPUvis support, but we need an Ack from 
Joonas/maintainer to merge any uapi changes. I will add them to the next 
revision.

Thanks,
Umesh

>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988
  2022-09-16 18:56     ` Umesh Nerlige Ramappa
@ 2022-09-16 19:57       ` Dixit, Ashutosh
  2022-09-16 20:25         ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-16 19:57 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 16 Sep 2022 11:56:04 -0700, Umesh Nerlige Ramappa wrote:
>
> On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
> > On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >
> > Hi Umesh,
> >
> >> OA reports in the OA buffer contain an OA timestamp field that helps
> >> user calculate delta between 2 OA reports. The calculation relies on the
> >> CS timestamp frequency to convert the timestamp value to nanoseconds.
> >> The CS timestamp frequency is a function of the CTC_SHIFT value in
> >> RPM_CONFIG0.
> >>
> >> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
> >> actual value from RPM_CONFIG0. At the user level, this results in an
> >> error in calculating delta between 2 OA reports since the OA timestamp
> >> is not shifted in the same manner as CS timestamp.
> >>
> >> To resolve this, return actual OA timestamp frequency to the user in
> >> i915_getparam_ioctl.
> >
> > Rather than exposing actual OA timestamp frequency to userspace (with the
> > corresponding uapi change, specially if it's only DG2 and not all future
> > products) questions about a couple of other options:
> >
> > Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
> >          same as OA freq :-)
> >
> >   The HSD seems to mention this:
> >   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
> >   Note: Changing the shift setting on live driver may break apps that are
> >   currently running (including desktop manager).
> >
> > Option 2. Is it possible to correct the timestamps in OA report headers to
> >          compensate for the difference between OA and GT frequencies (say when
> >          copying OA data to userspace)?
> >
> >	  Though not sure if this is preferable to having userspace do this.
>
> It does affect other platforms too. There's no guarantee on what the
> CTC_SHIFT value would be for different platforms, so user would have to at
> least query that somehow (maybe from i915). It's simpler for user to use
> the exported OA frequency since it is also backwards compatible.

Is Option 2 above feasible since it would stop propagating the change to
various UMD's?

> https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is
> consumed by GPUvis. That reminds me, I should include the UMD links for the
> patches with uapi changes.

I was thinking more about UMD's which analayze OA data and who till now are
probably assuming OA freq == GT freq and will now have to drop that
assumption. So not sure how widespread would be these changes in
the (multiple different?) UMD(s).

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988
  2022-09-16 19:57       ` Dixit, Ashutosh
@ 2022-09-16 20:25         ` Umesh Nerlige Ramappa
  2022-09-16 21:00           ` Dixit, Ashutosh
  0 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-16 20:25 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, Sep 16, 2022 at 12:57:19PM -0700, Dixit, Ashutosh wrote:
>On Fri, 16 Sep 2022 11:56:04 -0700, Umesh Nerlige Ramappa wrote:
>>
>> On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
>> > On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
>> >>
>> >
>> > Hi Umesh,
>> >
>> >> OA reports in the OA buffer contain an OA timestamp field that helps
>> >> user calculate delta between 2 OA reports. The calculation relies on the
>> >> CS timestamp frequency to convert the timestamp value to nanoseconds.
>> >> The CS timestamp frequency is a function of the CTC_SHIFT value in
>> >> RPM_CONFIG0.
>> >>
>> >> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
>> >> actual value from RPM_CONFIG0. At the user level, this results in an
>> >> error in calculating delta between 2 OA reports since the OA timestamp
>> >> is not shifted in the same manner as CS timestamp.
>> >>
>> >> To resolve this, return actual OA timestamp frequency to the user in
>> >> i915_getparam_ioctl.
>> >
>> > Rather than exposing actual OA timestamp frequency to userspace (with the
>> > corresponding uapi change, specially if it's only DG2 and not all future
>> > products) questions about a couple of other options:
>> >
>> > Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
>> >          same as OA freq :-)
>> >
>> >   The HSD seems to mention this:
>> >   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
>> >   Note: Changing the shift setting on live driver may break apps that are
>> >   currently running (including desktop manager).
>> >
>> > Option 2. Is it possible to correct the timestamps in OA report headers to
>> >          compensate for the difference between OA and GT frequencies (say when
>> >          copying OA data to userspace)?
>> >
>> >	  Though not sure if this is preferable to having userspace do this.
>>
>> It does affect other platforms too. There's no guarantee on what the
>> CTC_SHIFT value would be for different platforms, so user would have to at
>> least query that somehow (maybe from i915). It's simpler for user to use
>> the exported OA frequency since it is also backwards compatible.
>
>Is Option 2 above feasible since it would stop propagating the change to
>various UMD's?

Hmm, there is logic today that squashes context ids when doing oa buffer
filtering, but it does that on selective reports (i.e. if a gem_context 
is passed).

For this issue: for a 16MB OA buffer with 256 byte reports, that would 
be an additional write of 262144 in the kmd (to smem). For 20us sampled 
OA reports, it would be approx. 195 KB/s. Shouldn't be too much. Only 2 
concerns:

- the mmapped use case may break, but I don't see that being upstreamed.  
   We may have divergent solutions for upstream and internal.
- blocking/polling tests in IGT will be sensitive to this change on some 
   platforms and may need to be bolstered.

I will give it a shot and get back,

Thanks,
Umesh

>
>> https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is
>> consumed by GPUvis. That reminds me, I should include the UMD links for the
>> patches with uapi changes.
>
>I was thinking more about UMD's which analayze OA data and who till now are
>probably assuming OA freq == GT freq and will now have to drop that
>assumption. So not sure how widespread would be these changes in
>the (multiple different?) UMD(s).
>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988
  2022-09-16 20:25         ` Umesh Nerlige Ramappa
@ 2022-09-16 21:00           ` Dixit, Ashutosh
  2022-09-19 21:21             ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-16 21:00 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 16 Sep 2022 13:25:17 -0700, Umesh Nerlige Ramappa wrote:
>
> On Fri, Sep 16, 2022 at 12:57:19PM -0700, Dixit, Ashutosh wrote:
> > On Fri, 16 Sep 2022 11:56:04 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >> On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
> >> > On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
> >> >>
> >> >
> >> > Hi Umesh,
> >> >
> >> >> OA reports in the OA buffer contain an OA timestamp field that helps
> >> >> user calculate delta between 2 OA reports. The calculation relies on the
> >> >> CS timestamp frequency to convert the timestamp value to nanoseconds.
> >> >> The CS timestamp frequency is a function of the CTC_SHIFT value in
> >> >> RPM_CONFIG0.
> >> >>
> >> >> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
> >> >> actual value from RPM_CONFIG0. At the user level, this results in an
> >> >> error in calculating delta between 2 OA reports since the OA timestamp
> >> >> is not shifted in the same manner as CS timestamp.
> >> >>
> >> >> To resolve this, return actual OA timestamp frequency to the user in
> >> >> i915_getparam_ioctl.
> >> >
> >> > Rather than exposing actual OA timestamp frequency to userspace (with the
> >> > corresponding uapi change, specially if it's only DG2 and not all future
> >> > products) questions about a couple of other options:
> >> >
> >> > Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
> >> >          same as OA freq :-)
> >> >
> >> >   The HSD seems to mention this:
> >> >   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
> >> >   Note: Changing the shift setting on live driver may break apps that are
> >> >   currently running (including desktop manager).
> >> >
> >> > Option 2. Is it possible to correct the timestamps in OA report headers to
> >> >          compensate for the difference between OA and GT frequencies (say when
> >> >          copying OA data to userspace)?
> >> >
> >> >	  Though not sure if this is preferable to having userspace do this.
> >>
> >> It does affect other platforms too. There's no guarantee on what the
> >> CTC_SHIFT value would be for different platforms, so user would have to at
> >> least query that somehow (maybe from i915). It's simpler for user to use
> >> the exported OA frequency since it is also backwards compatible.
> >
> > Is Option 2 above feasible since it would stop propagating the change to
> > various UMD's?
>
> Hmm, there is logic today that squashes context ids when doing oa buffer
> filtering, but it does that on selective reports (i.e. if a gem_context is
> passed).
>
> For this issue: for a 16MB OA buffer with 256 byte reports, that would be
> an additional write of 262144 in the kmd (to smem). For 20us sampled OA
> reports, it would be approx. 195 KB/s. Shouldn't be too much. Only 2
> concerns:
>
> - the mmapped use case may break, but I don't see that being upstreamed.
> We may have divergent solutions for upstream and internal.
> - blocking/polling tests in IGT will be sensitive to this change on some
> platforms and may need to be bolstered.

If this correction/compensation in the kernel works out, even for internal
too we could do the following:

* For non-mmaped case, do the correction in the kernel and expose OA freq
  == GT freq (in the getparam ioctl)
* For mmaped case expose the actual OA freq (!= GT freq)

This will restrict the divergence only to the mmaped case (which we will
probably not be able to upstream).

>
> I will give it a shot and get back,
>
> Thanks,
> Umesh
>
> >
> >> https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is
> >> consumed by GPUvis. That reminds me, I should include the UMD links for the
> >> patches with uapi changes.
> >
> > I was thinking more about UMD's which analayze OA data and who till now are
> > probably assuming OA freq == GT freq and will now have to drop that
> > assumption. So not sure how widespread would be these changes in
> > the (multiple different?) UMD(s).
> >
> > Thanks.
> > --
> > Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 18/19] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 18/19] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
@ 2022-09-16 21:41   ` Dixit, Ashutosh
  2022-09-16 21:48     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-16 21:41 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:54 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>
> There is a w/a to reset RCS/CCS before it goes into RC6. This breaks
> OA. Fix it by disabling RC6.

Need to mention DG2 in the commit message?

> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> ---
>  .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h |  9 ++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 45 +++++++++++++++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  2 +
>  drivers/gpu/drm/i915/i915_perf.c              | 29 ++++++++++++
>  4 files changed, 85 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
> index 4c840a2639dc..811add10c30d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
> @@ -128,6 +128,15 @@ enum slpc_media_ratio_mode {
>	SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_TWO = 2,
>  };
>
> +enum slpc_gucrc_mode {
> +	SLPC_GUCRC_MODE_HW = 0,
> +	SLPC_GUCRC_MODE_GUCRC_NO_RC6 = 1,
> +	SLPC_GUCRC_MODE_GUCRC_STATIC_TIMEOUT = 2,
> +	SLPC_GUCRC_MODE_GUCRC_DYNAMIC_HYSTERESIS = 3,
> +
> +	SLPC_GUCRC_MODE_MAX,
> +};
> +
>  enum slpc_event_id {
>	SLPC_EVENT_RESET = 0,
>	SLPC_EVENT_SHUTDOWN = 1,
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> index e1fa1f32f29e..23989f5452a7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> @@ -642,6 +642,51 @@ static void slpc_get_rp_values(struct intel_guc_slpc *slpc)
>		slpc->boost_freq = slpc->rp0_freq;
>  }
>
> +/**
> + * intel_guc_slpc_override_gucrc_mode() - override GUCRC mode
> + * @slpc: pointer to intel_guc_slpc.
> + * @mode: new value of the mode.
> + *
> + * This function will override the GUCRC mode.
> + *
> + * Return: 0 on success, non-zero error code on failure.
> + */
> +int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode)
> +{
> +	int ret;
> +	struct drm_i915_private *i915 = slpc_to_i915(slpc);
> +	intel_wakeref_t wakeref;
> +
> +	if (mode >= SLPC_GUCRC_MODE_MAX)
> +		return -EINVAL;
> +
> +	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
> +
> +	ret = slpc_set_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE, mode);
> +	if (ret)
> +		drm_err(&i915->drm,
> +			"Override gucrc mode %d failed %d\n",
> +			mode, ret);
> +
> +	intel_runtime_pm_put(&i915->runtime_pm, wakeref);

nit but I think let's switch to with_intel_runtime_pm() since all other
slpc functions use that.

> +
> +	return ret;
> +}
> +
> +int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc)
> +{
> +	struct drm_i915_private *i915 = slpc_to_i915(slpc);
> +	int ret = 0;
> +
> +	ret = slpc_unset_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE);

Looks like slpc_unset_param() is not present so that needs to be added to
the patch too, otherwise probably doesn't even compile.

> +	if (ret)
> +		drm_err(&i915->drm,
> +			"Unsetting gucrc mode failed %d\n",
> +			ret);
> +
> +	return ret;
> +}
> +
>  /*
>   * intel_guc_slpc_enable() - Start SLPC
>   * @slpc: pointer to intel_guc_slpc.
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
> index 82a98f78f96c..ccf483730d9d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
> @@ -42,5 +42,7 @@ int intel_guc_slpc_set_media_ratio_mode(struct intel_guc_slpc *slpc, u32 val);
>  void intel_guc_pm_intrmsk_enable(struct intel_gt *gt);
>  void intel_guc_slpc_boost(struct intel_guc_slpc *slpc);
>  void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc);
> +int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc);
> +int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode);
>
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 132c2ce8b33b..ce1b6ad4d107 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -208,6 +208,7 @@
>  #include "gt/intel_lrc.h"
>  #include "gt/intel_lrc_reg.h"
>  #include "gt/intel_ring.h"
> +#include "gt/uc/intel_guc_slpc.h"
>
>  #include "i915_drv.h"
>  #include "i915_file_private.h"
> @@ -1651,6 +1652,16 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
>
>	free_oa_buffer(stream);
>
> +	/*
> +	 * Wa_16011777198:dg2: Unset the override of GUCRC mode to enable rc6.
> +	 */
> +	if (intel_guc_slpc_is_used(&gt->uc.guc) &&
> +	    intel_uc_uses_guc_rc(&gt->uc) &&
> +	    (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
> +	     IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0)))

Do these steppings need to be tweaked, otherwise ok as is too.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 18/19] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-09-16 21:41   ` Dixit, Ashutosh
@ 2022-09-16 21:48     ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-16 21:48 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, Sep 16, 2022 at 02:41:01PM -0700, Dixit, Ashutosh wrote:
>On Tue, 23 Aug 2022 13:41:54 -0700, Umesh Nerlige Ramappa wrote:
>>
>> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>
>> There is a w/a to reset RCS/CCS before it goes into RC6. This breaks
>> OA. Fix it by disabling RC6.
>
>Need to mention DG2 in the commit message?
>

will do

>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>> ---
>>  .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h |  9 ++++
>>  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 45 +++++++++++++++++++
>>  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  2 +
>>  drivers/gpu/drm/i915/i915_perf.c              | 29 ++++++++++++
>>  4 files changed, 85 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
>> index 4c840a2639dc..811add10c30d 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
>> @@ -128,6 +128,15 @@ enum slpc_media_ratio_mode {
>>	SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_TWO = 2,
>>  };
>>
>> +enum slpc_gucrc_mode {
>> +	SLPC_GUCRC_MODE_HW = 0,
>> +	SLPC_GUCRC_MODE_GUCRC_NO_RC6 = 1,
>> +	SLPC_GUCRC_MODE_GUCRC_STATIC_TIMEOUT = 2,
>> +	SLPC_GUCRC_MODE_GUCRC_DYNAMIC_HYSTERESIS = 3,
>> +
>> +	SLPC_GUCRC_MODE_MAX,
>> +};
>> +
>>  enum slpc_event_id {
>>	SLPC_EVENT_RESET = 0,
>>	SLPC_EVENT_SHUTDOWN = 1,
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>> index e1fa1f32f29e..23989f5452a7 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>> @@ -642,6 +642,51 @@ static void slpc_get_rp_values(struct intel_guc_slpc *slpc)
>>		slpc->boost_freq = slpc->rp0_freq;
>>  }
>>
>> +/**
>> + * intel_guc_slpc_override_gucrc_mode() - override GUCRC mode
>> + * @slpc: pointer to intel_guc_slpc.
>> + * @mode: new value of the mode.
>> + *
>> + * This function will override the GUCRC mode.
>> + *
>> + * Return: 0 on success, non-zero error code on failure.
>> + */
>> +int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode)
>> +{
>> +	int ret;
>> +	struct drm_i915_private *i915 = slpc_to_i915(slpc);
>> +	intel_wakeref_t wakeref;
>> +
>> +	if (mode >= SLPC_GUCRC_MODE_MAX)
>> +		return -EINVAL;
>> +
>> +	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
>> +
>> +	ret = slpc_set_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE, mode);
>> +	if (ret)
>> +		drm_err(&i915->drm,
>> +			"Override gucrc mode %d failed %d\n",
>> +			mode, ret);
>> +
>> +	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
>
>nit but I think let's switch to with_intel_runtime_pm() since all other
>slpc functions use that.
>

will do
>> +
>> +	return ret;
>> +}
>> +
>> +int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc)
>> +{
>> +	struct drm_i915_private *i915 = slpc_to_i915(slpc);
>> +	int ret = 0;
>> +
>> +	ret = slpc_unset_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE);
>
>Looks like slpc_unset_param() is not present so that needs to be added to
>the patch too, otherwise probably doesn't even compile.
>

yes, looks like a recent patch removed it. I will add it back here.

>> +	if (ret)
>> +		drm_err(&i915->drm,
>> +			"Unsetting gucrc mode failed %d\n",
>> +			ret);
>> +
>> +	return ret;
>> +}
>> +
>>  /*
>>   * intel_guc_slpc_enable() - Start SLPC
>>   * @slpc: pointer to intel_guc_slpc.
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
>> index 82a98f78f96c..ccf483730d9d 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
>> @@ -42,5 +42,7 @@ int intel_guc_slpc_set_media_ratio_mode(struct intel_guc_slpc *slpc, u32 val);
>>  void intel_guc_pm_intrmsk_enable(struct intel_gt *gt);
>>  void intel_guc_slpc_boost(struct intel_guc_slpc *slpc);
>>  void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc);
>> +int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc);
>> +int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode);
>>
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 132c2ce8b33b..ce1b6ad4d107 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -208,6 +208,7 @@
>>  #include "gt/intel_lrc.h"
>>  #include "gt/intel_lrc_reg.h"
>>  #include "gt/intel_ring.h"
>> +#include "gt/uc/intel_guc_slpc.h"
>>
>>  #include "i915_drv.h"
>>  #include "i915_file_private.h"
>> @@ -1651,6 +1652,16 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
>>
>>	free_oa_buffer(stream);
>>
>> +	/*
>> +	 * Wa_16011777198:dg2: Unset the override of GUCRC mode to enable rc6.
>> +	 */
>> +	if (intel_guc_slpc_is_used(&gt->uc.guc) &&
>> +	    intel_uc_uses_guc_rc(&gt->uc) &&
>> +	    (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
>> +	     IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0)))
>
>Do these steppings need to be tweaked, otherwise ok as is too.

Probably the G11 can be dropped, but I would leave it as is.

Thanks,
Umesh


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988
  2022-09-16 21:00           ` Dixit, Ashutosh
@ 2022-09-19 21:21             ` Umesh Nerlige Ramappa
  2022-09-20  1:24               ` Dixit, Ashutosh
  0 siblings, 1 reply; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-19 21:21 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, Sep 16, 2022 at 02:00:19PM -0700, Dixit, Ashutosh wrote:
>On Fri, 16 Sep 2022 13:25:17 -0700, Umesh Nerlige Ramappa wrote:
>>
>> On Fri, Sep 16, 2022 at 12:57:19PM -0700, Dixit, Ashutosh wrote:
>> > On Fri, 16 Sep 2022 11:56:04 -0700, Umesh Nerlige Ramappa wrote:
>> >>
>> >> On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
>> >> > On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
>> >> >>
>> >> >
>> >> > Hi Umesh,
>> >> >
>> >> >> OA reports in the OA buffer contain an OA timestamp field that helps
>> >> >> user calculate delta between 2 OA reports. The calculation relies on the
>> >> >> CS timestamp frequency to convert the timestamp value to nanoseconds.
>> >> >> The CS timestamp frequency is a function of the CTC_SHIFT value in
>> >> >> RPM_CONFIG0.
>> >> >>
>> >> >> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
>> >> >> actual value from RPM_CONFIG0. At the user level, this results in an
>> >> >> error in calculating delta between 2 OA reports since the OA timestamp
>> >> >> is not shifted in the same manner as CS timestamp.
>> >> >>
>> >> >> To resolve this, return actual OA timestamp frequency to the user in
>> >> >> i915_getparam_ioctl.
>> >> >
>> >> > Rather than exposing actual OA timestamp frequency to userspace (with the
>> >> > corresponding uapi change, specially if it's only DG2 and not all future
>> >> > products) questions about a couple of other options:
>> >> >
>> >> > Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
>> >> >          same as OA freq :-)
>> >> >
>> >> >   The HSD seems to mention this:
>> >> >   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
>> >> >   Note: Changing the shift setting on live driver may break apps that are
>> >> >   currently running (including desktop manager).
>> >> >
>> >> > Option 2. Is it possible to correct the timestamps in OA report headers to
>> >> >          compensate for the difference between OA and GT frequencies (say when
>> >> >          copying OA data to userspace)?
>> >> >
>> >> >	  Though not sure if this is preferable to having userspace do this.
>> >>
>> >> It does affect other platforms too. There's no guarantee on what the
>> >> CTC_SHIFT value would be for different platforms, so user would have to at
>> >> least query that somehow (maybe from i915). It's simpler for user to use
>> >> the exported OA frequency since it is also backwards compatible.
>> >
>> > Is Option 2 above feasible since it would stop propagating the change to
>> > various UMD's?
>>
>> Hmm, there is logic today that squashes context ids when doing oa buffer
>> filtering, but it does that on selective reports (i.e. if a gem_context is
>> passed).
>>
>> For this issue: for a 16MB OA buffer with 256 byte reports, that would be
>> an additional write of 262144 in the kmd (to smem). For 20us sampled OA
>> reports, it would be approx. 195 KB/s. Shouldn't be too much. Only 2
>> concerns:
>>
>> - the mmapped use case may break, but I don't see that being upstreamed.
>> We may have divergent solutions for upstream and internal.
>> - blocking/polling tests in IGT will be sensitive to this change on some
>> platforms and may need to be bolstered.
>
>If this correction/compensation in the kernel works out, even for internal
>too we could do the following:
>
>* For non-mmaped case, do the correction in the kernel and expose OA freq
>  == GT freq (in the getparam ioctl)
>* For mmaped case expose the actual OA freq (!= GT freq)
>
>This will restrict the divergence only to the mmaped case (which we will
>probably not be able to upstream).
>
>>
>> I will give it a shot and get back,

We cannot tweak this in the OA report header since that will be out of 
sync with the counters in the report. The other issue here is that the 
bug also applies to MI_REPORT_PERF_COUNT, and KMD cannot do anything to 
fix that. I would think this interface is the clean way to do this.

Thanks,
Umesh

>>
>> Thanks,
>> Umesh
>>
>> >
>> >> https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is
>> >> consumed by GPUvis. That reminds me, I should include the UMD links for the
>> >> patches with uapi changes.
>> >
>> > I was thinking more about UMD's which analayze OA data and who till now are
>> > probably assuming OA freq == GT freq and will now have to drop that
>> > assumption. So not sure how widespread would be these changes in
>> > the (multiple different?) UMD(s).
>> >
>> > Thanks.
>> > --
>> > Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 03/19] drm/i915/perf: Fix noa wait predication for DG2
  2022-08-23 20:41 ` [Intel-gfx] [PATCH 03/19] drm/i915/perf: Fix noa wait predication " Umesh Nerlige Ramappa
@ 2022-09-20  0:35   ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-20  0:35 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 23 Aug 2022 13:41:39 -0700, Umesh Nerlige Ramappa wrote:
>
> Predication for batch buffer commands changed in XEHPSDV.
> MI_BATCH_BUFFER_START predicates based on MI_SET_PREDICATE_RESULT
> register. The MI_SET_PREDICATE_RESULT register can only be modified
> with MI_SET_PREDICATE command. When configured, the MI_SET_PREDICATE
> command sets MI_SET_PREDICATE_RESULT based on bit 0 of
> MI_PREDICATE_RESULT_2. Use this to configure predication in noa_wait.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_regs.h |  1 +
>  drivers/gpu/drm/i915/i915_perf.c            | 24 +++++++++++++++++----
>  2 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_regs.h b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
> index 889f0df3940b..25d23f3a4769 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
> @@ -200,6 +200,7 @@
>  #define RING_CONTEXT_STATUS_PTR(base)		_MMIO((base) + 0x3a0)
>  #define RING_CTX_TIMESTAMP(base)		_MMIO((base) + 0x3a8) /* gen8+ */
>  #define RING_PREDICATE_RESULT(base)		_MMIO((base) + 0x3b8)
> +#define MI_PREDICATE_RESULT_2_ENGINE(base)	_MMIO((base) + 0x3bc)
>  #define RING_FORCE_TO_NONPRIV(base, i)		_MMIO(((base) + 0x4D0) + (i) * 4)
>  #define   RING_FORCE_TO_NONPRIV_DENY		REG_BIT(30)
>  #define   RING_FORCE_TO_NONPRIV_ADDRESS_MASK	REG_GENMASK(25, 2)
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index c8331b549d31..3526693d64fa 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -286,6 +286,7 @@ static u32 i915_perf_stream_paranoid = true;
>  #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
>  #define OAREPORT_REASON_CLK_RATIO      (1<<5)
>
> +#define HAS_MI_SET_PREDICATE(i915) (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
>
>  /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
>   *
> @@ -1766,6 +1767,9 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
>		DELTA_TARGET,
>		N_CS_GPR
>	};
> +	i915_reg_t mi_predicate_result = HAS_MI_SET_PREDICATE(i915) ?
> +					  MI_PREDICATE_RESULT_2_ENGINE(base) :
> +					  MI_PREDICATE_RESULT_1(RENDER_RING_BASE);
>
>	bo = i915_gem_object_create_internal(i915, 4096);
>	if (IS_ERR(bo)) {
> @@ -1803,7 +1807,7 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
>			stream, cs, true /* save */, CS_GPR(i),
>			INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
>	cs = save_restore_register(
> -		stream, cs, true /* save */, MI_PREDICATE_RESULT_1(RENDER_RING_BASE),
> +		stream, cs, true /* save */, mi_predicate_result,
>		INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
>
>	/* First timestamp snapshot location. */
> @@ -1857,7 +1861,10 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
>	 */
>	*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
>	*cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE));
> -	*cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1(RENDER_RING_BASE));
> +	*cs++ = i915_mmio_reg_offset(mi_predicate_result);
> +
> +	if (HAS_MI_SET_PREDICATE(i915))
> +		*cs++ = MI_SET_PREDICATE | 1;
>
>	/* Restart from the beginning if we had timestamps roll over. */
>	*cs++ = (GRAPHICS_VER(i915) < 8 ?
> @@ -1867,6 +1874,9 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
>	*cs++ = i915_ggtt_offset(vma) + (ts0 - batch) * 4;
>	*cs++ = 0;
>
> +	if (HAS_MI_SET_PREDICATE(i915))
> +		*cs++ = MI_SET_PREDICATE;
> +
>	/*
>	 * Now add the diff between to previous timestamps and add it to :
>	 *      (((1 * << 64) - 1) - delay_ns)
> @@ -1894,7 +1904,10 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
>	 */
>	*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
>	*cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE));
> -	*cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1(RENDER_RING_BASE));
> +	*cs++ = i915_mmio_reg_offset(mi_predicate_result);
> +
> +	if (HAS_MI_SET_PREDICATE(i915))
> +		*cs++ = MI_SET_PREDICATE | 1;
>
>	/* Predicate the jump.  */
>	*cs++ = (GRAPHICS_VER(i915) < 8 ?
> @@ -1904,13 +1917,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
>	*cs++ = i915_ggtt_offset(vma) + (jump - batch) * 4;
>	*cs++ = 0;
>
> +	if (HAS_MI_SET_PREDICATE(i915))
> +		*cs++ = MI_SET_PREDICATE;
> +
>	/* Restore registers. */
>	for (i = 0; i < N_CS_GPR; i++)
>		cs = save_restore_register(
>			stream, cs, false /* restore */, CS_GPR(i),
>			INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
>	cs = save_restore_register(
> -		stream, cs, false /* restore */, MI_PREDICATE_RESULT_1(RENDER_RING_BASE),
> +		stream, cs, false /* restore */, mi_predicate_result,
>		INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
>
>	/* And return to the ring. */
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988
  2022-09-19 21:21             ` Umesh Nerlige Ramappa
@ 2022-09-20  1:24               ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-20  1:24 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Mon, 19 Sep 2022 14:21:07 -0700, Umesh Nerlige Ramappa wrote:
>
> On Fri, Sep 16, 2022 at 02:00:19PM -0700, Dixit, Ashutosh wrote:
> > On Fri, 16 Sep 2022 13:25:17 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >> On Fri, Sep 16, 2022 at 12:57:19PM -0700, Dixit, Ashutosh wrote:
> >> > On Fri, 16 Sep 2022 11:56:04 -0700, Umesh Nerlige Ramappa wrote:
> >> >>
> >> >> On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
> >> >> > On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
> >> >> >>
> >> >> >
> >> >> > Hi Umesh,
> >> >> >
> >> >> >> OA reports in the OA buffer contain an OA timestamp field that helps
> >> >> >> user calculate delta between 2 OA reports. The calculation relies on the
> >> >> >> CS timestamp frequency to convert the timestamp value to nanoseconds.
> >> >> >> The CS timestamp frequency is a function of the CTC_SHIFT value in
> >> >> >> RPM_CONFIG0.
> >> >> >>
> >> >> >> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
> >> >> >> actual value from RPM_CONFIG0. At the user level, this results in an
> >> >> >> error in calculating delta between 2 OA reports since the OA timestamp
> >> >> >> is not shifted in the same manner as CS timestamp.
> >> >> >>
> >> >> >> To resolve this, return actual OA timestamp frequency to the user in
> >> >> >> i915_getparam_ioctl.
> >> >> >
> >> >> > Rather than exposing actual OA timestamp frequency to userspace (with the
> >> >> > corresponding uapi change, specially if it's only DG2 and not all future
> >> >> > products) questions about a couple of other options:
> >> >> >
> >> >> > Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
> >> >> >          same as OA freq :-)
> >> >> >
> >> >> >   The HSD seems to mention this:
> >> >> >   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
> >> >> >   Note: Changing the shift setting on live driver may break apps that are
> >> >> >   currently running (including desktop manager).
> >> >> >
> >> >> > Option 2. Is it possible to correct the timestamps in OA report headers to
> >> >> >          compensate for the difference between OA and GT frequencies (say when
> >> >> >          copying OA data to userspace)?
> >> >> >
> >> >> >	  Though not sure if this is preferable to having userspace do this.
> >> >>
> >> >> It does affect other platforms too. There's no guarantee on what the
> >> >> CTC_SHIFT value would be for different platforms, so user would have to at
> >> >> least query that somehow (maybe from i915). It's simpler for user to use
> >> >> the exported OA frequency since it is also backwards compatible.
> >> >
> >> > Is Option 2 above feasible since it would stop propagating the change to
> >> > various UMD's?
> >>
> >> Hmm, there is logic today that squashes context ids when doing oa buffer
> >> filtering, but it does that on selective reports (i.e. if a gem_context is
> >> passed).
> >>
> >> For this issue: for a 16MB OA buffer with 256 byte reports, that would be
> >> an additional write of 262144 in the kmd (to smem). For 20us sampled OA
> >> reports, it would be approx. 195 KB/s. Shouldn't be too much. Only 2
> >> concerns:
> >>
> >> - the mmapped use case may break, but I don't see that being upstreamed.
> >> We may have divergent solutions for upstream and internal.
> >> - blocking/polling tests in IGT will be sensitive to this change on some
> >> platforms and may need to be bolstered.
> >
> > If this correction/compensation in the kernel works out, even for internal
> > too we could do the following:
> >
> > * For non-mmaped case, do the correction in the kernel and expose OA freq
> >  == GT freq (in the getparam ioctl)
> > * For mmaped case expose the actual OA freq (!= GT freq)
> >
> > This will restrict the divergence only to the mmaped case (which we will
> > probably not be able to upstream).
> >
> >>
> >> I will give it a shot and get back,
>
> We cannot tweak this in the OA report header since that will be out of sync
> with the counters in the report. The other issue here is that the bug also
> applies to MI_REPORT_PERF_COUNT, and KMD cannot do anything to fix that. I
> would think this interface is the clean way to do this.

In that case this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

May still need a uapi change ack but that's separate.

>
> Thanks,
> Umesh
>
> >>
> >> Thanks,
> >> Umesh
> >>
> >> >
> >> >> https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is
> >> >> consumed by GPUvis. That reminds me, I should include the UMD links for the
> >> >> patches with uapi changes.
> >> >
> >> > I was thinking more about UMD's which analayze OA data and who till now are
> >> > probably assuming OA freq == GT freq and will now have to drop that
> >> > assumption. So not sure how widespread would be these changes in
> >> > the (multiple different?) UMD(s).
> >> >
> >> > Thanks.
> >> > --
> >> > Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-15 22:49             ` Umesh Nerlige Ramappa
@ 2022-09-20  3:22               ` Dixit, Ashutosh
  2022-09-22  3:51                 ` Dixit, Ashutosh
  0 siblings, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-20  3:22 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Thu, 15 Sep 2022 15:49:27 -0700, Umesh Nerlige Ramappa wrote:
>
> On Wed, Sep 14, 2022 at 04:13:41PM -0700, Umesh Nerlige Ramappa wrote:
> > On Wed, Sep 14, 2022 at 03:26:15PM -0700, Umesh Nerlige Ramappa wrote:
> >> On Tue, Sep 06, 2022 at 09:39:33PM +0300, Lionel Landwerlin wrote:
> >>> On 06/09/2022 20:39, Umesh Nerlige Ramappa wrote:
> >>>> On Tue, Sep 06, 2022 at 05:33:00PM +0300, Lionel Landwerlin wrote:
> >>>>> On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
> >>>>>> With GuC mode of submission, GuC is in control of defining the
> >>>>>> context id field
> >>>>>> that is part of the OA reports. To filter reports, UMD and KMD must
> >>>>>> know what sw
> >>>>>> context id was chosen by GuC. There is not interface between KMD and
> >>>>>> GuC to
> >>>>>> determine this, so read the upper-dword of EXECLIST_STATUS to
> >>>>>> filter/squash OA
> >>>>>> reports for the specific context.
> >>>>>>
> >>>>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >>>>>
> >>>>>
> >>>>> I assume you checked with GuC that this doesn't change as the context
> >>>>> is running?
> >>>>
> >>>> Correct.
> >>>>
> >>>>>
> >>>>> With i915/execlist submission mode, we had to ask i915 to pin the
> >>>>> sw_id/ctx_id.
> >>>>>
> >>>>
> >>>> From GuC perspective, the context id can change once KMD de-registers
> >>>> the context and that will not happen while the context is in use.
> >>>>
> >>>> Thanks,
> >>>> Umesh
> >>>
> >>>
> >>> Thanks Umesh,
> >>>
> >>>
> >>> Maybe I should have been more precise in my question :
> >>>
> >>>
> >>> Can the ID change while the i915-perf stream is opened?
> >>>
> >>> Because the ID not changing while the context is running makes sense.
> >>>
> >>> But since the number of available IDs is limited to 2k or something on
> >>> Gfx12, it's possible the GuC has to reuse IDs if too many apps want to
> >>> run during the period of time while i915-perf is active and filtering.
> >>>
> >>
> >> available guc ids are 64k with 4k reserved for multi-lrc, so GuC may
> >> have to reuse ids once 60k ids are used up.
> >
> > Spoke to the GuC team again and if there are a lot of contexts (> 60K)
> > running, there is a possibility of the context id being recycled. In that
> > case, the capture would be broken. I would track this as a separate JIRA
> > and follow up on a solution.
> >
> > From OA use case perspective, are we interested in monitoring just one
> > hardware context? If we make sure this context is not stolen, are we
> > good?
> >
>
> + John
>
> Based on John's inputs - if a context is pinned, then KMD does not steal
> it's id. It would just look for something else or wait for a context to be
> available (pin count 0 I believe).
>
> Since we pin the context for the duration of the OA use case, we should be
> good here.

Since this appears to be true I am thinking of okay'ing this patch rather
than define a new interface with GuC for this. Let me know if there are any
objections about this.

Thanks.
--
Ashutosh

> >>> -Lionel
> >>>
> >>>
> >>>>
> >>>>>
> >>>>> If that's not the case then filtering is broken.
> >>>>>
> >>>>>
> >>>>> -Lionel
> >>>>>
> >>>>>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-09 23:47   ` Dixit, Ashutosh
  2022-09-13  3:08     ` Dixit, Ashutosh
  2022-09-14 23:36     ` Umesh Nerlige Ramappa
@ 2022-09-22  3:44     ` Dixit, Ashutosh
  2022-09-22  3:49       ` Dixit, Ashutosh
  2 siblings, 1 reply; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-22  3:44 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 09 Sep 2022 16:47:36 -0700, Dixit, Ashutosh wrote:
>
> On Tue, 23 Aug 2022 13:41:37 -0700, Umesh Nerlige Ramappa wrote:
> >
> > +/*
> > + * For execlist mode of submission, pick an unused context id
> > + * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
> > + * XXX_MAX_CONTEXT_HW_ID is used by idle context
> > + *
> > + * For GuC mode of submission read context id from the upper dword of the
> > + * EXECLIST_STATUS register.
> > + */
> > +static int gen12_get_render_context_id(struct i915_perf_stream *stream)
> > +{
> > +	u32 ctx_id, mask;
> > +	int ret;
> > +
> > +	if (intel_engine_uses_guc(stream->engine)) {
> > +		ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
> > +		if (ret)
> > +			return ret;
> > +
> > +		mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
> > +			(GEN12_GUC_SW_CTX_ID_SHIFT - 32);
> > +	} else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) {
> > +		ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
> > +			(XEHP_SW_CTX_ID_SHIFT - 32);
> > +
> > +		mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
> > +			(XEHP_SW_CTX_ID_SHIFT - 32);
> > +	} else {
> > +		ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
> > +			 (GEN11_SW_CTX_ID_SHIFT - 32);
> > +
> > +		mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
> > +			(GEN11_SW_CTX_ID_SHIFT - 32);
>
> Previously I missed that these ctx_id's for non-GuC cases are just
> constants. How does it work in these cases?

For the record, offline reply from Umesh for this question:

Looks like the SW context id is set to a unique value by the KMD for
execlist mode here - __execlists_schedule_in() as ccid. Later it is written
to the execlist port here (as lrc.desc) - execlists_submit_ports(). It's
just a unique value that the kmd determines. For OA we are setting a
ce->tag when OA use case is active and it used by the
__execlists_schedule_in().

Related commit from Chris - 2935ed5339c49

I think the reason why OA is setting it is because this value is not
assigned until __execlists_schedule_in() is called. For OA context, this
may happen much later. The code that Chris has added, just assigns a value
in OA and then uses it later in the __execlists_schedule_in() path.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-22  3:44     ` Dixit, Ashutosh
@ 2022-09-22  3:49       ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-22  3:49 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Wed, 21 Sep 2022 20:44:57 -0700, Dixit, Ashutosh wrote:
>
> On Fri, 09 Sep 2022 16:47:36 -0700, Dixit, Ashutosh wrote:
> >
> > On Tue, 23 Aug 2022 13:41:37 -0700, Umesh Nerlige Ramappa wrote:
> > >
> > > +/*
> > > + * For execlist mode of submission, pick an unused context id
> > > + * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
> > > + * XXX_MAX_CONTEXT_HW_ID is used by idle context
> > > + *
> > > + * For GuC mode of submission read context id from the upper dword of the
> > > + * EXECLIST_STATUS register.
> > > + */
> > > +static int gen12_get_render_context_id(struct i915_perf_stream *stream)
> > > +{
> > > +	u32 ctx_id, mask;
> > > +	int ret;
> > > +
> > > +	if (intel_engine_uses_guc(stream->engine)) {
> > > +		ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
> > > +		if (ret)
> > > +			return ret;
> > > +
> > > +		mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
> > > +			(GEN12_GUC_SW_CTX_ID_SHIFT - 32);
> > > +	} else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) {
> > > +		ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
> > > +			(XEHP_SW_CTX_ID_SHIFT - 32);
> > > +
> > > +		mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
> > > +			(XEHP_SW_CTX_ID_SHIFT - 32);
> > > +	} else {
> > > +		ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
> > > +			 (GEN11_SW_CTX_ID_SHIFT - 32);
> > > +
> > > +		mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
> > > +			(GEN11_SW_CTX_ID_SHIFT - 32);
> >
> > Previously I missed that these ctx_id's for non-GuC cases are just
> > constants. How does it work in these cases?
>
> For the record, offline reply from Umesh for this question:
>
> Looks like the SW context id is set to a unique value by the KMD for
> execlist mode here - __execlists_schedule_in() as ccid. Later it is written
> to the execlist port here (as lrc.desc) - execlists_submit_ports(). It's
> just a unique value that the kmd determines. For OA we are setting a
> ce->tag when OA use case is active and it used by the
> __execlists_schedule_in().
>
> Related commit from Chris - 2935ed5339c49
>
> I think the reason why OA is setting it is because this value is not
> assigned until __execlists_schedule_in() is called. For OA context, this
> may happen much later. The code that Chris has added, just assigns a value
> in OA and then uses it later in the __execlists_schedule_in() path.

I would still think this should not be a constant value but something which
depends on the context or the context id. Anyway since this is a
pre-existing issue not introducd in this patch, I will disregard this and
continue reviewing this patch.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-20  3:22               ` Dixit, Ashutosh
@ 2022-09-22  3:51                 ` Dixit, Ashutosh
  0 siblings, 0 replies; 84+ messages in thread
From: Dixit, Ashutosh @ 2022-09-22  3:51 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Mon, 19 Sep 2022 20:22:40 -0700, Dixit, Ashutosh wrote:
>
> On Thu, 15 Sep 2022 15:49:27 -0700, Umesh Nerlige Ramappa wrote:
> >
> > On Wed, Sep 14, 2022 at 04:13:41PM -0700, Umesh Nerlige Ramappa wrote:
> > > On Wed, Sep 14, 2022 at 03:26:15PM -0700, Umesh Nerlige Ramappa wrote:
> > >> On Tue, Sep 06, 2022 at 09:39:33PM +0300, Lionel Landwerlin wrote:
> > >>> On 06/09/2022 20:39, Umesh Nerlige Ramappa wrote:
> > >>>> On Tue, Sep 06, 2022 at 05:33:00PM +0300, Lionel Landwerlin wrote:
> > >>>>> On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
> > >>>>>> With GuC mode of submission, GuC is in control of defining the
> > >>>>>> context id field
> > >>>>>> that is part of the OA reports. To filter reports, UMD and KMD must
> > >>>>>> know what sw
> > >>>>>> context id was chosen by GuC. There is not interface between KMD and
> > >>>>>> GuC to
> > >>>>>> determine this, so read the upper-dword of EXECLIST_STATUS to
> > >>>>>> filter/squash OA
> > >>>>>> reports for the specific context.
> > >>>>>>
> > >>>>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > >>>>>
> > >>>>>
> > >>>>> I assume you checked with GuC that this doesn't change as the context
> > >>>>> is running?
> > >>>>
> > >>>> Correct.
> > >>>>
> > >>>>>
> > >>>>> With i915/execlist submission mode, we had to ask i915 to pin the
> > >>>>> sw_id/ctx_id.
> > >>>>>
> > >>>>
> > >>>> From GuC perspective, the context id can change once KMD de-registers
> > >>>> the context and that will not happen while the context is in use.
> > >>>>
> > >>>> Thanks,
> > >>>> Umesh
> > >>>
> > >>>
> > >>> Thanks Umesh,
> > >>>
> > >>>
> > >>> Maybe I should have been more precise in my question :
> > >>>
> > >>>
> > >>> Can the ID change while the i915-perf stream is opened?
> > >>>
> > >>> Because the ID not changing while the context is running makes sense.
> > >>>
> > >>> But since the number of available IDs is limited to 2k or something on
> > >>> Gfx12, it's possible the GuC has to reuse IDs if too many apps want to
> > >>> run during the period of time while i915-perf is active and filtering.
> > >>>
> > >>
> > >> available guc ids are 64k with 4k reserved for multi-lrc, so GuC may
> > >> have to reuse ids once 60k ids are used up.
> > >
> > > Spoke to the GuC team again and if there are a lot of contexts (> 60K)
> > > running, there is a possibility of the context id being recycled. In that
> > > case, the capture would be broken. I would track this as a separate JIRA
> > > and follow up on a solution.
> > >
> > > From OA use case perspective, are we interested in monitoring just one
> > > hardware context? If we make sure this context is not stolen, are we
> > > good?
> > >
> >
> > + John
> >
> > Based on John's inputs - if a context is pinned, then KMD does not steal
> > it's id. It would just look for something else or wait for a context to be
> > available (pin count 0 I believe).
> >
> > Since we pin the context for the duration of the OA use case, we should be
> > good here.
>
> Since this appears to be true I am thinking of okay'ing this patch rather
> than define a new interface with GuC for this. Let me know if there are any
> objections about this.

With the above comments/assumptions this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-14 23:13           ` Umesh Nerlige Ramappa
  2022-09-15 22:49             ` Umesh Nerlige Ramappa
@ 2022-09-22 11:05             ` Lionel Landwerlin
  1 sibling, 0 replies; 84+ messages in thread
From: Lionel Landwerlin @ 2022-09-22 11:05 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On 15/09/2022 02:13, Umesh Nerlige Ramappa wrote:
> On Wed, Sep 14, 2022 at 03:26:15PM -0700, Umesh Nerlige Ramappa wrote:
>> On Tue, Sep 06, 2022 at 09:39:33PM +0300, Lionel Landwerlin wrote:
>>> On 06/09/2022 20:39, Umesh Nerlige Ramappa wrote:
>>>> On Tue, Sep 06, 2022 at 05:33:00PM +0300, Lionel Landwerlin wrote:
>>>>> On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote:
>>>>>> With GuC mode of submission, GuC is in control of defining the 
>>>>>> context id field
>>>>>> that is part of the OA reports. To filter reports, UMD and KMD 
>>>>>> must know what sw
>>>>>> context id was chosen by GuC. There is not interface between KMD 
>>>>>> and GuC to
>>>>>> determine this, so read the upper-dword of EXECLIST_STATUS to 
>>>>>> filter/squash OA
>>>>>> reports for the specific context.
>>>>>>
>>>>>> Signed-off-by: Umesh Nerlige Ramappa 
>>>>>> <umesh.nerlige.ramappa@intel.com>
>>>>>
>>>>>
>>>>> I assume you checked with GuC that this doesn't change as the 
>>>>> context is running?
>>>>
>>>> Correct.
>>>>
>>>>>
>>>>> With i915/execlist submission mode, we had to ask i915 to pin the 
>>>>> sw_id/ctx_id.
>>>>>
>>>>
>>>> From GuC perspective, the context id can change once KMD 
>>>> de-registers the context and that will not happen while the context 
>>>> is in use.
>>>>
>>>> Thanks,
>>>> Umesh
>>>
>>>
>>> Thanks Umesh,
>>>
>>>
>>> Maybe I should have been more precise in my question :
>>>
>>>
>>> Can the ID change while the i915-perf stream is opened?
>>>
>>> Because the ID not changing while the context is running makes sense.
>>>
>>> But since the number of available IDs is limited to 2k or something 
>>> on Gfx12, it's possible the GuC has to reuse IDs if too many apps 
>>> want to run during the period of time while i915-perf is active and 
>>> filtering.
>>>
>>
>> available guc ids are 64k with 4k reserved for multi-lrc, so GuC may 
>> have to reuse ids once 60k ids are used up.
>
> Spoke to the GuC team again and if there are a lot of contexts (> 60K) 
> running, there is a possibility of the context id being recycled. In 
> that case, the capture would be broken. I would track this as a 
> separate JIRA and follow up on a solution.
>
> From OA use case perspective, are we interested in monitoring just one 
> hardware context? If we make sure this context is not stolen, are we 
> good?
> Thanks,
> Umesh


Yep, we only care about that one ID not changing.


Thanks,

-Lionel


>
>>
>> Thanks,
>> Umesh
>>
>>>
>>> -Lionel
>>>
>>>
>>>>
>>>>>
>>>>> If that's not the case then filtering is broken.
>>>>>
>>>>>
>>>>> -Lionel
>>>>>
>>>>>
>>>>>> ---
>>>>>>  drivers/gpu/drm/i915/gt/intel_lrc.h |   2 +
>>>>>>  drivers/gpu/drm/i915/i915_perf.c    | 141 
>>>>>> ++++++++++++++++++++++++----
>>>>>>  2 files changed, 124 insertions(+), 19 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_lrc.h
>>>>>> index a390f0813c8b..7111bae759f3 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.h
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
>>>>>> @@ -110,6 +110,8 @@ enum {
>>>>>>  #define XEHP_SW_CTX_ID_WIDTH            16
>>>>>>  #define XEHP_SW_COUNTER_SHIFT            58
>>>>>>  #define XEHP_SW_COUNTER_WIDTH            6
>>>>>> +#define GEN12_GUC_SW_CTX_ID_SHIFT        39
>>>>>> +#define GEN12_GUC_SW_CTX_ID_WIDTH        16
>>>>>>  static inline void lrc_runtime_start(struct intel_context *ce)
>>>>>>  {
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>> index f3c23fe9ad9c..735244a3aedd 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>> @@ -1233,6 +1233,125 @@ static struct intel_context 
>>>>>> *oa_pin_context(struct i915_perf_stream *stream)
>>>>>>      return stream->pinned_ctx;
>>>>>>  }
>>>>>> +static int
>>>>>> +__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 
>>>>>> ggtt_offset)
>>>>>> +{
>>>>>> +    u32 *cs, cmd;
>>>>>> +
>>>>>> +    cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
>>>>>> +    if (GRAPHICS_VER(rq->engine->i915) >= 8)
>>>>>> +        cmd++;
>>>>>> +
>>>>>> +    cs = intel_ring_begin(rq, 4);
>>>>>> +    if (IS_ERR(cs))
>>>>>> +        return PTR_ERR(cs);
>>>>>> +
>>>>>> +    *cs++ = cmd;
>>>>>> +    *cs++ = i915_mmio_reg_offset(reg);
>>>>>> +    *cs++ = ggtt_offset;
>>>>>> +    *cs++ = 0;
>>>>>> +
>>>>>> +    intel_ring_advance(rq, cs);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int
>>>>>> +__read_reg(struct intel_context *ce, i915_reg_t reg, u32 
>>>>>> ggtt_offset)
>>>>>> +{
>>>>>> +    struct i915_request *rq;
>>>>>> +    int err;
>>>>>> +
>>>>>> +    rq = i915_request_create(ce);
>>>>>> +    if (IS_ERR(rq))
>>>>>> +        return PTR_ERR(rq);
>>>>>> +
>>>>>> +    i915_request_get(rq);
>>>>>> +
>>>>>> +    err = __store_reg_to_mem(rq, reg, ggtt_offset);
>>>>>> +
>>>>>> +    i915_request_add(rq);
>>>>>> +    if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
>>>>>> +        err = -ETIME;
>>>>>> +
>>>>>> +    i915_request_put(rq);
>>>>>> +
>>>>>> +    return err;
>>>>>> +}
>>>>>> +
>>>>>> +static int
>>>>>> +gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
>>>>>> +{
>>>>>> +    struct i915_vma *scratch;
>>>>>> +    u32 *val;
>>>>>> +    int err;
>>>>>> +
>>>>>> +    scratch = 
>>>>>> __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4);
>>>>>> +    if (IS_ERR(scratch))
>>>>>> +        return PTR_ERR(scratch);
>>>>>> +
>>>>>> +    err = i915_vma_sync(scratch);
>>>>>> +    if (err)
>>>>>> +        goto err_scratch;
>>>>>> +
>>>>>> +    err = __read_reg(ce, 
>>>>>> RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
>>>>>> +             i915_ggtt_offset(scratch));
>>>>>> +    if (err)
>>>>>> +        goto err_scratch;
>>>>>> +
>>>>>> +    val = i915_gem_object_pin_map_unlocked(scratch->obj, 
>>>>>> I915_MAP_WB);
>>>>>> +    if (IS_ERR(val)) {
>>>>>> +        err = PTR_ERR(val);
>>>>>> +        goto err_scratch;
>>>>>> +    }
>>>>>> +
>>>>>> +    *ctx_id = *val;
>>>>>> +    i915_gem_object_unpin_map(scratch->obj);
>>>>>> +
>>>>>> +err_scratch:
>>>>>> +    i915_vma_unpin_and_release(&scratch, 0);
>>>>>> +    return err;
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * For execlist mode of submission, pick an unused context id
>>>>>> + * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
>>>>>> + * XXX_MAX_CONTEXT_HW_ID is used by idle context
>>>>>> + *
>>>>>> + * For GuC mode of submission read context id from the upper 
>>>>>> dword of the
>>>>>> + * EXECLIST_STATUS register.
>>>>>> + */
>>>>>> +static int gen12_get_render_context_id(struct i915_perf_stream 
>>>>>> *stream)
>>>>>> +{
>>>>>> +    u32 ctx_id, mask;
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    if (intel_engine_uses_guc(stream->engine)) {
>>>>>> +        ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
>>>>>> +        if (ret)
>>>>>> +            return ret;
>>>>>> +
>>>>>> +        mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
>>>>>> +            (GEN12_GUC_SW_CTX_ID_SHIFT - 32);
>>>>>> +    } else if (GRAPHICS_VER_FULL(stream->engine->i915) >= 
>>>>>> IP_VER(12, 50)) {
>>>>>> +        ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
>>>>>> +            (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>>> +
>>>>>> +        mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>>>>>> +            (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>>> +    } else {
>>>>>> +        ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
>>>>>> +             (GEN11_SW_CTX_ID_SHIFT - 32);
>>>>>> +
>>>>>> +        mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
>>>>>> +            (GEN11_SW_CTX_ID_SHIFT - 32);
>>>>>> +    }
>>>>>> +    stream->specific_ctx_id = ctx_id & mask;
>>>>>> +    stream->specific_ctx_id_mask = mask;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>>  /**
>>>>>>   * oa_get_render_ctx_id - determine and hold ctx hw id
>>>>>>   * @stream: An i915-perf stream opened for OA metrics
>>>>>> @@ -1246,6 +1365,7 @@ static struct intel_context 
>>>>>> *oa_pin_context(struct i915_perf_stream *stream)
>>>>>>  static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>>>>>  {
>>>>>>      struct intel_context *ce;
>>>>>> +    int ret = 0;
>>>>>>      ce = oa_pin_context(stream);
>>>>>>      if (IS_ERR(ce))
>>>>>> @@ -1292,24 +1412,7 @@ static int oa_get_render_ctx_id(struct 
>>>>>> i915_perf_stream *stream)
>>>>>>      case 11:
>>>>>>      case 12:
>>>>>> -        if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 
>>>>>> 50)) {
>>>>>> -            stream->specific_ctx_id_mask =
>>>>>> -                ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
>>>>>> -                (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>>> -            stream->specific_ctx_id =
>>>>>> -                (XEHP_MAX_CONTEXT_HW_ID - 1) <<
>>>>>> -                (XEHP_SW_CTX_ID_SHIFT - 32);
>>>>>> -        } else {
>>>>>> -            stream->specific_ctx_id_mask =
>>>>>> -                ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << 
>>>>>> (GEN11_SW_CTX_ID_SHIFT - 32);
>>>>>> -            /*
>>>>>> -             * Pick an unused context id
>>>>>> -             * 0 - BITS_PER_LONG are used by other contexts
>>>>>> -             * GEN12_MAX_CONTEXT_HW_ID (0x7ff) is used by idle 
>>>>>> context
>>>>>> -             */
>>>>>> -            stream->specific_ctx_id =
>>>>>> -                (GEN12_MAX_CONTEXT_HW_ID - 1) << 
>>>>>> (GEN11_SW_CTX_ID_SHIFT - 32);
>>>>>> -        }
>>>>>> +        ret = gen12_get_render_context_id(stream);
>>>>>>          break;
>>>>>>      default:
>>>>>> @@ -1323,7 +1426,7 @@ static int oa_get_render_ctx_id(struct 
>>>>>> i915_perf_stream *stream)
>>>>>>          stream->specific_ctx_id,
>>>>>>          stream->specific_ctx_id_mask);
>>>>>> -    return 0;
>>>>>> +    return ret;
>>>>>>  }
>>>>>>  /**
>>>>>
>>>>>
>>>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
  2022-08-23  0:03 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
@ 2022-08-23  0:03 ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 84+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-08-23  0:03 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit

Add new OA formats for DG2. Some of the newer OA formats are not
multples of 64 bytes and are not powers of 2. For those formats, adjust
hw_tail accordingly when checking for new reports.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramampa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 63 ++++++++++++++++++++------------
 include/uapi/drm/i915_drm.h      |  6 +++
 2 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 735244a3aedd..c8331b549d31 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -306,7 +306,8 @@ static u32 i915_oa_max_sample_rate = 100000;
 
 /* XXX: beware if future OA HW adds new report formats that the current
  * code assumes all reports have a power-of-two size and ~(size - 1) can
- * be used as a mask to align the OA tail pointer.
+ * be used as a mask to align the OA tail pointer. In some of the
+ * formats, R is used to denote reserved field.
  */
 static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A13]	    = { 0, 64 },
@@ -320,6 +321,10 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A12]		    = { 0, 64 },
 	[I915_OA_FORMAT_A12_B8_C8]	    = { 2, 128 },
 	[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
+	[I915_OAR_FORMAT_A32u40_A4u32_B8_C8]    = { 5, 256 },
+	[I915_OA_FORMAT_A24u40_A14u32_B8_C8]    = { 5, 256 },
+	[I915_OAR_FORMAT_A36u64_B8_C8]		= { 1, 384 },
+	[I915_OA_FORMAT_A38u64_R2u64_B8_C8]	= { 1, 448 },
 };
 
 #define SAMPLE_OA_REPORT      (1<<0)
@@ -467,6 +472,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	bool pollin;
 	u32 hw_tail;
 	u64 now;
+	u32 partial_report_size;
 
 	/* We have to consider the (unlikely) possibility that read() errors
 	 * could result in an OA buffer reset which might reset the head and
@@ -476,10 +482,16 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 
 	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
 
-	/* The tail pointer increases in 64 byte increments,
-	 * not in report_size steps...
+	/* The tail pointer increases in 64 byte increments, whereas report
+	 * sizes need not be integral multiples or 64 or powers of 2.
+	 * Compute potentially partially landed report in the OA buffer
 	 */
-	hw_tail &= ~(report_size - 1);
+	partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail);
+	partial_report_size %= report_size;
+
+	/* Subtract partial amount off the tail */
+	hw_tail = gtt_offset + ((hw_tail - partial_report_size) &
+				(stream->oa_buffer.vma->size - 1));
 
 	now = ktime_get_mono_fast_ns();
 
@@ -601,6 +613,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 {
 	int report_size = stream->oa_buffer.format_size;
 	struct drm_i915_perf_record_header header;
+	int report_size_partial;
+	u8 *oa_buf_end;
 
 	header.type = DRM_I915_PERF_RECORD_SAMPLE;
 	header.pad = 0;
@@ -614,7 +628,19 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 		return -EFAULT;
 	buf += sizeof(header);
 
-	if (copy_to_user(buf, report, report_size))
+	oa_buf_end = stream->oa_buffer.vaddr +
+		     stream->oa_buffer.vma->size;
+	report_size_partial = oa_buf_end - report;
+
+	if (report_size_partial < report_size) {
+		if(copy_to_user(buf, report, report_size_partial))
+			return -EFAULT;
+		buf += report_size_partial;
+
+		if(copy_to_user(buf, stream->oa_buffer.vaddr,
+				report_size - report_size_partial))
+			return -EFAULT;
+	} else if (copy_to_user(buf, report, report_size))
 		return -EFAULT;
 
 	(*offset) += header.size;
@@ -684,8 +710,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	 * all a power of two).
 	 */
 	if (drm_WARN_ONCE(&uncore->i915->drm,
-			  head > OA_BUFFER_SIZE || head % report_size ||
-			  tail > OA_BUFFER_SIZE || tail % report_size,
+			  head > stream->oa_buffer.vma->size ||
+			  tail > stream->oa_buffer.vma->size,
 			  "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
 			  head, tail))
 		return -EIO;
@@ -699,22 +725,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		u32 ctx_id;
 		u32 reason;
 
-		/*
-		 * All the report sizes factor neatly into the buffer
-		 * size so we never expect to see a report split
-		 * between the beginning and end of the buffer.
-		 *
-		 * Given the initial alignment check a misalignment
-		 * here would imply a driver bug that would result
-		 * in an overrun.
-		 */
-		if (drm_WARN_ON(&uncore->i915->drm,
-				(OA_BUFFER_SIZE - head) < report_size)) {
-			drm_err(&uncore->i915->drm,
-				"Spurious OA head ptr: non-integral report offset\n");
-			break;
-		}
-
 		/*
 		 * The reason field includes flags identifying what
 		 * triggered this specific report (mostly timer
@@ -4513,6 +4523,13 @@ static void oa_init_supported_formats(struct i915_perf *perf)
 		oa_format_add(perf, I915_OA_FORMAT_C4_B8);
 		break;
 
+	case INTEL_DG2:
+		oa_format_add(perf, I915_OAR_FORMAT_A32u40_A4u32_B8_C8);
+		oa_format_add(perf, I915_OA_FORMAT_A24u40_A14u32_B8_C8);
+		oa_format_add(perf, I915_OAR_FORMAT_A36u64_B8_C8);
+		oa_format_add(perf, I915_OA_FORMAT_A38u64_R2u64_B8_C8);
+		break;
+
 	default:
 		MISSING_CASE(platform);
 	}
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 094f6e377793..9168412e0da8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2650,6 +2650,12 @@ enum drm_i915_oa_format {
 	I915_OA_FORMAT_A12_B8_C8,
 	I915_OA_FORMAT_A32u40_A4u32_B8_C8,
 
+	/* DG2 */
+	I915_OAR_FORMAT_A32u40_A4u32_B8_C8,
+	I915_OA_FORMAT_A24u40_A14u32_B8_C8,
+	I915_OAR_FORMAT_A36u64_B8_C8,
+	I915_OA_FORMAT_A38u64_R2u64_B8_C8,
+
 	I915_OA_FORMAT_MAX	    /* non-ABI */
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2022-09-22 11:05 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-23 20:41 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
2022-08-23 20:41 ` [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
2022-09-06 14:33   ` Lionel Landwerlin
2022-09-06 17:39     ` Umesh Nerlige Ramappa
2022-09-06 18:39       ` Lionel Landwerlin
2022-09-14 22:26         ` Umesh Nerlige Ramappa
2022-09-14 23:13           ` Umesh Nerlige Ramappa
2022-09-15 22:49             ` Umesh Nerlige Ramappa
2022-09-20  3:22               ` Dixit, Ashutosh
2022-09-22  3:51                 ` Dixit, Ashutosh
2022-09-22 11:05             ` Lionel Landwerlin
2022-09-09 23:47   ` Dixit, Ashutosh
2022-09-13  3:08     ` Dixit, Ashutosh
2022-09-14 23:37       ` Umesh Nerlige Ramappa
2022-09-14 23:36     ` Umesh Nerlige Ramappa
2022-09-22  3:44     ` Dixit, Ashutosh
2022-09-22  3:49       ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2 Umesh Nerlige Ramappa
2022-09-06 19:35   ` Lionel Landwerlin
2022-09-06 19:46     ` Umesh Nerlige Ramappa
2022-09-06 19:59       ` Lionel Landwerlin
2022-09-13 15:40   ` Dixit, Ashutosh
2022-09-14 20:54     ` Umesh Nerlige Ramappa
2022-09-14 21:16       ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 03/19] drm/i915/perf: Fix noa wait predication " Umesh Nerlige Ramappa
2022-09-20  0:35   ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
2022-09-06 19:48   ` Lionel Landwerlin
2022-09-06 20:35     ` Umesh Nerlige Ramappa
2022-09-08 18:32       ` Lionel Landwerlin
2022-09-08 23:04         ` Umesh Nerlige Ramappa
2022-08-23 20:41 ` [Intel-gfx] [PATCH 05/19] drm/i915/perf: Enable commands per clock reporting in OA Umesh Nerlige Ramappa
2022-09-06 19:51   ` Lionel Landwerlin
2022-09-14  0:19   ` Dixit, Ashutosh
2022-09-15  0:04     ` Umesh Nerlige Ramappa
2022-08-23 20:41 ` [Intel-gfx] [PATCH 06/19] drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size Umesh Nerlige Ramappa
2022-09-14 16:04   ` Dixit, Ashutosh
2022-09-14 18:19     ` Umesh Nerlige Ramappa
2022-09-14 19:07       ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 07/19] drm/i915/perf: Simply use stream->ctx Umesh Nerlige Ramappa
2022-09-06 19:52   ` Lionel Landwerlin
2022-08-23 20:41 ` [Intel-gfx] [PATCH 08/19] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
2022-09-06 19:54   ` Lionel Landwerlin
2022-09-14 18:20   ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 09/19] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops Umesh Nerlige Ramappa
2022-09-14 19:04   ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 10/19] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers Umesh Nerlige Ramappa
2022-09-06 19:56   ` Lionel Landwerlin
2022-09-06 20:28     ` Umesh Nerlige Ramappa
2022-09-06 20:31       ` Lionel Landwerlin
2022-08-23 20:41 ` [Intel-gfx] [PATCH 11/19] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
2022-09-06 19:56   ` Lionel Landwerlin
2022-09-14 20:43   ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 12/19] drm/i915/perf: Parse 64bit report header formats correctly Umesh Nerlige Ramappa
2022-09-16  0:47   ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 13/19] drm/i915/perf: Add Wa_16010703925:dg2 Umesh Nerlige Ramappa
2022-09-16  1:08   ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 14/19] drm/i915/perf: Add Wa_1608133521:dg2 Umesh Nerlige Ramappa
2022-08-29 14:04   ` Jani Nikula
2022-09-16  1:21   ` Dixit, Ashutosh
2022-09-16 18:19     ` Umesh Nerlige Ramappa
2022-08-23 20:41 ` [Intel-gfx] [PATCH 15/19] drm/i915/perf: Add Wa_1508761755:dg2 Umesh Nerlige Ramappa
2022-09-16  1:34   ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 16/19] drm/i915/perf: Apply Wa_18013179988 Umesh Nerlige Ramappa
2022-09-16  5:16   ` Dixit, Ashutosh
2022-09-16 15:22     ` Dixit, Ashutosh
2022-09-16 19:04       ` Umesh Nerlige Ramappa
2022-09-16 18:56     ` Umesh Nerlige Ramappa
2022-09-16 19:57       ` Dixit, Ashutosh
2022-09-16 20:25         ` Umesh Nerlige Ramappa
2022-09-16 21:00           ` Dixit, Ashutosh
2022-09-19 21:21             ` Umesh Nerlige Ramappa
2022-09-20  1:24               ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 17/19] drm/i915/perf: Save/restore EU flex counters across reset Umesh Nerlige Ramappa
2022-09-16  5:40   ` Dixit, Ashutosh
2022-08-23 20:41 ` [Intel-gfx] [PATCH 18/19] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
2022-09-16 21:41   ` Dixit, Ashutosh
2022-09-16 21:48     ` Umesh Nerlige Ramappa
2022-08-23 20:41 ` [Intel-gfx] [PATCH 19/19] drm/i915/perf: Enable OA for DG2 Umesh Nerlige Ramappa
2022-08-23 21:11 ` [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats " Umesh Nerlige Ramappa
2022-08-23 21:12 ` [Intel-gfx] [PATCH 19/19] drm/i915/perf: Enable OA " Umesh Nerlige Ramappa
2022-08-23 22:07 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev2) Patchwork
2022-08-23 22:07 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2022-08-23  0:03 [Intel-gfx] [PATCH 00/19] Add DG2 OA support Umesh Nerlige Ramappa
2022-08-23  0:03 ` [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2 Umesh Nerlige Ramappa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.