All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support
@ 2022-09-23 20:11 Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 01/15] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
                   ` (15 more replies)
  0 siblings, 16 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

Add OA format support for DG2 and various fixes for DG2.

This series has 2 uapi changes listed below:

1) drm/i915/perf: Add OAG and OAR formats for DG2

DG2 has new OA formats defined that can be selected by the
user. The UMD changes that are consumed by GPUvis are:
https://patchwork.freedesktop.org/patch/504456/?series=107633&rev=5

2) drm/i915/perf: Apply Wa_18013179988

DG2 has a bug where the OA timestamp does not tick at the CS timestamp
frequency. Instead it ticks at a multiple that is determined from the
CTC_SHIFT value in RPM_CONFIG. Since the timestamp is used by UMD to
make sense of all the counters in the report, expose the OA timestamp
frequency to the user. The interface is generic and applies to all
platforms. On platforms where the bug is not present, this returns the
CS timestamp frequency. UMD specific changes consumed by GPUvis are:
https://patchwork.freedesktop.org/patch/504464/?series=107633&rev=5

v2:
- Add review comments
- Update uapi changes in cover letter
- Drop patches for non-production platforms
drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size
drm/i915/perf: Add Wa_16010703925:dg2

- Drop 64-bit OA format changes for now
drm/i915/perf: Parse 64bit report header formats correctly
drm/i915/perf: Add Wa_1608133521:dg2

Test-with: 20220823183036.5270-1-umesh.nerlige.ramappa@intel.com
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Umesh Nerlige Ramappa (14):
  drm/i915/perf: Fix OA filtering logic for GuC mode
  drm/i915/perf: Add OAG and OAR formats for DG2
  drm/i915/perf: Fix noa wait predication for DG2
  drm/i915/perf: Determine gen12 oa ctx offset at runtime
  drm/i915/perf: Enable commands per clock reporting in OA
  drm/i915/perf: Simply use stream->ctx
  drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
  drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
  drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
  drm/i915/perf: Store a pointer to oa_format in oa_buffer
  drm/i915/perf: Add Wa_1508761755:dg2
  drm/i915/perf: Apply Wa_18013179988
  drm/i915/perf: Save/restore EU flex counters across reset
  drm/i915/perf: Enable OA for DG2

Vinay Belgaumkar (1):
  drm/i915/guc: Support OA when Wa_16011777198 is enabled

 drivers/gpu/drm/i915/gt/intel_engine_regs.h   |   1 +
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |   4 +
 drivers/gpu/drm/i915/gt/intel_gt_regs.h       |   1 +
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |   3 +
 drivers/gpu/drm/i915/gt/intel_lrc.h           |   2 +
 drivers/gpu/drm/i915/gt/intel_sseu.c          |   4 +-
 .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h |   9 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |   8 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   |  66 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |   2 +
 drivers/gpu/drm/i915/i915_drv.h               |   3 +
 drivers/gpu/drm/i915/i915_getparam.c          |   3 +
 drivers/gpu/drm/i915/i915_pci.c               |   1 +
 drivers/gpu/drm/i915/i915_perf.c              | 564 ++++++++++++++----
 drivers/gpu/drm/i915/i915_perf.h              |   2 +
 drivers/gpu/drm/i915/i915_perf_oa_regs.h      |   6 +-
 drivers/gpu/drm/i915/i915_perf_types.h        |  47 +-
 drivers/gpu/drm/i915/intel_device_info.h      |   1 +
 drivers/gpu/drm/i915/selftests/i915_perf.c    |  16 +-
 include/uapi/drm/i915_drm.h                   |  10 +
 20 files changed, 606 insertions(+), 147 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 01/15] drm/i915/perf: Fix OA filtering logic for GuC mode
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 02/15] drm/i915/perf: Add OAG and OAR formats for DG2 Umesh Nerlige Ramappa
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

With GuC mode of submission, GuC is in control of defining the context
id field that is part of the OA reports. To filter reports, UMD and KMD
must know what sw context id was chosen by GuC. There is not interface
between KMD and GuC to determine this, so read the upper-dword of
EXECLIST_STATUS to filter/squash OA reports for the specific context.

v2: Explain guc id stealing w.r.t OA use case

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.h |   2 +
 drivers/gpu/drm/i915/i915_perf.c    | 144 ++++++++++++++++++++++++----
 2 files changed, 127 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
index a390f0813c8b..7111bae759f3 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -110,6 +110,8 @@ enum {
 #define XEHP_SW_CTX_ID_WIDTH			16
 #define XEHP_SW_COUNTER_SHIFT			58
 #define XEHP_SW_COUNTER_WIDTH			6
+#define GEN12_GUC_SW_CTX_ID_SHIFT		39
+#define GEN12_GUC_SW_CTX_ID_WIDTH		16
 
 static inline void lrc_runtime_start(struct intel_context *ce)
 {
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 0defbb43ceea..315662329be3 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1233,6 +1233,128 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
 	return stream->pinned_ctx;
 }
 
+static int
+__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset)
+{
+	u32 *cs, cmd;
+
+	cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
+	if (GRAPHICS_VER(rq->engine->i915) >= 8)
+		cmd++;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = cmd;
+	*cs++ = i915_mmio_reg_offset(reg);
+	*cs++ = ggtt_offset;
+	*cs++ = 0;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+static int
+__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset)
+{
+	struct i915_request *rq;
+	int err;
+
+	rq = i915_request_create(ce);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	i915_request_get(rq);
+
+	err = __store_reg_to_mem(rq, reg, ggtt_offset);
+
+	i915_request_add(rq);
+	if (!err && i915_request_wait(rq, 0, HZ / 2) < 0)
+		err = -ETIME;
+
+	i915_request_put(rq);
+
+	return err;
+}
+
+static int
+gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id)
+{
+	struct i915_vma *scratch;
+	u32 *val;
+	int err;
+
+	scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4);
+	if (IS_ERR(scratch))
+		return PTR_ERR(scratch);
+
+	err = i915_vma_sync(scratch);
+	if (err)
+		goto err_scratch;
+
+	err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base),
+			 i915_ggtt_offset(scratch));
+	if (err)
+		goto err_scratch;
+
+	val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB);
+	if (IS_ERR(val)) {
+		err = PTR_ERR(val);
+		goto err_scratch;
+	}
+
+	*ctx_id = *val;
+	i915_gem_object_unpin_map(scratch->obj);
+
+err_scratch:
+	i915_vma_unpin_and_release(&scratch, 0);
+	return err;
+}
+
+/*
+ * For execlist mode of submission, pick an unused context id
+ * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts
+ * XXX_MAX_CONTEXT_HW_ID is used by idle context
+ *
+ * For GuC mode of submission read context id from the upper dword of the
+ * EXECLIST_STATUS register. Note that we read this value only once and expect
+ * that the value stays fixed for the entire OA use case. There are cases where
+ * GuC KMD implementation may deregister a context to reuse it's context id, but
+ * we prevent that from happening to the OA context by pinning it.
+ */
+static int gen12_get_render_context_id(struct i915_perf_stream *stream)
+{
+	u32 ctx_id, mask;
+	int ret;
+
+	if (intel_engine_uses_guc(stream->engine)) {
+		ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id);
+		if (ret)
+			return ret;
+
+		mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) <<
+			(GEN12_GUC_SW_CTX_ID_SHIFT - 32);
+	} else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) {
+		ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) <<
+			(XEHP_SW_CTX_ID_SHIFT - 32);
+
+		mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
+			(XEHP_SW_CTX_ID_SHIFT - 32);
+	} else {
+		ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) <<
+			 (GEN11_SW_CTX_ID_SHIFT - 32);
+
+		mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) <<
+			(GEN11_SW_CTX_ID_SHIFT - 32);
+	}
+	stream->specific_ctx_id = ctx_id & mask;
+	stream->specific_ctx_id_mask = mask;
+
+	return 0;
+}
+
 /**
  * oa_get_render_ctx_id - determine and hold ctx hw id
  * @stream: An i915-perf stream opened for OA metrics
@@ -1246,6 +1368,7 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream)
 static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 {
 	struct intel_context *ce;
+	int ret = 0;
 
 	ce = oa_pin_context(stream);
 	if (IS_ERR(ce))
@@ -1292,24 +1415,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 
 	case 11:
 	case 12:
-		if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 50)) {
-			stream->specific_ctx_id_mask =
-				((1U << XEHP_SW_CTX_ID_WIDTH) - 1) <<
-				(XEHP_SW_CTX_ID_SHIFT - 32);
-			stream->specific_ctx_id =
-				(XEHP_MAX_CONTEXT_HW_ID - 1) <<
-				(XEHP_SW_CTX_ID_SHIFT - 32);
-		} else {
-			stream->specific_ctx_id_mask =
-				((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
-			/*
-			 * Pick an unused context id
-			 * 0 - BITS_PER_LONG are used by other contexts
-			 * GEN12_MAX_CONTEXT_HW_ID (0x7ff) is used by idle context
-			 */
-			stream->specific_ctx_id =
-				(GEN12_MAX_CONTEXT_HW_ID - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
-		}
+		ret = gen12_get_render_context_id(stream);
 		break;
 
 	default:
@@ -1323,7 +1429,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 		stream->specific_ctx_id,
 		stream->specific_ctx_id_mask);
 
-	return 0;
+	return ret;
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 02/15] drm/i915/perf: Add OAG and OAR formats for DG2
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 01/15] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-24  4:08   ` Dixit, Ashutosh
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 03/15] drm/i915/perf: Fix noa wait predication " Umesh Nerlige Ramappa
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

Add new OA formats for DG2. Some of the newer OA formats are not
multples of 64 bytes and are not powers of 2. For those formats, adjust
hw_tail accordingly when checking for new reports.

v2:
- Update commit title (Ashutosh)
- Coding style fixes (Lionel)
- 64 bit OA formats need UMD changes in GPUvis, drop for now and send in a
  separate series with UMD changes

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #1
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> #1
---
 drivers/gpu/drm/i915/i915_perf.c | 7 +++++++
 include/uapi/drm/i915_drm.h      | 4 ++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 315662329be3..41e9f620ee31 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -320,6 +320,8 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A12]		    = { 0, 64 },
 	[I915_OA_FORMAT_A12_B8_C8]	    = { 2, 128 },
 	[I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 },
+	[I915_OAR_FORMAT_A32u40_A4u32_B8_C8]    = { 5, 256 },
+	[I915_OA_FORMAT_A24u40_A14u32_B8_C8]    = { 5, 256 },
 };
 
 #define SAMPLE_OA_REPORT      (1<<0)
@@ -4517,6 +4519,11 @@ static void oa_init_supported_formats(struct i915_perf *perf)
 		oa_format_add(perf, I915_OA_FORMAT_C4_B8);
 		break;
 
+	case INTEL_DG2:
+		oa_format_add(perf, I915_OAR_FORMAT_A32u40_A4u32_B8_C8);
+		oa_format_add(perf, I915_OA_FORMAT_A24u40_A14u32_B8_C8);
+		break;
+
 	default:
 		MISSING_CASE(platform);
 	}
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 520ad2691a99..8b59590e06d4 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2650,6 +2650,10 @@ enum drm_i915_oa_format {
 	I915_OA_FORMAT_A12_B8_C8,
 	I915_OA_FORMAT_A32u40_A4u32_B8_C8,
 
+	/* DG2 */
+	I915_OAR_FORMAT_A32u40_A4u32_B8_C8,
+	I915_OA_FORMAT_A24u40_A14u32_B8_C8,
+
 	I915_OA_FORMAT_MAX	    /* non-ABI */
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 03/15] drm/i915/perf: Fix noa wait predication for DG2
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 01/15] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 02/15] drm/i915/perf: Add OAG and OAR formats for DG2 Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 04/15] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

Predication for batch buffer commands changed in XEHPSDV.
MI_BATCH_BUFFER_START predicates based on MI_SET_PREDICATE_RESULT
register. The MI_SET_PREDICATE_RESULT register can only be modified
with MI_SET_PREDICATE command. When configured, the MI_SET_PREDICATE
command sets MI_SET_PREDICATE_RESULT based on bit 0 of
MI_PREDICATE_RESULT_2. Use this to configure predication in noa_wait.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_regs.h |  1 +
 drivers/gpu/drm/i915/i915_perf.c            | 24 +++++++++++++++++----
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_regs.h b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
index fe1a0d5fd4b1..ee3efd06ee54 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
@@ -201,6 +201,7 @@
 #define RING_CONTEXT_STATUS_PTR(base)		_MMIO((base) + 0x3a0)
 #define RING_CTX_TIMESTAMP(base)		_MMIO((base) + 0x3a8) /* gen8+ */
 #define RING_PREDICATE_RESULT(base)		_MMIO((base) + 0x3b8)
+#define MI_PREDICATE_RESULT_2_ENGINE(base)	_MMIO((base) + 0x3bc)
 #define RING_FORCE_TO_NONPRIV(base, i)		_MMIO(((base) + 0x4D0) + (i) * 4)
 #define   RING_FORCE_TO_NONPRIV_DENY		REG_BIT(30)
 #define   RING_FORCE_TO_NONPRIV_ADDRESS_MASK	REG_GENMASK(25, 2)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 41e9f620ee31..cd57b5836386 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -286,6 +286,7 @@ static u32 i915_perf_stream_paranoid = true;
 #define OAREPORT_REASON_CTX_SWITCH     (1<<3)
 #define OAREPORT_REASON_CLK_RATIO      (1<<5)
 
+#define HAS_MI_SET_PREDICATE(i915) (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
 
 /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
  *
@@ -1762,6 +1763,9 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 		DELTA_TARGET,
 		N_CS_GPR
 	};
+	i915_reg_t mi_predicate_result = HAS_MI_SET_PREDICATE(i915) ?
+					  MI_PREDICATE_RESULT_2_ENGINE(base) :
+					  MI_PREDICATE_RESULT_1(RENDER_RING_BASE);
 
 	bo = i915_gem_object_create_internal(i915, 4096);
 	if (IS_ERR(bo)) {
@@ -1799,7 +1803,7 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 			stream, cs, true /* save */, CS_GPR(i),
 			INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
 	cs = save_restore_register(
-		stream, cs, true /* save */, MI_PREDICATE_RESULT_1(RENDER_RING_BASE),
+		stream, cs, true /* save */, mi_predicate_result,
 		INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
 
 	/* First timestamp snapshot location. */
@@ -1853,7 +1857,10 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 	 */
 	*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
 	*cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE));
-	*cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1(RENDER_RING_BASE));
+	*cs++ = i915_mmio_reg_offset(mi_predicate_result);
+
+	if (HAS_MI_SET_PREDICATE(i915))
+		*cs++ = MI_SET_PREDICATE | 1;
 
 	/* Restart from the beginning if we had timestamps roll over. */
 	*cs++ = (GRAPHICS_VER(i915) < 8 ?
@@ -1863,6 +1870,9 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 	*cs++ = i915_ggtt_offset(vma) + (ts0 - batch) * 4;
 	*cs++ = 0;
 
+	if (HAS_MI_SET_PREDICATE(i915))
+		*cs++ = MI_SET_PREDICATE;
+
 	/*
 	 * Now add the diff between to previous timestamps and add it to :
 	 *      (((1 * << 64) - 1) - delay_ns)
@@ -1890,7 +1900,10 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 	 */
 	*cs++ = MI_LOAD_REGISTER_REG | (3 - 2);
 	*cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE));
-	*cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1(RENDER_RING_BASE));
+	*cs++ = i915_mmio_reg_offset(mi_predicate_result);
+
+	if (HAS_MI_SET_PREDICATE(i915))
+		*cs++ = MI_SET_PREDICATE | 1;
 
 	/* Predicate the jump.  */
 	*cs++ = (GRAPHICS_VER(i915) < 8 ?
@@ -1900,13 +1913,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 	*cs++ = i915_ggtt_offset(vma) + (jump - batch) * 4;
 	*cs++ = 0;
 
+	if (HAS_MI_SET_PREDICATE(i915))
+		*cs++ = MI_SET_PREDICATE;
+
 	/* Restore registers. */
 	for (i = 0; i < N_CS_GPR; i++)
 		cs = save_restore_register(
 			stream, cs, false /* restore */, CS_GPR(i),
 			INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2);
 	cs = save_restore_register(
-		stream, cs, false /* restore */, MI_PREDICATE_RESULT_1(RENDER_RING_BASE),
+		stream, cs, false /* restore */, mi_predicate_result,
 		INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1);
 
 	/* And return to the ring. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 04/15] drm/i915/perf: Determine gen12 oa ctx offset at runtime
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (2 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 03/15] drm/i915/perf: Fix noa wait predication " Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-27 23:24   ` Dixit, Ashutosh
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 05/15] drm/i915/perf: Enable commands per clock reporting in OA Umesh Nerlige Ramappa
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

Some SKUs of same gen12 platform may have different oactxctrl
offsets. For gen12, determine oactxctrl offsets at runtime.

v2: (Lionel)
- Move MI definitions to intel_gpu_commands.h
- Ensure __find_reg_in_lri does read past context image size

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |   4 +
 drivers/gpu/drm/i915/i915_perf.c             | 146 +++++++++++++++----
 drivers/gpu/drm/i915/i915_perf_oa_regs.h     |   2 +-
 3 files changed, 121 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index d4e9702d3c8e..f50ea92910d9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -187,6 +187,10 @@
 #define   MI_BATCH_RESOURCE_STREAMER REG_BIT(10)
 #define   MI_BATCH_PREDICATE         REG_BIT(15) /* HSW+ on RCS only*/
 
+#define MI_OPCODE(x)		(((x) >> 23) & 0x3f)
+#define IS_MI_LRI_CMD(x)	(MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 0)))
+#define MI_LRI_LEN(x)		(((x) & 0xff) + 1)
+
 /*
  * 3D instructions used by the kernel
  */
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index cd57b5836386..fb705f24705b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1358,6 +1358,64 @@ static int gen12_get_render_context_id(struct i915_perf_stream *stream)
 	return 0;
 }
 
+#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX)
+static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset, u32 end)
+{
+	u32 idx = *offset;
+	u32 len = min(MI_LRI_LEN(state[idx]) + idx, end);
+
+	idx++;
+	for (; idx < len; idx += 2)
+		if (state[idx] == reg)
+			break;
+
+	*offset = idx;
+	return state[idx] == reg;
+}
+
+static u32 __context_image_offset(struct intel_context *ce, u32 reg)
+{
+	u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
+	u32 *state = ce->lrc_reg_state;
+
+	for (offset = 0; offset < len; ) {
+		if (IS_MI_LRI_CMD(state[offset])) {
+			if (__find_reg_in_lri(state, reg, &offset, len))
+				break;
+		} else {
+			offset++;
+		}
+	}
+
+	return offset < len ? offset : U32_MAX;
+}
+
+static int __set_oa_ctx_ctrl_offset(struct intel_context *ce)
+{
+	i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
+	struct i915_perf *perf = &ce->engine->i915->perf;
+	u32 saved_offset = perf->ctx_oactxctrl_offset;
+	u32 offset;
+
+	/* Do this only once. Failure is stored as offset of U32_MAX */
+	if (saved_offset)
+		return 0;
+
+	offset = __context_image_offset(ce, i915_mmio_reg_offset(reg));
+	perf->ctx_oactxctrl_offset = offset;
+
+	drm_dbg(&ce->engine->i915->drm,
+		"%s oa ctx control at 0x%08x dword offset\n",
+		ce->engine->name, offset);
+
+	return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
+}
+
+static bool engine_supports_mi_query(struct intel_engine_cs *engine)
+{
+	return engine->class == RENDER_CLASS;
+}
+
 /**
  * oa_get_render_ctx_id - determine and hold ctx hw id
  * @stream: An i915-perf stream opened for OA metrics
@@ -1377,6 +1435,17 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
+	if (engine_supports_mi_query(stream->engine)) {
+		ret = __set_oa_ctx_ctrl_offset(ce);
+		if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) {
+			intel_context_unpin(ce);
+			drm_err(&stream->perf->i915->drm,
+				"Enabling perf query failed for %s\n",
+				stream->engine->name);
+			return ret;
+		}
+	}
+
 	switch (GRAPHICS_VER(ce->engine->i915)) {
 	case 7: {
 		/*
@@ -2408,10 +2477,11 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
 	int err;
 	struct intel_context *ce = stream->pinned_ctx;
 	u32 format = stream->oa_buffer.format;
+	u32 offset = stream->perf->ctx_oactxctrl_offset;
 	struct flex regs_context[] = {
 		{
 			GEN8_OACTXCONTROL,
-			stream->perf->ctx_oactxctrl_offset + 1,
+			offset + 1,
 			active ? GEN8_OA_COUNTER_RESUME : 0,
 		},
 	};
@@ -2436,15 +2506,18 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
 		},
 	};
 
-	/* Modify the context image of pinned context with regs_context*/
-	err = intel_context_lock_pinned(ce);
-	if (err)
-		return err;
+	/* Modify the context image of pinned context with regs_context */
+	if (__valid_oactxctrl_offset(offset)) {
+		err = intel_context_lock_pinned(ce);
+		if (err)
+			return err;
 
-	err = gen8_modify_context(ce, regs_context, ARRAY_SIZE(regs_context));
-	intel_context_unlock_pinned(ce);
-	if (err)
-		return err;
+		err = gen8_modify_context(ce, regs_context,
+					  ARRAY_SIZE(regs_context));
+		intel_context_unlock_pinned(ce);
+		if (err)
+			return err;
+	}
 
 	/* Apply regs_lri using LRI with pinned context */
 	return gen8_modify_self(ce, regs_lri, ARRAY_SIZE(regs_lri), active);
@@ -2566,6 +2639,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
 			   const struct i915_oa_config *oa_config,
 			   struct i915_active *active)
 {
+	u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset;
 	/* The MMIO offsets for Flex EU registers aren't contiguous */
 	const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset;
 #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1)
@@ -2576,7 +2650,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
 		},
 		{
 			GEN8_OACTXCONTROL,
-			stream->perf->ctx_oactxctrl_offset + 1,
+			ctx_oactxctrl + 1,
 		},
 		{ EU_PERF_CNTL0, ctx_flexeuN(0) },
 		{ EU_PERF_CNTL1, ctx_flexeuN(1) },
@@ -4545,6 +4619,37 @@ static void oa_init_supported_formats(struct i915_perf *perf)
 	}
 }
 
+static void i915_perf_init_info(struct drm_i915_private *i915)
+{
+	struct i915_perf *perf = &i915->perf;
+
+	switch (GRAPHICS_VER(i915)) {
+	case 8:
+		perf->ctx_oactxctrl_offset = 0x120;
+		perf->ctx_flexeu0_offset = 0x2ce;
+		perf->gen8_valid_ctx_bit = BIT(25);
+		break;
+	case 9:
+		perf->ctx_oactxctrl_offset = 0x128;
+		perf->ctx_flexeu0_offset = 0x3de;
+		perf->gen8_valid_ctx_bit = BIT(16);
+		break;
+	case 11:
+		perf->ctx_oactxctrl_offset = 0x124;
+		perf->ctx_flexeu0_offset = 0x78e;
+		perf->gen8_valid_ctx_bit = BIT(16);
+		break;
+	case 12:
+		/*
+		 * Calculate offset at runtime in oa_pin_context for gen12 and
+		 * cache the value in perf->ctx_oactxctrl_offset.
+		 */
+		break;
+	default:
+		MISSING_CASE(GRAPHICS_VER(i915));
+	}
+}
+
 /**
  * i915_perf_init - initialize i915-perf state on module bind
  * @i915: i915 device instance
@@ -4583,6 +4688,7 @@ void i915_perf_init(struct drm_i915_private *i915)
 		 * execlist mode by default.
 		 */
 		perf->ops.read = gen8_oa_read;
+		i915_perf_init_info(i915);
 
 		if (IS_GRAPHICS_VER(i915, 8, 9)) {
 			perf->ops.is_valid_b_counter_reg =
@@ -4602,18 +4708,6 @@ void i915_perf_init(struct drm_i915_private *i915)
 			perf->ops.enable_metric_set = gen8_enable_metric_set;
 			perf->ops.disable_metric_set = gen8_disable_metric_set;
 			perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
-
-			if (GRAPHICS_VER(i915) == 8) {
-				perf->ctx_oactxctrl_offset = 0x120;
-				perf->ctx_flexeu0_offset = 0x2ce;
-
-				perf->gen8_valid_ctx_bit = BIT(25);
-			} else {
-				perf->ctx_oactxctrl_offset = 0x128;
-				perf->ctx_flexeu0_offset = 0x3de;
-
-				perf->gen8_valid_ctx_bit = BIT(16);
-			}
 		} else if (GRAPHICS_VER(i915) == 11) {
 			perf->ops.is_valid_b_counter_reg =
 				gen7_is_valid_b_counter_addr;
@@ -4627,11 +4721,6 @@ void i915_perf_init(struct drm_i915_private *i915)
 			perf->ops.enable_metric_set = gen8_enable_metric_set;
 			perf->ops.disable_metric_set = gen11_disable_metric_set;
 			perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read;
-
-			perf->ctx_oactxctrl_offset = 0x124;
-			perf->ctx_flexeu0_offset = 0x78e;
-
-			perf->gen8_valid_ctx_bit = BIT(16);
 		} else if (GRAPHICS_VER(i915) == 12) {
 			perf->ops.is_valid_b_counter_reg =
 				gen12_is_valid_b_counter_addr;
@@ -4645,9 +4734,6 @@ void i915_perf_init(struct drm_i915_private *i915)
 			perf->ops.enable_metric_set = gen12_enable_metric_set;
 			perf->ops.disable_metric_set = gen12_disable_metric_set;
 			perf->ops.oa_hw_tail_read = gen12_oa_hw_tail_read;
-
-			perf->ctx_flexeu0_offset = 0;
-			perf->ctx_oactxctrl_offset = 0x144;
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
index f31c9f13a9fc..0ef3562ff4aa 100644
--- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
+++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
@@ -97,7 +97,7 @@
 #define  GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT 1
 #define  GEN12_OAR_OACONTROL_COUNTER_ENABLE       (1 << 0)
 
-#define GEN12_OACTXCONTROL _MMIO(0x2360)
+#define GEN12_OACTXCONTROL(base) _MMIO((base) + 0x360)
 #define GEN12_OAR_OASTATUS _MMIO(0x2968)
 
 /* Gen12 OAG unit */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 05/15] drm/i915/perf: Enable commands per clock reporting in OA
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (3 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 04/15] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-26 15:55   ` Dixit, Ashutosh
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 06/15] drm/i915/perf: Simply use stream->ctx Umesh Nerlige Ramappa
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

XEHPSDV and DG2 provide a way to configure bytes per clock vs commands
per clock reporting. Enable bytes per clock setting on enabling OA.

v2:
- Fix commit msg (Ashutosh)
- Fix checkpatch issues

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          |  3 +++
 drivers/gpu/drm/i915/i915_pci.c          |  1 +
 drivers/gpu/drm/i915/i915_perf.c         | 20 ++++++++++++++++++++
 drivers/gpu/drm/i915/i915_perf_oa_regs.h |  4 ++++
 drivers/gpu/drm/i915/intel_device_info.h |  1 +
 5 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9f9372931fd2..ccd54ff54002 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -902,6 +902,9 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_RUNTIME_PM(dev_priv) (INTEL_INFO(dev_priv)->has_runtime_pm)
 #define HAS_64BIT_RELOC(dev_priv) (INTEL_INFO(dev_priv)->has_64bit_reloc)
 
+#define HAS_OA_BPC_REPORTING(dev_priv) \
+	(INTEL_INFO(dev_priv)->has_oa_bpc_reporting)
+
 /*
  * Set this flag, when platform requires 64K GTT page sizes or larger for
  * device local memory access.
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 77e7df21f539..6b25e4cb6221 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1022,6 +1022,7 @@ static const struct intel_device_info adl_p_info = {
 	.has_logical_ring_contexts = 1, \
 	.has_logical_ring_elsq = 1, \
 	.has_mslice_steering = 1, \
+	.has_oa_bpc_reporting = 1, \
 	.has_rc6 = 1, \
 	.has_reset_engine = 1, \
 	.has_rps = 1, \
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index fb705f24705b..657dff8bf395 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2738,10 +2738,12 @@ static int
 gen12_enable_metric_set(struct i915_perf_stream *stream,
 			struct i915_active *active)
 {
+	struct drm_i915_private *i915 = stream->perf->i915;
 	struct intel_uncore *uncore = stream->uncore;
 	struct i915_oa_config *oa_config = stream->oa_config;
 	bool periodic = stream->periodic;
 	u32 period_exponent = stream->period_exponent;
+	u32 sqcnt1;
 	int ret;
 
 	intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG,
@@ -2760,6 +2762,16 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
 			    (period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT))
 			    : 0);
 
+	/*
+	 * Initialize Super Queue Internal Cnt Register
+	 * Set PMON Enable in order to collect valid metrics.
+	 * Enable commands per clock reporting in OA for XEHPSDV onward.
+	 */
+	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
+		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
+
+	intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, sqcnt1);
+
 	/*
 	 * Update all contexts prior writing the mux configurations as we need
 	 * to make sure all slices/subslices are ON before writing to NOA
@@ -2809,6 +2821,8 @@ static void gen11_disable_metric_set(struct i915_perf_stream *stream)
 static void gen12_disable_metric_set(struct i915_perf_stream *stream)
 {
 	struct intel_uncore *uncore = stream->uncore;
+	struct drm_i915_private *i915 = stream->perf->i915;
+	u32 sqcnt1;
 
 	/* Reset all contexts' slices/subslices configurations. */
 	gen12_configure_all_contexts(stream, NULL, NULL);
@@ -2819,6 +2833,12 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
 
 	/* Make sure we disable noa to save power. */
 	intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0);
+
+	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
+		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
+
+	/* Reset PMON Enable to save power. */
+	intel_uncore_rmw(uncore, GEN12_SQCNT1, sqcnt1, 0);
 }
 
 static void gen7_oa_enable(struct i915_perf_stream *stream)
diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
index 0ef3562ff4aa..381d94101610 100644
--- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h
+++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h
@@ -134,4 +134,8 @@
 #define GDT_CHICKEN_BITS    _MMIO(0x9840)
 #define   GT_NOA_ENABLE	    0x00000080
 
+#define GEN12_SQCNT1				_MMIO(0x8718)
+#define   GEN12_SQCNT1_PMON_ENABLE		REG_BIT(30)
+#define   GEN12_SQCNT1_OABPC			REG_BIT(29)
+
 #endif /* __INTEL_PERF_OA_REGS__ */
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 09b18910d3ab..1f7a842cd408 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -164,6 +164,7 @@ enum intel_ppgtt_type {
 	func(has_logical_ring_elsq); \
 	func(has_media_ratio_mode); \
 	func(has_mslice_steering); \
+	func(has_oa_bpc_reporting); \
 	func(has_one_eu_per_fuse_bit); \
 	func(has_pxp); \
 	func(has_rc6); \
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 06/15] drm/i915/perf: Simply use stream->ctx
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (4 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 05/15] drm/i915/perf: Enable commands per clock reporting in OA Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 07/15] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

Earlier code used exclusive_stream to check for user passed context.
Simplify this by accessing stream->ctx.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 657dff8bf395..2ec2da42a539 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -777,7 +777,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		 * switches since it's not-uncommon for periodic samples to
 		 * identify a switch before any 'context switch' report.
 		 */
-		if (!stream->perf->exclusive_stream->ctx ||
+		if (!stream->ctx ||
 		    stream->specific_ctx_id == ctx_id ||
 		    stream->oa_buffer.last_ctx_id == stream->specific_ctx_id ||
 		    reason & OAREPORT_REASON_CTX_SWITCH) {
@@ -786,7 +786,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 			 * While filtering for a single context we avoid
 			 * leaking the IDs of other contexts.
 			 */
-			if (stream->perf->exclusive_stream->ctx &&
+			if (stream->ctx &&
 			    stream->specific_ctx_id != ctx_id) {
 				report32[2] = INVALID_CTX_ID;
 			}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 07/15] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (5 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 06/15] drm/i915/perf: Simply use stream->ctx Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 08/15] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops Umesh Nerlige Ramappa
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

Make perf part of gt as the OAG buffer is specific to a gt. The refactor
eventually simplifies programming the right OA buffer and the right HW
registers when supporting multiple gts.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_types.h   |  3 +
 drivers/gpu/drm/i915/gt/intel_sseu.c       |  4 +-
 drivers/gpu/drm/i915/i915_perf.c           | 75 +++++++++++++---------
 drivers/gpu/drm/i915/i915_perf_types.h     | 39 +++++------
 drivers/gpu/drm/i915/selftests/i915_perf.c | 16 +++--
 5 files changed, 80 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index f19c2de77ff6..9f653a347cad 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -20,6 +20,7 @@
 #include "intel_gsc.h"
 
 #include "i915_vma.h"
+#include "i915_perf_types.h"
 #include "intel_engine_types.h"
 #include "intel_gt_buffer_pool_types.h"
 #include "intel_hwconfig.h"
@@ -286,6 +287,8 @@ struct intel_gt {
 	/* sysfs defaults per gt */
 	struct gt_defaults defaults;
 	struct kobject *sysfs_defaults;
+
+	struct i915_perf_gt perf;
 };
 
 struct intel_gt_definition {
diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c
index 66f21c735d54..6c6198a257ac 100644
--- a/drivers/gpu/drm/i915/gt/intel_sseu.c
+++ b/drivers/gpu/drm/i915/gt/intel_sseu.c
@@ -677,8 +677,8 @@ u32 intel_sseu_make_rpcs(struct intel_gt *gt,
 	 * If i915/perf is active, we want a stable powergating configuration
 	 * on the system. Use the configuration pinned by i915/perf.
 	 */
-	if (i915->perf.exclusive_stream)
-		req_sseu = &i915->perf.sseu;
+	if (gt->perf.exclusive_stream)
+		req_sseu = &gt->perf.sseu;
 
 	slices = hweight8(req_sseu->slice_mask);
 	subslices = hweight8(req_sseu->subslice_mask);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2ec2da42a539..8b238268ca55 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1553,8 +1553,9 @@ free_noa_wait(struct i915_perf_stream *stream)
 static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 {
 	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 
-	if (WARN_ON(stream != perf->exclusive_stream))
+	if (WARN_ON(stream != gt->perf.exclusive_stream))
 		return;
 
 	/*
@@ -1563,7 +1564,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 	 *
 	 * See i915_oa_init_reg_state() and lrc_configure_all_contexts()
 	 */
-	WRITE_ONCE(perf->exclusive_stream, NULL);
+	WRITE_ONCE(gt->perf.exclusive_stream, NULL);
 	perf->ops.disable_metric_set(stream);
 
 	free_oa_buffer(stream);
@@ -2556,10 +2557,11 @@ oa_configure_all_contexts(struct i915_perf_stream *stream,
 {
 	struct drm_i915_private *i915 = stream->perf->i915;
 	struct intel_engine_cs *engine;
+	struct intel_gt *gt = stream->engine->gt;
 	struct i915_gem_context *ctx, *cn;
 	int err;
 
-	lockdep_assert_held(&stream->perf->lock);
+	lockdep_assert_held(&gt->perf.lock);
 
 	/*
 	 * The OA register config is setup through the context image. This image
@@ -3080,6 +3082,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 {
 	struct drm_i915_private *i915 = stream->perf->i915;
 	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt;
 	int format_size;
 	int ret;
 
@@ -3088,6 +3091,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 			"OA engine not specified\n");
 		return -EINVAL;
 	}
+	gt = props->engine->gt;
 
 	/*
 	 * If the sysfs metrics/ directory wasn't registered for some
@@ -3118,7 +3122,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	 * counter reports and marshal to the appropriate client
 	 * we currently only allow exclusive access
 	 */
-	if (perf->exclusive_stream) {
+	if (gt->perf.exclusive_stream) {
 		drm_dbg(&stream->perf->i915->drm,
 			"OA unit already in use\n");
 		return -EBUSY;
@@ -3198,8 +3202,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 
 	stream->ops = &i915_oa_stream_ops;
 
-	perf->sseu = props->sseu;
-	WRITE_ONCE(perf->exclusive_stream, stream);
+	stream->engine->gt->perf.sseu = props->sseu;
+	WRITE_ONCE(gt->perf.exclusive_stream, stream);
 
 	ret = i915_perf_stream_enable_sync(stream);
 	if (ret) {
@@ -3221,7 +3225,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	return 0;
 
 err_enable:
-	WRITE_ONCE(perf->exclusive_stream, NULL);
+	WRITE_ONCE(gt->perf.exclusive_stream, NULL);
 	perf->ops.disable_metric_set(stream);
 
 	free_oa_buffer(stream);
@@ -3251,7 +3255,7 @@ void i915_oa_init_reg_state(const struct intel_context *ce,
 		return;
 
 	/* perf.exclusive_stream serialised by lrc_configure_all_contexts() */
-	stream = READ_ONCE(engine->i915->perf.exclusive_stream);
+	stream = READ_ONCE(engine->gt->perf.exclusive_stream);
 	if (stream && GRAPHICS_VER(stream->perf->i915) < 12)
 		gen8_update_reg_state_unlocked(ce, stream);
 }
@@ -3280,7 +3284,7 @@ static ssize_t i915_perf_read(struct file *file,
 			      loff_t *ppos)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 	size_t offset = 0;
 	int ret;
 
@@ -3304,14 +3308,14 @@ static ssize_t i915_perf_read(struct file *file,
 			if (ret)
 				return ret;
 
-			mutex_lock(&perf->lock);
+			mutex_lock(&gt->perf.lock);
 			ret = stream->ops->read(stream, buf, count, &offset);
-			mutex_unlock(&perf->lock);
+			mutex_unlock(&gt->perf.lock);
 		} while (!offset && !ret);
 	} else {
-		mutex_lock(&perf->lock);
+		mutex_lock(&gt->perf.lock);
 		ret = stream->ops->read(stream, buf, count, &offset);
-		mutex_unlock(&perf->lock);
+		mutex_unlock(&gt->perf.lock);
 	}
 
 	/* We allow the poll checking to sometimes report false positive EPOLLIN
@@ -3358,7 +3362,7 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
  * &i915_perf_stream_ops->poll_wait to call poll_wait() with a wait queue that
  * will be woken for new stream data.
  *
- * Note: The &perf->lock mutex has been taken to serialize
+ * Note: The &gt->perf.lock mutex has been taken to serialize
  * with any non-file-operation driver hooks.
  *
  * Returns: any poll events that are ready without sleeping
@@ -3399,12 +3403,12 @@ static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
 static __poll_t i915_perf_poll(struct file *file, poll_table *wait)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 	__poll_t ret;
 
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 	ret = i915_perf_poll_locked(stream, file, wait);
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 
 	return ret;
 }
@@ -3503,7 +3507,7 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
  * @cmd: the ioctl request
  * @arg: the ioctl data
  *
- * Note: The &perf->lock mutex has been taken to serialize
+ * Note: The &gt->perf.lock mutex has been taken to serialize
  * with any non-file-operation driver hooks.
  *
  * Returns: zero on success or a negative error code. Returns -EINVAL for
@@ -3543,12 +3547,12 @@ static long i915_perf_ioctl(struct file *file,
 			    unsigned long arg)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 	long ret;
 
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 	ret = i915_perf_ioctl_locked(stream, cmd, arg);
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 
 	return ret;
 }
@@ -3560,7 +3564,7 @@ static long i915_perf_ioctl(struct file *file,
  * Frees all resources associated with the given i915 perf @stream, disabling
  * any associated data capture in the process.
  *
- * Note: The &perf->lock mutex has been taken to serialize
+ * Note: The &gt->perf.lock mutex has been taken to serialize
  * with any non-file-operation driver hooks.
  */
 static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
@@ -3592,10 +3596,11 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 {
 	struct i915_perf_stream *stream = file->private_data;
 	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 	i915_perf_destroy_locked(stream);
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 
 	/* Release the reference the perf stream kept on the driver. */
 	drm_dev_put(&perf->i915->drm);
@@ -3628,7 +3633,7 @@ static const struct file_operations fops = {
  * See i915_perf_ioctl_open() for interface details.
  *
  * Implements further stream config validation and stream initialization on
- * behalf of i915_perf_open_ioctl() with the &perf->lock mutex
+ * behalf of i915_perf_open_ioctl() with the &gt->perf.lock mutex
  * taken to serialize with any non-file-operation driver hooks.
  *
  * Note: at this point the @props have only been validated in isolation and
@@ -4012,7 +4017,7 @@ static int read_properties_unlocked(struct i915_perf *perf,
  * mutex to avoid an awkward lockdep with mmap_lock.
  *
  * Most of the implementation details are handled by
- * i915_perf_open_ioctl_locked() after taking the &perf->lock
+ * i915_perf_open_ioctl_locked() after taking the &gt->perf.lock
  * mutex for serializing with any non-file-operation driver hooks.
  *
  * Return: A newly opened i915 Perf stream file descriptor or negative
@@ -4023,6 +4028,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 {
 	struct i915_perf *perf = &to_i915(dev)->perf;
 	struct drm_i915_perf_open_param *param = data;
+	struct intel_gt *gt;
 	struct perf_open_properties props;
 	u32 known_open_flags;
 	int ret;
@@ -4049,9 +4055,11 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	mutex_lock(&perf->lock);
+	gt = props.engine->gt;
+
+	mutex_lock(&gt->perf.lock);
 	ret = i915_perf_open_ioctl_locked(perf, param, &props, file);
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 
 	return ret;
 }
@@ -4067,6 +4075,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 void i915_perf_register(struct drm_i915_private *i915)
 {
 	struct i915_perf *perf = &i915->perf;
+	struct intel_gt *gt = to_gt(i915);
 
 	if (!perf->i915)
 		return;
@@ -4075,13 +4084,13 @@ void i915_perf_register(struct drm_i915_private *i915)
 	 * i915_perf_open_ioctl(); considering that we register after
 	 * being exposed to userspace.
 	 */
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 
 	perf->metrics_kobj =
 		kobject_create_and_add("metrics",
 				       &i915->drm.primary->kdev->kobj);
 
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 }
 
 /**
@@ -4758,7 +4767,11 @@ void i915_perf_init(struct drm_i915_private *i915)
 	}
 
 	if (perf->ops.enable_metric_set) {
-		mutex_init(&perf->lock);
+		struct intel_gt *gt;
+		int i;
+
+		for_each_gt(gt, i915, i)
+			mutex_init(&gt->perf.lock);
 
 		/* Choose a representative limit */
 		oa_sample_rate_hard_limit = to_gt(i915)->clock_frequency / 2;
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index 05cb9a335a97..e888bfab478f 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -380,6 +380,26 @@ struct i915_oa_ops {
 	u32 (*oa_hw_tail_read)(struct i915_perf_stream *stream);
 };
 
+struct i915_perf_gt {
+	/*
+	 * Lock associated with anything below within this structure.
+	 */
+	struct mutex lock;
+
+	/**
+	 * @sseu: sseu configuration selected to run while perf is active,
+	 * applies to all contexts.
+	 */
+	struct intel_sseu sseu;
+
+	/*
+	 * @exclusive_stream: The stream currently using the OA unit. This is
+	 * sometimes accessed outside a syscall associated to its file
+	 * descriptor.
+	 */
+	struct i915_perf_stream *exclusive_stream;
+};
+
 struct i915_perf {
 	struct drm_i915_private *i915;
 
@@ -397,25 +417,6 @@ struct i915_perf {
 	 */
 	struct idr metrics_idr;
 
-	/*
-	 * Lock associated with anything below within this structure
-	 * except exclusive_stream.
-	 */
-	struct mutex lock;
-
-	/*
-	 * The stream currently using the OA unit. If accessed
-	 * outside a syscall associated to its file
-	 * descriptor.
-	 */
-	struct i915_perf_stream *exclusive_stream;
-
-	/**
-	 * @sseu: sseu configuration selected to run while perf is active,
-	 * applies to all contexts.
-	 */
-	struct intel_sseu sseu;
-
 	/**
 	 * For rate limiting any notifications of spurious
 	 * invalid OA reports
diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c
index 429c6d73b159..24dde5531423 100644
--- a/drivers/gpu/drm/i915/selftests/i915_perf.c
+++ b/drivers/gpu/drm/i915/selftests/i915_perf.c
@@ -102,6 +102,12 @@ test_stream(struct i915_perf *perf)
 		I915_OA_FORMAT_A32u40_A4u32_B8_C8 : I915_OA_FORMAT_C4_B8,
 	};
 	struct i915_perf_stream *stream;
+	struct intel_gt *gt;
+
+	if (!props.engine)
+		return NULL;
+
+	gt = props.engine->gt;
 
 	if (!oa_config)
 		return NULL;
@@ -116,12 +122,12 @@ test_stream(struct i915_perf *perf)
 
 	stream->perf = perf;
 
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 	if (i915_oa_stream_init(stream, &param, &props)) {
 		kfree(stream);
 		stream =  NULL;
 	}
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 
 	i915_oa_config_put(oa_config);
 
@@ -130,11 +136,11 @@ test_stream(struct i915_perf *perf)
 
 static void stream_destroy(struct i915_perf_stream *stream)
 {
-	struct i915_perf *perf = stream->perf;
+	struct intel_gt *gt = stream->engine->gt;
 
-	mutex_lock(&perf->lock);
+	mutex_lock(&gt->perf.lock);
 	i915_perf_destroy_locked(stream);
-	mutex_unlock(&perf->lock);
+	mutex_unlock(&gt->perf.lock);
 }
 
 static int live_sanitycheck(void *arg)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 08/15] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (6 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 07/15] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 09/15] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers Umesh Nerlige Ramappa
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

With multi-gt, user can access multiple OA buffers concurrently. Use
stream->lock instead of gt->perf.lock to serialize file operations.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 31 ++++++++++++--------------
 drivers/gpu/drm/i915/i915_perf_types.h |  5 +++++
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 8b238268ca55..42a258578a5f 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3221,6 +3221,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	stream->poll_check_timer.function = oa_poll_check_timer_cb;
 	init_waitqueue_head(&stream->poll_wq);
 	spin_lock_init(&stream->oa_buffer.ptr_lock);
+	mutex_init(&stream->lock);
 
 	return 0;
 
@@ -3284,7 +3285,6 @@ static ssize_t i915_perf_read(struct file *file,
 			      loff_t *ppos)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct intel_gt *gt = stream->engine->gt;
 	size_t offset = 0;
 	int ret;
 
@@ -3308,14 +3308,14 @@ static ssize_t i915_perf_read(struct file *file,
 			if (ret)
 				return ret;
 
-			mutex_lock(&gt->perf.lock);
+			mutex_lock(&stream->lock);
 			ret = stream->ops->read(stream, buf, count, &offset);
-			mutex_unlock(&gt->perf.lock);
+			mutex_unlock(&stream->lock);
 		} while (!offset && !ret);
 	} else {
-		mutex_lock(&gt->perf.lock);
+		mutex_lock(&stream->lock);
 		ret = stream->ops->read(stream, buf, count, &offset);
-		mutex_unlock(&gt->perf.lock);
+		mutex_unlock(&stream->lock);
 	}
 
 	/* We allow the poll checking to sometimes report false positive EPOLLIN
@@ -3362,9 +3362,6 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
  * &i915_perf_stream_ops->poll_wait to call poll_wait() with a wait queue that
  * will be woken for new stream data.
  *
- * Note: The &gt->perf.lock mutex has been taken to serialize
- * with any non-file-operation driver hooks.
- *
  * Returns: any poll events that are ready without sleeping
  */
 static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
@@ -3403,12 +3400,11 @@ static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
 static __poll_t i915_perf_poll(struct file *file, poll_table *wait)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct intel_gt *gt = stream->engine->gt;
 	__poll_t ret;
 
-	mutex_lock(&gt->perf.lock);
+	mutex_lock(&stream->lock);
 	ret = i915_perf_poll_locked(stream, file, wait);
-	mutex_unlock(&gt->perf.lock);
+	mutex_unlock(&stream->lock);
 
 	return ret;
 }
@@ -3507,9 +3503,6 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
  * @cmd: the ioctl request
  * @arg: the ioctl data
  *
- * Note: The &gt->perf.lock mutex has been taken to serialize
- * with any non-file-operation driver hooks.
- *
  * Returns: zero on success or a negative error code. Returns -EINVAL for
  * an unknown ioctl request.
  */
@@ -3547,12 +3540,11 @@ static long i915_perf_ioctl(struct file *file,
 			    unsigned long arg)
 {
 	struct i915_perf_stream *stream = file->private_data;
-	struct intel_gt *gt = stream->engine->gt;
 	long ret;
 
-	mutex_lock(&gt->perf.lock);
+	mutex_lock(&stream->lock);
 	ret = i915_perf_ioctl_locked(stream, cmd, arg);
-	mutex_unlock(&gt->perf.lock);
+	mutex_unlock(&stream->lock);
 
 	return ret;
 }
@@ -3598,6 +3590,11 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	struct i915_perf *perf = stream->perf;
 	struct intel_gt *gt = stream->engine->gt;
 
+	/*
+	 * Within this call, we know that the fd is being closed and we have no
+	 * other user of stream->lock. Use the perf lock to destroy the stream
+	 * here.
+	 */
 	mutex_lock(&gt->perf.lock);
 	i915_perf_destroy_locked(stream);
 	mutex_unlock(&gt->perf.lock);
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index e888bfab478f..dc9bfd8086cf 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -146,6 +146,11 @@ struct i915_perf_stream {
 	 */
 	struct intel_engine_cs *engine;
 
+	/*
+	 * Lock associated with operations on stream
+	 */
+	struct mutex lock;
+
 	/**
 	 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
 	 * properties given when opening a stream, representing the contents
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 09/15] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (7 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 08/15] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 10/15] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

User passes uabi engine class and instance to the perf OA interface. Use
gt corresponding to the engine to pin the buffers to the right ggtt.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 42a258578a5f..e875d1722802 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1742,6 +1742,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
 static int alloc_oa_buffer(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *i915 = stream->perf->i915;
+	struct intel_gt *gt = stream->engine->gt;
 	struct drm_i915_gem_object *bo;
 	struct i915_vma *vma;
 	int ret;
@@ -1761,11 +1762,22 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream)
 	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
 
 	/* PreHSW required 512K alignment, HSW requires 16M */
-	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
+	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_unref;
 	}
+
+	/*
+	 * PreHSW required 512K alignment.
+	 * HSW and onwards, align to requested size of OA buffer.
+	 */
+	ret = i915_vma_pin(vma, 0, SZ_16M, PIN_GLOBAL | PIN_HIGH);
+	if (ret) {
+		drm_err(&gt->i915->drm, "Failed to pin OA buffer %d\n", ret);
+		goto err_unref;
+	}
+
 	stream->oa_buffer.vma = vma;
 
 	stream->oa_buffer.vaddr =
@@ -1815,6 +1827,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs,
 static int alloc_noa_wait(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *i915 = stream->perf->i915;
+	struct intel_gt *gt = stream->engine->gt;
 	struct drm_i915_gem_object *bo;
 	struct i915_vma *vma;
 	const u64 delay_ticks = 0xffffffffffffffff -
@@ -1855,12 +1868,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream)
 	 * multiple OA config BOs will have a jump to this address and it
 	 * needs to be fixed during the lifetime of the i915/perf stream.
 	 */
-	vma = i915_gem_object_ggtt_pin_ww(bo, &ww, NULL, 0, 0, PIN_HIGH);
+	vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto out_ww;
 	}
 
+	ret = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_GLOBAL | PIN_HIGH);
+	if (ret)
+		goto out_ww;
+
 	batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
 	if (IS_ERR(batch)) {
 		ret = PTR_ERR(batch);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 10/15] drm/i915/perf: Store a pointer to oa_format in oa_buffer
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (8 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 09/15] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 11/15] drm/i915/perf: Add Wa_1508761755:dg2 Umesh Nerlige Ramappa
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

DG2 introduces OA reports with 64 bit report header fields. Perf OA
would need more information about the OA format in order to process such
reports. Store all OA format info in oa_buffer instead of just the size
and format-id.

v2: Drop format_size variable (Ashutosh)

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 30 +++++++++++---------------
 drivers/gpu/drm/i915/i915_perf_types.h |  3 +--
 2 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e875d1722802..bd886ba4b927 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -465,7 +465,7 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
 static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 {
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
-	int report_size = stream->oa_buffer.format_size;
+	int report_size = stream->oa_buffer.format->size;
 	unsigned long flags;
 	bool pollin;
 	u32 hw_tail;
@@ -602,7 +602,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 			    size_t *offset,
 			    const u8 *report)
 {
-	int report_size = stream->oa_buffer.format_size;
+	int report_size = stream->oa_buffer.format->size;
 	struct drm_i915_perf_record_header header;
 
 	header.type = DRM_I915_PERF_RECORD_SAMPLE;
@@ -652,7 +652,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 				  size_t *offset)
 {
 	struct intel_uncore *uncore = stream->uncore;
-	int report_size = stream->oa_buffer.format_size;
+	int report_size = stream->oa_buffer.format->size;
 	u8 *oa_buf_base = stream->oa_buffer.vaddr;
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	u32 mask = (OA_BUFFER_SIZE - 1);
@@ -946,7 +946,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 				  size_t *offset)
 {
 	struct intel_uncore *uncore = stream->uncore;
-	int report_size = stream->oa_buffer.format_size;
+	int report_size = stream->oa_buffer.format->size;
 	u8 *oa_buf_base = stream->oa_buffer.vaddr;
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	u32 mask = (OA_BUFFER_SIZE - 1);
@@ -2494,7 +2494,7 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
 {
 	int err;
 	struct intel_context *ce = stream->pinned_ctx;
-	u32 format = stream->oa_buffer.format;
+	u32 format = stream->oa_buffer.format->format;
 	u32 offset = stream->perf->ctx_oactxctrl_offset;
 	struct flex regs_context[] = {
 		{
@@ -2867,7 +2867,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream)
 	u32 ctx_id = stream->specific_ctx_id;
 	bool periodic = stream->periodic;
 	u32 period_exponent = stream->period_exponent;
-	u32 report_format = stream->oa_buffer.format;
+	u32 report_format = stream->oa_buffer.format->format;
 
 	/*
 	 * Reset buf pointers so we don't forward reports from before now.
@@ -2893,7 +2893,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream)
 static void gen8_oa_enable(struct i915_perf_stream *stream)
 {
 	struct intel_uncore *uncore = stream->uncore;
-	u32 report_format = stream->oa_buffer.format;
+	u32 report_format = stream->oa_buffer.format->format;
 
 	/*
 	 * Reset buf pointers so we don't forward reports from before now.
@@ -2919,7 +2919,7 @@ static void gen8_oa_enable(struct i915_perf_stream *stream)
 static void gen12_oa_enable(struct i915_perf_stream *stream)
 {
 	struct intel_uncore *uncore = stream->uncore;
-	u32 report_format = stream->oa_buffer.format;
+	u32 report_format = stream->oa_buffer.format->format;
 
 	/*
 	 * If we don't want OA reports from the OA buffer, then we don't even
@@ -3100,7 +3100,6 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	struct drm_i915_private *i915 = stream->perf->i915;
 	struct i915_perf *perf = stream->perf;
 	struct intel_gt *gt;
-	int format_size;
 	int ret;
 
 	if (!props->engine) {
@@ -3156,20 +3155,15 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 
 	stream->sample_size = sizeof(struct drm_i915_perf_record_header);
 
-	format_size = perf->oa_formats[props->oa_format].size;
+	stream->oa_buffer.format = &perf->oa_formats[props->oa_format];
+	if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format->size == 0))
+		return -EINVAL;
 
 	stream->sample_flags = props->sample_flags;
-	stream->sample_size += format_size;
-
-	stream->oa_buffer.format_size = format_size;
-	if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format_size == 0))
-		return -EINVAL;
+	stream->sample_size += stream->oa_buffer.format->size;
 
 	stream->hold_preemption = props->hold_preemption;
 
-	stream->oa_buffer.format =
-		perf->oa_formats[props->oa_format].format;
-
 	stream->periodic = props->oa_periodic;
 	if (stream->periodic)
 		stream->period_exponent = props->oa_period_exponent;
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index dc9bfd8086cf..e0c96b44eda8 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -250,11 +250,10 @@ struct i915_perf_stream {
 	 * @oa_buffer: State of the OA buffer.
 	 */
 	struct {
+		const struct i915_oa_format *format;
 		struct i915_vma *vma;
 		u8 *vaddr;
 		u32 last_ctx_id;
-		int format;
-		int format_size;
 		int size_exponent;
 
 		/**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 11/15] drm/i915/perf: Add Wa_1508761755:dg2
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (9 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 10/15] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 12/15] drm/i915/perf: Apply Wa_18013179988 Umesh Nerlige Ramappa
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

Disable Clock gating in EU when gathering the events so that EU events
are not lost.

v2: Fix checkpatch issues

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_regs.h |  1 +
 drivers/gpu/drm/i915/i915_perf.c        | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 2275ee47da95..8bb1694035a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1131,6 +1131,7 @@
 #define   GEN12_DISABLE_EARLY_READ		REG_BIT(14)
 #define   GEN12_ENABLE_LARGE_GRF_MODE		REG_BIT(12)
 #define   GEN12_PUSH_CONST_DEREF_HOLD_DIS	REG_BIT(8)
+#define   GEN12_DISABLE_DOP_GATING              REG_BIT(0)
 
 #define RT_CTRL					_MMIO(0xe530)
 #define   DIS_NULL_QUERY			REG_BIT(10)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index bd886ba4b927..15f3bc742126 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2765,6 +2765,18 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
 	u32 sqcnt1;
 	int ret;
 
+	/*
+	 * Wa_1508761755:xehpsdv, dg2
+	 * EU NOA signals behave incorrectly if EU clock gating is enabled.
+	 * Disable thread stall DOP gating and EU DOP gating.
+	 */
+	if (IS_XEHPSDV(i915) || IS_DG2(i915)) {
+		intel_uncore_write(uncore, GEN8_ROW_CHICKEN,
+				   _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
+		intel_uncore_write(uncore, GEN7_ROW_CHICKEN2,
+				   _MASKED_BIT_ENABLE(GEN12_DISABLE_DOP_GATING));
+	}
+
 	intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG,
 			   /* Disable clk ratio reports, like previous Gens. */
 			   _MASKED_BIT_ENABLE(GEN12_OAG_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS |
@@ -2843,6 +2855,17 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
 	struct drm_i915_private *i915 = stream->perf->i915;
 	u32 sqcnt1;
 
+	/*
+	 * Wa_1508761755:xehpsdv, dg2
+	 * Enable thread stall DOP gating and EU DOP gating.
+	 */
+	if (IS_XEHPSDV(i915) || IS_DG2(i915)) {
+		intel_uncore_write(uncore, GEN8_ROW_CHICKEN,
+				   _MASKED_BIT_DISABLE(STALL_DOP_GATING_DISABLE));
+		intel_uncore_write(uncore, GEN7_ROW_CHICKEN2,
+				   _MASKED_BIT_DISABLE(GEN12_DISABLE_DOP_GATING));
+	}
+
 	/* Reset all contexts' slices/subslices configurations. */
 	gen12_configure_all_contexts(stream, NULL, NULL);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 12/15] drm/i915/perf: Apply Wa_18013179988
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (10 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 11/15] drm/i915/perf: Add Wa_1508761755:dg2 Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 13/15] drm/i915/perf: Save/restore EU flex counters across reset Umesh Nerlige Ramappa
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

OA reports in the OA buffer contain an OA timestamp field that helps
user calculate delta between 2 OA reports. The calculation relies on the
CS timestamp frequency to convert the timestamp value to nanoseconds.
The CS timestamp frequency is a function of the CTC_SHIFT value in
RPM_CONFIG0.

In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
actual value from RPM_CONFIG0. At the user level, this results in an
error in calculating delta between 2 OA reports since the OA timestamp
is not shifted in the same manner as CS timestamp. Also the periodicity
of the reports is different from what the user configured because of
mismatch in the CS and OA frequencies.

The issue also affects MI_REPORT_PERF_COUNT command.

To resolve this, return actual OA timestamp frequency to the user in
i915_getparam_ioctl, so that user can calculate the right OA exponent as
well as interpret the reports correctly.

v2:
- Use REG_FIELD_GET (Ashutosh)
- Update commit msg

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/i915_getparam.c |  3 +++
 drivers/gpu/drm/i915/i915_perf.c     | 30 ++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_perf.h     |  2 ++
 include/uapi/drm/i915_drm.h          |  6 ++++++
 4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c
index 342c8ca6414e..3047e80e1163 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -175,6 +175,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_PERF_REVISION:
 		value = i915_perf_ioctl_version();
 		break;
+	case I915_PARAM_OA_TIMESTAMP_FREQUENCY:
+		value = i915_perf_oa_timestamp_frequency(i915);
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 15f3bc742126..432d18b6cc0d 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3098,6 +3098,30 @@ get_sseu_config(struct intel_sseu *out_sseu,
 	return i915_gem_user_to_context_sseu(engine->gt, drm_sseu, out_sseu);
 }
 
+/*
+ * OA timestamp frequency = CS timestamp frequency in most platforms. On some
+ * platforms OA unit ignores the CTC_SHIFT and the 2 timestamps differ. In such
+ * cases, return the adjusted CS timestamp frequency to the user.
+ */
+u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915)
+{
+	/* Wa_18013179988:dg2 */
+	if (IS_DG2(i915)) {
+		intel_wakeref_t wakeref;
+		u32 reg, shift;
+
+		with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref)
+			reg = intel_uncore_read(to_gt(i915)->uncore, RPM_CONFIG0);
+
+		shift = REG_FIELD_GET(GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK,
+				      reg);
+
+		return to_gt(i915)->clock_frequency << (3 - shift);
+	}
+
+	return to_gt(i915)->clock_frequency;
+}
+
 /**
  * i915_oa_stream_init - validate combined props for OA stream and init
  * @stream: An i915 perf stream
@@ -3819,8 +3843,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
 
 static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
 {
-	return intel_gt_clock_interval_to_ns(to_gt(perf->i915),
-					     2ULL << exponent);
+	u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
+	u32 den = i915_perf_oa_timestamp_frequency(perf->i915);
+
+	return div_u64(nom + den - 1, den);
 }
 
 static __always_inline bool
diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h
index 1d1329e5af3a..f96e09a4af04 100644
--- a/drivers/gpu/drm/i915/i915_perf.h
+++ b/drivers/gpu/drm/i915/i915_perf.h
@@ -57,4 +57,6 @@ static inline void i915_oa_config_put(struct i915_oa_config *oa_config)
 	kref_put(&oa_config->ref, i915_oa_config_release);
 }
 
+u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915);
+
 #endif /* __I915_PERF_H__ */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 8b59590e06d4..3a714980b9f5 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -749,6 +749,12 @@ typedef struct drm_i915_irq_wait {
 /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
 #define I915_PARAM_HAS_USERPTR_PROBE 56
 
+/*
+ * Frequency of the timestamps in OA reports. This used to be the same as the CS
+ * timestamp frequency, but differs on some platforms.
+ */
+#define I915_PARAM_OA_TIMESTAMP_FREQUENCY 57
+
 /* Must be kept compact -- no holes and well documented */
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 13/15] drm/i915/perf: Save/restore EU flex counters across reset
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (11 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 12/15] drm/i915/perf: Apply Wa_18013179988 Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

If a drm client is killed, then hw contexts used by the client are reset
immediately. This reset clears the EU flex counter configuration. If an
OA use case is running in parallel, it would start seeing zeroed eu
counter values following the reset even if the drm client is restarted.
Save/restore the EU flex counter config so that the EU counters can be
monitored continuously across resets.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 74cbe8eaf531..3e152219fcb2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -375,6 +375,14 @@ static int guc_mmio_regset_init(struct temp_regset *regset,
 	for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++)
 		ret |= GUC_MMIO_REG_ADD(gt, regset, GEN9_LNCFCMOCS(i), false);
 
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL0, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL1, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL2, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL3, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL4, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL5, false);
+	ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL6, false);
+
 	return ret ? -1 : 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (12 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 13/15] drm/i915/perf: Save/restore EU flex counters across reset Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-26 15:56   ` Dixit, Ashutosh
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 15/15] drm/i915/perf: Enable OA for DG2 Umesh Nerlige Ramappa
  2022-09-23 21:44 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev3) Patchwork
  15 siblings, 1 reply; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>

On DG2, a w/a resets RCS/CCS before it goes into RC6. This breaks OA
since OA does not expect engine resets during its use. Fix it by
disabling RC6.

v2: (Ashutosh)
- Bring back slpc_unset_param helper
- Update commit msg
- Use with_intel_runtime_pm helper for set/unset

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h |  9 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 66 +++++++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  2 +
 drivers/gpu/drm/i915/i915_perf.c              | 29 ++++++++
 4 files changed, 106 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
index 4c840a2639dc..811add10c30d 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
@@ -128,6 +128,15 @@ enum slpc_media_ratio_mode {
 	SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_TWO = 2,
 };
 
+enum slpc_gucrc_mode {
+	SLPC_GUCRC_MODE_HW = 0,
+	SLPC_GUCRC_MODE_GUCRC_NO_RC6 = 1,
+	SLPC_GUCRC_MODE_GUCRC_STATIC_TIMEOUT = 2,
+	SLPC_GUCRC_MODE_GUCRC_DYNAMIC_HYSTERESIS = 3,
+
+	SLPC_GUCRC_MODE_MAX,
+};
+
 enum slpc_event_id {
 	SLPC_EVENT_RESET = 0,
 	SLPC_EVENT_SHUTDOWN = 1,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
index fdd895f73f9f..b3a4fb9e021f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
@@ -137,6 +137,17 @@ static int guc_action_slpc_set_param(struct intel_guc *guc, u8 id, u32 value)
 	return ret > 0 ? -EPROTO : ret;
 }
 
+static int guc_action_slpc_unset_param(struct intel_guc *guc, u8 id)
+{
+	u32 request[] = {
+		GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
+		SLPC_EVENT(SLPC_EVENT_PARAMETER_UNSET, 1),
+		id,
+	};
+
+	return intel_guc_send(guc, request, ARRAY_SIZE(request));
+}
+
 static bool slpc_is_running(struct intel_guc_slpc *slpc)
 {
 	return slpc_get_state(slpc) == SLPC_GLOBAL_STATE_RUNNING;
@@ -190,6 +201,15 @@ static int slpc_set_param(struct intel_guc_slpc *slpc, u8 id, u32 value)
 	return ret;
 }
 
+static int slpc_unset_param(struct intel_guc_slpc *slpc, u8 id)
+{
+	struct intel_guc *guc = slpc_to_guc(slpc);
+
+	GEM_BUG_ON(id >= SLPC_MAX_PARAM);
+
+	return guc_action_slpc_unset_param(guc, id);
+}
+
 static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq)
 {
 	struct drm_i915_private *i915 = slpc_to_i915(slpc);
@@ -610,6 +630,52 @@ static void slpc_get_rp_values(struct intel_guc_slpc *slpc)
 		slpc->boost_freq = slpc->rp0_freq;
 }
 
+/**
+ * intel_guc_slpc_override_gucrc_mode() - override GUCRC mode
+ * @slpc: pointer to intel_guc_slpc.
+ * @mode: new value of the mode.
+ *
+ * This function will override the GUCRC mode.
+ *
+ * Return: 0 on success, non-zero error code on failure.
+ */
+int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode)
+{
+	int ret;
+	struct drm_i915_private *i915 = slpc_to_i915(slpc);
+	intel_wakeref_t wakeref;
+
+	if (mode >= SLPC_GUCRC_MODE_MAX)
+		return -EINVAL;
+
+	with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
+		ret = slpc_set_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE, mode);
+		if (ret)
+			drm_err(&i915->drm,
+				"Override gucrc mode %d failed %d\n",
+				mode, ret);
+	}
+
+	return ret;
+}
+
+int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc)
+{
+	struct drm_i915_private *i915 = slpc_to_i915(slpc);
+	intel_wakeref_t wakeref;
+	int ret = 0;
+
+	with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
+		ret = slpc_unset_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE);
+		if (ret)
+			drm_err(&i915->drm,
+				"Unsetting gucrc mode failed %d\n",
+				ret);
+	}
+
+	return ret;
+}
+
 /*
  * intel_guc_slpc_enable() - Start SLPC
  * @slpc: pointer to intel_guc_slpc.
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
index 82a98f78f96c..ccf483730d9d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
@@ -42,5 +42,7 @@ int intel_guc_slpc_set_media_ratio_mode(struct intel_guc_slpc *slpc, u32 val);
 void intel_guc_pm_intrmsk_enable(struct intel_gt *gt);
 void intel_guc_slpc_boost(struct intel_guc_slpc *slpc);
 void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc);
+int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc);
+int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode);
 
 #endif
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 432d18b6cc0d..80016f860406 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -208,6 +208,7 @@
 #include "gt/intel_lrc.h"
 #include "gt/intel_lrc_reg.h"
 #include "gt/intel_ring.h"
+#include "gt/uc/intel_guc_slpc.h"
 
 #include "i915_drv.h"
 #include "i915_file_private.h"
@@ -1569,6 +1570,16 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 
 	free_oa_buffer(stream);
 
+	/*
+	 * Wa_16011777198:dg2: Unset the override of GUCRC mode to enable rc6.
+	 */
+	if (intel_guc_slpc_is_used(&gt->uc.guc) &&
+	    intel_uc_uses_guc_rc(&gt->uc) &&
+	    (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
+	     IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0)))
+		drm_WARN_ON(&gt->i915->drm,
+			    intel_guc_slpc_unset_gucrc_mode(&gt->uc.guc.slpc));
+
 	intel_uncore_forcewake_put(stream->uncore, FORCEWAKE_ALL);
 	intel_engine_pm_put(stream->engine);
 
@@ -3254,6 +3265,24 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	intel_engine_pm_get(stream->engine);
 	intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL);
 
+	/*
+	 * Wa_16011777198:dg2: GuC resets render as part of the Wa. This causes
+	 * OA to lose the configuration state. Prevent this by overriding GUCRC
+	 * mode.
+	 */
+	if (intel_guc_slpc_is_used(&gt->uc.guc) &&
+	    intel_uc_uses_guc_rc(&gt->uc) &&
+	    (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
+	     IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) {
+		ret = intel_guc_slpc_override_gucrc_mode(&gt->uc.guc.slpc,
+							 SLPC_GUCRC_MODE_GUCRC_NO_RC6);
+		if (ret) {
+			drm_dbg(&stream->perf->i915->drm,
+				"Unable to override gucrc mode\n");
+			goto err_config;
+		}
+	}
+
 	ret = alloc_oa_buffer(stream);
 	if (ret)
 		goto err_oa_buf_alloc;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH v2 15/15] drm/i915/perf: Enable OA for DG2
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (13 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
@ 2022-09-23 20:11 ` Umesh Nerlige Ramappa
  2022-09-23 21:44 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev3) Patchwork
  15 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-23 20:11 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, Joonas Lahtinen

OA was disabled for DG2 as support was missing. Enable it back now.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 80016f860406..b62508e662a3 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4778,12 +4778,6 @@ void i915_perf_init(struct drm_i915_private *i915)
 {
 	struct i915_perf *perf = &i915->perf;
 
-	/* XXX const struct i915_perf_ops! */
-
-	/* i915_perf is not enabled for DG2 yet */
-	if (IS_DG2(i915))
-		return;
-
 	perf->oa_formats = oa_formats;
 	if (IS_HASWELL(i915)) {
 		perf->ops.is_valid_b_counter_reg = gen7_is_valid_b_counter_addr;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev3)
  2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
                   ` (14 preceding siblings ...)
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 15/15] drm/i915/perf: Enable OA for DG2 Umesh Nerlige Ramappa
@ 2022-09-23 21:44 ` Patchwork
  15 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2022-09-23 21:44 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: Add DG2 OA support (rev3)
URL   : https://patchwork.freedesktop.org/series/107584/
State : warning

== Summary ==

Error: dim checkpatch failed
7e3d54fda966 drm/i915/perf: Fix OA filtering logic for GuC mode
cf4391110d50 drm/i915/perf: Add OAG and OAR formats for DG2
3b1ee9fc1bf9 drm/i915/perf: Fix noa wait predication for DG2
41fefad68a6d drm/i915/perf: Determine gen12 oa ctx offset at runtime
-:38: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'x' - possible side-effects?
#38: FILE: drivers/gpu/drm/i915/i915_perf.c:1361:
+#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX)

total: 0 errors, 0 warnings, 1 checks, 234 lines checked
a3f78042ac31 drm/i915/perf: Enable commands per clock reporting in OA
a5a6c6b84c76 drm/i915/perf: Simply use stream->ctx
4f989fb92c02 drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
2c50c8d01880 drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops
3af70ab98a20 drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
0d2b0f1fca68 drm/i915/perf: Store a pointer to oa_format in oa_buffer
7418a83a2f3e drm/i915/perf: Add Wa_1508761755:dg2
fad73b7c1434 drm/i915/perf: Apply Wa_18013179988
fe1388485698 drm/i915/perf: Save/restore EU flex counters across reset
9decc23433d8 drm/i915/guc: Support OA when Wa_16011777198 is enabled
7556c283eabe drm/i915/perf: Enable OA for DG2



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 02/15] drm/i915/perf: Add OAG and OAR formats for DG2
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 02/15] drm/i915/perf: Add OAG and OAR formats for DG2 Umesh Nerlige Ramappa
@ 2022-09-24  4:08   ` Dixit, Ashutosh
  2022-09-26 18:11     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 31+ messages in thread
From: Dixit, Ashutosh @ 2022-09-24  4:08 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 23 Sep 2022 13:11:41 -0700, Umesh Nerlige Ramappa wrote:
>

Commit title probably now "Add 32 bit OAG and OAR formats for DG2"?

> Add new OA formats for DG2. Some of the newer OA formats are not
> multples of 64 bytes and are not powers of 2. For those formats, adjust
> hw_tail accordingly when checking for new reports.

We are not touching hw_tail now, should we delete this from the commit
message?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 05/15] drm/i915/perf: Enable commands per clock reporting in OA
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 05/15] drm/i915/perf: Enable commands per clock reporting in OA Umesh Nerlige Ramappa
@ 2022-09-26 15:55   ` Dixit, Ashutosh
  0 siblings, 0 replies; 31+ messages in thread
From: Dixit, Ashutosh @ 2022-09-26 15:55 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 23 Sep 2022 13:11:44 -0700, Umesh Nerlige Ramappa wrote:
>
> XEHPSDV and DG2 provide a way to configure bytes per clock vs commands
> per clock reporting. Enable bytes per clock setting on enabling OA.

The commit title should also be changed to say bytes per clock instead of
commands per clock.

Also please add "Bspec: 51762" (and also maybe "Bspec: 52201") to the
commit message.

> @@ -2760,6 +2762,16 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
>			    (period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT))
>			    : 0);
>
> +	/*
> +	 * Initialize Super Queue Internal Cnt Register
> +	 * Set PMON Enable in order to collect valid metrics.
> +	 * Enable commands per clock reporting in OA for XEHPSDV onward.

Here also say bytes per clock instead of commands per clock.

> +	 */
> +	sqcnt1 = GEN12_SQCNT1_PMON_ENABLE |
> +		 (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0);
> +
> +	intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, sqcnt1);
> +

With that, this patch is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
@ 2022-09-26 15:56   ` Dixit, Ashutosh
  2022-09-26 18:19     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 31+ messages in thread
From: Dixit, Ashutosh @ 2022-09-26 15:56 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 23 Sep 2022 13:11:53 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>

Hi Umesh/Vinay,

> @@ -3254,6 +3265,24 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>	intel_engine_pm_get(stream->engine);
>	intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL);
>
> +	/*
> +	 * Wa_16011777198:dg2: GuC resets render as part of the Wa. This causes
> +	 * OA to lose the configuration state. Prevent this by overriding GUCRC
> +	 * mode.
> +	 */
> +	if (intel_guc_slpc_is_used(&gt->uc.guc) &&
> +	    intel_uc_uses_guc_rc(&gt->uc) &&

Is this condition above correct? E.g. what happens when:

a. GuCRC is used but SLPC is not used?
b. GuCRC is not used. Don't we need to disable RC6 in host based RC6
   control?

Do we need to worry about these cases?

Or if we always expect both GuCRC and SLPC to be used on DG2 then I think
let's get rid of these from the if condition and add a drm_err() if we see
these not being used and OA is being enabled on DG2?

Thanks.
--
Ashutosh

> +	    (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
> +	     IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) {
> +		ret = intel_guc_slpc_override_gucrc_mode(&gt->uc.guc.slpc,
> +							 SLPC_GUCRC_MODE_GUCRC_NO_RC6);
> +		if (ret) {
> +			drm_dbg(&stream->perf->i915->drm,
> +				"Unable to override gucrc mode\n");
> +			goto err_config;
> +		}
> +	}
> +
>	ret = alloc_oa_buffer(stream);
>	if (ret)
>		goto err_oa_buf_alloc;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 02/15] drm/i915/perf: Add OAG and OAR formats for DG2
  2022-09-24  4:08   ` Dixit, Ashutosh
@ 2022-09-26 18:11     ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-26 18:11 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, Sep 23, 2022 at 09:08:28PM -0700, Dixit, Ashutosh wrote:
>On Fri, 23 Sep 2022 13:11:41 -0700, Umesh Nerlige Ramappa wrote:
>>
>
>Commit title probably now "Add 32 bit OAG and OAR formats for DG2"?

Yes, will update.

>
>> Add new OA formats for DG2. Some of the newer OA formats are not
>> multples of 64 bytes and are not powers of 2. For those formats, adjust
>> hw_tail accordingly when checking for new reports.
>
>We are not touching hw_tail now, should we delete this from the commit
>message?

will remove,

Thanks,
Umesh

>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-09-26 15:56   ` Dixit, Ashutosh
@ 2022-09-26 18:19     ` Umesh Nerlige Ramappa
  2022-09-26 21:17       ` Belgaumkar, Vinay
  0 siblings, 1 reply; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-26 18:19 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Mon, Sep 26, 2022 at 08:56:01AM -0700, Dixit, Ashutosh wrote:
>On Fri, 23 Sep 2022 13:11:53 -0700, Umesh Nerlige Ramappa wrote:
>>
>> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>
>Hi Umesh/Vinay,
>
>> @@ -3254,6 +3265,24 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>>	intel_engine_pm_get(stream->engine);
>>	intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL);
>>
>> +	/*
>> +	 * Wa_16011777198:dg2: GuC resets render as part of the Wa. This causes
>> +	 * OA to lose the configuration state. Prevent this by overriding GUCRC
>> +	 * mode.
>> +	 */
>> +	if (intel_guc_slpc_is_used(&gt->uc.guc) &&
>> +	    intel_uc_uses_guc_rc(&gt->uc) &&
>
>Is this condition above correct? E.g. what happens when:
>
>a. GuCRC is used but SLPC is not used?
>b. GuCRC is not used. Don't we need to disable RC6 in host based RC6
>   control?

When using host based rc6, existing OA code is using forcewake and a 
reference to engine_pm to prevent rc6. Other questions, directing to 
@Vinay.

Thanks,
Umesh

>
>Do we need to worry about these cases?
>
>Or if we always expect both GuCRC and SLPC to be used on DG2 then I think
>let's get rid of these from the if condition and add a drm_err() if we see
>these not being used and OA is being enabled on DG2?
>
>Thanks.
>--
>Ashutosh
>
>> +	    (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
>> +	     IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) {
>> +		ret = intel_guc_slpc_override_gucrc_mode(&gt->uc.guc.slpc,
>> +							 SLPC_GUCRC_MODE_GUCRC_NO_RC6);
>> +		if (ret) {
>> +			drm_dbg(&stream->perf->i915->drm,
>> +				"Unable to override gucrc mode\n");
>> +			goto err_config;
>> +		}
>> +	}
>> +
>>	ret = alloc_oa_buffer(stream);
>>	if (ret)
>>		goto err_oa_buf_alloc;
>> --
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-09-26 18:19     ` Umesh Nerlige Ramappa
@ 2022-09-26 21:17       ` Belgaumkar, Vinay
  2022-09-26 23:28         ` Dixit, Ashutosh
  0 siblings, 1 reply; 31+ messages in thread
From: Belgaumkar, Vinay @ 2022-09-26 21:17 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, Dixit, Ashutosh; +Cc: intel-gfx


On 9/26/2022 11:19 AM, Umesh Nerlige Ramappa wrote:
> On Mon, Sep 26, 2022 at 08:56:01AM -0700, Dixit, Ashutosh wrote:
>> On Fri, 23 Sep 2022 13:11:53 -0700, Umesh Nerlige Ramappa wrote:
>>>
>>> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>
>> Hi Umesh/Vinay,
>>
>>> @@ -3254,6 +3265,24 @@ static int i915_oa_stream_init(struct 
>>> i915_perf_stream *stream,
>>>     intel_engine_pm_get(stream->engine);
>>>     intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL);
>>>
>>> +    /*
>>> +     * Wa_16011777198:dg2: GuC resets render as part of the Wa. 
>>> This causes
>>> +     * OA to lose the configuration state. Prevent this by 
>>> overriding GUCRC
>>> +     * mode.
>>> +     */
>>> +    if (intel_guc_slpc_is_used(&gt->uc.guc) &&
>>> +        intel_uc_uses_guc_rc(&gt->uc) &&
>>
>> Is this condition above correct? E.g. what happens when:
>>
>> a. GuCRC is used but SLPC is not used?

Currently, we have no way to separate out GuC RC and SLPC. Both are on 
when guc submission is enabled/supported. So, the above check is kind of 
redundant anyways.

Thanks,

Vinay.

>> b. GuCRC is not used. Don't we need to disable RC6 in host based RC6
>>   control?
>
> When using host based rc6, existing OA code is using forcewake and a 
> reference to engine_pm to prevent rc6. Other questions, directing to 
> @Vinay.
>
> Thanks,
> Umesh
>
>>
>> Do we need to worry about these cases?
>>
>> Or if we always expect both GuCRC and SLPC to be used on DG2 then I 
>> think
>> let's get rid of these from the if condition and add a drm_err() if 
>> we see
>> these not being used and OA is being enabled on DG2?
>>
>> Thanks.
>> -- 
>> Ashutosh
>>
>>> + (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
>>> +         IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) {
>>> +        ret = intel_guc_slpc_override_gucrc_mode(&gt->uc.guc.slpc,
>>> +                             SLPC_GUCRC_MODE_GUCRC_NO_RC6);
>>> +        if (ret) {
>>> +            drm_dbg(&stream->perf->i915->drm,
>>> +                "Unable to override gucrc mode\n");
>>> +            goto err_config;
>>> +        }
>>> +    }
>>> +
>>>     ret = alloc_oa_buffer(stream);
>>>     if (ret)
>>>         goto err_oa_buf_alloc;
>>> -- 
>>> 2.25.1
>>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-09-26 21:17       ` Belgaumkar, Vinay
@ 2022-09-26 23:28         ` Dixit, Ashutosh
  2022-09-27 16:11           ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 31+ messages in thread
From: Dixit, Ashutosh @ 2022-09-26 23:28 UTC (permalink / raw)
  To: Belgaumkar, Vinay; +Cc: intel-gfx

On Mon, 26 Sep 2022 14:17:21 -0700, Belgaumkar, Vinay wrote:
>
>
> On 9/26/2022 11:19 AM, Umesh Nerlige Ramappa wrote:
> > On Mon, Sep 26, 2022 at 08:56:01AM -0700, Dixit, Ashutosh wrote:
> >> On Fri, 23 Sep 2022 13:11:53 -0700, Umesh Nerlige Ramappa wrote:
> >>>
> >>> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> >>
> >> Hi Umesh/Vinay,
> >>
> >>> @@ -3254,6 +3265,24 @@ static int i915_oa_stream_init(struct
> >>> i915_perf_stream *stream,
> >>>     intel_engine_pm_get(stream->engine);
> >>>     intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL);
> >>>
> >>> +    /*
> >>> +     * Wa_16011777198:dg2: GuC resets render as part of the Wa. This
> >>> causes
> >>> +     * OA to lose the configuration state. Prevent this by overriding
> >>> GUCRC
> >>> +     * mode.
> >>> +     */
> >>> +    if (intel_guc_slpc_is_used(&gt->uc.guc) &&
> >>> +        intel_uc_uses_guc_rc(&gt->uc) &&
> >>
> >> Is this condition above correct? E.g. what happens when:
> >>
> >> a. GuCRC is used but SLPC is not used?
>
> Currently, we have no way to separate out GuC RC and SLPC. Both are on when
> guc submission is enabled/supported. So, the above check is kind of
> redundant anyways.

That is why I was suggesting changing the if check to an assert or
drm_err. So looks like it will work with or without GuC RC, but if we are
using GuC RC we should be using SLPC. So:

	if (GuC_RC && !SLPC)
		drm_err();

> Thanks,
>
> Vinay.
>
> >> b. GuCRC is not used. Don't we need to disable RC6 in host based RC6
> >>   control?
> >
> > When using host based rc6, existing OA code is using forcewake and a
> > reference to engine_pm to prevent rc6. Other questions, directing to
> > @Vinay.
> >
> > Thanks,
> > Umesh
> >
> >>
> >> Do we need to worry about these cases?
> >>
> >> Or if we always expect both GuCRC and SLPC to be used on DG2 then I
> >> think
> >> let's get rid of these from the if condition and add a drm_err() if we
> >> see
> >> these not being used and OA is being enabled on DG2?
> >>
> >> Thanks.
> >> --
> >> Ashutosh
> >>
> >>> + (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
> >>> +         IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) {
> >>> +        ret = intel_guc_slpc_override_gucrc_mode(&gt->uc.guc.slpc,
> >>> +                             SLPC_GUCRC_MODE_GUCRC_NO_RC6);
> >>> +        if (ret) {
> >>> +            drm_dbg(&stream->perf->i915->drm,
> >>> +                "Unable to override gucrc mode\n");
> >>> +            goto err_config;
> >>> +        }
> >>> +    }
> >>> +
> >>>     ret = alloc_oa_buffer(stream);
> >>>     if (ret)
> >>>         goto err_oa_buf_alloc;
> >>> --
> >>> 2.25.1
> >>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-09-26 23:28         ` Dixit, Ashutosh
@ 2022-09-27 16:11           ` Umesh Nerlige Ramappa
  2022-09-27 17:34             ` Dixit, Ashutosh
  0 siblings, 1 reply; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-27 16:11 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Mon, Sep 26, 2022 at 04:28:44PM -0700, Dixit, Ashutosh wrote:
>On Mon, 26 Sep 2022 14:17:21 -0700, Belgaumkar, Vinay wrote:
>>
>>
>> On 9/26/2022 11:19 AM, Umesh Nerlige Ramappa wrote:
>> > On Mon, Sep 26, 2022 at 08:56:01AM -0700, Dixit, Ashutosh wrote:
>> >> On Fri, 23 Sep 2022 13:11:53 -0700, Umesh Nerlige Ramappa wrote:
>> >>>
>> >>> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>> >>
>> >> Hi Umesh/Vinay,
>> >>
>> >>> @@ -3254,6 +3265,24 @@ static int i915_oa_stream_init(struct
>> >>> i915_perf_stream *stream,
>> >>>     intel_engine_pm_get(stream->engine);
>> >>>     intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL);
>> >>>
>> >>> +    /*
>> >>> +     * Wa_16011777198:dg2: GuC resets render as part of the Wa. This
>> >>> causes
>> >>> +     * OA to lose the configuration state. Prevent this by overriding
>> >>> GUCRC
>> >>> +     * mode.
>> >>> +     */
>> >>> +    if (intel_guc_slpc_is_used(&gt->uc.guc) &&
>> >>> +        intel_uc_uses_guc_rc(&gt->uc) &&
>> >>
>> >> Is this condition above correct? E.g. what happens when:
>> >>
>> >> a. GuCRC is used but SLPC is not used?
>>
>> Currently, we have no way to separate out GuC RC and SLPC. Both are on when
>> guc submission is enabled/supported. So, the above check is kind of
>> redundant anyways.
>
>That is why I was suggesting changing the if check to an assert or
>drm_err. So looks like it will work with or without GuC RC, but if we are
>using GuC RC we should be using SLPC. So:
>
>	if (GuC_RC && !SLPC)
>		drm_err();

I am little confused. What's the ask here? Should I just use one of 
these conditions? i.e.

if (intel_guc_slpc_is_used(&gt->uc.guc))
...

Thanks,
Umesh

>
>> Thanks,
>>
>> Vinay.
>>
>> >> b. GuCRC is not used. Don't we need to disable RC6 in host based RC6
>> >>   control?
>> >
>> > When using host based rc6, existing OA code is using forcewake and a
>> > reference to engine_pm to prevent rc6. Other questions, directing to
>> > @Vinay.
>> >
>> > Thanks,
>> > Umesh
>> >
>> >>
>> >> Do we need to worry about these cases?
>> >>
>> >> Or if we always expect both GuCRC and SLPC to be used on DG2 then I
>> >> think
>> >> let's get rid of these from the if condition and add a drm_err() if we
>> >> see
>> >> these not being used and OA is being enabled on DG2?
>> >>
>> >> Thanks.
>> >> --
>> >> Ashutosh
>> >>
>> >>> + (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
>> >>> +         IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) {
>> >>> +        ret = intel_guc_slpc_override_gucrc_mode(&gt->uc.guc.slpc,
>> >>> +                             SLPC_GUCRC_MODE_GUCRC_NO_RC6);
>> >>> +        if (ret) {
>> >>> +            drm_dbg(&stream->perf->i915->drm,
>> >>> +                "Unable to override gucrc mode\n");
>> >>> +            goto err_config;
>> >>> +        }
>> >>> +    }
>> >>> +
>> >>>     ret = alloc_oa_buffer(stream);
>> >>>     if (ret)
>> >>>         goto err_oa_buf_alloc;
>> >>> --
>> >>> 2.25.1
>> >>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-09-27 16:11           ` Umesh Nerlige Ramappa
@ 2022-09-27 17:34             ` Dixit, Ashutosh
  2022-09-27 17:51               ` Dixit, Ashutosh
  0 siblings, 1 reply; 31+ messages in thread
From: Dixit, Ashutosh @ 2022-09-27 17:34 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 27 Sep 2022 09:11:23 -0700, Umesh Nerlige Ramappa wrote:
>
> On Mon, Sep 26, 2022 at 04:28:44PM -0700, Dixit, Ashutosh wrote:
> > On Mon, 26 Sep 2022 14:17:21 -0700, Belgaumkar, Vinay wrote:
> >>
> >>
> >> On 9/26/2022 11:19 AM, Umesh Nerlige Ramappa wrote:
> >> > On Mon, Sep 26, 2022 at 08:56:01AM -0700, Dixit, Ashutosh wrote:
> >> >> On Fri, 23 Sep 2022 13:11:53 -0700, Umesh Nerlige Ramappa wrote:
> >> >>>
> >> >>> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> >> >>
> >> >> Hi Umesh/Vinay,
> >> >>
> >> >>> @@ -3254,6 +3265,24 @@ static int i915_oa_stream_init(struct
> >> >>> i915_perf_stream *stream,
> >> >>>     intel_engine_pm_get(stream->engine);
> >> >>>     intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL);
> >> >>>
> >> >>> +    /*
> >> >>> +     * Wa_16011777198:dg2: GuC resets render as part of the Wa. This
> >> >>> causes
> >> >>> +     * OA to lose the configuration state. Prevent this by overriding
> >> >>> GUCRC
> >> >>> +     * mode.
> >> >>> +     */
> >> >>> +    if (intel_guc_slpc_is_used(&gt->uc.guc) &&
> >> >>> +        intel_uc_uses_guc_rc(&gt->uc) &&
> >> >>
> >> >> Is this condition above correct? E.g. what happens when:
> >> >>
> >> >> a. GuCRC is used but SLPC is not used?
> >>
> >> Currently, we have no way to separate out GuC RC and SLPC. Both are on when
> >> guc submission is enabled/supported. So, the above check is kind of
> >> redundant anyways.
> >
> > That is why I was suggesting changing the if check to an assert or
> > drm_err. So looks like it will work with or without GuC RC, but if we are
> > using GuC RC we should be using SLPC. So:
> >
> >	if (GuC_RC && !SLPC)
> >		drm_err();
>
> I am little confused. What's the ask here? Should I just use one of these
> conditions? i.e.
>
> if (intel_guc_slpc_is_used(&gt->uc.guc))

I think let's just use intel_uc_uses_guc_rc and drop
intel_guc_slpc_is_used. I am assuming if SLPC is somehow not on and we try
to use SLPC set/unset param, some kind of error will be reported.


> ...
>
> Thanks,
> Umesh
>
> >
> >> Thanks,
> >>
> >> Vinay.
> >>
> >> >> b. GuCRC is not used. Don't we need to disable RC6 in host based RC6
> >> >>   control?
> >> >
> >> > When using host based rc6, existing OA code is using forcewake and a
> >> > reference to engine_pm to prevent rc6. Other questions, directing to
> >> > @Vinay.
> >> >
> >> > Thanks,
> >> > Umesh
> >> >
> >> >>
> >> >> Do we need to worry about these cases?
> >> >>
> >> >> Or if we always expect both GuCRC and SLPC to be used on DG2 then I
> >> >> think
> >> >> let's get rid of these from the if condition and add a drm_err() if we
> >> >> see
> >> >> these not being used and OA is being enabled on DG2?
> >> >>
> >> >> Thanks.
> >> >> --
> >> >> Ashutosh
> >> >>
> >> >>> + (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
> >> >>> +         IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) {
> >> >>> +        ret = intel_guc_slpc_override_gucrc_mode(&gt->uc.guc.slpc,
> >> >>> +                             SLPC_GUCRC_MODE_GUCRC_NO_RC6);
> >> >>> +        if (ret) {
> >> >>> +            drm_dbg(&stream->perf->i915->drm,
> >> >>> +                "Unable to override gucrc mode\n");
> >> >>> +            goto err_config;
> >> >>> +        }
> >> >>> +    }
> >> >>> +
> >> >>>     ret = alloc_oa_buffer(stream);
> >> >>>     if (ret)
> >> >>>         goto err_oa_buf_alloc;
> >> >>> --
> >> >>> 2.25.1
> >> >>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled
  2022-09-27 17:34             ` Dixit, Ashutosh
@ 2022-09-27 17:51               ` Dixit, Ashutosh
  0 siblings, 0 replies; 31+ messages in thread
From: Dixit, Ashutosh @ 2022-09-27 17:51 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 27 Sep 2022 10:34:52 -0700, Dixit, Ashutosh wrote:
>
> On Tue, 27 Sep 2022 09:11:23 -0700, Umesh Nerlige Ramappa wrote:
> >
> > On Mon, Sep 26, 2022 at 04:28:44PM -0700, Dixit, Ashutosh wrote:
> > > On Mon, 26 Sep 2022 14:17:21 -0700, Belgaumkar, Vinay wrote:
> > >>
> > >>
> > >> On 9/26/2022 11:19 AM, Umesh Nerlige Ramappa wrote:
> > >> > On Mon, Sep 26, 2022 at 08:56:01AM -0700, Dixit, Ashutosh wrote:
> > >> >> On Fri, 23 Sep 2022 13:11:53 -0700, Umesh Nerlige Ramappa wrote:
> > >> >>>
> > >> >>> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> > >> >>
> > >> >> Hi Umesh/Vinay,
> > >> >>
> > >> >>> @@ -3254,6 +3265,24 @@ static int i915_oa_stream_init(struct
> > >> >>> i915_perf_stream *stream,
> > >> >>>     intel_engine_pm_get(stream->engine);
> > >> >>>     intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL);
> > >> >>>
> > >> >>> +    /*
> > >> >>> +     * Wa_16011777198:dg2: GuC resets render as part of the Wa. This
> > >> >>> causes
> > >> >>> +     * OA to lose the configuration state. Prevent this by overriding
> > >> >>> GUCRC
> > >> >>> +     * mode.
> > >> >>> +     */
> > >> >>> +    if (intel_guc_slpc_is_used(&gt->uc.guc) &&
> > >> >>> +        intel_uc_uses_guc_rc(&gt->uc) &&
> > >> >>
> > >> >> Is this condition above correct? E.g. what happens when:
> > >> >>
> > >> >> a. GuCRC is used but SLPC is not used?
> > >>
> > >> Currently, we have no way to separate out GuC RC and SLPC. Both are on when
> > >> guc submission is enabled/supported. So, the above check is kind of
> > >> redundant anyways.
> > >
> > > That is why I was suggesting changing the if check to an assert or
> > > drm_err. So looks like it will work with or without GuC RC, but if we are
> > > using GuC RC we should be using SLPC. So:
> > >
> > >	if (GuC_RC && !SLPC)
> > >		drm_err();
> >
> > I am little confused. What's the ask here? Should I just use one of these
> > conditions? i.e.
> >
> > if (intel_guc_slpc_is_used(&gt->uc.guc))
>
> I think let's just use intel_uc_uses_guc_rc and drop
> intel_guc_slpc_is_used. I am assuming if SLPC is somehow not on and we try
> to use SLPC set/unset param, some kind of error will be reported.

With the above this patch is also:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

>
>
> > ...
> >
> > Thanks,
> > Umesh
> >
> > >
> > >> Thanks,
> > >>
> > >> Vinay.
> > >>
> > >> >> b. GuCRC is not used. Don't we need to disable RC6 in host based RC6
> > >> >>   control?
> > >> >
> > >> > When using host based rc6, existing OA code is using forcewake and a
> > >> > reference to engine_pm to prevent rc6. Other questions, directing to
> > >> > @Vinay.
> > >> >
> > >> > Thanks,
> > >> > Umesh
> > >> >
> > >> >>
> > >> >> Do we need to worry about these cases?
> > >> >>
> > >> >> Or if we always expect both GuCRC and SLPC to be used on DG2 then I
> > >> >> think
> > >> >> let's get rid of these from the if condition and add a drm_err() if we
> > >> >> see
> > >> >> these not being used and OA is being enabled on DG2?
> > >> >>
> > >> >> Thanks.
> > >> >> --
> > >> >> Ashutosh
> > >> >>
> > >> >>> + (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) ||
> > >> >>> +         IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) {
> > >> >>> +        ret = intel_guc_slpc_override_gucrc_mode(&gt->uc.guc.slpc,
> > >> >>> +                             SLPC_GUCRC_MODE_GUCRC_NO_RC6);
> > >> >>> +        if (ret) {
> > >> >>> +            drm_dbg(&stream->perf->i915->drm,
> > >> >>> +                "Unable to override gucrc mode\n");
> > >> >>> +            goto err_config;
> > >> >>> +        }
> > >> >>> +    }
> > >> >>> +
> > >> >>>     ret = alloc_oa_buffer(stream);
> > >> >>>     if (ret)
> > >> >>>         goto err_oa_buf_alloc;
> > >> >>> --
> > >> >>> 2.25.1
> > >> >>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 04/15] drm/i915/perf: Determine gen12 oa ctx offset at runtime
  2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 04/15] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
@ 2022-09-27 23:24   ` Dixit, Ashutosh
  2022-09-30 21:42     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 31+ messages in thread
From: Dixit, Ashutosh @ 2022-09-27 23:24 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 23 Sep 2022 13:11:43 -0700, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> Some SKUs of same gen12 platform may have different oactxctrl
> offsets. For gen12, determine oactxctrl offsets at runtime.

So seems we are writing code to extract static information for products
just because it is not documented?

>
> v2: (Lionel)
> - Move MI definitions to intel_gpu_commands.h
> - Ensure __find_reg_in_lri does read past context image size
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |   4 +
>  drivers/gpu/drm/i915/i915_perf.c             | 146 +++++++++++++++----
>  drivers/gpu/drm/i915/i915_perf_oa_regs.h     |   2 +-
>  3 files changed, 121 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index d4e9702d3c8e..f50ea92910d9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -187,6 +187,10 @@
>  #define   MI_BATCH_RESOURCE_STREAMER REG_BIT(10)
>  #define   MI_BATCH_PREDICATE         REG_BIT(15) /* HSW+ on RCS only*/
>
> +#define MI_OPCODE(x)		(((x) >> 23) & 0x3f)
> +#define IS_MI_LRI_CMD(x)	(MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 0)))
> +#define MI_LRI_LEN(x)		(((x) & 0xff) + 1)
> +
>  /*
>   * 3D instructions used by the kernel
>   */
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index cd57b5836386..fb705f24705b 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1358,6 +1358,64 @@ static int gen12_get_render_context_id(struct i915_perf_stream *stream)
>	return 0;
>  }
>
> +#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX)
> +static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset, u32 end)
> +{
> +	u32 idx = *offset;
> +	u32 len = min(MI_LRI_LEN(state[idx]) + idx, end);

The code below works only if MI_LRI_LEN() is an even number (because of
'idx += 2', which I think should always be the case but not sure if we it
makes sense to add an assert etc.

> +
> +	idx++;
> +	for (; idx < len; idx += 2)
> +		if (state[idx] == reg)
> +			break;
> +
> +	*offset = idx;
> +	return state[idx] == reg;

I think this can be a bug if 'idx > len' but 'state[idx] == reg'. So we
need to do something like:

static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset, u32 end)
{
	u32 idx = *offset;
	u32 len = min(MI_LRI_LEN(state[idx]) + idx, end);
	bool found = false;

	idx++;
	for (; idx < len; idx += 2)
		if (state[idx] == reg)
			found = true;

	*offset = idx;
	return found;

> +}
> +
> +static u32 __context_image_offset(struct intel_context *ce, u32 reg)
> +{
> +	u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
> +	u32 *state = ce->lrc_reg_state;
> +
> +	for (offset = 0; offset < len; ) {
> +		if (IS_MI_LRI_CMD(state[offset])) {
> +			if (__find_reg_in_lri(state, reg, &offset, len))
> +				break;
> +		} else {
> +			offset++;
> +		}
> +	}
> +
> +	return offset < len ? offset : U32_MAX;
> +}
> +
> +static int __set_oa_ctx_ctrl_offset(struct intel_context *ce)

I have seen people complain about unnecessary double underscores in front
of function names ;-)

> +{
> +	i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
> +	struct i915_perf *perf = &ce->engine->i915->perf;
> +	u32 saved_offset = perf->ctx_oactxctrl_offset;
> +	u32 offset;
> +
> +	/* Do this only once. Failure is stored as offset of U32_MAX */
> +	if (saved_offset)
> +		return 0;

But if saved_offset is U32_MAX we should be returning -ENODEV?

> +
> +	offset = __context_image_offset(ce, i915_mmio_reg_offset(reg));
> +	perf->ctx_oactxctrl_offset = offset;
> +
> +	drm_dbg(&ce->engine->i915->drm,
> +		"%s oa ctx control at 0x%08x dword offset\n",
> +		ce->engine->name, offset);
> +
> +	return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
> +}
> +
> +static bool engine_supports_mi_query(struct intel_engine_cs *engine)
> +{
> +	return engine->class == RENDER_CLASS;
> +}
> +
>  /**
>   * oa_get_render_ctx_id - determine and hold ctx hw id
>   * @stream: An i915-perf stream opened for OA metrics
> @@ -1377,6 +1435,17 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>	if (IS_ERR(ce))
>		return PTR_ERR(ce);
>
> +	if (engine_supports_mi_query(stream->engine)) {
> +		ret = __set_oa_ctx_ctrl_offset(ce);
> +		if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) {

This is not a problem in SAMPLE_OA_REPORT case?

> +			intel_context_unpin(ce);
> +			drm_err(&stream->perf->i915->drm,
> +				"Enabling perf query failed for %s\n",
> +				stream->engine->name);
> +			return ret;
> +		}
> +	}
> +
>	switch (GRAPHICS_VER(ce->engine->i915)) {
>	case 7: {
>		/*
> @@ -2408,10 +2477,11 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
>	int err;
>	struct intel_context *ce = stream->pinned_ctx;
>	u32 format = stream->oa_buffer.format;
> +	u32 offset = stream->perf->ctx_oactxctrl_offset;
>	struct flex regs_context[] = {
>		{
>			GEN8_OACTXCONTROL,
> -			stream->perf->ctx_oactxctrl_offset + 1,
> +			offset + 1,
>			active ? GEN8_OA_COUNTER_RESUME : 0,
>		},
>	};
> @@ -2436,15 +2506,18 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
>		},
>	};
>
> -	/* Modify the context image of pinned context with regs_context*/
> -	err = intel_context_lock_pinned(ce);
> -	if (err)
> -		return err;
> +	/* Modify the context image of pinned context with regs_context */
> +	if (__valid_oactxctrl_offset(offset)) {

This check is not needed, if we didn't have valid a offset we should return
error from oa_get_render_ctx_id.

> +		err = intel_context_lock_pinned(ce);
> +		if (err)
> +			return err;
>
> -	err = gen8_modify_context(ce, regs_context, ARRAY_SIZE(regs_context));
> -	intel_context_unlock_pinned(ce);
> -	if (err)
> -		return err;
> +		err = gen8_modify_context(ce, regs_context,
> +					  ARRAY_SIZE(regs_context));
> +		intel_context_unlock_pinned(ce);
> +		if (err)
> +			return err;
> +	}

>
>	/* Apply regs_lri using LRI with pinned context */
>	return gen8_modify_self(ce, regs_lri, ARRAY_SIZE(regs_lri), active);
> @@ -2566,6 +2639,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
>			   const struct i915_oa_config *oa_config,
>			   struct i915_active *active)
>  {
> +	u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset;
>	/* The MMIO offsets for Flex EU registers aren't contiguous */
>	const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset;
>  #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1)
> @@ -2576,7 +2650,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
>		},
>		{
>			GEN8_OACTXCONTROL,
> -			stream->perf->ctx_oactxctrl_offset + 1,
> +			ctx_oactxctrl + 1,
>		},
>		{ EU_PERF_CNTL0, ctx_flexeuN(0) },
>		{ EU_PERF_CNTL1, ctx_flexeuN(1) },
> @@ -4545,6 +4619,37 @@ static void oa_init_supported_formats(struct i915_perf *perf)
>	}
>  }
>
> +static void i915_perf_init_info(struct drm_i915_private *i915)
> +{
> +	struct i915_perf *perf = &i915->perf;
> +
> +	switch (GRAPHICS_VER(i915)) {
> +	case 8:
> +		perf->ctx_oactxctrl_offset = 0x120;
> +		perf->ctx_flexeu0_offset = 0x2ce;
> +		perf->gen8_valid_ctx_bit = BIT(25);
> +		break;
> +	case 9:
> +		perf->ctx_oactxctrl_offset = 0x128;
> +		perf->ctx_flexeu0_offset = 0x3de;
> +		perf->gen8_valid_ctx_bit = BIT(16);
> +		break;
> +	case 11:
> +		perf->ctx_oactxctrl_offset = 0x124;
> +		perf->ctx_flexeu0_offset = 0x78e;
> +		perf->gen8_valid_ctx_bit = BIT(16);
> +		break;
> +	case 12:
> +		/*
> +		 * Calculate offset at runtime in oa_pin_context for gen12 and
> +		 * cache the value in perf->ctx_oactxctrl_offset.
> +		 */

What about ctx_flexeu0_offset and gen8_valid_ctx_bit for Gen12+?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 04/15] drm/i915/perf: Determine gen12 oa ctx offset at runtime
  2022-09-27 23:24   ` Dixit, Ashutosh
@ 2022-09-30 21:42     ` Umesh Nerlige Ramappa
  2022-09-30 23:09       ` Dixit, Ashutosh
  0 siblings, 1 reply; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-09-30 21:42 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Tue, Sep 27, 2022 at 04:24:01PM -0700, Dixit, Ashutosh wrote:
>On Fri, 23 Sep 2022 13:11:43 -0700, Umesh Nerlige Ramappa wrote:
>>
>
>Hi Umesh,
>
>> Some SKUs of same gen12 platform may have different oactxctrl
>> offsets. For gen12, determine oactxctrl offsets at runtime.
>
>So seems we are writing code to extract static information for products
>just because it is not documented?

Documented but never accurate. The only accurate values are obtained 
with real hardware. In addition DG2 has different offsets for different 
variants, so this patch.

>
>>
>> v2: (Lionel)
>> - Move MI definitions to intel_gpu_commands.h
>> - Ensure __find_reg_in_lri does read past context image size
>>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |   4 +
>>  drivers/gpu/drm/i915/i915_perf.c             | 146 +++++++++++++++----
>>  drivers/gpu/drm/i915/i915_perf_oa_regs.h     |   2 +-
>>  3 files changed, 121 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> index d4e9702d3c8e..f50ea92910d9 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> @@ -187,6 +187,10 @@
>>  #define   MI_BATCH_RESOURCE_STREAMER REG_BIT(10)
>>  #define   MI_BATCH_PREDICATE         REG_BIT(15) /* HSW+ on RCS only*/
>>
>> +#define MI_OPCODE(x)		(((x) >> 23) & 0x3f)
>> +#define IS_MI_LRI_CMD(x)	(MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 0)))
>> +#define MI_LRI_LEN(x)		(((x) & 0xff) + 1)
>> +
>>  /*
>>   * 3D instructions used by the kernel
>>   */
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index cd57b5836386..fb705f24705b 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1358,6 +1358,64 @@ static int gen12_get_render_context_id(struct i915_perf_stream *stream)
>>	return 0;
>>  }
>>
>> +#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX)
>> +static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset, u32 end)
>> +{
>> +	u32 idx = *offset;
>> +	u32 len = min(MI_LRI_LEN(state[idx]) + idx, end);
>
>The code below works only if MI_LRI_LEN() is an even number (because of
>'idx += 2', which I think should always be the case but not sure if we it
>makes sense to add an assert etc.
>
>> +
>> +	idx++;
>> +	for (; idx < len; idx += 2)
>> +		if (state[idx] == reg)
>> +			break;
>> +
>> +	*offset = idx;
>> +	return state[idx] == reg;
>
>I think this can be a bug if 'idx > len' but 'state[idx] == reg'. So we
>need to do something like:
>
>static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset, u32 end)
>{
>	u32 idx = *offset;
>	u32 len = min(MI_LRI_LEN(state[idx]) + idx, end);
>	bool found = false;
>
>	idx++;
>	for (; idx < len; idx += 2)
>		if (state[idx] == reg)
>			found = true;
>
>	*offset = idx;
>	return found;
>

Agree, will add found.

>> +}
>> +
>> +static u32 __context_image_offset(struct intel_context *ce, u32 reg)
>> +{
>> +	u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4;
>> +	u32 *state = ce->lrc_reg_state;
>> +
>> +	for (offset = 0; offset < len; ) {
>> +		if (IS_MI_LRI_CMD(state[offset])) {
>> +			if (__find_reg_in_lri(state, reg, &offset, len))
>> +				break;
>> +		} else {
>> +			offset++;
>> +		}
>> +	}
>> +
>> +	return offset < len ? offset : U32_MAX;
>> +}
>> +
>> +static int __set_oa_ctx_ctrl_offset(struct intel_context *ce)
>
>I have seen people complain about unnecessary double underscores in front
>of function names ;-)

will remove/change to oa_*.

>
>> +{
>> +	i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
>> +	struct i915_perf *perf = &ce->engine->i915->perf;
>> +	u32 saved_offset = perf->ctx_oactxctrl_offset;
>> +	u32 offset;
>> +
>> +	/* Do this only once. Failure is stored as offset of U32_MAX */
>> +	if (saved_offset)
>> +		return 0;
>
>But if saved_offset is U32_MAX we should be returning -ENODEV?

correct, the above if block should be:

if (__valid_oactxctrl_offset(offset))
	return 0;

if (saved_offset == U32_MAX)
	return -ENODEV;

>
>> +
>> +	offset = __context_image_offset(ce, i915_mmio_reg_offset(reg));
>> +	perf->ctx_oactxctrl_offset = offset;
>> +
>> +	drm_dbg(&ce->engine->i915->drm,
>> +		"%s oa ctx control at 0x%08x dword offset\n",
>> +		ce->engine->name, offset);
>> +
>> +	return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
>> +}
>> +
>> +static bool engine_supports_mi_query(struct intel_engine_cs *engine)
>> +{
>> +	return engine->class == RENDER_CLASS;
>> +}
>> +
>>  /**
>>   * oa_get_render_ctx_id - determine and hold ctx hw id
>>   * @stream: An i915-perf stream opened for OA metrics
>> @@ -1377,6 +1435,17 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>	if (IS_ERR(ce))
>>		return PTR_ERR(ce);
>>
>> +	if (engine_supports_mi_query(stream->engine)) {
>> +		ret = __set_oa_ctx_ctrl_offset(ce);
>> +		if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) {
>
>This is not a problem in SAMPLE_OA_REPORT case?

SAMPLE_OA_REPORT is OAG use case.

Actually, I did not know how to treat this condition. The current 
interface will configure both OAR and OAG. If we have an error 
configuring OAR, should we fail or let the OAG use case work?

I am now leaning towards failing this unconditionally. Thoughts?

>
>> +			intel_context_unpin(ce);
>> +			drm_err(&stream->perf->i915->drm,
>> +				"Enabling perf query failed for %s\n",
>> +				stream->engine->name);
>> +			return ret;
>> +		}
>> +	}
>> +
>>	switch (GRAPHICS_VER(ce->engine->i915)) {
>>	case 7: {
>>		/*
>> @@ -2408,10 +2477,11 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
>>	int err;
>>	struct intel_context *ce = stream->pinned_ctx;
>>	u32 format = stream->oa_buffer.format;
>> +	u32 offset = stream->perf->ctx_oactxctrl_offset;
>>	struct flex regs_context[] = {
>>		{
>>			GEN8_OACTXCONTROL,
>> -			stream->perf->ctx_oactxctrl_offset + 1,
>> +			offset + 1,
>>			active ? GEN8_OA_COUNTER_RESUME : 0,
>>		},
>>	};
>> @@ -2436,15 +2506,18 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream,
>>		},
>>	};
>>
>> -	/* Modify the context image of pinned context with regs_context*/
>> -	err = intel_context_lock_pinned(ce);
>> -	if (err)
>> -		return err;
>> +	/* Modify the context image of pinned context with regs_context */
>> +	if (__valid_oactxctrl_offset(offset)) {
>
>This check is not needed, if we didn't have valid a offset we should return
>error from oa_get_render_ctx_id.

Hmm, I guess so.

>
>> +		err = intel_context_lock_pinned(ce);
>> +		if (err)
>> +			return err;
>>
>> -	err = gen8_modify_context(ce, regs_context, ARRAY_SIZE(regs_context));
>> -	intel_context_unlock_pinned(ce);
>> -	if (err)
>> -		return err;
>> +		err = gen8_modify_context(ce, regs_context,
>> +					  ARRAY_SIZE(regs_context));
>> +		intel_context_unlock_pinned(ce);
>> +		if (err)
>> +			return err;
>> +	}
>
>>
>>	/* Apply regs_lri using LRI with pinned context */
>>	return gen8_modify_self(ce, regs_lri, ARRAY_SIZE(regs_lri), active);
>> @@ -2566,6 +2639,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
>>			   const struct i915_oa_config *oa_config,
>>			   struct i915_active *active)
>>  {
>> +	u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset;
>>	/* The MMIO offsets for Flex EU registers aren't contiguous */
>>	const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset;
>>  #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1)
>> @@ -2576,7 +2650,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream,
>>		},
>>		{
>>			GEN8_OACTXCONTROL,
>> -			stream->perf->ctx_oactxctrl_offset + 1,
>> +			ctx_oactxctrl + 1,
>>		},
>>		{ EU_PERF_CNTL0, ctx_flexeuN(0) },
>>		{ EU_PERF_CNTL1, ctx_flexeuN(1) },
>> @@ -4545,6 +4619,37 @@ static void oa_init_supported_formats(struct i915_perf *perf)
>>	}
>>  }
>>
>> +static void i915_perf_init_info(struct drm_i915_private *i915)
>> +{
>> +	struct i915_perf *perf = &i915->perf;
>> +
>> +	switch (GRAPHICS_VER(i915)) {
>> +	case 8:
>> +		perf->ctx_oactxctrl_offset = 0x120;
>> +		perf->ctx_flexeu0_offset = 0x2ce;
>> +		perf->gen8_valid_ctx_bit = BIT(25);
>> +		break;
>> +	case 9:
>> +		perf->ctx_oactxctrl_offset = 0x128;
>> +		perf->ctx_flexeu0_offset = 0x3de;
>> +		perf->gen8_valid_ctx_bit = BIT(16);
>> +		break;
>> +	case 11:
>> +		perf->ctx_oactxctrl_offset = 0x124;
>> +		perf->ctx_flexeu0_offset = 0x78e;
>> +		perf->gen8_valid_ctx_bit = BIT(16);
>> +		break;
>> +	case 12:
>> +		/*
>> +		 * Calculate offset at runtime in oa_pin_context for gen12 and
>> +		 * cache the value in perf->ctx_oactxctrl_offset.
>> +		 */
>
>What about ctx_flexeu0_offset and gen8_valid_ctx_bit for Gen12+?

For gen12 ctx_flexeu0_offset are no longer part of the context image, so 
not present.

gen8_valid_ctx_bit is also not defined for gen12.

Thanks,
Umesh

>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 04/15] drm/i915/perf: Determine gen12 oa ctx offset at runtime
  2022-09-30 21:42     ` Umesh Nerlige Ramappa
@ 2022-09-30 23:09       ` Dixit, Ashutosh
  2022-10-08  1:06         ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 31+ messages in thread
From: Dixit, Ashutosh @ 2022-09-30 23:09 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Fri, 30 Sep 2022 14:42:07 -0700, Umesh Nerlige Ramappa wrote:
>
> >> +static int __set_oa_ctx_ctrl_offset(struct intel_context *ce)
> >
> > I have seen people complain about unnecessary double underscores in front
> > of function names ;-)
>
> will remove/change to oa_*.
>
> >
> >> +{
> >> +	i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
> >> +	struct i915_perf *perf = &ce->engine->i915->perf;
> >> +	u32 saved_offset = perf->ctx_oactxctrl_offset;
> >> +	u32 offset;
> >> +
> >> +	/* Do this only once. Failure is stored as offset of U32_MAX */
> >> +	if (saved_offset)
> >> +		return 0;
> >
> > But if saved_offset is U32_MAX we should be returning -ENODEV?
>
> correct, the above if block should be:
>
> if (__valid_oactxctrl_offset(offset))
>	return 0;
>
> if (saved_offset == U32_MAX)
>	return -ENODEV;

I would just do:

	u32 offset = perf->ctx_oactxctrl_offset;

	if (offset)
		goto exit;

	...
exit:
	return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
}

>
> >
> >> +
> >> +	offset = __context_image_offset(ce, i915_mmio_reg_offset(reg));
> >> +	perf->ctx_oactxctrl_offset = offset;
> >> +
> >> +	drm_dbg(&ce->engine->i915->drm,
> >> +		"%s oa ctx control at 0x%08x dword offset\n",
> >> +		ce->engine->name, offset);
> >> +
> >> +	return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
> >> +}
> >> +
> >> +static bool engine_supports_mi_query(struct intel_engine_cs *engine)
> >> +{
> >> +	return engine->class == RENDER_CLASS;
> >> +}
> >> +
> >>  /**
> >>   * oa_get_render_ctx_id - determine and hold ctx hw id
> >>   * @stream: An i915-perf stream opened for OA metrics
> >> @@ -1377,6 +1435,17 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
> >>	if (IS_ERR(ce))
> >>		return PTR_ERR(ce);
> >>
> >> +	if (engine_supports_mi_query(stream->engine)) {
> >> +		ret = __set_oa_ctx_ctrl_offset(ce);
> >> +		if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) {
> >
> > This is not a problem in SAMPLE_OA_REPORT case?
>
> SAMPLE_OA_REPORT is OAG use case.
>
> Actually, I did not know how to treat this condition. The current interface
> will configure both OAR and OAG. If we have an error configuring OAR,
> should we fail or let the OAG use case work?
>
> I am now leaning towards failing this unconditionally. Thoughts?

Sorry I didn't follow. What does the oa_ctx_ctrl_offset in the context
image do or control? Looks like oa_ctx_ctrl register controls the OA HW
timer which dumps data into the OA buffer so should be programmed correctly
for OAG (and possibly also for OAR). So maybe letting this fail
unconditionally is correct? But I am not sure.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH v2 04/15] drm/i915/perf: Determine gen12 oa ctx offset at runtime
  2022-09-30 23:09       ` Dixit, Ashutosh
@ 2022-10-08  1:06         ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-10-08  1:06 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, Sep 30, 2022 at 04:09:36PM -0700, Dixit, Ashutosh wrote:
>On Fri, 30 Sep 2022 14:42:07 -0700, Umesh Nerlige Ramappa wrote:
>>
>> >> +static int __set_oa_ctx_ctrl_offset(struct intel_context *ce)
>> >
>> > I have seen people complain about unnecessary double underscores in front
>> > of function names ;-)
>>
>> will remove/change to oa_*.
>>
>> >
>> >> +{
>> >> +	i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base);
>> >> +	struct i915_perf *perf = &ce->engine->i915->perf;
>> >> +	u32 saved_offset = perf->ctx_oactxctrl_offset;
>> >> +	u32 offset;
>> >> +
>> >> +	/* Do this only once. Failure is stored as offset of U32_MAX */
>> >> +	if (saved_offset)
>> >> +		return 0;
>> >
>> > But if saved_offset is U32_MAX we should be returning -ENODEV?
>>
>> correct, the above if block should be:
>>
>> if (__valid_oactxctrl_offset(offset))
>>	return 0;
>>
>> if (saved_offset == U32_MAX)
>>	return -ENODEV;
>
>I would just do:
>
>	u32 offset = perf->ctx_oactxctrl_offset;
>
>	if (offset)
>		goto exit;
>
>	...
>exit:
>	return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
>}
>
>>
>> >
>> >> +
>> >> +	offset = __context_image_offset(ce, i915_mmio_reg_offset(reg));
>> >> +	perf->ctx_oactxctrl_offset = offset;
>> >> +
>> >> +	drm_dbg(&ce->engine->i915->drm,
>> >> +		"%s oa ctx control at 0x%08x dword offset\n",
>> >> +		ce->engine->name, offset);
>> >> +
>> >> +	return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV;
>> >> +}
>> >> +
>> >> +static bool engine_supports_mi_query(struct intel_engine_cs *engine)
>> >> +{
>> >> +	return engine->class == RENDER_CLASS;
>> >> +}
>> >> +
>> >>  /**
>> >>   * oa_get_render_ctx_id - determine and hold ctx hw id
>> >>   * @stream: An i915-perf stream opened for OA metrics
>> >> @@ -1377,6 +1435,17 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>> >>	if (IS_ERR(ce))
>> >>		return PTR_ERR(ce);
>> >>
>> >> +	if (engine_supports_mi_query(stream->engine)) {
>> >> +		ret = __set_oa_ctx_ctrl_offset(ce);
>> >> +		if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) {
>> >
>> > This is not a problem in SAMPLE_OA_REPORT case?
>>
>> SAMPLE_OA_REPORT is OAG use case.
>>
>> Actually, I did not know how to treat this condition. The current interface
>> will configure both OAR and OAG. If we have an error configuring OAR,
>> should we fail or let the OAG use case work?
>>
>> I am now leaning towards failing this unconditionally. Thoughts?
>
>Sorry I didn't follow. What does the oa_ctx_ctrl_offset in the context
>image do or control? Looks like oa_ctx_ctrl register controls the OA HW
>timer which dumps data into the OA buffer so should be programmed correctly
>for OAG (and possibly also for OAR).

It controls TIMER for OAG. I think this is a bug. I will post a separate 
patch for this since there is some inconsistency in how OAR vs OAG is 
handled in this code. In short:

- we need to either enable OAG always OR disable all reports into OAG 
   based on SAMPLE_OA/stream->periodic config
- oa_ctx_ctrl_offset should accordingly program the timer and 
   periodicity bits here.

>So maybe letting this fail
>unconditionally is correct? But I am not sure.

Agree on returning failure if ret is error.

Thanks,
Umesh

>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-10-08  1:07 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-23 20:11 [Intel-gfx] [PATCH v2 00/15] Add DG2 OA support Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 01/15] drm/i915/perf: Fix OA filtering logic for GuC mode Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 02/15] drm/i915/perf: Add OAG and OAR formats for DG2 Umesh Nerlige Ramappa
2022-09-24  4:08   ` Dixit, Ashutosh
2022-09-26 18:11     ` Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 03/15] drm/i915/perf: Fix noa wait predication " Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 04/15] drm/i915/perf: Determine gen12 oa ctx offset at runtime Umesh Nerlige Ramappa
2022-09-27 23:24   ` Dixit, Ashutosh
2022-09-30 21:42     ` Umesh Nerlige Ramappa
2022-09-30 23:09       ` Dixit, Ashutosh
2022-10-08  1:06         ` Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 05/15] drm/i915/perf: Enable commands per clock reporting in OA Umesh Nerlige Ramappa
2022-09-26 15:55   ` Dixit, Ashutosh
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 06/15] drm/i915/perf: Simply use stream->ctx Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 07/15] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 08/15] drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 09/15] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 10/15] drm/i915/perf: Store a pointer to oa_format in oa_buffer Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 11/15] drm/i915/perf: Add Wa_1508761755:dg2 Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 12/15] drm/i915/perf: Apply Wa_18013179988 Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 13/15] drm/i915/perf: Save/restore EU flex counters across reset Umesh Nerlige Ramappa
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 14/15] drm/i915/guc: Support OA when Wa_16011777198 is enabled Umesh Nerlige Ramappa
2022-09-26 15:56   ` Dixit, Ashutosh
2022-09-26 18:19     ` Umesh Nerlige Ramappa
2022-09-26 21:17       ` Belgaumkar, Vinay
2022-09-26 23:28         ` Dixit, Ashutosh
2022-09-27 16:11           ` Umesh Nerlige Ramappa
2022-09-27 17:34             ` Dixit, Ashutosh
2022-09-27 17:51               ` Dixit, Ashutosh
2022-09-23 20:11 ` [Intel-gfx] [PATCH v2 15/15] drm/i915/perf: Enable OA for DG2 Umesh Nerlige Ramappa
2022-09-23 21:44 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add DG2 OA support (rev3) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.