All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Enable triggered perf query for Xe_HP
@ 2021-08-30 19:38 ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

This is a revival of the patch series to support triggered perf query reports
from here - https://patchwork.freedesktop.org/series/83831/

The patches were not pushed earlier because corresponding UMD changes were
missing. This revival addresses UMD changes in GPUvis for this series. GPUvis
uses the perf library in igt-gpu-tools. Changes to the library are here -
https://patchwork.freedesktop.org/series/93355/

GPUvis changes will be posted as a PR once the above library and kernel changes
are pushed.

Summary of the feature:

Current platforms provide MI_REPORT_PERF_COUNT to query a snapshot of perf
counters from a batch. This mechanism does not have consistent support on all
engines for newer platforms. To support perf query, all new platforms use a
mechanism to trigger OA report snapshots into the OA buffer by writing to a HW
register from a batch. To lookup this report in the OA buffer quickly, the OA
buffer is mmapped into the user space.

This series implements the new query mechanism.

v2: Fix BAT failure (Umesh)
v3: Fix selftest (Umesh)
v4: Update uapi comment (Umesh)

Test-with: 20210830193337.15260-1-umesh.nerlige.ramappa@intel.com
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Chris Wilson (3):
  drm/i915/gt: Refactor _wa_add to reuse wa_index and wa_list_grow
  drm/i915/gt: Check for conflicting RING_NONPRIV
  drm/i915/gt: Enable dynamic adjustment of RING_NONPRIV

Piotr Maciejewski (1):
  drm/i915/perf: Ensure observation logic is not clock gated

Umesh Nerlige Ramappa (4):
  drm/i915/gt: Lock intel_engine_apply_whitelist with uncore->lock
  drm/i915/perf: Whitelist OA report trigger registers
  drm/i915/perf: Whitelist OA counter and buffer registers
  drm/i915/perf: Map OA buffer to user space for gen12 performance query

 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.h      |   2 +
 drivers/gpu/drm/i915/gt/intel_workarounds.c   | 269 +++++++++++++-----
 drivers/gpu/drm/i915/gt/intel_workarounds.h   |   7 +
 .../gpu/drm/i915/gt/selftest_workarounds.c    | 267 +++++++++++++++++
 drivers/gpu/drm/i915/i915_perf.c              | 228 ++++++++++++++-
 drivers/gpu/drm/i915/i915_perf_types.h        |   8 +
 drivers/gpu/drm/i915/i915_reg.h               |  30 +-
 include/uapi/drm/i915_drm.h                   |  33 +++
 9 files changed, 758 insertions(+), 88 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH 0/8] Enable triggered perf query for Xe_HP
@ 2021-08-30 19:38 ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

This is a revival of the patch series to support triggered perf query reports
from here - https://patchwork.freedesktop.org/series/83831/

The patches were not pushed earlier because corresponding UMD changes were
missing. This revival addresses UMD changes in GPUvis for this series. GPUvis
uses the perf library in igt-gpu-tools. Changes to the library are here -
https://patchwork.freedesktop.org/series/93355/

GPUvis changes will be posted as a PR once the above library and kernel changes
are pushed.

Summary of the feature:

Current platforms provide MI_REPORT_PERF_COUNT to query a snapshot of perf
counters from a batch. This mechanism does not have consistent support on all
engines for newer platforms. To support perf query, all new platforms use a
mechanism to trigger OA report snapshots into the OA buffer by writing to a HW
register from a batch. To lookup this report in the OA buffer quickly, the OA
buffer is mmapped into the user space.

This series implements the new query mechanism.

v2: Fix BAT failure (Umesh)
v3: Fix selftest (Umesh)
v4: Update uapi comment (Umesh)

Test-with: 20210830193337.15260-1-umesh.nerlige.ramappa@intel.com
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Chris Wilson (3):
  drm/i915/gt: Refactor _wa_add to reuse wa_index and wa_list_grow
  drm/i915/gt: Check for conflicting RING_NONPRIV
  drm/i915/gt: Enable dynamic adjustment of RING_NONPRIV

Piotr Maciejewski (1):
  drm/i915/perf: Ensure observation logic is not clock gated

Umesh Nerlige Ramappa (4):
  drm/i915/gt: Lock intel_engine_apply_whitelist with uncore->lock
  drm/i915/perf: Whitelist OA report trigger registers
  drm/i915/perf: Whitelist OA counter and buffer registers
  drm/i915/perf: Map OA buffer to user space for gen12 performance query

 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.h      |   2 +
 drivers/gpu/drm/i915/gt/intel_workarounds.c   | 269 +++++++++++++-----
 drivers/gpu/drm/i915/gt/intel_workarounds.h   |   7 +
 .../gpu/drm/i915/gt/selftest_workarounds.c    | 267 +++++++++++++++++
 drivers/gpu/drm/i915/i915_perf.c              | 228 ++++++++++++++-
 drivers/gpu/drm/i915/i915_perf_types.h        |   8 +
 drivers/gpu/drm/i915/i915_reg.h               |  30 +-
 include/uapi/drm/i915_drm.h                   |  33 +++
 9 files changed, 758 insertions(+), 88 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/8] drm/i915/gt: Lock intel_engine_apply_whitelist with uncore->lock
  2021-08-30 19:38 ` [Intel-gfx] " Umesh Nerlige Ramappa
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  -1 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

Refactor intel_engine_apply_whitelist into locked and unlocked versions
so that a caller who already has the lock can apply whitelist.

v2: Fix sparse warning
v3: (Chris)
- Drop prefix and suffix for static function
- Use longest to shortest line ordering for variable declaration

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 ++++++++++++++-------
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 94e1937f8d29..2a8cc0e2d1b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -1248,7 +1248,8 @@ void intel_gt_init_workarounds(struct drm_i915_private *i915)
 }
 
 static enum forcewake_domains
-wal_get_fw_for_rmw(struct intel_uncore *uncore, const struct i915_wa_list *wal)
+wal_get_fw(struct intel_uncore *uncore, const struct i915_wa_list *wal,
+	   unsigned int op)
 {
 	enum forcewake_domains fw = 0;
 	struct i915_wa *wa;
@@ -1257,8 +1258,7 @@ wal_get_fw_for_rmw(struct intel_uncore *uncore, const struct i915_wa_list *wal)
 	for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
 		fw |= intel_uncore_forcewake_for_reg(uncore,
 						     wa->reg,
-						     FW_REG_READ |
-						     FW_REG_WRITE);
+						     op);
 
 	return fw;
 }
@@ -1289,7 +1289,7 @@ wa_list_apply(struct intel_gt *gt, const struct i915_wa_list *wal)
 	if (!wal->count)
 		return;
 
-	fw = wal_get_fw_for_rmw(uncore, wal);
+	fw = wal_get_fw(uncore, wal, FW_REG_READ | FW_REG_WRITE);
 
 	spin_lock_irqsave(&uncore->lock, flags);
 	intel_uncore_forcewake_get__locked(uncore, fw);
@@ -1328,7 +1328,7 @@ static bool wa_list_verify(struct intel_gt *gt,
 	unsigned int i;
 	bool ok = true;
 
-	fw = wal_get_fw_for_rmw(uncore, wal);
+	fw = wal_get_fw(uncore, wal, FW_REG_READ | FW_REG_WRITE);
 
 	spin_lock_irqsave(&uncore->lock, flags);
 	intel_uncore_forcewake_get__locked(uncore, fw);
@@ -1617,27 +1617,45 @@ void intel_engine_init_whitelist(struct intel_engine_cs *engine)
 	wa_init_finish(w);
 }
 
-void intel_engine_apply_whitelist(struct intel_engine_cs *engine)
+static void __engine_apply_whitelist(struct intel_engine_cs *engine)
 {
 	const struct i915_wa_list *wal = &engine->whitelist;
 	struct intel_uncore *uncore = engine->uncore;
 	const u32 base = engine->mmio_base;
+	enum forcewake_domains fw;
 	struct i915_wa *wa;
 	unsigned int i;
 
-	if (!wal->count)
-		return;
+	lockdep_assert_held(&uncore->lock);
+
+	fw = wal_get_fw(uncore, wal, FW_REG_WRITE);
+	intel_uncore_forcewake_get__locked(uncore, fw);
 
 	for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
-		intel_uncore_write(uncore,
-				   RING_FORCE_TO_NONPRIV(base, i),
-				   i915_mmio_reg_offset(wa->reg));
+		intel_uncore_write_fw(uncore,
+				      RING_FORCE_TO_NONPRIV(base, i),
+				      i915_mmio_reg_offset(wa->reg));
 
 	/* And clear the rest just in case of garbage */
 	for (; i < RING_MAX_NONPRIV_SLOTS; i++)
-		intel_uncore_write(uncore,
-				   RING_FORCE_TO_NONPRIV(base, i),
-				   i915_mmio_reg_offset(RING_NOPID(base)));
+		intel_uncore_write_fw(uncore,
+				      RING_FORCE_TO_NONPRIV(base, i),
+				      i915_mmio_reg_offset(RING_NOPID(base)));
+
+	intel_uncore_forcewake_put__locked(uncore, fw);
+}
+
+void intel_engine_apply_whitelist(struct intel_engine_cs *engine)
+{
+	const struct i915_wa_list *wal = &engine->whitelist;
+	unsigned long flags;
+
+	if (!wal->count)
+		return;
+
+	spin_lock_irqsave(&engine->uncore->lock, flags);
+	__engine_apply_whitelist(engine);
+	spin_unlock_irqrestore(&engine->uncore->lock, flags);
 }
 
 static void
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH 1/8] drm/i915/gt: Lock intel_engine_apply_whitelist with uncore->lock
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

Refactor intel_engine_apply_whitelist into locked and unlocked versions
so that a caller who already has the lock can apply whitelist.

v2: Fix sparse warning
v3: (Chris)
- Drop prefix and suffix for static function
- Use longest to shortest line ordering for variable declaration

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 ++++++++++++++-------
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 94e1937f8d29..2a8cc0e2d1b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -1248,7 +1248,8 @@ void intel_gt_init_workarounds(struct drm_i915_private *i915)
 }
 
 static enum forcewake_domains
-wal_get_fw_for_rmw(struct intel_uncore *uncore, const struct i915_wa_list *wal)
+wal_get_fw(struct intel_uncore *uncore, const struct i915_wa_list *wal,
+	   unsigned int op)
 {
 	enum forcewake_domains fw = 0;
 	struct i915_wa *wa;
@@ -1257,8 +1258,7 @@ wal_get_fw_for_rmw(struct intel_uncore *uncore, const struct i915_wa_list *wal)
 	for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
 		fw |= intel_uncore_forcewake_for_reg(uncore,
 						     wa->reg,
-						     FW_REG_READ |
-						     FW_REG_WRITE);
+						     op);
 
 	return fw;
 }
@@ -1289,7 +1289,7 @@ wa_list_apply(struct intel_gt *gt, const struct i915_wa_list *wal)
 	if (!wal->count)
 		return;
 
-	fw = wal_get_fw_for_rmw(uncore, wal);
+	fw = wal_get_fw(uncore, wal, FW_REG_READ | FW_REG_WRITE);
 
 	spin_lock_irqsave(&uncore->lock, flags);
 	intel_uncore_forcewake_get__locked(uncore, fw);
@@ -1328,7 +1328,7 @@ static bool wa_list_verify(struct intel_gt *gt,
 	unsigned int i;
 	bool ok = true;
 
-	fw = wal_get_fw_for_rmw(uncore, wal);
+	fw = wal_get_fw(uncore, wal, FW_REG_READ | FW_REG_WRITE);
 
 	spin_lock_irqsave(&uncore->lock, flags);
 	intel_uncore_forcewake_get__locked(uncore, fw);
@@ -1617,27 +1617,45 @@ void intel_engine_init_whitelist(struct intel_engine_cs *engine)
 	wa_init_finish(w);
 }
 
-void intel_engine_apply_whitelist(struct intel_engine_cs *engine)
+static void __engine_apply_whitelist(struct intel_engine_cs *engine)
 {
 	const struct i915_wa_list *wal = &engine->whitelist;
 	struct intel_uncore *uncore = engine->uncore;
 	const u32 base = engine->mmio_base;
+	enum forcewake_domains fw;
 	struct i915_wa *wa;
 	unsigned int i;
 
-	if (!wal->count)
-		return;
+	lockdep_assert_held(&uncore->lock);
+
+	fw = wal_get_fw(uncore, wal, FW_REG_WRITE);
+	intel_uncore_forcewake_get__locked(uncore, fw);
 
 	for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
-		intel_uncore_write(uncore,
-				   RING_FORCE_TO_NONPRIV(base, i),
-				   i915_mmio_reg_offset(wa->reg));
+		intel_uncore_write_fw(uncore,
+				      RING_FORCE_TO_NONPRIV(base, i),
+				      i915_mmio_reg_offset(wa->reg));
 
 	/* And clear the rest just in case of garbage */
 	for (; i < RING_MAX_NONPRIV_SLOTS; i++)
-		intel_uncore_write(uncore,
-				   RING_FORCE_TO_NONPRIV(base, i),
-				   i915_mmio_reg_offset(RING_NOPID(base)));
+		intel_uncore_write_fw(uncore,
+				      RING_FORCE_TO_NONPRIV(base, i),
+				      i915_mmio_reg_offset(RING_NOPID(base)));
+
+	intel_uncore_forcewake_put__locked(uncore, fw);
+}
+
+void intel_engine_apply_whitelist(struct intel_engine_cs *engine)
+{
+	const struct i915_wa_list *wal = &engine->whitelist;
+	unsigned long flags;
+
+	if (!wal->count)
+		return;
+
+	spin_lock_irqsave(&engine->uncore->lock, flags);
+	__engine_apply_whitelist(engine);
+	spin_unlock_irqrestore(&engine->uncore->lock, flags);
 }
 
 static void
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/8] drm/i915/gt: Refactor _wa_add to reuse wa_index and wa_list_grow
  2021-08-30 19:38 ` [Intel-gfx] " Umesh Nerlige Ramappa
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  -1 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

From: Chris Wilson <chris.p.wilson@intel.com>

Switch the search and grow code of the _wa_add to use _wa_index and
_wa_list_grow.

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 124 +++++++++++---------
 1 file changed, 71 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 2a8cc0e2d1b1..6928f250cafe 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -60,20 +60,19 @@ static void wa_init_start(struct i915_wa_list *wal, const char *name, const char
 
 #define WA_LIST_CHUNK (1 << 4)
 
-static void wa_init_finish(struct i915_wa_list *wal)
+static void wa_trim(struct i915_wa_list *wal, gfp_t gfp)
 {
+	struct i915_wa *list;
+
 	/* Trim unused entries. */
-	if (!IS_ALIGNED(wal->count, WA_LIST_CHUNK)) {
-		struct i915_wa *list = kmemdup(wal->list,
-					       wal->count * sizeof(*list),
-					       GFP_KERNEL);
-
-		if (list) {
-			kfree(wal->list);
-			wal->list = list;
-		}
-	}
+	list = krealloc(wal->list, wal->count * sizeof(*list), gfp);
+	if (list)
+		wal->list = list;
+}
 
+static void wa_init_finish(struct i915_wa_list *wal)
+{
+	wa_trim(wal, GFP_KERNEL);
 	if (!wal->count)
 		return;
 
@@ -81,57 +80,60 @@ static void wa_init_finish(struct i915_wa_list *wal)
 			 wal->wa_count, wal->name, wal->engine_name);
 }
 
-static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
+static int wa_index(struct i915_wa_list *wal, i915_reg_t reg)
 {
-	unsigned int addr = i915_mmio_reg_offset(wa->reg);
-	unsigned int start = 0, end = wal->count;
-	const unsigned int grow = WA_LIST_CHUNK;
-	struct i915_wa *wa_;
+	unsigned int addr = i915_mmio_reg_offset(reg);
+	int start = 0, end = wal->count;
 
-	GEM_BUG_ON(!is_power_of_2(grow));
+	/* addr and wal->list[].reg, both include the R/W flags */
+	while (start < end) {
+		unsigned int mid = start + (end - start) / 2;
 
-	if (IS_ALIGNED(wal->count, grow)) { /* Either uninitialized or full. */
-		struct i915_wa *list;
+		if (i915_mmio_reg_offset(wal->list[mid].reg) < addr)
+			start = mid + 1;
+		else if (i915_mmio_reg_offset(wal->list[mid].reg) > addr)
+			end = mid;
+		else
+			return mid;
+	}
 
-		list = kmalloc_array(ALIGN(wal->count + 1, grow), sizeof(*wa),
-				     GFP_KERNEL);
-		if (!list) {
-			DRM_ERROR("No space for workaround init!\n");
-			return;
-		}
+	return -ENOENT;
+}
 
-		if (wal->list) {
-			memcpy(list, wal->list, sizeof(*wa) * wal->count);
-			kfree(wal->list);
-		}
+static int wa_list_grow(struct i915_wa_list *wal, size_t count, gfp_t gfp)
+{
+	struct i915_wa *list;
 
-		wal->list = list;
-	}
+	list = krealloc(wal->list, count * sizeof(*list), gfp);
+	if (!list)
+		return -ENOMEM;
 
-	while (start < end) {
-		unsigned int mid = start + (end - start) / 2;
+	wal->list = list;
+	return 0;
+}
 
-		if (i915_mmio_reg_offset(wal->list[mid].reg) < addr) {
-			start = mid + 1;
-		} else if (i915_mmio_reg_offset(wal->list[mid].reg) > addr) {
-			end = mid;
-		} else {
-			wa_ = &wal->list[mid];
-
-			if ((wa->clr | wa_->clr) && !(wa->clr & ~wa_->clr)) {
-				DRM_ERROR("Discarding overwritten w/a for reg %04x (clear: %08x, set: %08x)\n",
-					  i915_mmio_reg_offset(wa_->reg),
-					  wa_->clr, wa_->set);
-
-				wa_->set &= ~wa->clr;
-			}
-
-			wal->wa_count++;
-			wa_->set |= wa->set;
-			wa_->clr |= wa->clr;
-			wa_->read |= wa->read;
-			return;
+static void __wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
+{
+	struct i915_wa *wa_;
+	int index;
+
+	index = wa_index(wal, wa->reg);
+	if (index >= 0) {
+		wa_ = &wal->list[index];
+
+		if ((wa->clr | wa_->clr) && !(wa->clr & ~wa_->clr)) {
+			DRM_ERROR("Discarding overwritten w/a for reg %04x (clear: %08x, set: %08x)\n",
+				  i915_mmio_reg_offset(wa_->reg),
+				  wa_->clr, wa_->set);
+
+			wa_->set &= ~wa->clr;
 		}
+
+		wal->wa_count++;
+		wa_->set |= wa->set;
+		wa_->clr |= wa->clr;
+		wa_->read |= wa->read;
+		return;
 	}
 
 	wal->wa_count++;
@@ -149,6 +151,22 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 	}
 }
 
+static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
+{
+	const unsigned int grow = WA_LIST_CHUNK;
+
+	GEM_BUG_ON(!is_power_of_2(grow));
+
+	if (IS_ALIGNED(wal->count, grow) && /* Either uninitialized or full. */
+	    wa_list_grow(wal, ALIGN(wal->count + 1, grow), GFP_KERNEL)) {
+		DRM_ERROR("Unable to store w/a for reg %04x\n",
+			  i915_mmio_reg_offset(wa->reg));
+		return;
+	}
+
+	__wa_add(wal, wa);
+}
+
 static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
 		   u32 clear, u32 set, u32 read_mask, bool masked_reg)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH 2/8] drm/i915/gt: Refactor _wa_add to reuse wa_index and wa_list_grow
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

From: Chris Wilson <chris.p.wilson@intel.com>

Switch the search and grow code of the _wa_add to use _wa_index and
_wa_list_grow.

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 124 +++++++++++---------
 1 file changed, 71 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 2a8cc0e2d1b1..6928f250cafe 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -60,20 +60,19 @@ static void wa_init_start(struct i915_wa_list *wal, const char *name, const char
 
 #define WA_LIST_CHUNK (1 << 4)
 
-static void wa_init_finish(struct i915_wa_list *wal)
+static void wa_trim(struct i915_wa_list *wal, gfp_t gfp)
 {
+	struct i915_wa *list;
+
 	/* Trim unused entries. */
-	if (!IS_ALIGNED(wal->count, WA_LIST_CHUNK)) {
-		struct i915_wa *list = kmemdup(wal->list,
-					       wal->count * sizeof(*list),
-					       GFP_KERNEL);
-
-		if (list) {
-			kfree(wal->list);
-			wal->list = list;
-		}
-	}
+	list = krealloc(wal->list, wal->count * sizeof(*list), gfp);
+	if (list)
+		wal->list = list;
+}
 
+static void wa_init_finish(struct i915_wa_list *wal)
+{
+	wa_trim(wal, GFP_KERNEL);
 	if (!wal->count)
 		return;
 
@@ -81,57 +80,60 @@ static void wa_init_finish(struct i915_wa_list *wal)
 			 wal->wa_count, wal->name, wal->engine_name);
 }
 
-static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
+static int wa_index(struct i915_wa_list *wal, i915_reg_t reg)
 {
-	unsigned int addr = i915_mmio_reg_offset(wa->reg);
-	unsigned int start = 0, end = wal->count;
-	const unsigned int grow = WA_LIST_CHUNK;
-	struct i915_wa *wa_;
+	unsigned int addr = i915_mmio_reg_offset(reg);
+	int start = 0, end = wal->count;
 
-	GEM_BUG_ON(!is_power_of_2(grow));
+	/* addr and wal->list[].reg, both include the R/W flags */
+	while (start < end) {
+		unsigned int mid = start + (end - start) / 2;
 
-	if (IS_ALIGNED(wal->count, grow)) { /* Either uninitialized or full. */
-		struct i915_wa *list;
+		if (i915_mmio_reg_offset(wal->list[mid].reg) < addr)
+			start = mid + 1;
+		else if (i915_mmio_reg_offset(wal->list[mid].reg) > addr)
+			end = mid;
+		else
+			return mid;
+	}
 
-		list = kmalloc_array(ALIGN(wal->count + 1, grow), sizeof(*wa),
-				     GFP_KERNEL);
-		if (!list) {
-			DRM_ERROR("No space for workaround init!\n");
-			return;
-		}
+	return -ENOENT;
+}
 
-		if (wal->list) {
-			memcpy(list, wal->list, sizeof(*wa) * wal->count);
-			kfree(wal->list);
-		}
+static int wa_list_grow(struct i915_wa_list *wal, size_t count, gfp_t gfp)
+{
+	struct i915_wa *list;
 
-		wal->list = list;
-	}
+	list = krealloc(wal->list, count * sizeof(*list), gfp);
+	if (!list)
+		return -ENOMEM;
 
-	while (start < end) {
-		unsigned int mid = start + (end - start) / 2;
+	wal->list = list;
+	return 0;
+}
 
-		if (i915_mmio_reg_offset(wal->list[mid].reg) < addr) {
-			start = mid + 1;
-		} else if (i915_mmio_reg_offset(wal->list[mid].reg) > addr) {
-			end = mid;
-		} else {
-			wa_ = &wal->list[mid];
-
-			if ((wa->clr | wa_->clr) && !(wa->clr & ~wa_->clr)) {
-				DRM_ERROR("Discarding overwritten w/a for reg %04x (clear: %08x, set: %08x)\n",
-					  i915_mmio_reg_offset(wa_->reg),
-					  wa_->clr, wa_->set);
-
-				wa_->set &= ~wa->clr;
-			}
-
-			wal->wa_count++;
-			wa_->set |= wa->set;
-			wa_->clr |= wa->clr;
-			wa_->read |= wa->read;
-			return;
+static void __wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
+{
+	struct i915_wa *wa_;
+	int index;
+
+	index = wa_index(wal, wa->reg);
+	if (index >= 0) {
+		wa_ = &wal->list[index];
+
+		if ((wa->clr | wa_->clr) && !(wa->clr & ~wa_->clr)) {
+			DRM_ERROR("Discarding overwritten w/a for reg %04x (clear: %08x, set: %08x)\n",
+				  i915_mmio_reg_offset(wa_->reg),
+				  wa_->clr, wa_->set);
+
+			wa_->set &= ~wa->clr;
 		}
+
+		wal->wa_count++;
+		wa_->set |= wa->set;
+		wa_->clr |= wa->clr;
+		wa_->read |= wa->read;
+		return;
 	}
 
 	wal->wa_count++;
@@ -149,6 +151,22 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 	}
 }
 
+static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
+{
+	const unsigned int grow = WA_LIST_CHUNK;
+
+	GEM_BUG_ON(!is_power_of_2(grow));
+
+	if (IS_ALIGNED(wal->count, grow) && /* Either uninitialized or full. */
+	    wa_list_grow(wal, ALIGN(wal->count + 1, grow), GFP_KERNEL)) {
+		DRM_ERROR("Unable to store w/a for reg %04x\n",
+			  i915_mmio_reg_offset(wa->reg));
+		return;
+	}
+
+	__wa_add(wal, wa);
+}
+
 static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
 		   u32 clear, u32 set, u32 read_mask, bool masked_reg)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/8] drm/i915/gt: Check for conflicting RING_NONPRIV
  2021-08-30 19:38 ` [Intel-gfx] " Umesh Nerlige Ramappa
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  -1 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

From: Chris Wilson <chris@chris-wilson.co.uk>

Strip the encoded bits from the register offset so that we only use the
address for looking up the RING_NONPRIV entry.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 66 +++++++++++++--------
 1 file changed, 42 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 6928f250cafe..df452a718200 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -80,18 +80,44 @@ static void wa_init_finish(struct i915_wa_list *wal)
 			 wal->wa_count, wal->name, wal->engine_name);
 }
 
+static u32 reg_offset(i915_reg_t reg)
+{
+	return i915_mmio_reg_offset(reg) & RING_FORCE_TO_NONPRIV_ADDRESS_MASK;
+}
+
+static u32 reg_flags(i915_reg_t reg)
+{
+	return i915_mmio_reg_offset(reg) & ~RING_FORCE_TO_NONPRIV_ADDRESS_MASK;
+}
+
+__maybe_unused
+static bool is_nonpriv_flags_valid(u32 flags)
+{
+	/* Check only valid flag bits are set */
+	if (flags & ~RING_FORCE_TO_NONPRIV_MASK_VALID)
+		return false;
+
+	/* NB: Only 3 out of 4 enum values are valid for access field */
+	if ((flags & RING_FORCE_TO_NONPRIV_ACCESS_MASK) ==
+	    RING_FORCE_TO_NONPRIV_ACCESS_INVALID)
+		return false;
+
+	return true;
+}
+
 static int wa_index(struct i915_wa_list *wal, i915_reg_t reg)
 {
-	unsigned int addr = i915_mmio_reg_offset(reg);
 	int start = 0, end = wal->count;
+	u32 addr = reg_offset(reg);
 
 	/* addr and wal->list[].reg, both include the R/W flags */
 	while (start < end) {
 		unsigned int mid = start + (end - start) / 2;
+		u32 pos = reg_offset(wal->list[mid].reg);
 
-		if (i915_mmio_reg_offset(wal->list[mid].reg) < addr)
+		if (pos < addr)
 			start = mid + 1;
-		else if (i915_mmio_reg_offset(wal->list[mid].reg) > addr)
+		else if (pos > addr)
 			end = mid;
 		else
 			return mid;
@@ -117,13 +143,22 @@ static void __wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 	struct i915_wa *wa_;
 	int index;
 
+	GEM_BUG_ON(!is_nonpriv_flags_valid(reg_flags(wa->reg)));
+
 	index = wa_index(wal, wa->reg);
 	if (index >= 0) {
 		wa_ = &wal->list[index];
 
+		if (i915_mmio_reg_offset(wa->reg) !=
+		    i915_mmio_reg_offset(wa_->reg)) {
+			DRM_ERROR("Discarding incompatible w/a for reg %04x\n",
+				  reg_offset(wa->reg));
+			return;
+		}
+
 		if ((wa->clr | wa_->clr) && !(wa->clr & ~wa_->clr)) {
 			DRM_ERROR("Discarding overwritten w/a for reg %04x (clear: %08x, set: %08x)\n",
-				  i915_mmio_reg_offset(wa_->reg),
+				  reg_offset(wa_->reg),
 				  wa_->clr, wa_->set);
 
 			wa_->set &= ~wa->clr;
@@ -141,10 +176,8 @@ static void __wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 	*wa_ = *wa;
 
 	while (wa_-- > wal->list) {
-		GEM_BUG_ON(i915_mmio_reg_offset(wa_[0].reg) ==
-			   i915_mmio_reg_offset(wa_[1].reg));
-		if (i915_mmio_reg_offset(wa_[1].reg) >
-		    i915_mmio_reg_offset(wa_[0].reg))
+		GEM_BUG_ON(reg_offset(wa_[0].reg) == reg_offset(wa_[1].reg));
+		if (reg_offset(wa_[1].reg) > reg_offset(wa_[0].reg))
 			break;
 
 		swap(wa_[1], wa_[0]);
@@ -160,7 +193,7 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 	if (IS_ALIGNED(wal->count, grow) && /* Either uninitialized or full. */
 	    wa_list_grow(wal, ALIGN(wal->count + 1, grow), GFP_KERNEL)) {
 		DRM_ERROR("Unable to store w/a for reg %04x\n",
-			  i915_mmio_reg_offset(wa->reg));
+			  reg_offset(wa->reg));
 		return;
 	}
 
@@ -1367,21 +1400,6 @@ bool intel_gt_verify_workarounds(struct intel_gt *gt, const char *from)
 	return wa_list_verify(gt, &gt->i915->gt_wa_list, from);
 }
 
-__maybe_unused
-static bool is_nonpriv_flags_valid(u32 flags)
-{
-	/* Check only valid flag bits are set */
-	if (flags & ~RING_FORCE_TO_NONPRIV_MASK_VALID)
-		return false;
-
-	/* NB: Only 3 out of 4 enum values are valid for access field */
-	if ((flags & RING_FORCE_TO_NONPRIV_ACCESS_MASK) ==
-	    RING_FORCE_TO_NONPRIV_ACCESS_INVALID)
-		return false;
-
-	return true;
-}
-
 static void
 whitelist_reg_ext(struct i915_wa_list *wal, i915_reg_t reg, u32 flags)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH 3/8] drm/i915/gt: Check for conflicting RING_NONPRIV
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

From: Chris Wilson <chris@chris-wilson.co.uk>

Strip the encoded bits from the register offset so that we only use the
address for looking up the RING_NONPRIV entry.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 66 +++++++++++++--------
 1 file changed, 42 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 6928f250cafe..df452a718200 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -80,18 +80,44 @@ static void wa_init_finish(struct i915_wa_list *wal)
 			 wal->wa_count, wal->name, wal->engine_name);
 }
 
+static u32 reg_offset(i915_reg_t reg)
+{
+	return i915_mmio_reg_offset(reg) & RING_FORCE_TO_NONPRIV_ADDRESS_MASK;
+}
+
+static u32 reg_flags(i915_reg_t reg)
+{
+	return i915_mmio_reg_offset(reg) & ~RING_FORCE_TO_NONPRIV_ADDRESS_MASK;
+}
+
+__maybe_unused
+static bool is_nonpriv_flags_valid(u32 flags)
+{
+	/* Check only valid flag bits are set */
+	if (flags & ~RING_FORCE_TO_NONPRIV_MASK_VALID)
+		return false;
+
+	/* NB: Only 3 out of 4 enum values are valid for access field */
+	if ((flags & RING_FORCE_TO_NONPRIV_ACCESS_MASK) ==
+	    RING_FORCE_TO_NONPRIV_ACCESS_INVALID)
+		return false;
+
+	return true;
+}
+
 static int wa_index(struct i915_wa_list *wal, i915_reg_t reg)
 {
-	unsigned int addr = i915_mmio_reg_offset(reg);
 	int start = 0, end = wal->count;
+	u32 addr = reg_offset(reg);
 
 	/* addr and wal->list[].reg, both include the R/W flags */
 	while (start < end) {
 		unsigned int mid = start + (end - start) / 2;
+		u32 pos = reg_offset(wal->list[mid].reg);
 
-		if (i915_mmio_reg_offset(wal->list[mid].reg) < addr)
+		if (pos < addr)
 			start = mid + 1;
-		else if (i915_mmio_reg_offset(wal->list[mid].reg) > addr)
+		else if (pos > addr)
 			end = mid;
 		else
 			return mid;
@@ -117,13 +143,22 @@ static void __wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 	struct i915_wa *wa_;
 	int index;
 
+	GEM_BUG_ON(!is_nonpriv_flags_valid(reg_flags(wa->reg)));
+
 	index = wa_index(wal, wa->reg);
 	if (index >= 0) {
 		wa_ = &wal->list[index];
 
+		if (i915_mmio_reg_offset(wa->reg) !=
+		    i915_mmio_reg_offset(wa_->reg)) {
+			DRM_ERROR("Discarding incompatible w/a for reg %04x\n",
+				  reg_offset(wa->reg));
+			return;
+		}
+
 		if ((wa->clr | wa_->clr) && !(wa->clr & ~wa_->clr)) {
 			DRM_ERROR("Discarding overwritten w/a for reg %04x (clear: %08x, set: %08x)\n",
-				  i915_mmio_reg_offset(wa_->reg),
+				  reg_offset(wa_->reg),
 				  wa_->clr, wa_->set);
 
 			wa_->set &= ~wa->clr;
@@ -141,10 +176,8 @@ static void __wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 	*wa_ = *wa;
 
 	while (wa_-- > wal->list) {
-		GEM_BUG_ON(i915_mmio_reg_offset(wa_[0].reg) ==
-			   i915_mmio_reg_offset(wa_[1].reg));
-		if (i915_mmio_reg_offset(wa_[1].reg) >
-		    i915_mmio_reg_offset(wa_[0].reg))
+		GEM_BUG_ON(reg_offset(wa_[0].reg) == reg_offset(wa_[1].reg));
+		if (reg_offset(wa_[1].reg) > reg_offset(wa_[0].reg))
 			break;
 
 		swap(wa_[1], wa_[0]);
@@ -160,7 +193,7 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 	if (IS_ALIGNED(wal->count, grow) && /* Either uninitialized or full. */
 	    wa_list_grow(wal, ALIGN(wal->count + 1, grow), GFP_KERNEL)) {
 		DRM_ERROR("Unable to store w/a for reg %04x\n",
-			  i915_mmio_reg_offset(wa->reg));
+			  reg_offset(wa->reg));
 		return;
 	}
 
@@ -1367,21 +1400,6 @@ bool intel_gt_verify_workarounds(struct intel_gt *gt, const char *from)
 	return wa_list_verify(gt, &gt->i915->gt_wa_list, from);
 }
 
-__maybe_unused
-static bool is_nonpriv_flags_valid(u32 flags)
-{
-	/* Check only valid flag bits are set */
-	if (flags & ~RING_FORCE_TO_NONPRIV_MASK_VALID)
-		return false;
-
-	/* NB: Only 3 out of 4 enum values are valid for access field */
-	if ((flags & RING_FORCE_TO_NONPRIV_ACCESS_MASK) ==
-	    RING_FORCE_TO_NONPRIV_ACCESS_INVALID)
-		return false;
-
-	return true;
-}
-
 static void
 whitelist_reg_ext(struct i915_wa_list *wal, i915_reg_t reg, u32 flags)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/8] drm/i915/gt: Enable dynamic adjustment of RING_NONPRIV
  2021-08-30 19:38 ` [Intel-gfx] " Umesh Nerlige Ramappa
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  -1 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

From: Chris Wilson <chris@chris-wilson.co.uk>

The OA subsystem would like to enable its privileged clients access to
the OA registers from execbuf. This requires temporarily removing the
HW validation from those registers for the duration of the OA client,
for which we need to allow OA to dynamically adjust the set of
RING_NONPRIV.

Care must still be taken since the RING_NONPRIV are global, so any and
all contexts that run at the same time as the OA client, will also be
able to adjust the registers from their execbuf.

v2: Fix memmove size (Umesh)
v3: Update selftest (Umesh)
- Use ppgtt for results
- Use ww locking
- Prevent rc6. Whitelist configuration is saved/restored on rc6, so
  applying whitelist configuration with rc6 enabled leads to a race
  where the pwr ctx restored configuration conflicts with the most
  recently applied config in the selftest.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |  59 ++++
 drivers/gpu/drm/i915/gt/intel_workarounds.h   |   7 +
 .../gpu/drm/i915/gt/selftest_workarounds.c    | 267 ++++++++++++++++++
 3 files changed, 333 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index df452a718200..c1ec09162e66 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -200,6 +200,18 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 	__wa_add(wal, wa);
 }
 
+static void _wa_del(struct i915_wa_list *wal, i915_reg_t reg)
+{
+	struct i915_wa *wa = wal->list;
+	int index;
+
+	index = wa_index(wal, reg);
+	if (GEM_DEBUG_WARN_ON(index < 0))
+		return;
+
+	memmove(wa + index, wa + index + 1, (--wal->count - index) * sizeof(*wa));
+}
+
 static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
 		   u32 clear, u32 set, u32 read_mask, bool masked_reg)
 {
@@ -2152,6 +2164,53 @@ void intel_engine_init_workarounds(struct intel_engine_cs *engine)
 	wa_init_finish(wal);
 }
 
+int intel_engine_allow_user_register_access(struct intel_engine_cs *engine,
+					    const i915_reg_t *reg,
+					    unsigned int count)
+{
+	struct i915_wa_list *wal = &engine->whitelist;
+	unsigned long flags;
+	int err;
+
+	if (GEM_DEBUG_WARN_ON(wal->count + count >= RING_MAX_NONPRIV_SLOTS))
+		return -ENOSPC;
+
+	spin_lock_irqsave(&engine->uncore->lock, flags);
+
+	err = wa_list_grow(wal, wal->count + count, GFP_ATOMIC | __GFP_NOWARN);
+	if (err)
+		goto out;
+
+	while (count--) {
+		struct i915_wa wa = { .reg = *reg++ };
+
+		__wa_add(wal, &wa);
+	}
+
+	__engine_apply_whitelist(engine);
+
+out:
+	spin_unlock_irqrestore(&engine->uncore->lock, flags);
+	return err;
+}
+
+void intel_engine_deny_user_register_access(struct intel_engine_cs *engine,
+					    const i915_reg_t *reg,
+					    unsigned int count)
+{
+	struct i915_wa_list *wal = &engine->whitelist;
+	unsigned long flags;
+
+	spin_lock_irqsave(&engine->uncore->lock, flags);
+
+	while (count--)
+		_wa_del(wal, *reg++);
+
+	__engine_apply_whitelist(engine);
+
+	spin_unlock_irqrestore(&engine->uncore->lock, flags);
+}
+
 void intel_engine_apply_workarounds(struct intel_engine_cs *engine)
 {
 	wa_list_apply(engine->gt, &engine->wa_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.h b/drivers/gpu/drm/i915/gt/intel_workarounds.h
index 15abb68b6c00..3c50390e3a7f 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.h
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.h
@@ -36,4 +36,11 @@ void intel_engine_apply_workarounds(struct intel_engine_cs *engine);
 int intel_engine_verify_workarounds(struct intel_engine_cs *engine,
 				    const char *from);
 
+int intel_engine_allow_user_register_access(struct intel_engine_cs *engine,
+					    const i915_reg_t *reg,
+					    unsigned int count);
+void intel_engine_deny_user_register_access(struct intel_engine_cs *engine,
+					    const i915_reg_t *reg,
+					    unsigned int count);
+
 #endif
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index e623ac45f4aa..ce91fad9075f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -1177,6 +1177,272 @@ static int live_isolated_whitelist(void *arg)
 	return err;
 }
 
+static int rmw_reg(struct intel_engine_cs *engine, const i915_reg_t reg)
+{
+	const u32 values[] = {
+		0x00000000,
+		0x01010101,
+		0x10100101,
+		0x03030303,
+		0x30300303,
+		0x05050505,
+		0x50500505,
+		0x0f0f0f0f,
+		0xf00ff00f,
+		0x10101010,
+		0xf0f01010,
+		0x30303030,
+		0xa0a03030,
+		0x50505050,
+		0xc0c05050,
+		0xf0f0f0f0,
+		0x11111111,
+		0x33333333,
+		0x55555555,
+		0x0000ffff,
+		0x00ff00ff,
+		0xff0000ff,
+		0xffff00ff,
+		0xffffffff,
+	};
+	struct i915_vma *vma, *batch;
+	struct i915_gem_ww_ctx ww;
+	struct intel_context *ce;
+	struct i915_request *rq;
+	u32 srm, lrm, idx;
+	u32 *cs, *results;
+	u64 addr;
+	int err;
+	int sz;
+	int v;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	sz = (2 * ARRAY_SIZE(values) + 1) * sizeof(u32);
+	vma = __vm_create_scratch_for_read_pinned(ce->vm, sz);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto out_context;
+	}
+
+	batch = create_batch(ce->vm);
+	if (IS_ERR(batch)) {
+		err = PTR_ERR(batch);
+		goto out_vma;
+	}
+
+	srm = MI_STORE_REGISTER_MEM;
+	lrm = MI_LOAD_REGISTER_MEM;
+	if (GRAPHICS_VER(ce->vm->i915) >= 8)
+		lrm++, srm++;
+
+	addr = vma->node.start;
+
+	i915_gem_ww_ctx_init(&ww, false);
+retry:
+	err = i915_gem_object_lock(vma->obj, &ww);
+	if (!err)
+		err = i915_gem_object_lock(batch->obj, &ww);
+	if (!err)
+		err = intel_context_pin_ww(ce, &ww);
+	if (err)
+		goto out_ww;
+
+	cs = i915_gem_object_pin_map(batch->obj, I915_MAP_WC);
+	if (IS_ERR(cs)) {
+		err = PTR_ERR(cs);
+		goto out_ctx;
+	}
+
+	results = i915_gem_object_pin_map(vma->obj, I915_MAP_WB);
+	if (IS_ERR(results)) {
+		err = PTR_ERR(results);
+		goto out_unpin_batch;
+	}
+
+	/* SRM original */
+	*cs++ = srm;
+	*cs++ = i915_mmio_reg_offset(reg);
+	*cs++ = lower_32_bits(addr);
+	*cs++ = upper_32_bits(addr);
+
+	idx = 1;
+	for (v = 0; v < ARRAY_SIZE(values); v++) {
+		/* LRI garbage */
+		*cs++ = MI_LOAD_REGISTER_IMM(1);
+		*cs++ = i915_mmio_reg_offset(reg);
+		*cs++ = values[v];
+
+		/* SRM result */
+		*cs++ = srm;
+		*cs++ = i915_mmio_reg_offset(reg);
+		*cs++ = lower_32_bits(addr + sizeof(u32) * idx);
+		*cs++ = upper_32_bits(addr + sizeof(u32) * idx);
+		idx++;
+	}
+	for (v = 0; v < ARRAY_SIZE(values); v++) {
+		/* LRI garbage */
+		*cs++ = MI_LOAD_REGISTER_IMM(1);
+		*cs++ = i915_mmio_reg_offset(reg);
+		*cs++ = ~values[v];
+
+		/* SRM result */
+		*cs++ = srm;
+		*cs++ = i915_mmio_reg_offset(reg);
+		*cs++ = lower_32_bits(addr + sizeof(u32) * idx);
+		*cs++ = upper_32_bits(addr + sizeof(u32) * idx);
+		idx++;
+	}
+
+	/* LRM original -- don't leave garbage in the context! */
+	*cs++ = lrm;
+	*cs++ = i915_mmio_reg_offset(reg);
+	*cs++ = lower_32_bits(addr);
+	*cs++ = upper_32_bits(addr);
+
+	*cs++ = MI_BATCH_BUFFER_END;
+
+	i915_gem_object_flush_map(batch->obj);
+	intel_gt_chipset_flush(engine->gt);
+
+	rq = i915_request_create(ce);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto out_unpin_vma;
+	}
+
+	if (engine->emit_init_breadcrumb) { /* Be nice if we hang */
+		err = engine->emit_init_breadcrumb(rq);
+		if (err)
+			goto err_request;
+	}
+
+	err = i915_request_await_object(rq, batch->obj, false);
+	if (err == 0)
+		err = i915_vma_move_to_active(batch, rq, 0);
+	if (err)
+		goto err_request;
+
+	err = i915_request_await_object(rq, vma->obj, true);
+	if (err == 0)
+		err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
+	if (err)
+		goto err_request;
+
+	err = engine->emit_bb_start(rq, batch->node.start, PAGE_SIZE, 0);
+	if (err)
+		goto err_request;
+
+err_request:
+	err = request_add_sync(rq, err);
+	if (err) {
+		pr_err("%s: Futzing %04x timedout; cancelling test\n",
+		       engine->name, i915_mmio_reg_offset(reg));
+		intel_gt_set_wedged(engine->gt);
+		goto out_unpin_vma;
+	}
+
+	for (v = 0, idx = 0; v < 2 * ARRAY_SIZE(values); v++) {
+		if (results[++idx] != results[0]) {
+			err = idx;
+			break;
+		}
+	}
+
+out_unpin_vma:
+	i915_gem_object_unpin_map(vma->obj);
+out_unpin_batch:
+	i915_gem_object_unpin_map(batch->obj);
+out_ctx:
+	intel_context_unpin(ce);
+out_ww:
+	if (err == -EDEADLK) {
+		err = i915_gem_ww_ctx_backoff(&ww);
+		if (!err)
+			goto retry;
+	}
+	i915_gem_ww_ctx_fini(&ww);
+	i915_vma_unpin_and_release(&batch, 0);
+out_vma:
+	i915_vma_unpin_and_release(&vma, 0);
+out_context:
+	intel_context_put(ce);
+	return err;
+}
+
+static int live_dynamic_whitelist(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = 0;
+
+	if (GRAPHICS_VER(gt->i915) < 8)
+		return 0;
+
+	for_each_engine(engine, gt, id) {
+		const i915_reg_t reg = RING_MAX_IDLE(engine->mmio_base);
+
+		intel_engine_pm_get(engine);
+		intel_uncore_forcewake_get(engine->uncore, FORCEWAKE_ALL);
+
+		err = rmw_reg(engine, reg);
+		if (err < 0)
+			break;
+
+		if (err) {
+			pr_err("%s: Able to write to protected reg:%04x!\n",
+			       engine->name, i915_mmio_reg_offset(reg));
+			err = -EINVAL;
+			break;
+		}
+
+		err = intel_engine_allow_user_register_access(engine, &reg, 1);
+		if (err)
+			break;
+
+		err = rmw_reg(engine, reg);
+		intel_engine_deny_user_register_access(engine, &reg, 1);
+		if (err < 0)
+			break;
+
+		if (!err) {
+			pr_err("%s: Unable to write to allowed reg:%04x!\n",
+			       engine->name, i915_mmio_reg_offset(reg));
+			err = -EINVAL;
+			break;
+		}
+
+		err = rmw_reg(engine, reg);
+		if (err < 0)
+			break;
+
+		if (err) {
+			pr_err("%s: Able to write to denied reg:%04x!\n",
+			       engine->name, i915_mmio_reg_offset(reg));
+			err = -EINVAL;
+			break;
+		}
+
+		intel_uncore_forcewake_put(engine->uncore, FORCEWAKE_ALL);
+		intel_engine_pm_put(engine);
+
+		err = 0;
+	}
+
+	if (err) {
+		intel_uncore_forcewake_put(engine->uncore, FORCEWAKE_ALL);
+		intel_engine_pm_put(engine);
+	}
+
+	if (igt_flush_test(gt->i915))
+		err = -EIO;
+
+	return err;
+}
+
 static bool
 verify_wa_lists(struct intel_gt *gt, struct wa_lists *lists,
 		const char *str)
@@ -1383,6 +1649,7 @@ int intel_workarounds_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_dirty_whitelist),
 		SUBTEST(live_reset_whitelist),
 		SUBTEST(live_isolated_whitelist),
+		SUBTEST(live_dynamic_whitelist),
 		SUBTEST(live_gpu_reset_workarounds),
 		SUBTEST(live_engine_reset_workarounds),
 	};
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH 4/8] drm/i915/gt: Enable dynamic adjustment of RING_NONPRIV
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

From: Chris Wilson <chris@chris-wilson.co.uk>

The OA subsystem would like to enable its privileged clients access to
the OA registers from execbuf. This requires temporarily removing the
HW validation from those registers for the duration of the OA client,
for which we need to allow OA to dynamically adjust the set of
RING_NONPRIV.

Care must still be taken since the RING_NONPRIV are global, so any and
all contexts that run at the same time as the OA client, will also be
able to adjust the registers from their execbuf.

v2: Fix memmove size (Umesh)
v3: Update selftest (Umesh)
- Use ppgtt for results
- Use ww locking
- Prevent rc6. Whitelist configuration is saved/restored on rc6, so
  applying whitelist configuration with rc6 enabled leads to a race
  where the pwr ctx restored configuration conflicts with the most
  recently applied config in the selftest.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |  59 ++++
 drivers/gpu/drm/i915/gt/intel_workarounds.h   |   7 +
 .../gpu/drm/i915/gt/selftest_workarounds.c    | 267 ++++++++++++++++++
 3 files changed, 333 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index df452a718200..c1ec09162e66 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -200,6 +200,18 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 	__wa_add(wal, wa);
 }
 
+static void _wa_del(struct i915_wa_list *wal, i915_reg_t reg)
+{
+	struct i915_wa *wa = wal->list;
+	int index;
+
+	index = wa_index(wal, reg);
+	if (GEM_DEBUG_WARN_ON(index < 0))
+		return;
+
+	memmove(wa + index, wa + index + 1, (--wal->count - index) * sizeof(*wa));
+}
+
 static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
 		   u32 clear, u32 set, u32 read_mask, bool masked_reg)
 {
@@ -2152,6 +2164,53 @@ void intel_engine_init_workarounds(struct intel_engine_cs *engine)
 	wa_init_finish(wal);
 }
 
+int intel_engine_allow_user_register_access(struct intel_engine_cs *engine,
+					    const i915_reg_t *reg,
+					    unsigned int count)
+{
+	struct i915_wa_list *wal = &engine->whitelist;
+	unsigned long flags;
+	int err;
+
+	if (GEM_DEBUG_WARN_ON(wal->count + count >= RING_MAX_NONPRIV_SLOTS))
+		return -ENOSPC;
+
+	spin_lock_irqsave(&engine->uncore->lock, flags);
+
+	err = wa_list_grow(wal, wal->count + count, GFP_ATOMIC | __GFP_NOWARN);
+	if (err)
+		goto out;
+
+	while (count--) {
+		struct i915_wa wa = { .reg = *reg++ };
+
+		__wa_add(wal, &wa);
+	}
+
+	__engine_apply_whitelist(engine);
+
+out:
+	spin_unlock_irqrestore(&engine->uncore->lock, flags);
+	return err;
+}
+
+void intel_engine_deny_user_register_access(struct intel_engine_cs *engine,
+					    const i915_reg_t *reg,
+					    unsigned int count)
+{
+	struct i915_wa_list *wal = &engine->whitelist;
+	unsigned long flags;
+
+	spin_lock_irqsave(&engine->uncore->lock, flags);
+
+	while (count--)
+		_wa_del(wal, *reg++);
+
+	__engine_apply_whitelist(engine);
+
+	spin_unlock_irqrestore(&engine->uncore->lock, flags);
+}
+
 void intel_engine_apply_workarounds(struct intel_engine_cs *engine)
 {
 	wa_list_apply(engine->gt, &engine->wa_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.h b/drivers/gpu/drm/i915/gt/intel_workarounds.h
index 15abb68b6c00..3c50390e3a7f 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.h
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.h
@@ -36,4 +36,11 @@ void intel_engine_apply_workarounds(struct intel_engine_cs *engine);
 int intel_engine_verify_workarounds(struct intel_engine_cs *engine,
 				    const char *from);
 
+int intel_engine_allow_user_register_access(struct intel_engine_cs *engine,
+					    const i915_reg_t *reg,
+					    unsigned int count);
+void intel_engine_deny_user_register_access(struct intel_engine_cs *engine,
+					    const i915_reg_t *reg,
+					    unsigned int count);
+
 #endif
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index e623ac45f4aa..ce91fad9075f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -1177,6 +1177,272 @@ static int live_isolated_whitelist(void *arg)
 	return err;
 }
 
+static int rmw_reg(struct intel_engine_cs *engine, const i915_reg_t reg)
+{
+	const u32 values[] = {
+		0x00000000,
+		0x01010101,
+		0x10100101,
+		0x03030303,
+		0x30300303,
+		0x05050505,
+		0x50500505,
+		0x0f0f0f0f,
+		0xf00ff00f,
+		0x10101010,
+		0xf0f01010,
+		0x30303030,
+		0xa0a03030,
+		0x50505050,
+		0xc0c05050,
+		0xf0f0f0f0,
+		0x11111111,
+		0x33333333,
+		0x55555555,
+		0x0000ffff,
+		0x00ff00ff,
+		0xff0000ff,
+		0xffff00ff,
+		0xffffffff,
+	};
+	struct i915_vma *vma, *batch;
+	struct i915_gem_ww_ctx ww;
+	struct intel_context *ce;
+	struct i915_request *rq;
+	u32 srm, lrm, idx;
+	u32 *cs, *results;
+	u64 addr;
+	int err;
+	int sz;
+	int v;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	sz = (2 * ARRAY_SIZE(values) + 1) * sizeof(u32);
+	vma = __vm_create_scratch_for_read_pinned(ce->vm, sz);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto out_context;
+	}
+
+	batch = create_batch(ce->vm);
+	if (IS_ERR(batch)) {
+		err = PTR_ERR(batch);
+		goto out_vma;
+	}
+
+	srm = MI_STORE_REGISTER_MEM;
+	lrm = MI_LOAD_REGISTER_MEM;
+	if (GRAPHICS_VER(ce->vm->i915) >= 8)
+		lrm++, srm++;
+
+	addr = vma->node.start;
+
+	i915_gem_ww_ctx_init(&ww, false);
+retry:
+	err = i915_gem_object_lock(vma->obj, &ww);
+	if (!err)
+		err = i915_gem_object_lock(batch->obj, &ww);
+	if (!err)
+		err = intel_context_pin_ww(ce, &ww);
+	if (err)
+		goto out_ww;
+
+	cs = i915_gem_object_pin_map(batch->obj, I915_MAP_WC);
+	if (IS_ERR(cs)) {
+		err = PTR_ERR(cs);
+		goto out_ctx;
+	}
+
+	results = i915_gem_object_pin_map(vma->obj, I915_MAP_WB);
+	if (IS_ERR(results)) {
+		err = PTR_ERR(results);
+		goto out_unpin_batch;
+	}
+
+	/* SRM original */
+	*cs++ = srm;
+	*cs++ = i915_mmio_reg_offset(reg);
+	*cs++ = lower_32_bits(addr);
+	*cs++ = upper_32_bits(addr);
+
+	idx = 1;
+	for (v = 0; v < ARRAY_SIZE(values); v++) {
+		/* LRI garbage */
+		*cs++ = MI_LOAD_REGISTER_IMM(1);
+		*cs++ = i915_mmio_reg_offset(reg);
+		*cs++ = values[v];
+
+		/* SRM result */
+		*cs++ = srm;
+		*cs++ = i915_mmio_reg_offset(reg);
+		*cs++ = lower_32_bits(addr + sizeof(u32) * idx);
+		*cs++ = upper_32_bits(addr + sizeof(u32) * idx);
+		idx++;
+	}
+	for (v = 0; v < ARRAY_SIZE(values); v++) {
+		/* LRI garbage */
+		*cs++ = MI_LOAD_REGISTER_IMM(1);
+		*cs++ = i915_mmio_reg_offset(reg);
+		*cs++ = ~values[v];
+
+		/* SRM result */
+		*cs++ = srm;
+		*cs++ = i915_mmio_reg_offset(reg);
+		*cs++ = lower_32_bits(addr + sizeof(u32) * idx);
+		*cs++ = upper_32_bits(addr + sizeof(u32) * idx);
+		idx++;
+	}
+
+	/* LRM original -- don't leave garbage in the context! */
+	*cs++ = lrm;
+	*cs++ = i915_mmio_reg_offset(reg);
+	*cs++ = lower_32_bits(addr);
+	*cs++ = upper_32_bits(addr);
+
+	*cs++ = MI_BATCH_BUFFER_END;
+
+	i915_gem_object_flush_map(batch->obj);
+	intel_gt_chipset_flush(engine->gt);
+
+	rq = i915_request_create(ce);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto out_unpin_vma;
+	}
+
+	if (engine->emit_init_breadcrumb) { /* Be nice if we hang */
+		err = engine->emit_init_breadcrumb(rq);
+		if (err)
+			goto err_request;
+	}
+
+	err = i915_request_await_object(rq, batch->obj, false);
+	if (err == 0)
+		err = i915_vma_move_to_active(batch, rq, 0);
+	if (err)
+		goto err_request;
+
+	err = i915_request_await_object(rq, vma->obj, true);
+	if (err == 0)
+		err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
+	if (err)
+		goto err_request;
+
+	err = engine->emit_bb_start(rq, batch->node.start, PAGE_SIZE, 0);
+	if (err)
+		goto err_request;
+
+err_request:
+	err = request_add_sync(rq, err);
+	if (err) {
+		pr_err("%s: Futzing %04x timedout; cancelling test\n",
+		       engine->name, i915_mmio_reg_offset(reg));
+		intel_gt_set_wedged(engine->gt);
+		goto out_unpin_vma;
+	}
+
+	for (v = 0, idx = 0; v < 2 * ARRAY_SIZE(values); v++) {
+		if (results[++idx] != results[0]) {
+			err = idx;
+			break;
+		}
+	}
+
+out_unpin_vma:
+	i915_gem_object_unpin_map(vma->obj);
+out_unpin_batch:
+	i915_gem_object_unpin_map(batch->obj);
+out_ctx:
+	intel_context_unpin(ce);
+out_ww:
+	if (err == -EDEADLK) {
+		err = i915_gem_ww_ctx_backoff(&ww);
+		if (!err)
+			goto retry;
+	}
+	i915_gem_ww_ctx_fini(&ww);
+	i915_vma_unpin_and_release(&batch, 0);
+out_vma:
+	i915_vma_unpin_and_release(&vma, 0);
+out_context:
+	intel_context_put(ce);
+	return err;
+}
+
+static int live_dynamic_whitelist(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = 0;
+
+	if (GRAPHICS_VER(gt->i915) < 8)
+		return 0;
+
+	for_each_engine(engine, gt, id) {
+		const i915_reg_t reg = RING_MAX_IDLE(engine->mmio_base);
+
+		intel_engine_pm_get(engine);
+		intel_uncore_forcewake_get(engine->uncore, FORCEWAKE_ALL);
+
+		err = rmw_reg(engine, reg);
+		if (err < 0)
+			break;
+
+		if (err) {
+			pr_err("%s: Able to write to protected reg:%04x!\n",
+			       engine->name, i915_mmio_reg_offset(reg));
+			err = -EINVAL;
+			break;
+		}
+
+		err = intel_engine_allow_user_register_access(engine, &reg, 1);
+		if (err)
+			break;
+
+		err = rmw_reg(engine, reg);
+		intel_engine_deny_user_register_access(engine, &reg, 1);
+		if (err < 0)
+			break;
+
+		if (!err) {
+			pr_err("%s: Unable to write to allowed reg:%04x!\n",
+			       engine->name, i915_mmio_reg_offset(reg));
+			err = -EINVAL;
+			break;
+		}
+
+		err = rmw_reg(engine, reg);
+		if (err < 0)
+			break;
+
+		if (err) {
+			pr_err("%s: Able to write to denied reg:%04x!\n",
+			       engine->name, i915_mmio_reg_offset(reg));
+			err = -EINVAL;
+			break;
+		}
+
+		intel_uncore_forcewake_put(engine->uncore, FORCEWAKE_ALL);
+		intel_engine_pm_put(engine);
+
+		err = 0;
+	}
+
+	if (err) {
+		intel_uncore_forcewake_put(engine->uncore, FORCEWAKE_ALL);
+		intel_engine_pm_put(engine);
+	}
+
+	if (igt_flush_test(gt->i915))
+		err = -EIO;
+
+	return err;
+}
+
 static bool
 verify_wa_lists(struct intel_gt *gt, struct wa_lists *lists,
 		const char *str)
@@ -1383,6 +1649,7 @@ int intel_workarounds_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_dirty_whitelist),
 		SUBTEST(live_reset_whitelist),
 		SUBTEST(live_isolated_whitelist),
+		SUBTEST(live_dynamic_whitelist),
 		SUBTEST(live_gpu_reset_workarounds),
 		SUBTEST(live_engine_reset_workarounds),
 	};
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH 5/8] drm/i915/perf: Ensure observation logic is not clock gated
  2021-08-30 19:38 ` [Intel-gfx] " Umesh Nerlige Ramappa
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  -1 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

From: Piotr Maciejewski <piotr.maciejewski@intel.com>

A clock gating switch can control if the performance monitoring and
observation logic is enaled or not. Ensure that we enable the clocks.

v2: Separate code from other patches (Lionel)
v3: Reset PMON enable when disabling perf to save power (Lionel)
v4: Use intel_uncore_rmw and REG_BIT (Chris)

Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 9 +++++++++
 drivers/gpu/drm/i915/i915_reg.h  | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2f01b8c0284c..3ded6e7d8526 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2553,6 +2553,12 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
 			    (period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT))
 			    : 0);
 
+	/*
+	 * Initialize Super Queue Internal Cnt Register
+	 * Set PMON Enable in order to collect valid metrics.
+	 */
+	intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, GEN12_SQCNT1_PMON_ENABLE);
+
 	/*
 	 * Update all contexts prior writing the mux configurations as we need
 	 * to make sure all slices/subslices are ON before writing to NOA
@@ -2612,6 +2618,9 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
 
 	/* Make sure we disable noa to save power. */
 	intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0);
+
+	/* Reset PMON Enable to save power. */
+	intel_uncore_rmw(uncore, GEN12_SQCNT1, GEN12_SQCNT1_PMON_ENABLE, 0);
 }
 
 static void gen7_oa_enable(struct i915_perf_stream *stream)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 40943dd5e0db..77ece19bda7e 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -718,6 +718,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define OABUFFER_SIZE_16M   (7 << 3)
 
 #define GEN12_OA_TLB_INV_CR _MMIO(0xceec)
+#define GEN12_SQCNT1 _MMIO(0x8718)
+#define  GEN12_SQCNT1_PMON_ENABLE REG_BIT(30)
 
 /* Gen12 OAR unit */
 #define GEN12_OAR_OACONTROL _MMIO(0x2960)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/8] drm/i915/perf: Ensure observation logic is not clock gated
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

From: Piotr Maciejewski <piotr.maciejewski@intel.com>

A clock gating switch can control if the performance monitoring and
observation logic is enaled or not. Ensure that we enable the clocks.

v2: Separate code from other patches (Lionel)
v3: Reset PMON enable when disabling perf to save power (Lionel)
v4: Use intel_uncore_rmw and REG_BIT (Chris)

Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 9 +++++++++
 drivers/gpu/drm/i915/i915_reg.h  | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2f01b8c0284c..3ded6e7d8526 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2553,6 +2553,12 @@ gen12_enable_metric_set(struct i915_perf_stream *stream,
 			    (period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT))
 			    : 0);
 
+	/*
+	 * Initialize Super Queue Internal Cnt Register
+	 * Set PMON Enable in order to collect valid metrics.
+	 */
+	intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, GEN12_SQCNT1_PMON_ENABLE);
+
 	/*
 	 * Update all contexts prior writing the mux configurations as we need
 	 * to make sure all slices/subslices are ON before writing to NOA
@@ -2612,6 +2618,9 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream)
 
 	/* Make sure we disable noa to save power. */
 	intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0);
+
+	/* Reset PMON Enable to save power. */
+	intel_uncore_rmw(uncore, GEN12_SQCNT1, GEN12_SQCNT1_PMON_ENABLE, 0);
 }
 
 static void gen7_oa_enable(struct i915_perf_stream *stream)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 40943dd5e0db..77ece19bda7e 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -718,6 +718,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define OABUFFER_SIZE_16M   (7 << 3)
 
 #define GEN12_OA_TLB_INV_CR _MMIO(0xceec)
+#define GEN12_SQCNT1 _MMIO(0x8718)
+#define  GEN12_SQCNT1_PMON_ENABLE REG_BIT(30)
 
 /* Gen12 OAR unit */
 #define GEN12_OAR_OACONTROL _MMIO(0x2960)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH 6/8] drm/i915/perf: Whitelist OA report trigger registers
  2021-08-30 19:38 ` [Intel-gfx] " Umesh Nerlige Ramappa
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  -1 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

OA reports can be triggered into the OA buffer by writing into the
OAREPORTTRIG registers. Whitelist the registers to allow non-privileged
user to trigger reports.

Whitelist registers only if perf_stream_paranoid is set to 0. In
i915_perf_open_ioctl, this setting is checked and the whitelist is
enabled accordingly. On closing the perf fd, the whitelist is removed.

This ensures that the access to the whitelist is gated by
perf_stream_paranoid.

v2:
- Move related change to this patch (Lionel)
- Bump up perf revision (Lionel)

v3: Pardon whitelisted registers for selftest (Umesh)
v4: Document supported gens for the feature (Lionel)
v5: Whitelist registers only if perf_stream_paranoid is set to 0 (Jon)
v6: Move oa whitelist array to i915_perf (Chris)
v7: Fix OA writing beyond the wal->list memory (CI)
v8: Protect write to engine whitelist registers

v9: (Umesh)
- Use uncore->lock to protect write to forcepriv regs
- In case wal->count falls to zero on _wa_remove, make sure you still
  clear the registers. Remove wal->count check when applying whitelist.

v10: (Umesh)
- Split patches modifying intel_workarounds
- On some platforms there are no whitelisted regs. intel_engine_resume
  applies whitelist on these platforms too and the wal->count gates such
  platforms. Bring back the wal->count check.
- intel_engine_allow/deny_user_register_access modifies the engine
  whitelist and the wal->count. Use uncore->lock to serialize it with
  intel_engine_apply_whitelist.
- Grow the wal->list when adding whitelist registers after driver load.

v11:
- Fix memory leak in _wa_list_grow (Chris)
- Serialize kfree with engine resume using uncore->lock (Umesh)
- Grow the list only if wal->count is not aligned (Umesh)

Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 79 +++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_perf_types.h |  8 +++
 drivers/gpu/drm/i915/i915_reg.h        | 12 ++--
 3 files changed, 92 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 3ded6e7d8526..30f5025b2ff6 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1364,12 +1364,56 @@ free_noa_wait(struct i915_perf_stream *stream)
 	i915_vma_unpin_and_release(&stream->noa_wait, 0);
 }
 
+static const i915_reg_t gen9_oa_wl_regs[] = {
+	{ __OAREPORTTRIG2 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+	{ __OAREPORTTRIG6 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+};
+
+static const i915_reg_t gen12_oa_wl_regs[] = {
+	{ __GEN12_OAG_OAREPORTTRIG2 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+	{ __GEN12_OAG_OAREPORTTRIG6 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+};
+
+static int intel_engine_apply_oa_whitelist(struct i915_perf_stream *stream)
+{
+	struct intel_engine_cs *engine = stream->engine;
+	int ret;
+
+	if (i915_perf_stream_paranoid ||
+	    !(stream->sample_flags & SAMPLE_OA_REPORT) ||
+	    !stream->perf->oa_wl)
+		return 0;
+
+	ret = intel_engine_allow_user_register_access(engine,
+						      stream->perf->oa_wl,
+						      stream->perf->num_oa_wl);
+	if (ret < 0)
+		return ret;
+
+	stream->oa_whitelisted = true;
+	return 0;
+}
+
+static void intel_engine_remove_oa_whitelist(struct i915_perf_stream *stream)
+{
+	struct intel_engine_cs *engine = stream->engine;
+
+	if (!stream->oa_whitelisted)
+		return;
+
+	intel_engine_deny_user_register_access(engine,
+					       stream->perf->oa_wl,
+					       stream->perf->num_oa_wl);
+}
+
 static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 {
 	struct i915_perf *perf = stream->perf;
 
 	BUG_ON(stream != perf->exclusive_stream);
 
+	intel_engine_remove_oa_whitelist(stream);
+
 	/*
 	 * Unset exclusive_stream first, it will be checked while disabling
 	 * the metric set on gen8+.
@@ -1465,7 +1509,8 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
 	 *  bit."
 	 */
 	intel_uncore_write(uncore, GEN8_OABUFFER, gtt_offset |
-		   OABUFFER_SIZE_16M | GEN8_OABUFFER_MEM_SELECT_GGTT);
+			   OABUFFER_SIZE_16M | GEN8_OABUFFER_MEM_SELECT_GGTT |
+			   GEN7_OABUFFER_EDGE_TRIGGER);
 	intel_uncore_write(uncore, GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK);
 
 	/* Mark that we need updated tail pointers to read from... */
@@ -1518,7 +1563,8 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
 	 *  bit."
 	 */
 	intel_uncore_write(uncore, GEN12_OAG_OABUFFER, gtt_offset |
-			   OABUFFER_SIZE_16M | GEN8_OABUFFER_MEM_SELECT_GGTT);
+			   OABUFFER_SIZE_16M | GEN8_OABUFFER_MEM_SELECT_GGTT |
+			   GEN7_OABUFFER_EDGE_TRIGGER);
 	intel_uncore_write(uncore, GEN12_OAG_OATAILPTR,
 			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
 
@@ -3530,6 +3576,20 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
 	if (!(param->flags & I915_PERF_FLAG_DISABLED))
 		i915_perf_enable_locked(stream);
 
+	/*
+	 * OA whitelist allows non-privileged access to some OA counters for
+	 * triggering reports into the OA buffer. This is only allowed if
+	 * perf_stream_paranoid is set to 0 by the sysadmin.
+	 *
+	 * We want to make sure this is almost the last thing we do before
+	 * returning the stream fd. If we do end up checking for errors in code
+	 * that follows this, we MUST call intel_engine_remove_oa_whitelist in
+	 * the error handling path to remove the whitelisted registers.
+	 */
+	ret = intel_engine_apply_oa_whitelist(stream);
+	if (ret < 0)
+		goto err_flags;
+
 	/* Take a reference on the driver that will be kept with stream_fd
 	 * until its release.
 	 */
@@ -4410,6 +4470,8 @@ void i915_perf_init(struct drm_i915_private *i915)
 				perf->ctx_flexeu0_offset = 0x3de;
 
 				perf->gen8_valid_ctx_bit = BIT(16);
+				perf->oa_wl = gen9_oa_wl_regs;
+				perf->num_oa_wl = ARRAY_SIZE(gen9_oa_wl_regs);
 			}
 		} else if (GRAPHICS_VER(i915) == 11) {
 			perf->ops.is_valid_b_counter_reg =
@@ -4428,6 +4490,9 @@ void i915_perf_init(struct drm_i915_private *i915)
 			perf->ctx_oactxctrl_offset = 0x124;
 			perf->ctx_flexeu0_offset = 0x78e;
 
+			perf->oa_wl = gen9_oa_wl_regs;
+			perf->num_oa_wl = ARRAY_SIZE(gen9_oa_wl_regs);
+
 			perf->gen8_valid_ctx_bit = BIT(16);
 		} else if (GRAPHICS_VER(i915) == 12) {
 			perf->ops.is_valid_b_counter_reg =
@@ -4445,6 +4510,9 @@ void i915_perf_init(struct drm_i915_private *i915)
 
 			perf->ctx_flexeu0_offset = 0;
 			perf->ctx_oactxctrl_offset = 0x144;
+
+			perf->oa_wl = gen12_oa_wl_regs;
+			perf->num_oa_wl = ARRAY_SIZE(gen12_oa_wl_regs);
 		}
 	}
 
@@ -4550,8 +4618,13 @@ int i915_perf_ioctl_version(void)
 	 *
 	 * 5: Add DRM_I915_PERF_PROP_POLL_OA_PERIOD parameter that controls the
 	 *    interval for the hrtimer used to check for OA data.
+	 *
+	 * 6: Whitelist OATRIGGER registers to allow user to trigger reports
+	 *    into the OA buffer. This applies only to gen8+. The feature can
+	 *    only be accessed if perf_stream_paranoid is set to 0 by privileged
+	 *    user.
 	 */
-	return 5;
+	return 6;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index aa14354a5120..f5d3eece70d8 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -312,6 +312,11 @@ struct i915_perf_stream {
 	 * buffer should be checked for available data.
 	 */
 	u64 poll_oa_period;
+
+	/**
+	 * @oa_whitelisted: Indicates that the oa registers are whitelisted.
+	 */
+	bool oa_whitelisted;
 };
 
 /**
@@ -432,6 +437,9 @@ struct i915_perf {
 	u32 ctx_oactxctrl_offset;
 	u32 ctx_flexeu0_offset;
 
+	const i915_reg_t *oa_wl;
+	unsigned int num_oa_wl;
+
 	/**
 	 * The RPT_ID/reason field for Gen8+ includes a bit
 	 * to determine if the CTX ID in the report is valid
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 77ece19bda7e..c5c6adbe5b6f 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -894,7 +894,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define OAREPORTTRIG1_THRESHOLD_MASK 0xffff
 #define OAREPORTTRIG1_EDGE_LEVEL_TRIGGER_SELECT_MASK 0xffff0000 /* 0=level */
 
-#define OAREPORTTRIG2 _MMIO(0x2744)
+#define __OAREPORTTRIG2 0x2744
+#define OAREPORTTRIG2 _MMIO(__OAREPORTTRIG2)
 #define OAREPORTTRIG2_INVERT_A_0  (1 << 0)
 #define OAREPORTTRIG2_INVERT_A_1  (1 << 1)
 #define OAREPORTTRIG2_INVERT_A_2  (1 << 2)
@@ -947,7 +948,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define OAREPORTTRIG5_THRESHOLD_MASK 0xffff
 #define OAREPORTTRIG5_EDGE_LEVEL_TRIGGER_SELECT_MASK 0xffff0000 /* 0=level */
 
-#define OAREPORTTRIG6 _MMIO(0x2754)
+#define __OAREPORTTRIG6 0x2754
+#define OAREPORTTRIG6 _MMIO(__OAREPORTTRIG6)
 #define OAREPORTTRIG6_INVERT_A_0  (1 << 0)
 #define OAREPORTTRIG6_INVERT_A_1  (1 << 1)
 #define OAREPORTTRIG6_INVERT_A_2  (1 << 2)
@@ -1008,11 +1010,13 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 
 /* Same layout as OAREPORTTRIGX */
 #define GEN12_OAG_OAREPORTTRIG1 _MMIO(0xd920)
-#define GEN12_OAG_OAREPORTTRIG2 _MMIO(0xd924)
+#define __GEN12_OAG_OAREPORTTRIG2 0xd924
+#define GEN12_OAG_OAREPORTTRIG2 _MMIO(__GEN12_OAG_OAREPORTTRIG2)
 #define GEN12_OAG_OAREPORTTRIG3 _MMIO(0xd928)
 #define GEN12_OAG_OAREPORTTRIG4 _MMIO(0xd92c)
 #define GEN12_OAG_OAREPORTTRIG5 _MMIO(0xd930)
-#define GEN12_OAG_OAREPORTTRIG6 _MMIO(0xd934)
+#define __GEN12_OAG_OAREPORTTRIG6 0xd934
+#define GEN12_OAG_OAREPORTTRIG6 _MMIO(__GEN12_OAG_OAREPORTTRIG6)
 #define GEN12_OAG_OAREPORTTRIG7 _MMIO(0xd938)
 #define GEN12_OAG_OAREPORTTRIG8 _MMIO(0xd93c)
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/8] drm/i915/perf: Whitelist OA report trigger registers
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

OA reports can be triggered into the OA buffer by writing into the
OAREPORTTRIG registers. Whitelist the registers to allow non-privileged
user to trigger reports.

Whitelist registers only if perf_stream_paranoid is set to 0. In
i915_perf_open_ioctl, this setting is checked and the whitelist is
enabled accordingly. On closing the perf fd, the whitelist is removed.

This ensures that the access to the whitelist is gated by
perf_stream_paranoid.

v2:
- Move related change to this patch (Lionel)
- Bump up perf revision (Lionel)

v3: Pardon whitelisted registers for selftest (Umesh)
v4: Document supported gens for the feature (Lionel)
v5: Whitelist registers only if perf_stream_paranoid is set to 0 (Jon)
v6: Move oa whitelist array to i915_perf (Chris)
v7: Fix OA writing beyond the wal->list memory (CI)
v8: Protect write to engine whitelist registers

v9: (Umesh)
- Use uncore->lock to protect write to forcepriv regs
- In case wal->count falls to zero on _wa_remove, make sure you still
  clear the registers. Remove wal->count check when applying whitelist.

v10: (Umesh)
- Split patches modifying intel_workarounds
- On some platforms there are no whitelisted regs. intel_engine_resume
  applies whitelist on these platforms too and the wal->count gates such
  platforms. Bring back the wal->count check.
- intel_engine_allow/deny_user_register_access modifies the engine
  whitelist and the wal->count. Use uncore->lock to serialize it with
  intel_engine_apply_whitelist.
- Grow the wal->list when adding whitelist registers after driver load.

v11:
- Fix memory leak in _wa_list_grow (Chris)
- Serialize kfree with engine resume using uncore->lock (Umesh)
- Grow the list only if wal->count is not aligned (Umesh)

Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 79 +++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_perf_types.h |  8 +++
 drivers/gpu/drm/i915/i915_reg.h        | 12 ++--
 3 files changed, 92 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 3ded6e7d8526..30f5025b2ff6 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1364,12 +1364,56 @@ free_noa_wait(struct i915_perf_stream *stream)
 	i915_vma_unpin_and_release(&stream->noa_wait, 0);
 }
 
+static const i915_reg_t gen9_oa_wl_regs[] = {
+	{ __OAREPORTTRIG2 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+	{ __OAREPORTTRIG6 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+};
+
+static const i915_reg_t gen12_oa_wl_regs[] = {
+	{ __GEN12_OAG_OAREPORTTRIG2 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+	{ __GEN12_OAG_OAREPORTTRIG6 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+};
+
+static int intel_engine_apply_oa_whitelist(struct i915_perf_stream *stream)
+{
+	struct intel_engine_cs *engine = stream->engine;
+	int ret;
+
+	if (i915_perf_stream_paranoid ||
+	    !(stream->sample_flags & SAMPLE_OA_REPORT) ||
+	    !stream->perf->oa_wl)
+		return 0;
+
+	ret = intel_engine_allow_user_register_access(engine,
+						      stream->perf->oa_wl,
+						      stream->perf->num_oa_wl);
+	if (ret < 0)
+		return ret;
+
+	stream->oa_whitelisted = true;
+	return 0;
+}
+
+static void intel_engine_remove_oa_whitelist(struct i915_perf_stream *stream)
+{
+	struct intel_engine_cs *engine = stream->engine;
+
+	if (!stream->oa_whitelisted)
+		return;
+
+	intel_engine_deny_user_register_access(engine,
+					       stream->perf->oa_wl,
+					       stream->perf->num_oa_wl);
+}
+
 static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 {
 	struct i915_perf *perf = stream->perf;
 
 	BUG_ON(stream != perf->exclusive_stream);
 
+	intel_engine_remove_oa_whitelist(stream);
+
 	/*
 	 * Unset exclusive_stream first, it will be checked while disabling
 	 * the metric set on gen8+.
@@ -1465,7 +1509,8 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
 	 *  bit."
 	 */
 	intel_uncore_write(uncore, GEN8_OABUFFER, gtt_offset |
-		   OABUFFER_SIZE_16M | GEN8_OABUFFER_MEM_SELECT_GGTT);
+			   OABUFFER_SIZE_16M | GEN8_OABUFFER_MEM_SELECT_GGTT |
+			   GEN7_OABUFFER_EDGE_TRIGGER);
 	intel_uncore_write(uncore, GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK);
 
 	/* Mark that we need updated tail pointers to read from... */
@@ -1518,7 +1563,8 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
 	 *  bit."
 	 */
 	intel_uncore_write(uncore, GEN12_OAG_OABUFFER, gtt_offset |
-			   OABUFFER_SIZE_16M | GEN8_OABUFFER_MEM_SELECT_GGTT);
+			   OABUFFER_SIZE_16M | GEN8_OABUFFER_MEM_SELECT_GGTT |
+			   GEN7_OABUFFER_EDGE_TRIGGER);
 	intel_uncore_write(uncore, GEN12_OAG_OATAILPTR,
 			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
 
@@ -3530,6 +3576,20 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
 	if (!(param->flags & I915_PERF_FLAG_DISABLED))
 		i915_perf_enable_locked(stream);
 
+	/*
+	 * OA whitelist allows non-privileged access to some OA counters for
+	 * triggering reports into the OA buffer. This is only allowed if
+	 * perf_stream_paranoid is set to 0 by the sysadmin.
+	 *
+	 * We want to make sure this is almost the last thing we do before
+	 * returning the stream fd. If we do end up checking for errors in code
+	 * that follows this, we MUST call intel_engine_remove_oa_whitelist in
+	 * the error handling path to remove the whitelisted registers.
+	 */
+	ret = intel_engine_apply_oa_whitelist(stream);
+	if (ret < 0)
+		goto err_flags;
+
 	/* Take a reference on the driver that will be kept with stream_fd
 	 * until its release.
 	 */
@@ -4410,6 +4470,8 @@ void i915_perf_init(struct drm_i915_private *i915)
 				perf->ctx_flexeu0_offset = 0x3de;
 
 				perf->gen8_valid_ctx_bit = BIT(16);
+				perf->oa_wl = gen9_oa_wl_regs;
+				perf->num_oa_wl = ARRAY_SIZE(gen9_oa_wl_regs);
 			}
 		} else if (GRAPHICS_VER(i915) == 11) {
 			perf->ops.is_valid_b_counter_reg =
@@ -4428,6 +4490,9 @@ void i915_perf_init(struct drm_i915_private *i915)
 			perf->ctx_oactxctrl_offset = 0x124;
 			perf->ctx_flexeu0_offset = 0x78e;
 
+			perf->oa_wl = gen9_oa_wl_regs;
+			perf->num_oa_wl = ARRAY_SIZE(gen9_oa_wl_regs);
+
 			perf->gen8_valid_ctx_bit = BIT(16);
 		} else if (GRAPHICS_VER(i915) == 12) {
 			perf->ops.is_valid_b_counter_reg =
@@ -4445,6 +4510,9 @@ void i915_perf_init(struct drm_i915_private *i915)
 
 			perf->ctx_flexeu0_offset = 0;
 			perf->ctx_oactxctrl_offset = 0x144;
+
+			perf->oa_wl = gen12_oa_wl_regs;
+			perf->num_oa_wl = ARRAY_SIZE(gen12_oa_wl_regs);
 		}
 	}
 
@@ -4550,8 +4618,13 @@ int i915_perf_ioctl_version(void)
 	 *
 	 * 5: Add DRM_I915_PERF_PROP_POLL_OA_PERIOD parameter that controls the
 	 *    interval for the hrtimer used to check for OA data.
+	 *
+	 * 6: Whitelist OATRIGGER registers to allow user to trigger reports
+	 *    into the OA buffer. This applies only to gen8+. The feature can
+	 *    only be accessed if perf_stream_paranoid is set to 0 by privileged
+	 *    user.
 	 */
-	return 5;
+	return 6;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index aa14354a5120..f5d3eece70d8 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -312,6 +312,11 @@ struct i915_perf_stream {
 	 * buffer should be checked for available data.
 	 */
 	u64 poll_oa_period;
+
+	/**
+	 * @oa_whitelisted: Indicates that the oa registers are whitelisted.
+	 */
+	bool oa_whitelisted;
 };
 
 /**
@@ -432,6 +437,9 @@ struct i915_perf {
 	u32 ctx_oactxctrl_offset;
 	u32 ctx_flexeu0_offset;
 
+	const i915_reg_t *oa_wl;
+	unsigned int num_oa_wl;
+
 	/**
 	 * The RPT_ID/reason field for Gen8+ includes a bit
 	 * to determine if the CTX ID in the report is valid
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 77ece19bda7e..c5c6adbe5b6f 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -894,7 +894,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define OAREPORTTRIG1_THRESHOLD_MASK 0xffff
 #define OAREPORTTRIG1_EDGE_LEVEL_TRIGGER_SELECT_MASK 0xffff0000 /* 0=level */
 
-#define OAREPORTTRIG2 _MMIO(0x2744)
+#define __OAREPORTTRIG2 0x2744
+#define OAREPORTTRIG2 _MMIO(__OAREPORTTRIG2)
 #define OAREPORTTRIG2_INVERT_A_0  (1 << 0)
 #define OAREPORTTRIG2_INVERT_A_1  (1 << 1)
 #define OAREPORTTRIG2_INVERT_A_2  (1 << 2)
@@ -947,7 +948,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define OAREPORTTRIG5_THRESHOLD_MASK 0xffff
 #define OAREPORTTRIG5_EDGE_LEVEL_TRIGGER_SELECT_MASK 0xffff0000 /* 0=level */
 
-#define OAREPORTTRIG6 _MMIO(0x2754)
+#define __OAREPORTTRIG6 0x2754
+#define OAREPORTTRIG6 _MMIO(__OAREPORTTRIG6)
 #define OAREPORTTRIG6_INVERT_A_0  (1 << 0)
 #define OAREPORTTRIG6_INVERT_A_1  (1 << 1)
 #define OAREPORTTRIG6_INVERT_A_2  (1 << 2)
@@ -1008,11 +1010,13 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 
 /* Same layout as OAREPORTTRIGX */
 #define GEN12_OAG_OAREPORTTRIG1 _MMIO(0xd920)
-#define GEN12_OAG_OAREPORTTRIG2 _MMIO(0xd924)
+#define __GEN12_OAG_OAREPORTTRIG2 0xd924
+#define GEN12_OAG_OAREPORTTRIG2 _MMIO(__GEN12_OAG_OAREPORTTRIG2)
 #define GEN12_OAG_OAREPORTTRIG3 _MMIO(0xd928)
 #define GEN12_OAG_OAREPORTTRIG4 _MMIO(0xd92c)
 #define GEN12_OAG_OAREPORTTRIG5 _MMIO(0xd930)
-#define GEN12_OAG_OAREPORTTRIG6 _MMIO(0xd934)
+#define __GEN12_OAG_OAREPORTTRIG6 0xd934
+#define GEN12_OAG_OAREPORTTRIG6 _MMIO(__GEN12_OAG_OAREPORTTRIG6)
 #define GEN12_OAG_OAREPORTTRIG7 _MMIO(0xd938)
 #define GEN12_OAG_OAREPORTTRIG8 _MMIO(0xd93c)
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/8] drm/i915/perf: Whitelist OA counter and buffer registers
  2021-08-30 19:38 ` [Intel-gfx] " Umesh Nerlige Ramappa
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  -1 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

It is useful to have markers in the OA reports to identify triggered
reports. Whitelist some OA counters that can be used as markers.

A triggered report can be found faster if we can sample the HW tail and
head registers when the report was triggered. Whitelist OA buffer
specific registers.

v2:
- Bump up the perf revision (Lionel)
- Use indexing for counters (Lionel)
- Fix selftest for oa ticking register (Umesh)

v3: Pardon whitelisted registers for selftest (Umesh)

v4:
- Document whitelisted registers (Lionel)
- Fix live isolated whitelist for OA regs (Umesh)

v5:
- Free up whitelist slots. Remove GPU_TICKS and A20 counter (Piotr)
- Whitelist registers only if perf_stream_paranoid is set to 0 (Jon)

v6: Move oa whitelist array to i915_perf (Chris)

Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 18 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h  | 16 ++++++++++++++--
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 30f5025b2ff6..de3d1738aabe 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1367,11 +1367,19 @@ free_noa_wait(struct i915_perf_stream *stream)
 static const i915_reg_t gen9_oa_wl_regs[] = {
 	{ __OAREPORTTRIG2 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
 	{ __OAREPORTTRIG6 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+	{ __OA_PERF_COUNTER_A(18) | (RING_FORCE_TO_NONPRIV_ACCESS_RW |
+				     RING_FORCE_TO_NONPRIV_RANGE_4) },
+	{ __GEN8_OASTATUS | (RING_FORCE_TO_NONPRIV_ACCESS_RD |
+			     RING_FORCE_TO_NONPRIV_RANGE_4) },
 };
 
 static const i915_reg_t gen12_oa_wl_regs[] = {
 	{ __GEN12_OAG_OAREPORTTRIG2 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
 	{ __GEN12_OAG_OAREPORTTRIG6 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+	{ __GEN12_OAG_PERF_COUNTER_A(18) | (RING_FORCE_TO_NONPRIV_ACCESS_RW |
+					    RING_FORCE_TO_NONPRIV_RANGE_4) },
+	{ __GEN12_OAG_OASTATUS | (RING_FORCE_TO_NONPRIV_ACCESS_RD |
+				  RING_FORCE_TO_NONPRIV_RANGE_4) },
 };
 
 static int intel_engine_apply_oa_whitelist(struct i915_perf_stream *stream)
@@ -4623,8 +4631,16 @@ int i915_perf_ioctl_version(void)
 	 *    into the OA buffer. This applies only to gen8+. The feature can
 	 *    only be accessed if perf_stream_paranoid is set to 0 by privileged
 	 *    user.
+	 *
+	 * 7: Whitelist below OA registers for user to identify the location of
+	 *    triggered reports in the OA buffer. This applies only to gen8+.
+	 *    The feature can only be accessed if perf_stream_paranoid is set to
+	 *    0 by privileged user.
+	 *
+	 *    - OA buffer head/tail/status/buffer registers for read only
+	 *    - OA counters A18, A19, A20 for read/write
 	 */
-	return 6;
+	return 7;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c5c6adbe5b6f..b4c6bfc33a18 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -695,7 +695,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define  GEN7_OASTATUS2_HEAD_MASK           0xffffffc0
 #define  GEN7_OASTATUS2_MEM_SELECT_GGTT     (1 << 0) /* 0: PPGTT, 1: GGTT */
 
-#define GEN8_OASTATUS _MMIO(0x2b08)
+#define __GEN8_OASTATUS 0x2b08
+#define GEN8_OASTATUS _MMIO(__GEN8_OASTATUS)
 #define  GEN8_OASTATUS_TAIL_POINTER_WRAP    (1 << 17)
 #define  GEN8_OASTATUS_HEAD_POINTER_WRAP    (1 << 16)
 #define  GEN8_OASTATUS_OVERRUN_STATUS	    (1 << 3)
@@ -755,7 +756,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define  GEN12_OAG_OA_DEBUG_DISABLE_GO_1_0_REPORTS     (1 << 2)
 #define  GEN12_OAG_OA_DEBUG_DISABLE_CTX_SWITCH_REPORTS (1 << 1)
 
-#define GEN12_OAG_OASTATUS _MMIO(0xdafc)
+#define __GEN12_OAG_OASTATUS 0xdafc
+#define GEN12_OAG_OASTATUS _MMIO(__GEN12_OAG_OASTATUS)
 #define  GEN12_OAG_OASTATUS_COUNTER_OVERFLOW (1 << 2)
 #define  GEN12_OAG_OASTATUS_BUFFER_OVERFLOW  (1 << 1)
 #define  GEN12_OAG_OASTATUS_REPORT_LOST      (1 << 0)
@@ -998,6 +1000,16 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define OAREPORTTRIG8_NOA_SELECT_6_SHIFT    24
 #define OAREPORTTRIG8_NOA_SELECT_7_SHIFT    28
 
+/* Performance counters registers */
+#define __OA_PERF_COUNTER_A(idx)	(0x2800 + 8 * (idx))
+#define OA_PERF_COUNTER_A(idx)		_MMIO(__OA_PERF_COUNTER_A(idx))
+#define OA_PERF_COUNTER_A_UDW(idx)		_MMIO(__OA_PERF_COUNTER_A(idx) + 4)
+
+/* Gen12 Performance counters registers */
+#define __GEN12_OAG_PERF_COUNTER_A(idx)	(0xD980 + 8 * (idx))
+#define GEN12_OAG_PERF_COUNTER_A(idx)	_MMIO(__GEN12_OAG_PERF_COUNTER_A(idx))
+#define GEN12_OAG_PERF_COUNTER_A_UDW(idx) _MMIO(__GEN12_OAG_PERF_COUNTER_A(idx) + 4)
+
 /* Same layout as OASTARTTRIGX */
 #define GEN12_OAG_OASTARTTRIG1 _MMIO(0xd900)
 #define GEN12_OAG_OASTARTTRIG2 _MMIO(0xd904)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH 7/8] drm/i915/perf: Whitelist OA counter and buffer registers
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

It is useful to have markers in the OA reports to identify triggered
reports. Whitelist some OA counters that can be used as markers.

A triggered report can be found faster if we can sample the HW tail and
head registers when the report was triggered. Whitelist OA buffer
specific registers.

v2:
- Bump up the perf revision (Lionel)
- Use indexing for counters (Lionel)
- Fix selftest for oa ticking register (Umesh)

v3: Pardon whitelisted registers for selftest (Umesh)

v4:
- Document whitelisted registers (Lionel)
- Fix live isolated whitelist for OA regs (Umesh)

v5:
- Free up whitelist slots. Remove GPU_TICKS and A20 counter (Piotr)
- Whitelist registers only if perf_stream_paranoid is set to 0 (Jon)

v6: Move oa whitelist array to i915_perf (Chris)

Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 18 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h  | 16 ++++++++++++++--
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 30f5025b2ff6..de3d1738aabe 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1367,11 +1367,19 @@ free_noa_wait(struct i915_perf_stream *stream)
 static const i915_reg_t gen9_oa_wl_regs[] = {
 	{ __OAREPORTTRIG2 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
 	{ __OAREPORTTRIG6 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+	{ __OA_PERF_COUNTER_A(18) | (RING_FORCE_TO_NONPRIV_ACCESS_RW |
+				     RING_FORCE_TO_NONPRIV_RANGE_4) },
+	{ __GEN8_OASTATUS | (RING_FORCE_TO_NONPRIV_ACCESS_RD |
+			     RING_FORCE_TO_NONPRIV_RANGE_4) },
 };
 
 static const i915_reg_t gen12_oa_wl_regs[] = {
 	{ __GEN12_OAG_OAREPORTTRIG2 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
 	{ __GEN12_OAG_OAREPORTTRIG6 | RING_FORCE_TO_NONPRIV_ACCESS_RW },
+	{ __GEN12_OAG_PERF_COUNTER_A(18) | (RING_FORCE_TO_NONPRIV_ACCESS_RW |
+					    RING_FORCE_TO_NONPRIV_RANGE_4) },
+	{ __GEN12_OAG_OASTATUS | (RING_FORCE_TO_NONPRIV_ACCESS_RD |
+				  RING_FORCE_TO_NONPRIV_RANGE_4) },
 };
 
 static int intel_engine_apply_oa_whitelist(struct i915_perf_stream *stream)
@@ -4623,8 +4631,16 @@ int i915_perf_ioctl_version(void)
 	 *    into the OA buffer. This applies only to gen8+. The feature can
 	 *    only be accessed if perf_stream_paranoid is set to 0 by privileged
 	 *    user.
+	 *
+	 * 7: Whitelist below OA registers for user to identify the location of
+	 *    triggered reports in the OA buffer. This applies only to gen8+.
+	 *    The feature can only be accessed if perf_stream_paranoid is set to
+	 *    0 by privileged user.
+	 *
+	 *    - OA buffer head/tail/status/buffer registers for read only
+	 *    - OA counters A18, A19, A20 for read/write
 	 */
-	return 6;
+	return 7;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c5c6adbe5b6f..b4c6bfc33a18 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -695,7 +695,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define  GEN7_OASTATUS2_HEAD_MASK           0xffffffc0
 #define  GEN7_OASTATUS2_MEM_SELECT_GGTT     (1 << 0) /* 0: PPGTT, 1: GGTT */
 
-#define GEN8_OASTATUS _MMIO(0x2b08)
+#define __GEN8_OASTATUS 0x2b08
+#define GEN8_OASTATUS _MMIO(__GEN8_OASTATUS)
 #define  GEN8_OASTATUS_TAIL_POINTER_WRAP    (1 << 17)
 #define  GEN8_OASTATUS_HEAD_POINTER_WRAP    (1 << 16)
 #define  GEN8_OASTATUS_OVERRUN_STATUS	    (1 << 3)
@@ -755,7 +756,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define  GEN12_OAG_OA_DEBUG_DISABLE_GO_1_0_REPORTS     (1 << 2)
 #define  GEN12_OAG_OA_DEBUG_DISABLE_CTX_SWITCH_REPORTS (1 << 1)
 
-#define GEN12_OAG_OASTATUS _MMIO(0xdafc)
+#define __GEN12_OAG_OASTATUS 0xdafc
+#define GEN12_OAG_OASTATUS _MMIO(__GEN12_OAG_OASTATUS)
 #define  GEN12_OAG_OASTATUS_COUNTER_OVERFLOW (1 << 2)
 #define  GEN12_OAG_OASTATUS_BUFFER_OVERFLOW  (1 << 1)
 #define  GEN12_OAG_OASTATUS_REPORT_LOST      (1 << 0)
@@ -998,6 +1000,16 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define OAREPORTTRIG8_NOA_SELECT_6_SHIFT    24
 #define OAREPORTTRIG8_NOA_SELECT_7_SHIFT    28
 
+/* Performance counters registers */
+#define __OA_PERF_COUNTER_A(idx)	(0x2800 + 8 * (idx))
+#define OA_PERF_COUNTER_A(idx)		_MMIO(__OA_PERF_COUNTER_A(idx))
+#define OA_PERF_COUNTER_A_UDW(idx)		_MMIO(__OA_PERF_COUNTER_A(idx) + 4)
+
+/* Gen12 Performance counters registers */
+#define __GEN12_OAG_PERF_COUNTER_A(idx)	(0xD980 + 8 * (idx))
+#define GEN12_OAG_PERF_COUNTER_A(idx)	_MMIO(__GEN12_OAG_PERF_COUNTER_A(idx))
+#define GEN12_OAG_PERF_COUNTER_A_UDW(idx) _MMIO(__GEN12_OAG_PERF_COUNTER_A(idx) + 4)
+
 /* Same layout as OASTARTTRIGX */
 #define GEN12_OAG_OASTARTTRIG1 _MMIO(0xd900)
 #define GEN12_OAG_OASTARTTRIG2 _MMIO(0xd904)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] [PATCH 8/8] drm/i915/perf: Map OA buffer to user space for gen12 performance query
  2021-08-30 19:38 ` [Intel-gfx] " Umesh Nerlige Ramappa
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  -1 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

i915 used to support time based sampling mode which is good for overall
system monitoring, but is not enough for query mode used to measure a
single draw call or dispatch. Gen9-Gen11 are using current i915 perf
implementation for query, but Gen12+ requires a new approach for query
based on triggered reports within oa buffer.

Triggering reports into the OA buffer is achieved by writing into a
a trigger register. Optionally an unused counter/register is set with a
marker value such that a triggered report can be identified in the OA
buffer. Reports are usually triggered at the start and end of work that
is measured.

Since OA buffer is large and queries can be frequent, an efficient way
to look for triggered reports is required. By knowing the current head
and tail offsets into the OA buffer, it is easier to determine the
locality of the reports of interest.

Current perf OA interface does not expose head/tail information to the
user and it filters out invalid reports before sending data to user.
Also considering limited size of user buffer used during a query,
creating a 1:1 copy of the OA buffer at the user space added undesired
complexity.

The solution was to map the OA buffer to user space provided

(1) that it is accessed from a privileged user.
(2) OA report filtering is not used.

These 2 conditions would satisfy the safety criteria that the current
perf interface addresses.

To enable the query:
- Add an ioctl to expose head and tail to the user
- Add an ioctl to return size and offset of the OA buffer
- Map the OA buffer to the user space

v2:
- Improve commit message (Chris)
- Do not mmap based on gem object filp. Instead, use perf_fd and support
  mmap syscall (Chris)
- Pass non-zero offset in mmap to enforce the right object is
  mapped (Chris)
- Do not expose gpu_address (Chris)
- Verify start and length of vma for page alignment (Lionel)
- Move SQNTL config out (Lionel)

v3: (Chris)
- Omit redundant checks
- Return VM_FAULT_SIGBUS is old stream is closed
- Maintain reference counts to stream in vm_open and vm_close
- Use switch to identify object to be mapped

v4: Call kref_put on closing perf fd (Chris)
v5:
- Strip access to OA buffer from unprivileged child of a privileged
  parent. Use VM_DONTCOPY
- Enforce MAP_PRIVATE by checking for VM_MAYSHARE

v6:
(Chris)
- Use len of -1 in unmap_mapping_range
- Don't use stream->oa_buffer.vma->obj in vm_fault_oa
- Use kernel block comment style
- do_mmap gets a reference to the file and puts it in do_munmap, so
  no need to maintain a reference to i915_perf_stream. Hence, remove
  vm_open/vm_close and stream->closed hooks/checks.
(Umesh)
- Do not allow mmap if SAMPLE_OA_REPORT is not set during
  i915_perf_open_ioctl.
- Drop ioctl returning head/tail since this information is already
  whitelisted. Remove hooks to read head register.

v7: (Chris)
- unmap before destroy
- change ioctl argument struct

v8: Documentation and more checks (Chris)
v9: Fix comment style (Umesh)
v10: Update uapi comment (Ashutosh)

Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.h |   2 +
 drivers/gpu/drm/i915/i915_perf.c         | 126 ++++++++++++++++++++++-
 include/uapi/drm/i915_drm.h              |  33 ++++++
 4 files changed, 161 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 5130e8ed9564..84cdce2ee447 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -213,7 +213,7 @@ compute_partial_view(const struct drm_i915_gem_object *obj,
 	return view;
 }
 
-static vm_fault_t i915_error_to_vmf_fault(int err)
+vm_fault_t i915_error_to_vmf_fault(int err)
 {
 	switch (err) {
 	default:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
index efee9e0d2508..1190a3a228ea 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
@@ -29,4 +29,6 @@ void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
 
 void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);
 
+vm_fault_t i915_error_to_vmf_fault(int err);
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index de3d1738aabe..1f8d4f3a2148 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -192,10 +192,12 @@
  */
 
 #include <linux/anon_inodes.h>
+#include <linux/mman.h>
 #include <linux/sizes.h>
 #include <linux/uuid.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_mman.h"
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_engine_user.h"
 #include "gt/intel_execlists_submission.h"
@@ -3322,6 +3324,44 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
 	return ret;
 }
 
+#define I915_PERF_OA_BUFFER_MMAP_OFFSET 1
+
+/**
+ * i915_perf_oa_buffer_info_locked - size and offset of the OA buffer
+ * @stream: i915 perf stream
+ * @cmd: ioctl command
+ * @arg: pointer to oa buffer info filled by this function.
+ */
+static int i915_perf_oa_buffer_info_locked(struct i915_perf_stream *stream,
+					   unsigned int cmd,
+					   unsigned long arg)
+{
+	struct drm_i915_perf_oa_buffer_info info;
+	void __user *output = (void __user *)arg;
+
+	if (i915_perf_stream_paranoid && !perfmon_capable()) {
+		DRM_DEBUG("Insufficient privileges to access OA buffer info\n");
+		return -EACCES;
+	}
+
+	if (_IOC_SIZE(cmd) != sizeof(info))
+		return -EINVAL;
+
+	if (copy_from_user(&info, output, sizeof(info)))
+		return -EFAULT;
+
+	if (info.type || info.flags || info.rsvd)
+		return -EINVAL;
+
+	info.size = stream->oa_buffer.vma->size;
+	info.offset = I915_PERF_OA_BUFFER_MMAP_OFFSET * PAGE_SIZE;
+
+	if (copy_to_user(output, &info, sizeof(info)))
+		return -EFAULT;
+
+	return 0;
+}
+
 /**
  * i915_perf_ioctl_locked - support ioctl() usage with i915 perf stream FDs
  * @stream: An i915 perf stream
@@ -3347,6 +3387,8 @@ static long i915_perf_ioctl_locked(struct i915_perf_stream *stream,
 		return 0;
 	case I915_PERF_IOCTL_CONFIG:
 		return i915_perf_config_locked(stream, arg);
+	case I915_PERF_IOCTL_GET_OA_BUFFER_INFO:
+		return i915_perf_oa_buffer_info_locked(stream, cmd, arg);
 	}
 
 	return -EINVAL;
@@ -3418,6 +3460,14 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	struct i915_perf_stream *stream = file->private_data;
 	struct i915_perf *perf = stream->perf;
 
+	/*
+	 * User could have multiple vmas from multiple mmaps. We want to zap
+	 * them all here. Note that a fresh fault cannot occur as the mmap holds
+	 * a reference to the stream via the vma->vm_file, so before user's
+	 * munmap, the stream cannot be destroyed.
+	 */
+	unmap_mapping_range(file->f_mapping, 0, -1, 1);
+
 	mutex_lock(&perf->lock);
 	i915_perf_destroy_locked(stream);
 	mutex_unlock(&perf->lock);
@@ -3428,6 +3478,75 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static vm_fault_t vm_fault_oa(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct i915_perf_stream *stream = vma->vm_private_data;
+	int err;
+
+	err = remap_io_sg(vma,
+			  vma->vm_start, vma->vm_end - vma->vm_start,
+			  stream->oa_buffer.vma->pages->sgl, -1);
+
+	return i915_error_to_vmf_fault(err);
+}
+
+static const struct vm_operations_struct vm_ops_oa = {
+	.fault = vm_fault_oa,
+};
+
+static int i915_perf_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct i915_perf_stream *stream = file->private_data;
+
+	/* mmap-ing OA buffer to user space MUST absolutely be privileged */
+	if (i915_perf_stream_paranoid && !perfmon_capable()) {
+		DRM_DEBUG("Insufficient privileges to map OA buffer\n");
+		return -EACCES;
+	}
+
+	switch (vma->vm_pgoff) {
+	/*
+	 * A non-zero offset ensures that we are mapping the right object. Also
+	 * leaves room for future objects added to this implementation.
+	 */
+	case I915_PERF_OA_BUFFER_MMAP_OFFSET:
+		if (!(stream->sample_flags & SAMPLE_OA_REPORT))
+			return -EINVAL;
+
+		if (vma->vm_end - vma->vm_start > stream->oa_buffer.vma->size)
+			return -EINVAL;
+
+		/*
+		 * Only support VM_READ. Enforce MAP_PRIVATE by checking for
+		 * VM_MAYSHARE.
+		 */
+		if (vma->vm_flags & (VM_WRITE | VM_EXEC |
+				     VM_SHARED | VM_MAYSHARE))
+			return -EINVAL;
+
+		vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+		/*
+		 * If the privileged parent forks and child drops root
+		 * privilege, we do not want the child to retain access to the
+		 * mapped OA buffer. Explicitly set VM_DONTCOPY to avoid such
+		 * cases.
+		 */
+		vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND |
+				 VM_DONTDUMP | VM_DONTCOPY;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+	vma->vm_private_data = stream;
+	vma->vm_ops = &vm_ops_oa;
+
+	return 0;
+}
 
 static const struct file_operations fops = {
 	.owner		= THIS_MODULE,
@@ -3440,6 +3559,7 @@ static const struct file_operations fops = {
 	 * to handle 32bits compatibility.
 	 */
 	.compat_ioctl   = i915_perf_ioctl,
+	.mmap		= i915_perf_mmap,
 };
 
 
@@ -4639,8 +4759,12 @@ int i915_perf_ioctl_version(void)
 	 *
 	 *    - OA buffer head/tail/status/buffer registers for read only
 	 *    - OA counters A18, A19, A20 for read/write
+	 *
+	 * 8: Added an option to map oa buffer at umd driver level and trigger
+	 *    oa reports within oa buffer from command buffer. See
+	 *    I915_PERF_IOCTL_GET_OA_BUFFER_INFO.
 	 */
-	return 7;
+	return 8;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index bde5860b3686..2c17fe845604 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2417,6 +2417,39 @@ struct drm_i915_perf_open_param {
  */
 #define I915_PERF_IOCTL_CONFIG	_IO('i', 0x2)
 
+/*
+ * Returns OA buffer properties to be used with mmap.
+ *
+ * This ioctl is available in perf revision 8.
+ */
+#define I915_PERF_IOCTL_GET_OA_BUFFER_INFO _IOWR('i', 0x3, struct drm_i915_perf_oa_buffer_info)
+
+/*
+ * OA buffer size and offset.
+ *
+ * OA output buffer
+ *   type: 0
+ *   flags: mbz
+ *
+ *   After querying the info, pass (size,offset) to mmap(),
+ *
+ *   mmap(0, info.size, PROT_READ, MAP_PRIVATE, perf_fd, info.offset).
+ *
+ *   Note that only a private (not shared between processes, or across fork())
+ *   read-only mmapping is allowed.
+ *
+ *   HW is continually writing data to the mapped  OA buffer and it conforms to
+ *   the OA format as specified by user config. The buffer provides reports that
+ *   have OA counters - A, B and C.
+ */
+struct drm_i915_perf_oa_buffer_info {
+	__u32 type;   /* in */
+	__u32 flags;  /* in */
+	__u64 size;   /* out */
+	__u64 offset; /* out */
+	__u64 rsvd;   /* mbz */
+};
+
 /*
  * Common to all i915 perf records
  */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 8/8] drm/i915/perf: Map OA buffer to user space for gen12 performance query
@ 2021-08-30 19:38   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-30 19:38 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: dri-devel, daniel.vetter, Joonas Lahtinen, jason

i915 used to support time based sampling mode which is good for overall
system monitoring, but is not enough for query mode used to measure a
single draw call or dispatch. Gen9-Gen11 are using current i915 perf
implementation for query, but Gen12+ requires a new approach for query
based on triggered reports within oa buffer.

Triggering reports into the OA buffer is achieved by writing into a
a trigger register. Optionally an unused counter/register is set with a
marker value such that a triggered report can be identified in the OA
buffer. Reports are usually triggered at the start and end of work that
is measured.

Since OA buffer is large and queries can be frequent, an efficient way
to look for triggered reports is required. By knowing the current head
and tail offsets into the OA buffer, it is easier to determine the
locality of the reports of interest.

Current perf OA interface does not expose head/tail information to the
user and it filters out invalid reports before sending data to user.
Also considering limited size of user buffer used during a query,
creating a 1:1 copy of the OA buffer at the user space added undesired
complexity.

The solution was to map the OA buffer to user space provided

(1) that it is accessed from a privileged user.
(2) OA report filtering is not used.

These 2 conditions would satisfy the safety criteria that the current
perf interface addresses.

To enable the query:
- Add an ioctl to expose head and tail to the user
- Add an ioctl to return size and offset of the OA buffer
- Map the OA buffer to the user space

v2:
- Improve commit message (Chris)
- Do not mmap based on gem object filp. Instead, use perf_fd and support
  mmap syscall (Chris)
- Pass non-zero offset in mmap to enforce the right object is
  mapped (Chris)
- Do not expose gpu_address (Chris)
- Verify start and length of vma for page alignment (Lionel)
- Move SQNTL config out (Lionel)

v3: (Chris)
- Omit redundant checks
- Return VM_FAULT_SIGBUS is old stream is closed
- Maintain reference counts to stream in vm_open and vm_close
- Use switch to identify object to be mapped

v4: Call kref_put on closing perf fd (Chris)
v5:
- Strip access to OA buffer from unprivileged child of a privileged
  parent. Use VM_DONTCOPY
- Enforce MAP_PRIVATE by checking for VM_MAYSHARE

v6:
(Chris)
- Use len of -1 in unmap_mapping_range
- Don't use stream->oa_buffer.vma->obj in vm_fault_oa
- Use kernel block comment style
- do_mmap gets a reference to the file and puts it in do_munmap, so
  no need to maintain a reference to i915_perf_stream. Hence, remove
  vm_open/vm_close and stream->closed hooks/checks.
(Umesh)
- Do not allow mmap if SAMPLE_OA_REPORT is not set during
  i915_perf_open_ioctl.
- Drop ioctl returning head/tail since this information is already
  whitelisted. Remove hooks to read head register.

v7: (Chris)
- unmap before destroy
- change ioctl argument struct

v8: Documentation and more checks (Chris)
v9: Fix comment style (Umesh)
v10: Update uapi comment (Ashutosh)

Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.h |   2 +
 drivers/gpu/drm/i915/i915_perf.c         | 126 ++++++++++++++++++++++-
 include/uapi/drm/i915_drm.h              |  33 ++++++
 4 files changed, 161 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 5130e8ed9564..84cdce2ee447 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -213,7 +213,7 @@ compute_partial_view(const struct drm_i915_gem_object *obj,
 	return view;
 }
 
-static vm_fault_t i915_error_to_vmf_fault(int err)
+vm_fault_t i915_error_to_vmf_fault(int err)
 {
 	switch (err) {
 	default:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
index efee9e0d2508..1190a3a228ea 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
@@ -29,4 +29,6 @@ void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
 
 void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);
 
+vm_fault_t i915_error_to_vmf_fault(int err);
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index de3d1738aabe..1f8d4f3a2148 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -192,10 +192,12 @@
  */
 
 #include <linux/anon_inodes.h>
+#include <linux/mman.h>
 #include <linux/sizes.h>
 #include <linux/uuid.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_mman.h"
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_engine_user.h"
 #include "gt/intel_execlists_submission.h"
@@ -3322,6 +3324,44 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
 	return ret;
 }
 
+#define I915_PERF_OA_BUFFER_MMAP_OFFSET 1
+
+/**
+ * i915_perf_oa_buffer_info_locked - size and offset of the OA buffer
+ * @stream: i915 perf stream
+ * @cmd: ioctl command
+ * @arg: pointer to oa buffer info filled by this function.
+ */
+static int i915_perf_oa_buffer_info_locked(struct i915_perf_stream *stream,
+					   unsigned int cmd,
+					   unsigned long arg)
+{
+	struct drm_i915_perf_oa_buffer_info info;
+	void __user *output = (void __user *)arg;
+
+	if (i915_perf_stream_paranoid && !perfmon_capable()) {
+		DRM_DEBUG("Insufficient privileges to access OA buffer info\n");
+		return -EACCES;
+	}
+
+	if (_IOC_SIZE(cmd) != sizeof(info))
+		return -EINVAL;
+
+	if (copy_from_user(&info, output, sizeof(info)))
+		return -EFAULT;
+
+	if (info.type || info.flags || info.rsvd)
+		return -EINVAL;
+
+	info.size = stream->oa_buffer.vma->size;
+	info.offset = I915_PERF_OA_BUFFER_MMAP_OFFSET * PAGE_SIZE;
+
+	if (copy_to_user(output, &info, sizeof(info)))
+		return -EFAULT;
+
+	return 0;
+}
+
 /**
  * i915_perf_ioctl_locked - support ioctl() usage with i915 perf stream FDs
  * @stream: An i915 perf stream
@@ -3347,6 +3387,8 @@ static long i915_perf_ioctl_locked(struct i915_perf_stream *stream,
 		return 0;
 	case I915_PERF_IOCTL_CONFIG:
 		return i915_perf_config_locked(stream, arg);
+	case I915_PERF_IOCTL_GET_OA_BUFFER_INFO:
+		return i915_perf_oa_buffer_info_locked(stream, cmd, arg);
 	}
 
 	return -EINVAL;
@@ -3418,6 +3460,14 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	struct i915_perf_stream *stream = file->private_data;
 	struct i915_perf *perf = stream->perf;
 
+	/*
+	 * User could have multiple vmas from multiple mmaps. We want to zap
+	 * them all here. Note that a fresh fault cannot occur as the mmap holds
+	 * a reference to the stream via the vma->vm_file, so before user's
+	 * munmap, the stream cannot be destroyed.
+	 */
+	unmap_mapping_range(file->f_mapping, 0, -1, 1);
+
 	mutex_lock(&perf->lock);
 	i915_perf_destroy_locked(stream);
 	mutex_unlock(&perf->lock);
@@ -3428,6 +3478,75 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static vm_fault_t vm_fault_oa(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct i915_perf_stream *stream = vma->vm_private_data;
+	int err;
+
+	err = remap_io_sg(vma,
+			  vma->vm_start, vma->vm_end - vma->vm_start,
+			  stream->oa_buffer.vma->pages->sgl, -1);
+
+	return i915_error_to_vmf_fault(err);
+}
+
+static const struct vm_operations_struct vm_ops_oa = {
+	.fault = vm_fault_oa,
+};
+
+static int i915_perf_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct i915_perf_stream *stream = file->private_data;
+
+	/* mmap-ing OA buffer to user space MUST absolutely be privileged */
+	if (i915_perf_stream_paranoid && !perfmon_capable()) {
+		DRM_DEBUG("Insufficient privileges to map OA buffer\n");
+		return -EACCES;
+	}
+
+	switch (vma->vm_pgoff) {
+	/*
+	 * A non-zero offset ensures that we are mapping the right object. Also
+	 * leaves room for future objects added to this implementation.
+	 */
+	case I915_PERF_OA_BUFFER_MMAP_OFFSET:
+		if (!(stream->sample_flags & SAMPLE_OA_REPORT))
+			return -EINVAL;
+
+		if (vma->vm_end - vma->vm_start > stream->oa_buffer.vma->size)
+			return -EINVAL;
+
+		/*
+		 * Only support VM_READ. Enforce MAP_PRIVATE by checking for
+		 * VM_MAYSHARE.
+		 */
+		if (vma->vm_flags & (VM_WRITE | VM_EXEC |
+				     VM_SHARED | VM_MAYSHARE))
+			return -EINVAL;
+
+		vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+		/*
+		 * If the privileged parent forks and child drops root
+		 * privilege, we do not want the child to retain access to the
+		 * mapped OA buffer. Explicitly set VM_DONTCOPY to avoid such
+		 * cases.
+		 */
+		vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND |
+				 VM_DONTDUMP | VM_DONTCOPY;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+	vma->vm_private_data = stream;
+	vma->vm_ops = &vm_ops_oa;
+
+	return 0;
+}
 
 static const struct file_operations fops = {
 	.owner		= THIS_MODULE,
@@ -3440,6 +3559,7 @@ static const struct file_operations fops = {
 	 * to handle 32bits compatibility.
 	 */
 	.compat_ioctl   = i915_perf_ioctl,
+	.mmap		= i915_perf_mmap,
 };
 
 
@@ -4639,8 +4759,12 @@ int i915_perf_ioctl_version(void)
 	 *
 	 *    - OA buffer head/tail/status/buffer registers for read only
 	 *    - OA counters A18, A19, A20 for read/write
+	 *
+	 * 8: Added an option to map oa buffer at umd driver level and trigger
+	 *    oa reports within oa buffer from command buffer. See
+	 *    I915_PERF_IOCTL_GET_OA_BUFFER_INFO.
 	 */
-	return 7;
+	return 8;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index bde5860b3686..2c17fe845604 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2417,6 +2417,39 @@ struct drm_i915_perf_open_param {
  */
 #define I915_PERF_IOCTL_CONFIG	_IO('i', 0x2)
 
+/*
+ * Returns OA buffer properties to be used with mmap.
+ *
+ * This ioctl is available in perf revision 8.
+ */
+#define I915_PERF_IOCTL_GET_OA_BUFFER_INFO _IOWR('i', 0x3, struct drm_i915_perf_oa_buffer_info)
+
+/*
+ * OA buffer size and offset.
+ *
+ * OA output buffer
+ *   type: 0
+ *   flags: mbz
+ *
+ *   After querying the info, pass (size,offset) to mmap(),
+ *
+ *   mmap(0, info.size, PROT_READ, MAP_PRIVATE, perf_fd, info.offset).
+ *
+ *   Note that only a private (not shared between processes, or across fork())
+ *   read-only mmapping is allowed.
+ *
+ *   HW is continually writing data to the mapped  OA buffer and it conforms to
+ *   the OA format as specified by user config. The buffer provides reports that
+ *   have OA counters - A, B and C.
+ */
+struct drm_i915_perf_oa_buffer_info {
+	__u32 type;   /* in */
+	__u32 flags;  /* in */
+	__u64 size;   /* out */
+	__u64 offset; /* out */
+	__u64 rsvd;   /* mbz */
+};
+
 /*
  * Common to all i915 perf records
  */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for Enable triggered perf query for Xe_HP (rev2)
  2021-08-30 19:38 ` [Intel-gfx] " Umesh Nerlige Ramappa
                   ` (8 preceding siblings ...)
  (?)
@ 2021-08-30 22:16 ` Patchwork
  -1 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2021-08-30 22:16 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 4963 bytes --]

== Series Details ==

Series: Enable triggered perf query for Xe_HP (rev2)
URL   : https://patchwork.freedesktop.org/series/93357/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10538 -> Patchwork_20922
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_20922 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_20922, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20922/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_20922:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@workarounds:
    - fi-bsw-kefka:       [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10538/fi-bsw-kefka/igt@i915_selftest@live@workarounds.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20922/fi-bsw-kefka/igt@i915_selftest@live@workarounds.html
    - fi-bsw-nick:        [PASS][3] -> [DMESG-FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10538/fi-bsw-nick/igt@i915_selftest@live@workarounds.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20922/fi-bsw-nick/igt@i915_selftest@live@workarounds.html

  
Known issues
------------

  Here are the changes found in Patchwork_20922 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@cs-gfx:
    - fi-rkl-guc:         NOTRUN -> [SKIP][5] ([fdo#109315]) +17 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20922/fi-rkl-guc/igt@amdgpu/amd_basic@cs-gfx.html

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-icl-y:           [PASS][6] -> [DMESG-FAIL][7] ([i915#2291] / [i915#541])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10538/fi-icl-y/igt@i915_selftest@live@gt_heartbeat.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20922/fi-icl-y/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@gt_lrc:
    - fi-rkl-guc:         NOTRUN -> [DMESG-WARN][8] ([i915#3958])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20922/fi-rkl-guc/igt@i915_selftest@live@gt_lrc.html

  
#### Possible fixes ####

  * igt@core_hotunplug@unbind-rebind:
    - fi-rkl-guc:         [DMESG-WARN][9] ([i915#3925]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10538/fi-rkl-guc/igt@core_hotunplug@unbind-rebind.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20922/fi-rkl-guc/igt@core_hotunplug@unbind-rebind.html

  * igt@gem_exec_suspend@basic-s3:
    - fi-tgl-1115g4:      [FAIL][11] ([i915#1888]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10538/fi-tgl-1115g4/igt@gem_exec_suspend@basic-s3.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20922/fi-tgl-1115g4/igt@gem_exec_suspend@basic-s3.html

  
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#2291]: https://gitlab.freedesktop.org/drm/intel/issues/2291
  [i915#3925]: https://gitlab.freedesktop.org/drm/intel/issues/3925
  [i915#3958]: https://gitlab.freedesktop.org/drm/intel/issues/3958
  [i915#541]: https://gitlab.freedesktop.org/drm/intel/issues/541


Participating hosts (44 -> 35)
------------------------------

  Missing    (9): fi-kbl-soraka fi-ilk-m540 bat-adls-5 bat-dg1-6 bat-dg1-5 fi-bsw-cyan fi-ctg-p8600 fi-bdw-samus bat-jsl-1 


Build changes
-------------

  * IGT: IGT_6191 -> IGTPW_6174
  * Linux: CI_DRM_10538 -> Patchwork_20922

  CI-20190529: 20190529
  CI_DRM_10538: 031f5dcf3ec226bcd0d5f700347780d51cddd2eb @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_6174: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6174/index.html
  IGT_6191: e9292b533691784f46eeb9bae522ca7a8710c920 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_20922: ba4c9f5e219912607fd084988eb7b3e3ca3ce68b @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

ba4c9f5e2199 drm/i915/perf: Map OA buffer to user space for gen12 performance query
0e1c469625db drm/i915/perf: Whitelist OA counter and buffer registers
316d15c65b1a drm/i915/perf: Whitelist OA report trigger registers
818814290091 drm/i915/perf: Ensure observation logic is not clock gated
11ac7d470d99 drm/i915/gt: Enable dynamic adjustment of RING_NONPRIV
813712375355 drm/i915/gt: Check for conflicting RING_NONPRIV
b0e3395c01ae drm/i915/gt: Refactor _wa_add to reuse wa_index and wa_list_grow
200d46fa9ad3 drm/i915/gt: Lock intel_engine_apply_whitelist with uncore->lock

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20922/index.html

[-- Attachment #2: Type: text/html, Size: 5810 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 3/8] drm/i915/gt: Check for conflicting RING_NONPRIV
  2021-08-30 19:38   ` [Intel-gfx] " Umesh Nerlige Ramappa
@ 2021-08-31  3:39     ` kernel test robot
  -1 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2021-08-31  3:39 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx, Lionel G Landwerlin, Ashutosh Dixit
  Cc: llvm, kbuild-all, dri-devel, daniel.vetter, Joonas Lahtinen, jason

[-- Attachment #1: Type: text/plain, Size: 2209 bytes --]

Hi Umesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on drm-tip/drm-tip drm-exynos/exynos-drm-next next-20210830]
[cannot apply to tegra-drm/drm/tegra/for-next linus/master drm/drm-next v5.14]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Umesh-Nerlige-Ramappa/Enable-triggered-perf-query-for-Xe_HP/20210831-034105
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: x86_64-randconfig-a004-20210830 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 4b1fde8a2b681dad2ce0c082a5d6422caa06b0bc)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/81219176b43209d12b8212d1281fbfed9546087f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Umesh-Nerlige-Ramappa/Enable-triggered-perf-query-for-Xe_HP/20210831-034105
        git checkout 81219176b43209d12b8212d1281fbfed9546087f
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/i915/gt/intel_workarounds.c:88:12: warning: function 'reg_flags' is not needed and will not be emitted [-Wunneeded-internal-declaration]
   static u32 reg_flags(i915_reg_t reg)
              ^
   1 warning generated.


vim +/reg_flags +88 drivers/gpu/drm/i915/gt/intel_workarounds.c

    87	
  > 88	static u32 reg_flags(i915_reg_t reg)
    89	{
    90		return i915_mmio_reg_offset(reg) & ~RING_FORCE_TO_NONPRIV_ADDRESS_MASK;
    91	}
    92	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34354 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 3/8] drm/i915/gt: Check for conflicting RING_NONPRIV
@ 2021-08-31  3:39     ` kernel test robot
  0 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2021-08-31  3:39 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2259 bytes --]

Hi Umesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on drm-tip/drm-tip drm-exynos/exynos-drm-next next-20210830]
[cannot apply to tegra-drm/drm/tegra/for-next linus/master drm/drm-next v5.14]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Umesh-Nerlige-Ramappa/Enable-triggered-perf-query-for-Xe_HP/20210831-034105
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: x86_64-randconfig-a004-20210830 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 4b1fde8a2b681dad2ce0c082a5d6422caa06b0bc)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/81219176b43209d12b8212d1281fbfed9546087f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Umesh-Nerlige-Ramappa/Enable-triggered-perf-query-for-Xe_HP/20210831-034105
        git checkout 81219176b43209d12b8212d1281fbfed9546087f
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/i915/gt/intel_workarounds.c:88:12: warning: function 'reg_flags' is not needed and will not be emitted [-Wunneeded-internal-declaration]
   static u32 reg_flags(i915_reg_t reg)
              ^
   1 warning generated.


vim +/reg_flags +88 drivers/gpu/drm/i915/gt/intel_workarounds.c

    87	
  > 88	static u32 reg_flags(i915_reg_t reg)
    89	{
    90		return i915_mmio_reg_offset(reg) & ~RING_FORCE_TO_NONPRIV_ADDRESS_MASK;
    91	}
    92	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 34354 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 8/8] drm/i915/perf: Map OA buffer to user space for gen12 performance query
  2021-08-30 19:38   ` Umesh Nerlige Ramappa
@ 2021-08-31 12:55     ` Daniel Vetter
  -1 siblings, 0 replies; 26+ messages in thread
From: Daniel Vetter @ 2021-08-31 12:55 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa
  Cc: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, dri-devel,
	daniel.vetter, Joonas Lahtinen, jason

On Mon, Aug 30, 2021 at 12:38:51PM -0700, Umesh Nerlige Ramappa wrote:
> i915 used to support time based sampling mode which is good for overall
> system monitoring, but is not enough for query mode used to measure a
> single draw call or dispatch. Gen9-Gen11 are using current i915 perf
> implementation for query, but Gen12+ requires a new approach for query
> based on triggered reports within oa buffer.
> 
> Triggering reports into the OA buffer is achieved by writing into a
> a trigger register. Optionally an unused counter/register is set with a
> marker value such that a triggered report can be identified in the OA
> buffer. Reports are usually triggered at the start and end of work that
> is measured.
> 
> Since OA buffer is large and queries can be frequent, an efficient way
> to look for triggered reports is required. By knowing the current head
> and tail offsets into the OA buffer, it is easier to determine the
> locality of the reports of interest.
> 
> Current perf OA interface does not expose head/tail information to the
> user and it filters out invalid reports before sending data to user.
> Also considering limited size of user buffer used during a query,
> creating a 1:1 copy of the OA buffer at the user space added undesired
> complexity.
> 
> The solution was to map the OA buffer to user space provided
> 
> (1) that it is accessed from a privileged user.
> (2) OA report filtering is not used.
> 
> These 2 conditions would satisfy the safety criteria that the current
> perf interface addresses.

This is a perf improvement. Please include benchmark numbers to justify
it.

> 
> To enable the query:
> - Add an ioctl to expose head and tail to the user
> - Add an ioctl to return size and offset of the OA buffer
> - Map the OA buffer to the user space
> 
> v2:
> - Improve commit message (Chris)
> - Do not mmap based on gem object filp. Instead, use perf_fd and support
>   mmap syscall (Chris)
> - Pass non-zero offset in mmap to enforce the right object is
>   mapped (Chris)
> - Do not expose gpu_address (Chris)
> - Verify start and length of vma for page alignment (Lionel)
> - Move SQNTL config out (Lionel)
> 
> v3: (Chris)
> - Omit redundant checks
> - Return VM_FAULT_SIGBUS is old stream is closed
> - Maintain reference counts to stream in vm_open and vm_close
> - Use switch to identify object to be mapped
> 
> v4: Call kref_put on closing perf fd (Chris)
> v5:
> - Strip access to OA buffer from unprivileged child of a privileged
>   parent. Use VM_DONTCOPY
> - Enforce MAP_PRIVATE by checking for VM_MAYSHARE
> 
> v6:
> (Chris)
> - Use len of -1 in unmap_mapping_range
> - Don't use stream->oa_buffer.vma->obj in vm_fault_oa
> - Use kernel block comment style
> - do_mmap gets a reference to the file and puts it in do_munmap, so
>   no need to maintain a reference to i915_perf_stream. Hence, remove
>   vm_open/vm_close and stream->closed hooks/checks.
> (Umesh)
> - Do not allow mmap if SAMPLE_OA_REPORT is not set during
>   i915_perf_open_ioctl.
> - Drop ioctl returning head/tail since this information is already
>   whitelisted. Remove hooks to read head register.
> 
> v7: (Chris)
> - unmap before destroy
> - change ioctl argument struct
> 
> v8: Documentation and more checks (Chris)
> v9: Fix comment style (Umesh)
> v10: Update uapi comment (Ashutosh)
> 
> Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_mman.h |   2 +
>  drivers/gpu/drm/i915/i915_perf.c         | 126 ++++++++++++++++++++++-
>  include/uapi/drm/i915_drm.h              |  33 ++++++
>  4 files changed, 161 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index 5130e8ed9564..84cdce2ee447 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -213,7 +213,7 @@ compute_partial_view(const struct drm_i915_gem_object *obj,
>  	return view;
>  }
>  
> -static vm_fault_t i915_error_to_vmf_fault(int err)
> +vm_fault_t i915_error_to_vmf_fault(int err)
>  {
>  	switch (err) {
>  	default:
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
> index efee9e0d2508..1190a3a228ea 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
> @@ -29,4 +29,6 @@ void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
>  
>  void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);
>  
> +vm_fault_t i915_error_to_vmf_fault(int err);
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index de3d1738aabe..1f8d4f3a2148 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -192,10 +192,12 @@
>   */
>  
>  #include <linux/anon_inodes.h>
> +#include <linux/mman.h>
>  #include <linux/sizes.h>
>  #include <linux/uuid.h>
>  
>  #include "gem/i915_gem_context.h"
> +#include "gem/i915_gem_mman.h"
>  #include "gt/intel_engine_pm.h"
>  #include "gt/intel_engine_user.h"
>  #include "gt/intel_execlists_submission.h"
> @@ -3322,6 +3324,44 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
>  	return ret;
>  }
>  
> +#define I915_PERF_OA_BUFFER_MMAP_OFFSET 1
> +
> +/**
> + * i915_perf_oa_buffer_info_locked - size and offset of the OA buffer
> + * @stream: i915 perf stream
> + * @cmd: ioctl command
> + * @arg: pointer to oa buffer info filled by this function.
> + */
> +static int i915_perf_oa_buffer_info_locked(struct i915_perf_stream *stream,
> +					   unsigned int cmd,
> +					   unsigned long arg)
> +{
> +	struct drm_i915_perf_oa_buffer_info info;
> +	void __user *output = (void __user *)arg;
> +
> +	if (i915_perf_stream_paranoid && !perfmon_capable()) {
> +		DRM_DEBUG("Insufficient privileges to access OA buffer info\n");
> +		return -EACCES;
> +	}
> +
> +	if (_IOC_SIZE(cmd) != sizeof(info))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&info, output, sizeof(info)))
> +		return -EFAULT;
> +
> +	if (info.type || info.flags || info.rsvd)
> +		return -EINVAL;
> +
> +	info.size = stream->oa_buffer.vma->size;
> +	info.offset = I915_PERF_OA_BUFFER_MMAP_OFFSET * PAGE_SIZE;
> +
> +	if (copy_to_user(output, &info, sizeof(info)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  /**
>   * i915_perf_ioctl_locked - support ioctl() usage with i915 perf stream FDs
>   * @stream: An i915 perf stream
> @@ -3347,6 +3387,8 @@ static long i915_perf_ioctl_locked(struct i915_perf_stream *stream,
>  		return 0;
>  	case I915_PERF_IOCTL_CONFIG:
>  		return i915_perf_config_locked(stream, arg);
> +	case I915_PERF_IOCTL_GET_OA_BUFFER_INFO:
> +		return i915_perf_oa_buffer_info_locked(stream, cmd, arg);
>  	}
>  
>  	return -EINVAL;
> @@ -3418,6 +3460,14 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>  	struct i915_perf_stream *stream = file->private_data;
>  	struct i915_perf *perf = stream->perf;
>  
> +	/*
> +	 * User could have multiple vmas from multiple mmaps. We want to zap
> +	 * them all here. Note that a fresh fault cannot occur as the mmap holds
> +	 * a reference to the stream via the vma->vm_file, so before user's
> +	 * munmap, the stream cannot be destroyed.
> +	 */
> +	unmap_mapping_range(file->f_mapping, 0, -1, 1);
> +
>  	mutex_lock(&perf->lock);
>  	i915_perf_destroy_locked(stream);
>  	mutex_unlock(&perf->lock);
> @@ -3428,6 +3478,75 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +static vm_fault_t vm_fault_oa(struct vm_fault *vmf)
> +{
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct i915_perf_stream *stream = vma->vm_private_data;
> +	int err;
> +
> +	err = remap_io_sg(vma,
> +			  vma->vm_start, vma->vm_end - vma->vm_start,
> +			  stream->oa_buffer.vma->pages->sgl, -1);
> +
> +	return i915_error_to_vmf_fault(err);
> +}
> +
> +static const struct vm_operations_struct vm_ops_oa = {
> +	.fault = vm_fault_oa,
> +};
> +
> +static int i915_perf_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct i915_perf_stream *stream = file->private_data;
> +
> +	/* mmap-ing OA buffer to user space MUST absolutely be privileged */
> +	if (i915_perf_stream_paranoid && !perfmon_capable()) {
> +		DRM_DEBUG("Insufficient privileges to map OA buffer\n");
> +		return -EACCES;
> +	}
> +
> +	switch (vma->vm_pgoff) {
> +	/*
> +	 * A non-zero offset ensures that we are mapping the right object. Also
> +	 * leaves room for future objects added to this implementation.
> +	 */
> +	case I915_PERF_OA_BUFFER_MMAP_OFFSET:
> +		if (!(stream->sample_flags & SAMPLE_OA_REPORT))
> +			return -EINVAL;
> +
> +		if (vma->vm_end - vma->vm_start > stream->oa_buffer.vma->size)
> +			return -EINVAL;
> +
> +		/*
> +		 * Only support VM_READ. Enforce MAP_PRIVATE by checking for
> +		 * VM_MAYSHARE.
> +		 */
> +		if (vma->vm_flags & (VM_WRITE | VM_EXEC |
> +				     VM_SHARED | VM_MAYSHARE))
> +			return -EINVAL;
> +
> +		vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
> +
> +		/*
> +		 * If the privileged parent forks and child drops root
> +		 * privilege, we do not want the child to retain access to the
> +		 * mapped OA buffer. Explicitly set VM_DONTCOPY to avoid such
> +		 * cases.
> +		 */
> +		vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND |
> +				 VM_DONTDUMP | VM_DONTCOPY;
> +		break;
> +
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> +	vma->vm_private_data = stream;
> +	vma->vm_ops = &vm_ops_oa;
> +
> +	return 0;
> +}
>  
>  static const struct file_operations fops = {
>  	.owner		= THIS_MODULE,
> @@ -3440,6 +3559,7 @@ static const struct file_operations fops = {
>  	 * to handle 32bits compatibility.
>  	 */
>  	.compat_ioctl   = i915_perf_ioctl,
> +	.mmap		= i915_perf_mmap,
>  };
>  
>  
> @@ -4639,8 +4759,12 @@ int i915_perf_ioctl_version(void)
>  	 *
>  	 *    - OA buffer head/tail/status/buffer registers for read only
>  	 *    - OA counters A18, A19, A20 for read/write
> +	 *
> +	 * 8: Added an option to map oa buffer at umd driver level and trigger
> +	 *    oa reports within oa buffer from command buffer. See
> +	 *    I915_PERF_IOCTL_GET_OA_BUFFER_INFO.
>  	 */
> -	return 7;
> +	return 8;
>  }
>  
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index bde5860b3686..2c17fe845604 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2417,6 +2417,39 @@ struct drm_i915_perf_open_param {
>   */
>  #define I915_PERF_IOCTL_CONFIG	_IO('i', 0x2)
>  
> +/*

Please make this proper kerneldoc. Ideally also fix up all the other perf
related uapi and also document it all with kerneldoc.

Please make sure the result looks good with

$ make htmldocs

Also, this is uapi, therefore your patch needs to include
- link to the igts for this
- link to the userspace MR that uses this (I guess it's mesa?)

Cheers, Daniel

> + * Returns OA buffer properties to be used with mmap.
> + *
> + * This ioctl is available in perf revision 8.
> + */
> +#define I915_PERF_IOCTL_GET_OA_BUFFER_INFO _IOWR('i', 0x3, struct drm_i915_perf_oa_buffer_info)
> +
> +/*
> + * OA buffer size and offset.
> + *
> + * OA output buffer
> + *   type: 0
> + *   flags: mbz
> + *
> + *   After querying the info, pass (size,offset) to mmap(),
> + *
> + *   mmap(0, info.size, PROT_READ, MAP_PRIVATE, perf_fd, info.offset).
> + *
> + *   Note that only a private (not shared between processes, or across fork())
> + *   read-only mmapping is allowed.
> + *
> + *   HW is continually writing data to the mapped  OA buffer and it conforms to
> + *   the OA format as specified by user config. The buffer provides reports that
> + *   have OA counters - A, B and C.
> + */
> +struct drm_i915_perf_oa_buffer_info {
> +	__u32 type;   /* in */
> +	__u32 flags;  /* in */
> +	__u64 size;   /* out */
> +	__u64 offset; /* out */
> +	__u64 rsvd;   /* mbz */
> +};
> +
>  /*
>   * Common to all i915 perf records
>   */
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915/perf: Map OA buffer to user space for gen12 performance query
@ 2021-08-31 12:55     ` Daniel Vetter
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Vetter @ 2021-08-31 12:55 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa
  Cc: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, dri-devel,
	daniel.vetter, Joonas Lahtinen, jason

On Mon, Aug 30, 2021 at 12:38:51PM -0700, Umesh Nerlige Ramappa wrote:
> i915 used to support time based sampling mode which is good for overall
> system monitoring, but is not enough for query mode used to measure a
> single draw call or dispatch. Gen9-Gen11 are using current i915 perf
> implementation for query, but Gen12+ requires a new approach for query
> based on triggered reports within oa buffer.
> 
> Triggering reports into the OA buffer is achieved by writing into a
> a trigger register. Optionally an unused counter/register is set with a
> marker value such that a triggered report can be identified in the OA
> buffer. Reports are usually triggered at the start and end of work that
> is measured.
> 
> Since OA buffer is large and queries can be frequent, an efficient way
> to look for triggered reports is required. By knowing the current head
> and tail offsets into the OA buffer, it is easier to determine the
> locality of the reports of interest.
> 
> Current perf OA interface does not expose head/tail information to the
> user and it filters out invalid reports before sending data to user.
> Also considering limited size of user buffer used during a query,
> creating a 1:1 copy of the OA buffer at the user space added undesired
> complexity.
> 
> The solution was to map the OA buffer to user space provided
> 
> (1) that it is accessed from a privileged user.
> (2) OA report filtering is not used.
> 
> These 2 conditions would satisfy the safety criteria that the current
> perf interface addresses.

This is a perf improvement. Please include benchmark numbers to justify
it.

> 
> To enable the query:
> - Add an ioctl to expose head and tail to the user
> - Add an ioctl to return size and offset of the OA buffer
> - Map the OA buffer to the user space
> 
> v2:
> - Improve commit message (Chris)
> - Do not mmap based on gem object filp. Instead, use perf_fd and support
>   mmap syscall (Chris)
> - Pass non-zero offset in mmap to enforce the right object is
>   mapped (Chris)
> - Do not expose gpu_address (Chris)
> - Verify start and length of vma for page alignment (Lionel)
> - Move SQNTL config out (Lionel)
> 
> v3: (Chris)
> - Omit redundant checks
> - Return VM_FAULT_SIGBUS is old stream is closed
> - Maintain reference counts to stream in vm_open and vm_close
> - Use switch to identify object to be mapped
> 
> v4: Call kref_put on closing perf fd (Chris)
> v5:
> - Strip access to OA buffer from unprivileged child of a privileged
>   parent. Use VM_DONTCOPY
> - Enforce MAP_PRIVATE by checking for VM_MAYSHARE
> 
> v6:
> (Chris)
> - Use len of -1 in unmap_mapping_range
> - Don't use stream->oa_buffer.vma->obj in vm_fault_oa
> - Use kernel block comment style
> - do_mmap gets a reference to the file and puts it in do_munmap, so
>   no need to maintain a reference to i915_perf_stream. Hence, remove
>   vm_open/vm_close and stream->closed hooks/checks.
> (Umesh)
> - Do not allow mmap if SAMPLE_OA_REPORT is not set during
>   i915_perf_open_ioctl.
> - Drop ioctl returning head/tail since this information is already
>   whitelisted. Remove hooks to read head register.
> 
> v7: (Chris)
> - unmap before destroy
> - change ioctl argument struct
> 
> v8: Documentation and more checks (Chris)
> v9: Fix comment style (Umesh)
> v10: Update uapi comment (Ashutosh)
> 
> Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_mman.h |   2 +
>  drivers/gpu/drm/i915/i915_perf.c         | 126 ++++++++++++++++++++++-
>  include/uapi/drm/i915_drm.h              |  33 ++++++
>  4 files changed, 161 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index 5130e8ed9564..84cdce2ee447 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -213,7 +213,7 @@ compute_partial_view(const struct drm_i915_gem_object *obj,
>  	return view;
>  }
>  
> -static vm_fault_t i915_error_to_vmf_fault(int err)
> +vm_fault_t i915_error_to_vmf_fault(int err)
>  {
>  	switch (err) {
>  	default:
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
> index efee9e0d2508..1190a3a228ea 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
> @@ -29,4 +29,6 @@ void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
>  
>  void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);
>  
> +vm_fault_t i915_error_to_vmf_fault(int err);
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index de3d1738aabe..1f8d4f3a2148 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -192,10 +192,12 @@
>   */
>  
>  #include <linux/anon_inodes.h>
> +#include <linux/mman.h>
>  #include <linux/sizes.h>
>  #include <linux/uuid.h>
>  
>  #include "gem/i915_gem_context.h"
> +#include "gem/i915_gem_mman.h"
>  #include "gt/intel_engine_pm.h"
>  #include "gt/intel_engine_user.h"
>  #include "gt/intel_execlists_submission.h"
> @@ -3322,6 +3324,44 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
>  	return ret;
>  }
>  
> +#define I915_PERF_OA_BUFFER_MMAP_OFFSET 1
> +
> +/**
> + * i915_perf_oa_buffer_info_locked - size and offset of the OA buffer
> + * @stream: i915 perf stream
> + * @cmd: ioctl command
> + * @arg: pointer to oa buffer info filled by this function.
> + */
> +static int i915_perf_oa_buffer_info_locked(struct i915_perf_stream *stream,
> +					   unsigned int cmd,
> +					   unsigned long arg)
> +{
> +	struct drm_i915_perf_oa_buffer_info info;
> +	void __user *output = (void __user *)arg;
> +
> +	if (i915_perf_stream_paranoid && !perfmon_capable()) {
> +		DRM_DEBUG("Insufficient privileges to access OA buffer info\n");
> +		return -EACCES;
> +	}
> +
> +	if (_IOC_SIZE(cmd) != sizeof(info))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&info, output, sizeof(info)))
> +		return -EFAULT;
> +
> +	if (info.type || info.flags || info.rsvd)
> +		return -EINVAL;
> +
> +	info.size = stream->oa_buffer.vma->size;
> +	info.offset = I915_PERF_OA_BUFFER_MMAP_OFFSET * PAGE_SIZE;
> +
> +	if (copy_to_user(output, &info, sizeof(info)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  /**
>   * i915_perf_ioctl_locked - support ioctl() usage with i915 perf stream FDs
>   * @stream: An i915 perf stream
> @@ -3347,6 +3387,8 @@ static long i915_perf_ioctl_locked(struct i915_perf_stream *stream,
>  		return 0;
>  	case I915_PERF_IOCTL_CONFIG:
>  		return i915_perf_config_locked(stream, arg);
> +	case I915_PERF_IOCTL_GET_OA_BUFFER_INFO:
> +		return i915_perf_oa_buffer_info_locked(stream, cmd, arg);
>  	}
>  
>  	return -EINVAL;
> @@ -3418,6 +3460,14 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>  	struct i915_perf_stream *stream = file->private_data;
>  	struct i915_perf *perf = stream->perf;
>  
> +	/*
> +	 * User could have multiple vmas from multiple mmaps. We want to zap
> +	 * them all here. Note that a fresh fault cannot occur as the mmap holds
> +	 * a reference to the stream via the vma->vm_file, so before user's
> +	 * munmap, the stream cannot be destroyed.
> +	 */
> +	unmap_mapping_range(file->f_mapping, 0, -1, 1);
> +
>  	mutex_lock(&perf->lock);
>  	i915_perf_destroy_locked(stream);
>  	mutex_unlock(&perf->lock);
> @@ -3428,6 +3478,75 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +static vm_fault_t vm_fault_oa(struct vm_fault *vmf)
> +{
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct i915_perf_stream *stream = vma->vm_private_data;
> +	int err;
> +
> +	err = remap_io_sg(vma,
> +			  vma->vm_start, vma->vm_end - vma->vm_start,
> +			  stream->oa_buffer.vma->pages->sgl, -1);
> +
> +	return i915_error_to_vmf_fault(err);
> +}
> +
> +static const struct vm_operations_struct vm_ops_oa = {
> +	.fault = vm_fault_oa,
> +};
> +
> +static int i915_perf_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct i915_perf_stream *stream = file->private_data;
> +
> +	/* mmap-ing OA buffer to user space MUST absolutely be privileged */
> +	if (i915_perf_stream_paranoid && !perfmon_capable()) {
> +		DRM_DEBUG("Insufficient privileges to map OA buffer\n");
> +		return -EACCES;
> +	}
> +
> +	switch (vma->vm_pgoff) {
> +	/*
> +	 * A non-zero offset ensures that we are mapping the right object. Also
> +	 * leaves room for future objects added to this implementation.
> +	 */
> +	case I915_PERF_OA_BUFFER_MMAP_OFFSET:
> +		if (!(stream->sample_flags & SAMPLE_OA_REPORT))
> +			return -EINVAL;
> +
> +		if (vma->vm_end - vma->vm_start > stream->oa_buffer.vma->size)
> +			return -EINVAL;
> +
> +		/*
> +		 * Only support VM_READ. Enforce MAP_PRIVATE by checking for
> +		 * VM_MAYSHARE.
> +		 */
> +		if (vma->vm_flags & (VM_WRITE | VM_EXEC |
> +				     VM_SHARED | VM_MAYSHARE))
> +			return -EINVAL;
> +
> +		vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
> +
> +		/*
> +		 * If the privileged parent forks and child drops root
> +		 * privilege, we do not want the child to retain access to the
> +		 * mapped OA buffer. Explicitly set VM_DONTCOPY to avoid such
> +		 * cases.
> +		 */
> +		vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND |
> +				 VM_DONTDUMP | VM_DONTCOPY;
> +		break;
> +
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> +	vma->vm_private_data = stream;
> +	vma->vm_ops = &vm_ops_oa;
> +
> +	return 0;
> +}
>  
>  static const struct file_operations fops = {
>  	.owner		= THIS_MODULE,
> @@ -3440,6 +3559,7 @@ static const struct file_operations fops = {
>  	 * to handle 32bits compatibility.
>  	 */
>  	.compat_ioctl   = i915_perf_ioctl,
> +	.mmap		= i915_perf_mmap,
>  };
>  
>  
> @@ -4639,8 +4759,12 @@ int i915_perf_ioctl_version(void)
>  	 *
>  	 *    - OA buffer head/tail/status/buffer registers for read only
>  	 *    - OA counters A18, A19, A20 for read/write
> +	 *
> +	 * 8: Added an option to map oa buffer at umd driver level and trigger
> +	 *    oa reports within oa buffer from command buffer. See
> +	 *    I915_PERF_IOCTL_GET_OA_BUFFER_INFO.
>  	 */
> -	return 7;
> +	return 8;
>  }
>  
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index bde5860b3686..2c17fe845604 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2417,6 +2417,39 @@ struct drm_i915_perf_open_param {
>   */
>  #define I915_PERF_IOCTL_CONFIG	_IO('i', 0x2)
>  
> +/*

Please make this proper kerneldoc. Ideally also fix up all the other perf
related uapi and also document it all with kerneldoc.

Please make sure the result looks good with

$ make htmldocs

Also, this is uapi, therefore your patch needs to include
- link to the igts for this
- link to the userspace MR that uses this (I guess it's mesa?)

Cheers, Daniel

> + * Returns OA buffer properties to be used with mmap.
> + *
> + * This ioctl is available in perf revision 8.
> + */
> +#define I915_PERF_IOCTL_GET_OA_BUFFER_INFO _IOWR('i', 0x3, struct drm_i915_perf_oa_buffer_info)
> +
> +/*
> + * OA buffer size and offset.
> + *
> + * OA output buffer
> + *   type: 0
> + *   flags: mbz
> + *
> + *   After querying the info, pass (size,offset) to mmap(),
> + *
> + *   mmap(0, info.size, PROT_READ, MAP_PRIVATE, perf_fd, info.offset).
> + *
> + *   Note that only a private (not shared between processes, or across fork())
> + *   read-only mmapping is allowed.
> + *
> + *   HW is continually writing data to the mapped  OA buffer and it conforms to
> + *   the OA format as specified by user config. The buffer provides reports that
> + *   have OA counters - A, B and C.
> + */
> +struct drm_i915_perf_oa_buffer_info {
> +	__u32 type;   /* in */
> +	__u32 flags;  /* in */
> +	__u64 size;   /* out */
> +	__u64 offset; /* out */
> +	__u64 rsvd;   /* mbz */
> +};
> +
>  /*
>   * Common to all i915 perf records
>   */
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 8/8] drm/i915/perf: Map OA buffer to user space for gen12 performance query
  2021-08-31 12:55     ` [Intel-gfx] " Daniel Vetter
@ 2021-08-31 20:35       ` Umesh Nerlige Ramappa
  -1 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-31 20:35 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, dri-devel,
	daniel.vetter, Joonas Lahtinen, jason

On Tue, Aug 31, 2021 at 02:55:37PM +0200, Daniel Vetter wrote:
>On Mon, Aug 30, 2021 at 12:38:51PM -0700, Umesh Nerlige Ramappa wrote:
>> i915 used to support time based sampling mode which is good for overall
>> system monitoring, but is not enough for query mode used to measure a
>> single draw call or dispatch. Gen9-Gen11 are using current i915 perf
>> implementation for query, but Gen12+ requires a new approach for query
>> based on triggered reports within oa buffer.
>>
>> Triggering reports into the OA buffer is achieved by writing into a
>> a trigger register. Optionally an unused counter/register is set with a
>> marker value such that a triggered report can be identified in the OA
>> buffer. Reports are usually triggered at the start and end of work that
>> is measured.
>>
>> Since OA buffer is large and queries can be frequent, an efficient way
>> to look for triggered reports is required. By knowing the current head
>> and tail offsets into the OA buffer, it is easier to determine the
>> locality of the reports of interest.
>>
>> Current perf OA interface does not expose head/tail information to the
>> user and it filters out invalid reports before sending data to user.
>> Also considering limited size of user buffer used during a query,
>> creating a 1:1 copy of the OA buffer at the user space added undesired
>> complexity.
>>
>> The solution was to map the OA buffer to user space provided
>>
>> (1) that it is accessed from a privileged user.
>> (2) OA report filtering is not used.
>>
>> These 2 conditions would satisfy the safety criteria that the current
>> perf interface addresses.
>
>This is a perf improvement. Please include benchmark numbers to justify
>it.

OA supports 2 mechanisms of perf measurements.

1) query interface where perf countes can be queried.
2) OA buffer use case where counter-snapshots are captured periodically 
and analyzed for performance.

This patch series is actually just (1) query interface implementation 
for discrete and not a perf improvement.

The old mechanism to query OA report (MI_REPORT_PERF_COUNT) is not 
available for all engines. In the new mechanism, a query is triggered 
from a batch by writing to a whitelisted OA trigger register. Once a 
query is triggered, the resulting report is captured in the OA buffer.  
To locate the report quickly, the batch also captures the OA HW tail 
pointer before/after writing to the trigger register. This gives the 
user a window/locality in the OA buffer where the report of interest 
lies.  

For this new mechanism, the current interface to read reports from the 
OA buffer is inefficient since the reads are sequential and reports are 
copied to user buffer.

mmap provides an accurate and faster way to fetch the queried reports 
based on locality.

Note that mmap does not replace the OA buffer use case from (2) which 
still reads reports sequentially to analyze performance.

>
>>
>> To enable the query:
>> - Add an ioctl to expose head and tail to the user
>> - Add an ioctl to return size and offset of the OA buffer
>> - Map the OA buffer to the user space
>>
>> v2:
>> - Improve commit message (Chris)
>> - Do not mmap based on gem object filp. Instead, use perf_fd and support
>>   mmap syscall (Chris)
>> - Pass non-zero offset in mmap to enforce the right object is
>>   mapped (Chris)
>> - Do not expose gpu_address (Chris)
>> - Verify start and length of vma for page alignment (Lionel)
>> - Move SQNTL config out (Lionel)
>>
>> v3: (Chris)
>> - Omit redundant checks
>> - Return VM_FAULT_SIGBUS is old stream is closed
>> - Maintain reference counts to stream in vm_open and vm_close
>> - Use switch to identify object to be mapped
>>
>> v4: Call kref_put on closing perf fd (Chris)
>> v5:
>> - Strip access to OA buffer from unprivileged child of a privileged
>>   parent. Use VM_DONTCOPY
>> - Enforce MAP_PRIVATE by checking for VM_MAYSHARE
>>
>> v6:
>> (Chris)
>> - Use len of -1 in unmap_mapping_range
>> - Don't use stream->oa_buffer.vma->obj in vm_fault_oa
>> - Use kernel block comment style
>> - do_mmap gets a reference to the file and puts it in do_munmap, so
>>   no need to maintain a reference to i915_perf_stream. Hence, remove
>>   vm_open/vm_close and stream->closed hooks/checks.
>> (Umesh)
>> - Do not allow mmap if SAMPLE_OA_REPORT is not set during
>>   i915_perf_open_ioctl.
>> - Drop ioctl returning head/tail since this information is already
>>   whitelisted. Remove hooks to read head register.
>>
>> v7: (Chris)
>> - unmap before destroy
>> - change ioctl argument struct
>>
>> v8: Documentation and more checks (Chris)
>> v9: Fix comment style (Umesh)
>> v10: Update uapi comment (Ashutosh)
>>
>> Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_mman.c |   2 +-
>>  drivers/gpu/drm/i915/gem/i915_gem_mman.h |   2 +
>>  drivers/gpu/drm/i915/i915_perf.c         | 126 ++++++++++++++++++++++-
>>  include/uapi/drm/i915_drm.h              |  33 ++++++
>>  4 files changed, 161 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index 5130e8ed9564..84cdce2ee447 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -213,7 +213,7 @@ compute_partial_view(const struct drm_i915_gem_object *obj,
>>  	return view;
>>  }
>>
>> -static vm_fault_t i915_error_to_vmf_fault(int err)
>> +vm_fault_t i915_error_to_vmf_fault(int err)
>>  {
>>  	switch (err) {
>>  	default:
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
>> index efee9e0d2508..1190a3a228ea 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
>> @@ -29,4 +29,6 @@ void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
>>
>>  void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);
>>
>> +vm_fault_t i915_error_to_vmf_fault(int err);
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index de3d1738aabe..1f8d4f3a2148 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -192,10 +192,12 @@
>>   */
>>
>>  #include <linux/anon_inodes.h>
>> +#include <linux/mman.h>
>>  #include <linux/sizes.h>
>>  #include <linux/uuid.h>
>>
>>  #include "gem/i915_gem_context.h"
>> +#include "gem/i915_gem_mman.h"
>>  #include "gt/intel_engine_pm.h"
>>  #include "gt/intel_engine_user.h"
>>  #include "gt/intel_execlists_submission.h"
>> @@ -3322,6 +3324,44 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
>>  	return ret;
>>  }
>>
>> +#define I915_PERF_OA_BUFFER_MMAP_OFFSET 1
>> +
>> +/**
>> + * i915_perf_oa_buffer_info_locked - size and offset of the OA buffer
>> + * @stream: i915 perf stream
>> + * @cmd: ioctl command
>> + * @arg: pointer to oa buffer info filled by this function.
>> + */
>> +static int i915_perf_oa_buffer_info_locked(struct i915_perf_stream *stream,
>> +					   unsigned int cmd,
>> +					   unsigned long arg)
>> +{
>> +	struct drm_i915_perf_oa_buffer_info info;
>> +	void __user *output = (void __user *)arg;
>> +
>> +	if (i915_perf_stream_paranoid && !perfmon_capable()) {
>> +		DRM_DEBUG("Insufficient privileges to access OA buffer info\n");
>> +		return -EACCES;
>> +	}
>> +
>> +	if (_IOC_SIZE(cmd) != sizeof(info))
>> +		return -EINVAL;
>> +
>> +	if (copy_from_user(&info, output, sizeof(info)))
>> +		return -EFAULT;
>> +
>> +	if (info.type || info.flags || info.rsvd)
>> +		return -EINVAL;
>> +
>> +	info.size = stream->oa_buffer.vma->size;
>> +	info.offset = I915_PERF_OA_BUFFER_MMAP_OFFSET * PAGE_SIZE;
>> +
>> +	if (copy_to_user(output, &info, sizeof(info)))
>> +		return -EFAULT;
>> +
>> +	return 0;
>> +}
>> +
>>  /**
>>   * i915_perf_ioctl_locked - support ioctl() usage with i915 perf stream FDs
>>   * @stream: An i915 perf stream
>> @@ -3347,6 +3387,8 @@ static long i915_perf_ioctl_locked(struct i915_perf_stream *stream,
>>  		return 0;
>>  	case I915_PERF_IOCTL_CONFIG:
>>  		return i915_perf_config_locked(stream, arg);
>> +	case I915_PERF_IOCTL_GET_OA_BUFFER_INFO:
>> +		return i915_perf_oa_buffer_info_locked(stream, cmd, arg);
>>  	}
>>
>>  	return -EINVAL;
>> @@ -3418,6 +3460,14 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>>  	struct i915_perf_stream *stream = file->private_data;
>>  	struct i915_perf *perf = stream->perf;
>>
>> +	/*
>> +	 * User could have multiple vmas from multiple mmaps. We want to zap
>> +	 * them all here. Note that a fresh fault cannot occur as the mmap holds
>> +	 * a reference to the stream via the vma->vm_file, so before user's
>> +	 * munmap, the stream cannot be destroyed.
>> +	 */
>> +	unmap_mapping_range(file->f_mapping, 0, -1, 1);
>> +
>>  	mutex_lock(&perf->lock);
>>  	i915_perf_destroy_locked(stream);
>>  	mutex_unlock(&perf->lock);
>> @@ -3428,6 +3478,75 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>>  	return 0;
>>  }
>>
>> +static vm_fault_t vm_fault_oa(struct vm_fault *vmf)
>> +{
>> +	struct vm_area_struct *vma = vmf->vma;
>> +	struct i915_perf_stream *stream = vma->vm_private_data;
>> +	int err;
>> +
>> +	err = remap_io_sg(vma,
>> +			  vma->vm_start, vma->vm_end - vma->vm_start,
>> +			  stream->oa_buffer.vma->pages->sgl, -1);
>> +
>> +	return i915_error_to_vmf_fault(err);
>> +}
>> +
>> +static const struct vm_operations_struct vm_ops_oa = {
>> +	.fault = vm_fault_oa,
>> +};
>> +
>> +static int i915_perf_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> +	struct i915_perf_stream *stream = file->private_data;
>> +
>> +	/* mmap-ing OA buffer to user space MUST absolutely be privileged */
>> +	if (i915_perf_stream_paranoid && !perfmon_capable()) {
>> +		DRM_DEBUG("Insufficient privileges to map OA buffer\n");
>> +		return -EACCES;
>> +	}
>> +
>> +	switch (vma->vm_pgoff) {
>> +	/*
>> +	 * A non-zero offset ensures that we are mapping the right object. Also
>> +	 * leaves room for future objects added to this implementation.
>> +	 */
>> +	case I915_PERF_OA_BUFFER_MMAP_OFFSET:
>> +		if (!(stream->sample_flags & SAMPLE_OA_REPORT))
>> +			return -EINVAL;
>> +
>> +		if (vma->vm_end - vma->vm_start > stream->oa_buffer.vma->size)
>> +			return -EINVAL;
>> +
>> +		/*
>> +		 * Only support VM_READ. Enforce MAP_PRIVATE by checking for
>> +		 * VM_MAYSHARE.
>> +		 */
>> +		if (vma->vm_flags & (VM_WRITE | VM_EXEC |
>> +				     VM_SHARED | VM_MAYSHARE))
>> +			return -EINVAL;
>> +
>> +		vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
>> +
>> +		/*
>> +		 * If the privileged parent forks and child drops root
>> +		 * privilege, we do not want the child to retain access to the
>> +		 * mapped OA buffer. Explicitly set VM_DONTCOPY to avoid such
>> +		 * cases.
>> +		 */
>> +		vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND |
>> +				 VM_DONTDUMP | VM_DONTCOPY;
>> +		break;
>> +
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>> +	vma->vm_private_data = stream;
>> +	vma->vm_ops = &vm_ops_oa;
>> +
>> +	return 0;
>> +}
>>
>>  static const struct file_operations fops = {
>>  	.owner		= THIS_MODULE,
>> @@ -3440,6 +3559,7 @@ static const struct file_operations fops = {
>>  	 * to handle 32bits compatibility.
>>  	 */
>>  	.compat_ioctl   = i915_perf_ioctl,
>> +	.mmap		= i915_perf_mmap,
>>  };
>>
>>
>> @@ -4639,8 +4759,12 @@ int i915_perf_ioctl_version(void)
>>  	 *
>>  	 *    - OA buffer head/tail/status/buffer registers for read only
>>  	 *    - OA counters A18, A19, A20 for read/write
>> +	 *
>> +	 * 8: Added an option to map oa buffer at umd driver level and trigger
>> +	 *    oa reports within oa buffer from command buffer. See
>> +	 *    I915_PERF_IOCTL_GET_OA_BUFFER_INFO.
>>  	 */
>> -	return 7;
>> +	return 8;
>>  }
>>
>>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index bde5860b3686..2c17fe845604 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -2417,6 +2417,39 @@ struct drm_i915_perf_open_param {
>>   */
>>  #define I915_PERF_IOCTL_CONFIG	_IO('i', 0x2)
>>
>> +/*
>
>Please make this proper kerneldoc. Ideally also fix up all the other perf
>related uapi and also document it all with kerneldoc.
>
>Please make sure the result looks good with
>
>$ make htmldocs

>Also, this is uapi, therefore your patch needs to include
>- link to the igts for this
IGT tests: https://patchwork.freedesktop.org/series/94172/ Patches 1,2

>- link to the userspace MR that uses this (I guess it's mesa?)

For now GPUvis is the userspace for all perf OA interfaces.

GPUvis provides the visualization for OA reports. It has 2 parts
1) It uses perf library from IGT. The perf library changes are here:
IGT: https://patchwork.freedesktop.org/series/94172/ Patches 3,4 and 5.

2) GPUvis changes:
https://github.com/unerlige/gpuvis/commit/1c19c134a64564f7b8d7ca3b46449324040a40be
Should I create an MR for GPUvis while this kernel patch and (1) are not 
yet merged? Please advise.

Thanks,
Umesh

>
>Cheers, Daniel
>
>> + * Returns OA buffer properties to be used with mmap.
>> + *
>> + * This ioctl is available in perf revision 8.
>> + */
>> +#define I915_PERF_IOCTL_GET_OA_BUFFER_INFO _IOWR('i', 0x3, struct drm_i915_perf_oa_buffer_info)
>> +
>> +/*
>> + * OA buffer size and offset.
>> + *
>> + * OA output buffer
>> + *   type: 0
>> + *   flags: mbz
>> + *
>> + *   After querying the info, pass (size,offset) to mmap(),
>> + *
>> + *   mmap(0, info.size, PROT_READ, MAP_PRIVATE, perf_fd, info.offset).
>> + *
>> + *   Note that only a private (not shared between processes, or across fork())
>> + *   read-only mmapping is allowed.
>> + *
>> + *   HW is continually writing data to the mapped  OA buffer and it conforms to
>> + *   the OA format as specified by user config. The buffer provides reports that
>> + *   have OA counters - A, B and C.
>> + */
>> +struct drm_i915_perf_oa_buffer_info {
>> +	__u32 type;   /* in */
>> +	__u32 flags;  /* in */
>> +	__u64 size;   /* out */
>> +	__u64 offset; /* out */
>> +	__u64 rsvd;   /* mbz */
>> +};
>> +
>>  /*
>>   * Common to all i915 perf records
>>   */
>> --
>> 2.20.1
>>
>
>-- 
>Daniel Vetter
>Software Engineer, Intel Corporation
GPUvis pro
>http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915/perf: Map OA buffer to user space for gen12 performance query
@ 2021-08-31 20:35       ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-31 20:35 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, dri-devel,
	daniel.vetter, Joonas Lahtinen, jason

On Tue, Aug 31, 2021 at 02:55:37PM +0200, Daniel Vetter wrote:
>On Mon, Aug 30, 2021 at 12:38:51PM -0700, Umesh Nerlige Ramappa wrote:
>> i915 used to support time based sampling mode which is good for overall
>> system monitoring, but is not enough for query mode used to measure a
>> single draw call or dispatch. Gen9-Gen11 are using current i915 perf
>> implementation for query, but Gen12+ requires a new approach for query
>> based on triggered reports within oa buffer.
>>
>> Triggering reports into the OA buffer is achieved by writing into a
>> a trigger register. Optionally an unused counter/register is set with a
>> marker value such that a triggered report can be identified in the OA
>> buffer. Reports are usually triggered at the start and end of work that
>> is measured.
>>
>> Since OA buffer is large and queries can be frequent, an efficient way
>> to look for triggered reports is required. By knowing the current head
>> and tail offsets into the OA buffer, it is easier to determine the
>> locality of the reports of interest.
>>
>> Current perf OA interface does not expose head/tail information to the
>> user and it filters out invalid reports before sending data to user.
>> Also considering limited size of user buffer used during a query,
>> creating a 1:1 copy of the OA buffer at the user space added undesired
>> complexity.
>>
>> The solution was to map the OA buffer to user space provided
>>
>> (1) that it is accessed from a privileged user.
>> (2) OA report filtering is not used.
>>
>> These 2 conditions would satisfy the safety criteria that the current
>> perf interface addresses.
>
>This is a perf improvement. Please include benchmark numbers to justify
>it.

OA supports 2 mechanisms of perf measurements.

1) query interface where perf countes can be queried.
2) OA buffer use case where counter-snapshots are captured periodically 
and analyzed for performance.

This patch series is actually just (1) query interface implementation 
for discrete and not a perf improvement.

The old mechanism to query OA report (MI_REPORT_PERF_COUNT) is not 
available for all engines. In the new mechanism, a query is triggered 
from a batch by writing to a whitelisted OA trigger register. Once a 
query is triggered, the resulting report is captured in the OA buffer.  
To locate the report quickly, the batch also captures the OA HW tail 
pointer before/after writing to the trigger register. This gives the 
user a window/locality in the OA buffer where the report of interest 
lies.  

For this new mechanism, the current interface to read reports from the 
OA buffer is inefficient since the reads are sequential and reports are 
copied to user buffer.

mmap provides an accurate and faster way to fetch the queried reports 
based on locality.

Note that mmap does not replace the OA buffer use case from (2) which 
still reads reports sequentially to analyze performance.

>
>>
>> To enable the query:
>> - Add an ioctl to expose head and tail to the user
>> - Add an ioctl to return size and offset of the OA buffer
>> - Map the OA buffer to the user space
>>
>> v2:
>> - Improve commit message (Chris)
>> - Do not mmap based on gem object filp. Instead, use perf_fd and support
>>   mmap syscall (Chris)
>> - Pass non-zero offset in mmap to enforce the right object is
>>   mapped (Chris)
>> - Do not expose gpu_address (Chris)
>> - Verify start and length of vma for page alignment (Lionel)
>> - Move SQNTL config out (Lionel)
>>
>> v3: (Chris)
>> - Omit redundant checks
>> - Return VM_FAULT_SIGBUS is old stream is closed
>> - Maintain reference counts to stream in vm_open and vm_close
>> - Use switch to identify object to be mapped
>>
>> v4: Call kref_put on closing perf fd (Chris)
>> v5:
>> - Strip access to OA buffer from unprivileged child of a privileged
>>   parent. Use VM_DONTCOPY
>> - Enforce MAP_PRIVATE by checking for VM_MAYSHARE
>>
>> v6:
>> (Chris)
>> - Use len of -1 in unmap_mapping_range
>> - Don't use stream->oa_buffer.vma->obj in vm_fault_oa
>> - Use kernel block comment style
>> - do_mmap gets a reference to the file and puts it in do_munmap, so
>>   no need to maintain a reference to i915_perf_stream. Hence, remove
>>   vm_open/vm_close and stream->closed hooks/checks.
>> (Umesh)
>> - Do not allow mmap if SAMPLE_OA_REPORT is not set during
>>   i915_perf_open_ioctl.
>> - Drop ioctl returning head/tail since this information is already
>>   whitelisted. Remove hooks to read head register.
>>
>> v7: (Chris)
>> - unmap before destroy
>> - change ioctl argument struct
>>
>> v8: Documentation and more checks (Chris)
>> v9: Fix comment style (Umesh)
>> v10: Update uapi comment (Ashutosh)
>>
>> Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_mman.c |   2 +-
>>  drivers/gpu/drm/i915/gem/i915_gem_mman.h |   2 +
>>  drivers/gpu/drm/i915/i915_perf.c         | 126 ++++++++++++++++++++++-
>>  include/uapi/drm/i915_drm.h              |  33 ++++++
>>  4 files changed, 161 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index 5130e8ed9564..84cdce2ee447 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -213,7 +213,7 @@ compute_partial_view(const struct drm_i915_gem_object *obj,
>>  	return view;
>>  }
>>
>> -static vm_fault_t i915_error_to_vmf_fault(int err)
>> +vm_fault_t i915_error_to_vmf_fault(int err)
>>  {
>>  	switch (err) {
>>  	default:
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
>> index efee9e0d2508..1190a3a228ea 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
>> @@ -29,4 +29,6 @@ void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
>>
>>  void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);
>>
>> +vm_fault_t i915_error_to_vmf_fault(int err);
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index de3d1738aabe..1f8d4f3a2148 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -192,10 +192,12 @@
>>   */
>>
>>  #include <linux/anon_inodes.h>
>> +#include <linux/mman.h>
>>  #include <linux/sizes.h>
>>  #include <linux/uuid.h>
>>
>>  #include "gem/i915_gem_context.h"
>> +#include "gem/i915_gem_mman.h"
>>  #include "gt/intel_engine_pm.h"
>>  #include "gt/intel_engine_user.h"
>>  #include "gt/intel_execlists_submission.h"
>> @@ -3322,6 +3324,44 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
>>  	return ret;
>>  }
>>
>> +#define I915_PERF_OA_BUFFER_MMAP_OFFSET 1
>> +
>> +/**
>> + * i915_perf_oa_buffer_info_locked - size and offset of the OA buffer
>> + * @stream: i915 perf stream
>> + * @cmd: ioctl command
>> + * @arg: pointer to oa buffer info filled by this function.
>> + */
>> +static int i915_perf_oa_buffer_info_locked(struct i915_perf_stream *stream,
>> +					   unsigned int cmd,
>> +					   unsigned long arg)
>> +{
>> +	struct drm_i915_perf_oa_buffer_info info;
>> +	void __user *output = (void __user *)arg;
>> +
>> +	if (i915_perf_stream_paranoid && !perfmon_capable()) {
>> +		DRM_DEBUG("Insufficient privileges to access OA buffer info\n");
>> +		return -EACCES;
>> +	}
>> +
>> +	if (_IOC_SIZE(cmd) != sizeof(info))
>> +		return -EINVAL;
>> +
>> +	if (copy_from_user(&info, output, sizeof(info)))
>> +		return -EFAULT;
>> +
>> +	if (info.type || info.flags || info.rsvd)
>> +		return -EINVAL;
>> +
>> +	info.size = stream->oa_buffer.vma->size;
>> +	info.offset = I915_PERF_OA_BUFFER_MMAP_OFFSET * PAGE_SIZE;
>> +
>> +	if (copy_to_user(output, &info, sizeof(info)))
>> +		return -EFAULT;
>> +
>> +	return 0;
>> +}
>> +
>>  /**
>>   * i915_perf_ioctl_locked - support ioctl() usage with i915 perf stream FDs
>>   * @stream: An i915 perf stream
>> @@ -3347,6 +3387,8 @@ static long i915_perf_ioctl_locked(struct i915_perf_stream *stream,
>>  		return 0;
>>  	case I915_PERF_IOCTL_CONFIG:
>>  		return i915_perf_config_locked(stream, arg);
>> +	case I915_PERF_IOCTL_GET_OA_BUFFER_INFO:
>> +		return i915_perf_oa_buffer_info_locked(stream, cmd, arg);
>>  	}
>>
>>  	return -EINVAL;
>> @@ -3418,6 +3460,14 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>>  	struct i915_perf_stream *stream = file->private_data;
>>  	struct i915_perf *perf = stream->perf;
>>
>> +	/*
>> +	 * User could have multiple vmas from multiple mmaps. We want to zap
>> +	 * them all here. Note that a fresh fault cannot occur as the mmap holds
>> +	 * a reference to the stream via the vma->vm_file, so before user's
>> +	 * munmap, the stream cannot be destroyed.
>> +	 */
>> +	unmap_mapping_range(file->f_mapping, 0, -1, 1);
>> +
>>  	mutex_lock(&perf->lock);
>>  	i915_perf_destroy_locked(stream);
>>  	mutex_unlock(&perf->lock);
>> @@ -3428,6 +3478,75 @@ static int i915_perf_release(struct inode *inode, struct file *file)
>>  	return 0;
>>  }
>>
>> +static vm_fault_t vm_fault_oa(struct vm_fault *vmf)
>> +{
>> +	struct vm_area_struct *vma = vmf->vma;
>> +	struct i915_perf_stream *stream = vma->vm_private_data;
>> +	int err;
>> +
>> +	err = remap_io_sg(vma,
>> +			  vma->vm_start, vma->vm_end - vma->vm_start,
>> +			  stream->oa_buffer.vma->pages->sgl, -1);
>> +
>> +	return i915_error_to_vmf_fault(err);
>> +}
>> +
>> +static const struct vm_operations_struct vm_ops_oa = {
>> +	.fault = vm_fault_oa,
>> +};
>> +
>> +static int i915_perf_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> +	struct i915_perf_stream *stream = file->private_data;
>> +
>> +	/* mmap-ing OA buffer to user space MUST absolutely be privileged */
>> +	if (i915_perf_stream_paranoid && !perfmon_capable()) {
>> +		DRM_DEBUG("Insufficient privileges to map OA buffer\n");
>> +		return -EACCES;
>> +	}
>> +
>> +	switch (vma->vm_pgoff) {
>> +	/*
>> +	 * A non-zero offset ensures that we are mapping the right object. Also
>> +	 * leaves room for future objects added to this implementation.
>> +	 */
>> +	case I915_PERF_OA_BUFFER_MMAP_OFFSET:
>> +		if (!(stream->sample_flags & SAMPLE_OA_REPORT))
>> +			return -EINVAL;
>> +
>> +		if (vma->vm_end - vma->vm_start > stream->oa_buffer.vma->size)
>> +			return -EINVAL;
>> +
>> +		/*
>> +		 * Only support VM_READ. Enforce MAP_PRIVATE by checking for
>> +		 * VM_MAYSHARE.
>> +		 */
>> +		if (vma->vm_flags & (VM_WRITE | VM_EXEC |
>> +				     VM_SHARED | VM_MAYSHARE))
>> +			return -EINVAL;
>> +
>> +		vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
>> +
>> +		/*
>> +		 * If the privileged parent forks and child drops root
>> +		 * privilege, we do not want the child to retain access to the
>> +		 * mapped OA buffer. Explicitly set VM_DONTCOPY to avoid such
>> +		 * cases.
>> +		 */
>> +		vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND |
>> +				 VM_DONTDUMP | VM_DONTCOPY;
>> +		break;
>> +
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>> +	vma->vm_private_data = stream;
>> +	vma->vm_ops = &vm_ops_oa;
>> +
>> +	return 0;
>> +}
>>
>>  static const struct file_operations fops = {
>>  	.owner		= THIS_MODULE,
>> @@ -3440,6 +3559,7 @@ static const struct file_operations fops = {
>>  	 * to handle 32bits compatibility.
>>  	 */
>>  	.compat_ioctl   = i915_perf_ioctl,
>> +	.mmap		= i915_perf_mmap,
>>  };
>>
>>
>> @@ -4639,8 +4759,12 @@ int i915_perf_ioctl_version(void)
>>  	 *
>>  	 *    - OA buffer head/tail/status/buffer registers for read only
>>  	 *    - OA counters A18, A19, A20 for read/write
>> +	 *
>> +	 * 8: Added an option to map oa buffer at umd driver level and trigger
>> +	 *    oa reports within oa buffer from command buffer. See
>> +	 *    I915_PERF_IOCTL_GET_OA_BUFFER_INFO.
>>  	 */
>> -	return 7;
>> +	return 8;
>>  }
>>
>>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index bde5860b3686..2c17fe845604 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -2417,6 +2417,39 @@ struct drm_i915_perf_open_param {
>>   */
>>  #define I915_PERF_IOCTL_CONFIG	_IO('i', 0x2)
>>
>> +/*
>
>Please make this proper kerneldoc. Ideally also fix up all the other perf
>related uapi and also document it all with kerneldoc.
>
>Please make sure the result looks good with
>
>$ make htmldocs

>Also, this is uapi, therefore your patch needs to include
>- link to the igts for this
IGT tests: https://patchwork.freedesktop.org/series/94172/ Patches 1,2

>- link to the userspace MR that uses this (I guess it's mesa?)

For now GPUvis is the userspace for all perf OA interfaces.

GPUvis provides the visualization for OA reports. It has 2 parts
1) It uses perf library from IGT. The perf library changes are here:
IGT: https://patchwork.freedesktop.org/series/94172/ Patches 3,4 and 5.

2) GPUvis changes:
https://github.com/unerlige/gpuvis/commit/1c19c134a64564f7b8d7ca3b46449324040a40be
Should I create an MR for GPUvis while this kernel patch and (1) are not 
yet merged? Please advise.

Thanks,
Umesh

>
>Cheers, Daniel
>
>> + * Returns OA buffer properties to be used with mmap.
>> + *
>> + * This ioctl is available in perf revision 8.
>> + */
>> +#define I915_PERF_IOCTL_GET_OA_BUFFER_INFO _IOWR('i', 0x3, struct drm_i915_perf_oa_buffer_info)
>> +
>> +/*
>> + * OA buffer size and offset.
>> + *
>> + * OA output buffer
>> + *   type: 0
>> + *   flags: mbz
>> + *
>> + *   After querying the info, pass (size,offset) to mmap(),
>> + *
>> + *   mmap(0, info.size, PROT_READ, MAP_PRIVATE, perf_fd, info.offset).
>> + *
>> + *   Note that only a private (not shared between processes, or across fork())
>> + *   read-only mmapping is allowed.
>> + *
>> + *   HW is continually writing data to the mapped  OA buffer and it conforms to
>> + *   the OA format as specified by user config. The buffer provides reports that
>> + *   have OA counters - A, B and C.
>> + */
>> +struct drm_i915_perf_oa_buffer_info {
>> +	__u32 type;   /* in */
>> +	__u32 flags;  /* in */
>> +	__u64 size;   /* out */
>> +	__u64 offset; /* out */
>> +	__u64 rsvd;   /* mbz */
>> +};
>> +
>>  /*
>>   * Common to all i915 perf records
>>   */
>> --
>> 2.20.1
>>
>
>-- 
>Daniel Vetter
>Software Engineer, Intel Corporation
GPUvis pro
>http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 8/8] drm/i915/perf: Map OA buffer to user space for gen12 performance query
  2021-08-03 20:13 [PATCH 0/8] Enable triggered perf query for Xe_HP Umesh Nerlige Ramappa
@ 2021-08-03 20:13 ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 26+ messages in thread
From: Umesh Nerlige Ramappa @ 2021-08-03 20:13 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit, daniel; +Cc: dri-devel

i915 used to support time based sampling mode which is good for overall
system monitoring, but is not enough for query mode used to measure a
single draw call or dispatch. Gen9-Gen11 are using current i915 perf
implementation for query, but Gen12+ requires a new approach for query
based on triggered reports within oa buffer.

Triggering reports into the OA buffer is achieved by writing into a
a trigger register. Optionally an unused counter/register is set with a
marker value such that a triggered report can be identified in the OA
buffer. Reports are usually triggered at the start and end of work that
is measured.

Since OA buffer is large and queries can be frequent, an efficient way
to look for triggered reports is required. By knowing the current head
and tail offsets into the OA buffer, it is easier to determine the
locality of the reports of interest.

Current perf OA interface does not expose head/tail information to the
user and it filters out invalid reports before sending data to user.
Also considering limited size of user buffer used during a query,
creating a 1:1 copy of the OA buffer at the user space added undesired
complexity.

The solution was to map the OA buffer to user space provided

(1) that it is accessed from a privileged user.
(2) OA report filtering is not used.

These 2 conditions would satisfy the safety criteria that the current
perf interface addresses.

To enable the query:
- Add an ioctl to expose head and tail to the user
- Add an ioctl to return size and offset of the OA buffer
- Map the OA buffer to the user space

v2:
- Improve commit message (Chris)
- Do not mmap based on gem object filp. Instead, use perf_fd and support
  mmap syscall (Chris)
- Pass non-zero offset in mmap to enforce the right object is
  mapped (Chris)
- Do not expose gpu_address (Chris)
- Verify start and length of vma for page alignment (Lionel)
- Move SQNTL config out (Lionel)

v3: (Chris)
- Omit redundant checks
- Return VM_FAULT_SIGBUS is old stream is closed
- Maintain reference counts to stream in vm_open and vm_close
- Use switch to identify object to be mapped

v4: Call kref_put on closing perf fd (Chris)
v5:
- Strip access to OA buffer from unprivileged child of a privileged
  parent. Use VM_DONTCOPY
- Enforce MAP_PRIVATE by checking for VM_MAYSHARE

v6:
(Chris)
- Use len of -1 in unmap_mapping_range
- Don't use stream->oa_buffer.vma->obj in vm_fault_oa
- Use kernel block comment style
- do_mmap gets a reference to the file and puts it in do_munmap, so
  no need to maintain a reference to i915_perf_stream. Hence, remove
  vm_open/vm_close and stream->closed hooks/checks.
(Umesh)
- Do not allow mmap if SAMPLE_OA_REPORT is not set during
  i915_perf_open_ioctl.
- Drop ioctl returning head/tail since this information is already
  whitelisted. Remove hooks to read head register.

v7: (Chris)
- unmap before destroy
- change ioctl argument struct

v8: Documentation and more checks (Chris)

Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.h |   2 +
 drivers/gpu/drm/i915/i915_perf.c         | 126 ++++++++++++++++++++++-
 include/uapi/drm/i915_drm.h              |  33 ++++++
 4 files changed, 161 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 5130e8ed9564..84cdce2ee447 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -213,7 +213,7 @@ compute_partial_view(const struct drm_i915_gem_object *obj,
 	return view;
 }
 
-static vm_fault_t i915_error_to_vmf_fault(int err)
+vm_fault_t i915_error_to_vmf_fault(int err)
 {
 	switch (err) {
 	default:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
index efee9e0d2508..1190a3a228ea 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
@@ -29,4 +29,6 @@ void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
 
 void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);
 
+vm_fault_t i915_error_to_vmf_fault(int err);
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index de3d1738aabe..1f8d4f3a2148 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -192,10 +192,12 @@
  */
 
 #include <linux/anon_inodes.h>
+#include <linux/mman.h>
 #include <linux/sizes.h>
 #include <linux/uuid.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_mman.h"
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_engine_user.h"
 #include "gt/intel_execlists_submission.h"
@@ -3322,6 +3324,44 @@ static long i915_perf_config_locked(struct i915_perf_stream *stream,
 	return ret;
 }
 
+#define I915_PERF_OA_BUFFER_MMAP_OFFSET 1
+
+/**
+ * i915_perf_oa_buffer_info_locked - size and offset of the OA buffer
+ * @stream: i915 perf stream
+ * @cmd: ioctl command
+ * @arg: pointer to oa buffer info filled by this function.
+ */
+static int i915_perf_oa_buffer_info_locked(struct i915_perf_stream *stream,
+					   unsigned int cmd,
+					   unsigned long arg)
+{
+	struct drm_i915_perf_oa_buffer_info info;
+	void __user *output = (void __user *)arg;
+
+	if (i915_perf_stream_paranoid && !perfmon_capable()) {
+		DRM_DEBUG("Insufficient privileges to access OA buffer info\n");
+		return -EACCES;
+	}
+
+	if (_IOC_SIZE(cmd) != sizeof(info))
+		return -EINVAL;
+
+	if (copy_from_user(&info, output, sizeof(info)))
+		return -EFAULT;
+
+	if (info.type || info.flags || info.rsvd)
+		return -EINVAL;
+
+	info.size = stream->oa_buffer.vma->size;
+	info.offset = I915_PERF_OA_BUFFER_MMAP_OFFSET * PAGE_SIZE;
+
+	if (copy_to_user(output, &info, sizeof(info)))
+		return -EFAULT;
+
+	return 0;
+}
+
 /**
  * i915_perf_ioctl_locked - support ioctl() usage with i915 perf stream FDs
  * @stream: An i915 perf stream
@@ -3347,6 +3387,8 @@ static long i915_perf_ioctl_locked(struct i915_perf_stream *stream,
 		return 0;
 	case I915_PERF_IOCTL_CONFIG:
 		return i915_perf_config_locked(stream, arg);
+	case I915_PERF_IOCTL_GET_OA_BUFFER_INFO:
+		return i915_perf_oa_buffer_info_locked(stream, cmd, arg);
 	}
 
 	return -EINVAL;
@@ -3418,6 +3460,14 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	struct i915_perf_stream *stream = file->private_data;
 	struct i915_perf *perf = stream->perf;
 
+	/*
+	 * User could have multiple vmas from multiple mmaps. We want to zap
+	 * them all here. Note that a fresh fault cannot occur as the mmap holds
+	 * a reference to the stream via the vma->vm_file, so before user's
+	 * munmap, the stream cannot be destroyed.
+	 */
+	unmap_mapping_range(file->f_mapping, 0, -1, 1);
+
 	mutex_lock(&perf->lock);
 	i915_perf_destroy_locked(stream);
 	mutex_unlock(&perf->lock);
@@ -3428,6 +3478,75 @@ static int i915_perf_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static vm_fault_t vm_fault_oa(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct i915_perf_stream *stream = vma->vm_private_data;
+	int err;
+
+	err = remap_io_sg(vma,
+			  vma->vm_start, vma->vm_end - vma->vm_start,
+			  stream->oa_buffer.vma->pages->sgl, -1);
+
+	return i915_error_to_vmf_fault(err);
+}
+
+static const struct vm_operations_struct vm_ops_oa = {
+	.fault = vm_fault_oa,
+};
+
+static int i915_perf_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct i915_perf_stream *stream = file->private_data;
+
+	/* mmap-ing OA buffer to user space MUST absolutely be privileged */
+	if (i915_perf_stream_paranoid && !perfmon_capable()) {
+		DRM_DEBUG("Insufficient privileges to map OA buffer\n");
+		return -EACCES;
+	}
+
+	switch (vma->vm_pgoff) {
+	/*
+	 * A non-zero offset ensures that we are mapping the right object. Also
+	 * leaves room for future objects added to this implementation.
+	 */
+	case I915_PERF_OA_BUFFER_MMAP_OFFSET:
+		if (!(stream->sample_flags & SAMPLE_OA_REPORT))
+			return -EINVAL;
+
+		if (vma->vm_end - vma->vm_start > stream->oa_buffer.vma->size)
+			return -EINVAL;
+
+		/*
+		 * Only support VM_READ. Enforce MAP_PRIVATE by checking for
+		 * VM_MAYSHARE.
+		 */
+		if (vma->vm_flags & (VM_WRITE | VM_EXEC |
+				     VM_SHARED | VM_MAYSHARE))
+			return -EINVAL;
+
+		vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+		/*
+		 * If the privileged parent forks and child drops root
+		 * privilege, we do not want the child to retain access to the
+		 * mapped OA buffer. Explicitly set VM_DONTCOPY to avoid such
+		 * cases.
+		 */
+		vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND |
+				 VM_DONTDUMP | VM_DONTCOPY;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+	vma->vm_private_data = stream;
+	vma->vm_ops = &vm_ops_oa;
+
+	return 0;
+}
 
 static const struct file_operations fops = {
 	.owner		= THIS_MODULE,
@@ -3440,6 +3559,7 @@ static const struct file_operations fops = {
 	 * to handle 32bits compatibility.
 	 */
 	.compat_ioctl   = i915_perf_ioctl,
+	.mmap		= i915_perf_mmap,
 };
 
 
@@ -4639,8 +4759,12 @@ int i915_perf_ioctl_version(void)
 	 *
 	 *    - OA buffer head/tail/status/buffer registers for read only
 	 *    - OA counters A18, A19, A20 for read/write
+	 *
+	 * 8: Added an option to map oa buffer at umd driver level and trigger
+	 *    oa reports within oa buffer from command buffer. See
+	 *    I915_PERF_IOCTL_GET_OA_BUFFER_INFO.
 	 */
-	return 7;
+	return 8;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f13d241417f..7f6493bb5bba 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2397,6 +2397,39 @@ struct drm_i915_perf_open_param {
  */
 #define I915_PERF_IOCTL_CONFIG	_IO('i', 0x2)
 
+/*
+ * Returns OA buffer properties to be used with mmap.
+ *
+ * This ioctl is available in perf revision 8.
+ */
+#define I915_PERF_IOCTL_GET_OA_BUFFER_INFO _IOWR('i', 0x3, struct drm_i915_perf_oa_buffer_info)
+
+/**
+ * OA buffer size and offset.
+ *
+ * OA output buffer
+ *   type: 0
+ *   flags: mbz
+ *
+ *   After querying the info, pass (size,offset) to mmap(),
+ *
+ *   mmap(0, info.size, PROT_READ, MAP_PRIVATE, perf_fd, info.offset).
+ *
+ *   Note that only a private (not shared between processes, or across fork())
+ *   read-only mmapping is allowed.
+ *
+ *   Userspace must treat the incoming data as tainted, but it conforms to the OA
+ *   format as specified by user config. The buffer provides reports that have
+ *   OA counters - A, B and C.
+ */
+struct drm_i915_perf_oa_buffer_info {
+	__u32 type;   /* in */
+	__u32 flags;  /* in */
+	__u64 size;   /* out */
+	__u64 offset; /* out */
+	__u64 rsvd;   /* mbz */
+};
+
 /*
  * Common to all i915 perf records
  */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-08-31 20:35 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-30 19:38 [PATCH 0/8] Enable triggered perf query for Xe_HP Umesh Nerlige Ramappa
2021-08-30 19:38 ` [Intel-gfx] " Umesh Nerlige Ramappa
2021-08-30 19:38 ` [PATCH 1/8] drm/i915/gt: Lock intel_engine_apply_whitelist with uncore->lock Umesh Nerlige Ramappa
2021-08-30 19:38   ` [Intel-gfx] " Umesh Nerlige Ramappa
2021-08-30 19:38 ` [PATCH 2/8] drm/i915/gt: Refactor _wa_add to reuse wa_index and wa_list_grow Umesh Nerlige Ramappa
2021-08-30 19:38   ` [Intel-gfx] " Umesh Nerlige Ramappa
2021-08-30 19:38 ` [PATCH 3/8] drm/i915/gt: Check for conflicting RING_NONPRIV Umesh Nerlige Ramappa
2021-08-30 19:38   ` [Intel-gfx] " Umesh Nerlige Ramappa
2021-08-31  3:39   ` kernel test robot
2021-08-31  3:39     ` kernel test robot
2021-08-30 19:38 ` [PATCH 4/8] drm/i915/gt: Enable dynamic adjustment of RING_NONPRIV Umesh Nerlige Ramappa
2021-08-30 19:38   ` [Intel-gfx] " Umesh Nerlige Ramappa
2021-08-30 19:38 ` [Intel-gfx] [PATCH 5/8] drm/i915/perf: Ensure observation logic is not clock gated Umesh Nerlige Ramappa
2021-08-30 19:38   ` Umesh Nerlige Ramappa
2021-08-30 19:38 ` [Intel-gfx] [PATCH 6/8] drm/i915/perf: Whitelist OA report trigger registers Umesh Nerlige Ramappa
2021-08-30 19:38   ` Umesh Nerlige Ramappa
2021-08-30 19:38 ` [PATCH 7/8] drm/i915/perf: Whitelist OA counter and buffer registers Umesh Nerlige Ramappa
2021-08-30 19:38   ` [Intel-gfx] " Umesh Nerlige Ramappa
2021-08-30 19:38 ` [Intel-gfx] [PATCH 8/8] drm/i915/perf: Map OA buffer to user space for gen12 performance query Umesh Nerlige Ramappa
2021-08-30 19:38   ` Umesh Nerlige Ramappa
2021-08-31 12:55   ` Daniel Vetter
2021-08-31 12:55     ` [Intel-gfx] " Daniel Vetter
2021-08-31 20:35     ` Umesh Nerlige Ramappa
2021-08-31 20:35       ` [Intel-gfx] " Umesh Nerlige Ramappa
2021-08-30 22:16 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for Enable triggered perf query for Xe_HP (rev2) Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2021-08-03 20:13 [PATCH 0/8] Enable triggered perf query for Xe_HP Umesh Nerlige Ramappa
2021-08-03 20:13 ` [PATCH 8/8] drm/i915/perf: Map OA buffer to user space for gen12 performance query Umesh Nerlige Ramappa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.