All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 00/10] i915 PMU and engine busy stats
@ 2017-08-02 12:32 Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 01/10] drm/i915: Convert intel_rc6_residency_us to ns Tvrtko Ursulin
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:32 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Second spin of the i915 PMU series.

This version squashes the fixups I was making against Chris' original PMU
prototype into patch 4 itself. Main changes in the core patch are:

 * Convert to class/instance uAPI.
 * Refactor some code to helpers for clarity.
 * Skip sampling disabled engines.
 * Expose counters in sysfs.
 * Pass in fake regs to avoid null ptr deref in perf core.
 * Use shared driver code for rc6 residency, power and frequency.

Series otherwise begins with three patches to expose some exsiting code so it
can be used from the PMU implementation.

Software engine busyness patches follow (6-9), which enable cheaper and more
accurate busyness metric.

Finally, I have made the static key patch as the last in the series so any
improvements it brings can be easily looke at separately.

Patch 8 also exports engine busyness in debugfs.

Pending is discussing with perf folks on whether our PMU usage/implementation
is correct. Opens there are:

 * Are the metric we are exposing meanigful/appropriate as software PMU?
 * Do we need / can we get anything useful by enumerating our events in sysfs?
 * Can we have "perf stat" or similar existing tools usefully parse these?
 * I had to create fake pt_regs to pass to perf_event_overflow, is that OK?
 * Can we control sampling frequency as requested by the perf API? We mostly
   do not need rapid polling on these counters.

My own high level TODOs:

 * Can we use OA instead of software tracking? On what platforms, with what
   limitations etc? Current thinking is to have this work completely separate.
 * See how to integrate with gpu-top.

I will also send IGT patches for people who are curious to exercise this.

Chris Wilson (1):
  drm/i915: Expose a PMU interface for perf queries

Tvrtko Ursulin (9):
  drm/i915: Convert intel_rc6_residency_us to ns
  drm/i915: Add intel_energy_uJ
  drm/i915: Extract intel_get_cagf
  drm/i915/pmu: Suspend sampling when GPU is idle
  drm/i915: Wrap context schedule notification
  drm/i915: Engine busy time tracking
  drm/i915: Export engine busy stats in debugfs
  drm/i915/pmu: Wire up engine busy stats to PMU
  drm/i915: Gate engine stats collection with a static key

 drivers/gpu/drm/i915/Makefile           |   1 +
 drivers/gpu/drm/i915/i915_debugfs.c     | 114 ++++-
 drivers/gpu/drm/i915/i915_drv.c         |   2 +
 drivers/gpu/drm/i915/i915_drv.h         |  41 +-
 drivers/gpu/drm/i915/i915_gem.c         |   1 +
 drivers/gpu/drm/i915/i915_gem_request.c |   1 +
 drivers/gpu/drm/i915/i915_pmu.c         | 734 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h         |   3 +
 drivers/gpu/drm/i915/intel_engine_cs.c  | 130 ++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  19 +-
 drivers/gpu/drm/i915/intel_pm.c         |  57 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  25 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  78 ++++
 include/uapi/drm/i915_drm.h             |  55 +++
 kernel/events/core.c                    |   1 +
 15 files changed, 1224 insertions(+), 38 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_pmu.c

-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC 01/10] drm/i915: Convert intel_rc6_residency_us to ns
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
@ 2017-08-02 12:32 ` Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 02/10] drm/i915: Add intel_energy_uJ Tvrtko Ursulin
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:32 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Will be used for exposing the PMU counters.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  8 +++++++-
 drivers/gpu/drm/i915/intel_pm.c | 23 +++++++++--------------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6383940e8d55..d47215e73405 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -4009,9 +4009,15 @@ void vlv_phy_reset_lanes(struct intel_encoder *encoder);
 
 int intel_gpu_freq(struct drm_i915_private *dev_priv, int val);
 int intel_freq_opcode(struct drm_i915_private *dev_priv, int val);
-u64 intel_rc6_residency_us(struct drm_i915_private *dev_priv,
+u64 intel_rc6_residency_ns(struct drm_i915_private *dev_priv,
 			   const i915_reg_t reg);
 
+static inline u64 intel_rc6_residency_us(struct drm_i915_private *dev_priv,
+					 const i915_reg_t reg)
+{
+	return DIV_ROUND_UP_ULL(intel_rc6_residency_ns(dev_priv, reg), 1000);
+}
+
 #define I915_READ8(reg)		dev_priv->uncore.funcs.mmio_readb(dev_priv, (reg), true)
 #define I915_WRITE8(reg, val)	dev_priv->uncore.funcs.mmio_writeb(dev_priv, (reg), (val), true)
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 48785ef75d33..bfbce74a4013 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -9179,10 +9179,10 @@ static u64 vlv_residency_raw(struct drm_i915_private *dev_priv,
 	return lower | (u64)upper << 8;
 }
 
-u64 intel_rc6_residency_us(struct drm_i915_private *dev_priv,
+u64 intel_rc6_residency_ns(struct drm_i915_private *dev_priv,
 			   const i915_reg_t reg)
 {
-	u64 time_hw, units, div;
+	u64 res;
 
 	if (!intel_enable_rc6())
 		return 0;
@@ -9191,22 +9191,17 @@ u64 intel_rc6_residency_us(struct drm_i915_private *dev_priv,
 
 	/* On VLV and CHV, residency time is in CZ units rather than 1.28us */
 	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
-		units = 1000;
-		div = dev_priv->czclk_freq;
+		res = vlv_residency_raw(dev_priv, reg);
+		res = DIV_ROUND_UP_ULL(res * 1000000, dev_priv->czclk_freq);
 
-		time_hw = vlv_residency_raw(dev_priv, reg);
-	} else if (IS_GEN9_LP(dev_priv)) {
-		units = 1000;
-		div = 1200;		/* 833.33ns */
-
-		time_hw = I915_READ(reg);
 	} else {
-		units = 128000; /* 1.28us */
-		div = 100000;
+		/* 833.33ns units on Gen9LP, 1.28us elsewhere. */
+		unsigned int unit = IS_GEN9_LP(dev_priv) ? 833 : 1280;
 
-		time_hw = I915_READ(reg);
+		res = (u64)I915_READ(reg) * unit;
 	}
 
 	intel_runtime_pm_put(dev_priv);
-	return DIV_ROUND_UP_ULL(time_hw * units, div);
+
+	return res;
 }
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 02/10] drm/i915: Add intel_energy_uJ
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 01/10] drm/i915: Convert intel_rc6_residency_us to ns Tvrtko Ursulin
@ 2017-08-02 12:32 ` Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 03/10] drm/i915: Extract intel_get_cagf Tvrtko Ursulin
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:32 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Extract code from i915_energy_uJ (debugfs) so it can be used by
other callers in future patches.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 14 +-------------
 drivers/gpu/drm/i915/i915_drv.h     |  2 ++
 drivers/gpu/drm/i915/intel_pm.c     | 20 ++++++++++++++++++++
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ea50c4a1efae..67287ecc2ed1 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2783,23 +2783,11 @@ static int i915_sink_crc(struct seq_file *m, void *data)
 static int i915_energy_uJ(struct seq_file *m, void *data)
 {
 	struct drm_i915_private *dev_priv = node_to_i915(m->private);
-	u64 power;
-	u32 units;
 
 	if (INTEL_GEN(dev_priv) < 6)
 		return -ENODEV;
 
-	intel_runtime_pm_get(dev_priv);
-
-	rdmsrl(MSR_RAPL_POWER_UNIT, power);
-	power = (power & 0x1f00) >> 8;
-	units = 1000000 / (1 << power); /* convert to uJ */
-	power = I915_READ(MCH_SECP_NRG_STTS);
-	power *= units;
-
-	intel_runtime_pm_put(dev_priv);
-
-	seq_printf(m, "%llu", (long long unsigned)power);
+	seq_printf(m, "%llu", (long long unsigned)intel_energy_uJ(dev_priv));
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d47215e73405..124cbfc9353c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -4018,6 +4018,8 @@ static inline u64 intel_rc6_residency_us(struct drm_i915_private *dev_priv,
 	return DIV_ROUND_UP_ULL(intel_rc6_residency_ns(dev_priv, reg), 1000);
 }
 
+u64 intel_energy_uJ(struct drm_i915_private *dev_priv);
+
 #define I915_READ8(reg)		dev_priv->uncore.funcs.mmio_readb(dev_priv, (reg), true)
 #define I915_WRITE8(reg, val)	dev_priv->uncore.funcs.mmio_writeb(dev_priv, (reg), (val), true)
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index bfbce74a4013..ad49932b445c 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -9205,3 +9205,23 @@ u64 intel_rc6_residency_ns(struct drm_i915_private *dev_priv,
 
 	return res;
 }
+
+u64 intel_energy_uJ(struct drm_i915_private *dev_priv)
+{
+	u64 power;
+	u32 units, val;
+
+	if (GEM_WARN_ON(INTEL_GEN(dev_priv) < 6))
+		return 0;
+
+	intel_runtime_pm_get(dev_priv);
+
+	rdmsrl(MSR_RAPL_POWER_UNIT, power);
+	val = (power >> 8) & 0x1f;
+	units = 1000000 >> val; /* convert to uJ */
+	power = I915_READ(MCH_SECP_NRG_STTS) * units;
+
+	intel_runtime_pm_put(dev_priv);
+
+	return power;
+}
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 03/10] drm/i915: Extract intel_get_cagf
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 01/10] drm/i915: Convert intel_rc6_residency_us to ns Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 02/10] drm/i915: Add intel_energy_uJ Tvrtko Ursulin
@ 2017-08-02 12:32 ` Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 04/10] drm/i915: Expose a PMU interface for perf queries Tvrtko Ursulin
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:32 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Code to be shared between debugfs and the PMU implementation.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  8 +-------
 drivers/gpu/drm/i915/i915_drv.h     |  1 +
 drivers/gpu/drm/i915/intel_pm.c     | 14 ++++++++++++++
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 67287ecc2ed1..2811229f6fb7 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1112,13 +1112,7 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
 		rpdownei = I915_READ(GEN6_RP_CUR_DOWN_EI) & GEN6_CURIAVG_MASK;
 		rpcurdown = I915_READ(GEN6_RP_CUR_DOWN) & GEN6_CURBSYTAVG_MASK;
 		rpprevdown = I915_READ(GEN6_RP_PREV_DOWN) & GEN6_CURBSYTAVG_MASK;
-		if (INTEL_GEN(dev_priv) >= 9)
-			cagf = (rpstat & GEN9_CAGF_MASK) >> GEN9_CAGF_SHIFT;
-		else if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
-			cagf = (rpstat & HSW_CAGF_MASK) >> HSW_CAGF_SHIFT;
-		else
-			cagf = (rpstat & GEN6_CAGF_MASK) >> GEN6_CAGF_SHIFT;
-		cagf = intel_gpu_freq(dev_priv, cagf);
+		cagf = intel_gpu_freq(dev_priv, intel_get_cagf(dev_priv, rpstat));
 
 		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 124cbfc9353c..0c8bd1cdcbbd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -4019,6 +4019,7 @@ static inline u64 intel_rc6_residency_us(struct drm_i915_private *dev_priv,
 }
 
 u64 intel_energy_uJ(struct drm_i915_private *dev_priv);
+u32 intel_get_cagf(struct drm_i915_private *dev_priv, u32 rpstat1);
 
 #define I915_READ8(reg)		dev_priv->uncore.funcs.mmio_readb(dev_priv, (reg), true)
 #define I915_WRITE8(reg, val)	dev_priv->uncore.funcs.mmio_writeb(dev_priv, (reg), (val), true)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index ad49932b445c..1533b6996c85 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -9225,3 +9225,17 @@ u64 intel_energy_uJ(struct drm_i915_private *dev_priv)
 
 	return power;
 }
+
+u32 intel_get_cagf(struct drm_i915_private *dev_priv, u32 rpstat)
+{
+	u32 cagf;
+
+	if (INTEL_GEN(dev_priv) >= 9)
+		cagf = (rpstat & GEN9_CAGF_MASK) >> GEN9_CAGF_SHIFT;
+	else if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
+		cagf = (rpstat & HSW_CAGF_MASK) >> HSW_CAGF_SHIFT;
+	else
+		cagf = (rpstat & GEN6_CAGF_MASK) >> GEN6_CAGF_SHIFT;
+
+	return  cagf;
+}
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2017-08-02 12:32 ` [RFC 03/10] drm/i915: Extract intel_get_cagf Tvrtko Ursulin
@ 2017-08-02 12:32 ` Tvrtko Ursulin
  2017-08-02 13:00   ` Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 05/10] drm/i915/pmu: Suspend sampling when GPU is idle Tvrtko Ursulin
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:32 UTC (permalink / raw)
  To: Intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

The first goal is to be able to measure GPU (and invidual ring) busyness
without having to poll registers from userspace. (Which not only incurs
holding the forcewake lock indefinitely, perturbing the system, but also
runs the risk of hanging the machine.) As an alternative we can use the
perf event counter interface to sample the ring registers periodically
and send those results to userspace.

To be able to do so, we need to export the two symbols from
kernel/events/core.c to register and unregister a PMU device.

v2: Use a common timer for the ring sampling.

v3:
 * Decouple uAPI from i915 engine ids.
 * Complete uAPI defines.
 * Refactor some code to helpers for clarity.
 * Skip sampling disabled engines.
 * Expose counters in sysfs.
 * Pass in fake regs to avoid null ptr deref in perf core.
 * Convert to class/instance uAPI.
 * Use shared driver code for rc6 residency, power and frequency.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/Makefile           |   1 +
 drivers/gpu/drm/i915/i915_drv.c         |   2 +
 drivers/gpu/drm/i915/i915_drv.h         |  25 ++
 drivers/gpu/drm/i915/i915_pmu.c         | 629 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h         |   3 +
 drivers/gpu/drm/i915/intel_engine_cs.c  |  10 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |  25 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |   8 +
 include/uapi/drm/i915_drm.h             |  55 +++
 kernel/events/core.c                    |   1 +
 10 files changed, 759 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_pmu.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index f8227318dcaf..1c720013dc42 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -26,6 +26,7 @@ i915-y := i915_drv.o \
 
 i915-$(CONFIG_COMPAT)   += i915_ioc32.o
 i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o
+i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
 
 # GEM code
 i915-y += i915_cmd_parser.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 214555e813f1..d75c2d790875 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1194,6 +1194,7 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
 	struct drm_device *dev = &dev_priv->drm;
 
 	i915_gem_shrinker_init(dev_priv);
+	i915_pmu_register(dev_priv);
 
 	/*
 	 * Notify a valid surface after modesetting,
@@ -1248,6 +1249,7 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
 	intel_opregion_unregister(dev_priv);
 
 	i915_perf_unregister(dev_priv);
+	i915_pmu_unregister(dev_priv);
 
 	i915_teardown_sysfs(dev_priv);
 	i915_guc_log_unregister(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0c8bd1cdcbbd..142826742b86 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -40,6 +40,7 @@
 #include <linux/hash.h>
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
+#include <linux/perf_event.h>
 #include <linux/pm_qos.h>
 #include <linux/reservation.h>
 #include <linux/shmem_fs.h>
@@ -2111,6 +2112,12 @@ struct intel_cdclk_state {
 	unsigned int cdclk, vco, ref;
 };
 
+enum {
+	__I915_SAMPLE_FREQ_ACT = 0,
+	__I915_SAMPLE_FREQ_REQ,
+	__I915_NUM_PMU_SAMPLERS
+};
+
 struct drm_i915_private {
 	struct drm_device drm;
 
@@ -2158,6 +2165,7 @@ struct drm_i915_private {
 	struct pci_dev *bridge_dev;
 	struct i915_gem_context *kernel_context;
 	struct intel_engine_cs *engine[I915_NUM_ENGINES];
+	struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1][MAX_ENGINE_INSTANCE + 1];
 	struct i915_vma *semaphore;
 
 	struct drm_dma_handle *status_page_dmah;
@@ -2605,6 +2613,14 @@ struct drm_i915_private {
 		int	irq;
 	} lpe_audio;
 
+	struct {
+		struct pmu base;
+		spinlock_t lock;
+		struct hrtimer timer;
+		u64 enable;
+		u64 sample[__I915_NUM_PMU_SAMPLERS];
+	} pmu;
+
 	/*
 	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
 	 * will be rejected. Instead look for a better place.
@@ -3813,6 +3829,15 @@ extern void i915_perf_fini(struct drm_i915_private *dev_priv);
 extern void i915_perf_register(struct drm_i915_private *dev_priv);
 extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
 
+/* i915_pmu.c */
+#ifdef CONFIG_PERF_EVENTS
+extern void i915_pmu_register(struct drm_i915_private *i915);
+extern void i915_pmu_unregister(struct drm_i915_private *i915);
+#else
+static inline void i915_pmu_register(struct drm_i915_private *i915) {}
+static inline void i915_pmu_unregister(struct drm_i915_private *i915) {}
+#endif
+
 /* i915_suspend.c */
 extern int i915_save_state(struct drm_i915_private *dev_priv);
 extern int i915_restore_state(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
new file mode 100644
index 000000000000..62c527c12641
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -0,0 +1,629 @@
+#include <linux/perf_event.h>
+#include <linux/pm_runtime.h>
+
+#include "i915_drv.h"
+#include "intel_ringbuffer.h"
+
+#define FREQUENCY 200
+#define PERIOD max_t(u64, 10000, NSEC_PER_SEC / FREQUENCY)
+
+#define ENGINE_SAMPLE_MASK \
+	(BIT(I915_SAMPLE_QUEUED) | \
+	 BIT(I915_SAMPLE_BUSY) | \
+	 BIT(I915_SAMPLE_WAIT) | \
+	 BIT(I915_SAMPLE_SEMA))
+
+#define ENGINE_SAMPLE_BITS (16)
+
+static u8 engine_config_sample(u64 config)
+{
+	return config & I915_PMU_SAMPLE_MASK;
+}
+
+static u8 engine_event_sample(struct perf_event *event)
+{
+	return engine_config_sample(event->attr.config);
+}
+
+static u8 engine_event_class(struct perf_event *event)
+{
+	return (event->attr.config >> I915_PMU_CLASS_SHIFT) & 0xff;
+}
+
+static u8 engine_event_instance(struct perf_event *event)
+{
+	return (event->attr.config >> I915_PMU_SAMPLE_BITS) & 0xff;
+}
+
+static bool is_engine_config(u64 config)
+{
+	return config < __I915_PMU_OTHER(0);
+}
+
+static u64 config_enabled_mask(u64 config)
+{
+	if (is_engine_config(config))
+		return BIT_ULL(engine_config_sample(config));
+	else
+		return BIT_ULL(config - __I915_PMU_OTHER(0)) <<
+		       ENGINE_SAMPLE_BITS;
+}
+
+static bool is_engine_event(struct perf_event *event)
+{
+	return is_engine_config(event->attr.config);
+}
+
+static u64 event_enabled_mask(struct perf_event *event)
+{
+	return config_enabled_mask(event->attr.config);
+}
+
+static bool grab_forcewake(struct drm_i915_private *i915, bool fw)
+{
+	if (!fw)
+		intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
+
+	return true;
+}
+
+static void engines_sample(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	bool fw = false;
+
+	if ((dev_priv->pmu.enable & ENGINE_SAMPLE_MASK) == 0)
+		return;
+
+	if (!dev_priv->gt.awake)
+		return;
+
+	if (!intel_runtime_pm_get_if_in_use(dev_priv))
+		return;
+
+	for_each_engine(engine, dev_priv, id) {
+		u32 enable = engine->pmu.enable;
+
+		if (i915_seqno_passed(intel_engine_get_seqno(engine),
+				      intel_engine_last_submit(engine)))
+			continue;
+
+		if (enable & BIT(I915_SAMPLE_QUEUED))
+			engine->pmu.sample[I915_SAMPLE_QUEUED] += PERIOD;
+
+		if (enable & BIT(I915_SAMPLE_BUSY)) {
+			u32 val;
+
+			fw = grab_forcewake(dev_priv, fw);
+			val = I915_READ_FW(RING_MI_MODE(engine->mmio_base));
+			if (!(val & MODE_IDLE))
+				engine->pmu.sample[I915_SAMPLE_BUSY] += PERIOD;
+		}
+
+		if (enable & (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA))) {
+			u32 val;
+
+			fw = grab_forcewake(dev_priv, fw);
+			val = I915_READ_FW(RING_CTL(engine->mmio_base));
+			if ((enable & BIT(I915_SAMPLE_WAIT)) &&
+			    (val & RING_WAIT))
+				engine->pmu.sample[I915_SAMPLE_WAIT] += PERIOD;
+			if ((enable & BIT(I915_SAMPLE_SEMA)) &&
+			    (val & RING_WAIT_SEMAPHORE))
+				engine->pmu.sample[I915_SAMPLE_SEMA] += PERIOD;
+		}
+	}
+
+	if (fw)
+		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+	intel_runtime_pm_put(dev_priv);
+}
+
+static void frequency_sample(struct drm_i915_private *dev_priv)
+{
+	if (dev_priv->pmu.enable &
+	    config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY)) {
+		u64 val;
+
+		val = dev_priv->rps.cur_freq;
+		if (dev_priv->gt.awake &&
+		    intel_runtime_pm_get_if_in_use(dev_priv)) {
+			val = intel_get_cagf(dev_priv,
+					     I915_READ_NOTRACE(GEN6_RPSTAT1));
+			intel_runtime_pm_put(dev_priv);
+		}
+		val = intel_gpu_freq(dev_priv, val);
+		dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT] += val * PERIOD;
+	}
+
+	if (dev_priv->pmu.enable &
+	    config_enabled_mask(I915_PMU_REQUESTED_FREQUENCY)) {
+		u64 val = intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq);
+		dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ] += val * PERIOD;
+	}
+}
+
+static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
+{
+	struct drm_i915_private *i915 =
+		container_of(hrtimer, struct drm_i915_private, pmu.timer);
+
+	if (i915->pmu.enable == 0)
+		return HRTIMER_NORESTART;
+
+	engines_sample(i915);
+	frequency_sample(i915);
+
+	hrtimer_forward_now(hrtimer, ns_to_ktime(PERIOD));
+	return HRTIMER_RESTART;
+}
+
+static void i915_pmu_event_destroy(struct perf_event *event)
+{
+	WARN_ON(event->parent);
+}
+
+static int engine_event_init(struct perf_event *event)
+{
+	struct drm_i915_private *i915 =
+		container_of(event->pmu, typeof(*i915), pmu.base);
+
+	if (!intel_engine_lookup_user(i915, engine_event_class(event),
+				      engine_event_instance(event)))
+		return -ENODEV;
+
+	switch (engine_event_sample(event)) {
+	case I915_SAMPLE_QUEUED:
+	case I915_SAMPLE_BUSY:
+	case I915_SAMPLE_WAIT:
+		break;
+	case I915_SAMPLE_SEMA:
+		if (INTEL_GEN(i915) < 6)
+			return -ENODEV;
+		break;
+	default:
+		return -ENOENT;
+	}
+
+	return 0;
+}
+
+static DEFINE_PER_CPU(struct pt_regs, i915_pmu_pt_regs);
+
+static enum hrtimer_restart hrtimer_sample(struct hrtimer *hrtimer)
+{
+	struct pt_regs *regs = this_cpu_ptr(&i915_pmu_pt_regs);
+	struct perf_sample_data data;
+	struct perf_event *event;
+	u64 period;
+
+	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
+	if (event->state != PERF_EVENT_STATE_ACTIVE)
+		return HRTIMER_NORESTART;
+
+	event->pmu->read(event);
+
+	perf_sample_data_init(&data, 0, event->hw.last_period);
+	perf_event_overflow(event, &data, regs);
+
+	period = max_t(u64, 10000, event->hw.sample_period);
+	hrtimer_forward_now(hrtimer, ns_to_ktime(period));
+	return HRTIMER_RESTART;
+}
+
+static void init_hrtimer(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!is_sampling_event(event))
+		return;
+
+	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hwc->hrtimer.function = hrtimer_sample;
+
+	if (event->attr.freq) {
+		long freq = event->attr.sample_freq;
+
+		event->attr.sample_period = NSEC_PER_SEC / freq;
+		hwc->sample_period = event->attr.sample_period;
+		local64_set(&hwc->period_left, hwc->sample_period);
+		hwc->last_period = hwc->sample_period;
+		event->attr.freq = 0;
+	}
+}
+
+static int i915_pmu_event_init(struct perf_event *event)
+{
+	struct drm_i915_private *i915 =
+		container_of(event->pmu, typeof(*i915), pmu.base);
+	int ret;
+
+	/* XXX ideally only want pid == -1 && cpu == -1 */
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	ret = 0;
+	if (is_engine_event(event)) {
+		ret = engine_event_init(event);
+	} else switch (event->attr.config) {
+	case I915_PMU_ACTUAL_FREQUENCY:
+		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
+			ret = -ENODEV; /* requires a mutex for sampling! */
+	case I915_PMU_REQUESTED_FREQUENCY:
+	case I915_PMU_ENERGY:
+	case I915_PMU_RC6_RESIDENCY:
+	case I915_PMU_RC6p_RESIDENCY:
+	case I915_PMU_RC6pp_RESIDENCY:
+		if (INTEL_GEN(i915) < 6)
+			ret = -ENODEV;
+		break;
+	}
+	if (ret)
+		return ret;
+
+	if (!event->parent)
+		event->destroy = i915_pmu_event_destroy;
+
+	init_hrtimer(event);
+
+	return 0;
+}
+
+static void i915_pmu_timer_start(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	s64 period;
+
+	if (!is_sampling_event(event))
+		return;
+
+	period = local64_read(&hwc->period_left);
+	if (period) {
+		if (period < 0)
+			period = 10000;
+
+		local64_set(&hwc->period_left, 0);
+	} else {
+		period = max_t(u64, 10000, hwc->sample_period);
+	}
+
+	hrtimer_start_range_ns(&hwc->hrtimer,
+			       ns_to_ktime(period), 0,
+			       HRTIMER_MODE_REL_PINNED);
+}
+
+static void i915_pmu_timer_cancel(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!is_sampling_event(event))
+		return;
+
+	local64_set(&hwc->period_left,
+		    ktime_to_ns(hrtimer_get_remaining(&hwc->hrtimer)));
+	hrtimer_cancel(&hwc->hrtimer);
+}
+
+static void i915_pmu_enable(struct perf_event *event)
+{
+	struct drm_i915_private *i915 =
+		container_of(event->pmu, typeof(*i915), pmu.base);
+	unsigned long flags;
+
+	spin_lock_irqsave(&i915->pmu.lock, flags);
+
+	if (i915->pmu.enable == 0)
+		hrtimer_start_range_ns(&i915->pmu.timer,
+				       ns_to_ktime(PERIOD), 0,
+				       HRTIMER_MODE_REL_PINNED);
+
+	i915->pmu.enable |= event_enabled_mask(event);
+
+	if (is_engine_event(event)) {
+		struct intel_engine_cs *engine;
+
+		engine = intel_engine_lookup_user(i915,
+						  engine_event_class(event),
+						  engine_event_instance(event));
+		GEM_BUG_ON(!engine);
+		engine->pmu.enable |= BIT(engine_event_sample(event));
+	}
+
+	spin_unlock_irqrestore(&i915->pmu.lock, flags);
+
+	i915_pmu_timer_start(event);
+}
+
+static void i915_pmu_disable(struct perf_event *event)
+{
+	struct drm_i915_private *i915 =
+		container_of(event->pmu, typeof(*i915), pmu.base);
+	unsigned long flags;
+	u64 mask;
+
+	spin_lock_irqsave(&i915->pmu.lock, flags);
+
+	if (is_engine_event(event)) {
+		struct intel_engine_cs *engine;
+		enum intel_engine_id id;
+
+		engine = intel_engine_lookup_user(i915,
+						  engine_event_class(event),
+						  engine_event_instance(event));
+		GEM_BUG_ON(!engine);
+		engine->pmu.enable &= ~BIT(engine_event_sample(event));
+		mask = 0;
+		for_each_engine(engine, i915, id)
+			mask |= engine->pmu.enable;
+		mask = ~mask;
+	} else {
+		mask = event_enabled_mask(event);
+	}
+
+	i915->pmu.enable &= ~mask;
+
+	spin_unlock_irqrestore(&i915->pmu.lock, flags);
+
+	i915_pmu_timer_cancel(event);
+}
+
+static int i915_pmu_event_add(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (flags & PERF_EF_START)
+		i915_pmu_enable(event);
+
+	hwc->state = !(flags & PERF_EF_START);
+
+	return 0;
+}
+
+static void i915_pmu_event_del(struct perf_event *event, int flags)
+{
+	i915_pmu_disable(event);
+}
+
+static void i915_pmu_event_start(struct perf_event *event, int flags)
+{
+	i915_pmu_enable(event);
+}
+
+static void i915_pmu_event_stop(struct perf_event *event, int flags)
+{
+	i915_pmu_disable(event);
+}
+
+static u64 count_interrupts(struct drm_i915_private *i915)
+{
+	/* open-coded kstat_irqs() */
+	struct irq_desc *desc = irq_to_desc(i915->drm.pdev->irq);
+	u64 sum = 0;
+	int cpu;
+
+	if (!desc || !desc->kstat_irqs)
+		return 0;
+
+	for_each_possible_cpu(cpu)
+		sum += *per_cpu_ptr(desc->kstat_irqs, cpu);
+
+	return sum;
+}
+
+static void i915_pmu_event_read(struct perf_event *event)
+{
+	struct drm_i915_private *i915 =
+		container_of(event->pmu, typeof(*i915), pmu.base);
+	u64 val = 0;
+
+	if (is_engine_event(event)) {
+		u8 sample = engine_event_sample(event);
+		struct intel_engine_cs *engine;
+
+		engine = intel_engine_lookup_user(i915,
+						  engine_event_class(event),
+						  engine_event_instance(event));
+
+		if (WARN_ON_ONCE(!engine)) {
+			/* Do nothing */
+		} else {
+			val = engine->pmu.sample[sample];
+		}
+	} else switch (event->attr.config) {
+	case I915_PMU_ACTUAL_FREQUENCY:
+		val = i915->pmu.sample[__I915_SAMPLE_FREQ_ACT];
+		break;
+	case I915_PMU_REQUESTED_FREQUENCY:
+		val = i915->pmu.sample[__I915_SAMPLE_FREQ_REQ];
+		break;
+	case I915_PMU_ENERGY:
+		val = intel_energy_uJ(i915);
+		break;
+	case I915_PMU_INTERRUPTS:
+		val = count_interrupts(i915);
+		break;
+
+	case I915_PMU_RC6_RESIDENCY:
+		if (!i915->gt.awake)
+			return;
+
+		val = intel_rc6_residency_ns(i915,
+					     IS_VALLEYVIEW(i915) ?
+					     VLV_GT_RENDER_RC6 :
+					     GEN6_GT_GFX_RC6);
+		break;
+
+	case I915_PMU_RC6p_RESIDENCY:
+		if (!i915->gt.awake)
+			return;
+
+		if (!IS_VALLEYVIEW(i915))
+			val = intel_rc6_residency_ns(i915, GEN6_GT_GFX_RC6p);
+		break;
+
+	case I915_PMU_RC6pp_RESIDENCY:
+		if (!i915->gt.awake)
+			return;
+
+		if (!IS_VALLEYVIEW(i915))
+			val = intel_rc6_residency_ns(i915, GEN6_GT_GFX_RC6pp);
+		break;
+	}
+
+	local64_set(&event->count, val);
+}
+
+static int i915_pmu_event_event_idx(struct perf_event *event)
+{
+	return 0;
+}
+
+static ssize_t i915_pmu_format_show(struct device *dev,
+				    struct device_attribute *attr, char *buf)
+{
+        struct dev_ext_attribute *eattr;
+
+        eattr = container_of(attr, struct dev_ext_attribute, attr);
+        return sprintf(buf, "%s\n", (char *) eattr->var);
+}
+
+#define I915_PMU_FORMAT_ATTR(_name, _config)           \
+        (&((struct dev_ext_attribute[]) {               \
+                { .attr = __ATTR(_name, S_IRUGO, i915_pmu_format_show, NULL), \
+                  .var = (void *) _config, }            \
+        })[0].attr.attr)
+
+static struct attribute *i915_pmu_format_attrs[] = {
+        I915_PMU_FORMAT_ATTR(i915_eventid, "config:0-42"),
+        NULL,
+};
+
+static const struct attribute_group i915_pmu_format_attr_group = {
+        .name = "format",
+        .attrs = i915_pmu_format_attrs,
+};
+
+static ssize_t i915_pmu_event_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+        struct dev_ext_attribute *eattr;
+
+        eattr = container_of(attr, struct dev_ext_attribute, attr);
+        return sprintf(buf, "config=0x%lx\n", (unsigned long) eattr->var);
+}
+
+#define I915_PMU_EVENT_ATTR(_name, _config)            \
+        (&((struct dev_ext_attribute[]) {               \
+                { .attr = __ATTR(_name, S_IRUGO, i915_pmu_event_show, NULL), \
+                  .var = (void *) _config, }            \
+         })[0].attr.attr)
+
+static struct attribute *i915_pmu_events_attrs[] = {
+	I915_PMU_EVENT_ATTR(rcs0-queued,
+			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_RENDER, 0)),
+	I915_PMU_EVENT_ATTR(rcs0-busy,
+			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0)),
+	I915_PMU_EVENT_ATTR(rcs0-wait,
+			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_RENDER, 0)),
+	I915_PMU_EVENT_ATTR(rcs0-sema,
+			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_RENDER, 0)),
+
+	I915_PMU_EVENT_ATTR(bcs0-queued,
+			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_COPY, 0)),
+	I915_PMU_EVENT_ATTR(bcs0-busy,
+			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_COPY, 0)),
+	I915_PMU_EVENT_ATTR(bcs0-wait,
+			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_COPY, 0)),
+	I915_PMU_EVENT_ATTR(bcs0-sema,
+			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_COPY, 0)),
+
+	I915_PMU_EVENT_ATTR(vcs0-queued,
+			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_VIDEO, 0)),
+	I915_PMU_EVENT_ATTR(vcs0-busy,
+			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO, 0)),
+	I915_PMU_EVENT_ATTR(vcs0-wait,
+			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_VIDEO, 0)),
+	I915_PMU_EVENT_ATTR(vcs0-sema,
+			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_VIDEO, 0)),
+
+	I915_PMU_EVENT_ATTR(vcs1-queued,
+			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_VIDEO, 1)),
+	I915_PMU_EVENT_ATTR(vcs1-busy,
+			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO, 1)),
+	I915_PMU_EVENT_ATTR(vcs1-wait,
+			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_VIDEO, 1)),
+	I915_PMU_EVENT_ATTR(vcs1-sema,
+			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_VIDEO, 1)),
+
+	I915_PMU_EVENT_ATTR(vecs0-queued,
+			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
+	I915_PMU_EVENT_ATTR(vecs0-busy,
+			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
+	I915_PMU_EVENT_ATTR(vecs0-wait,
+			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
+	I915_PMU_EVENT_ATTR(vecs0-sema,
+			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
+
+        I915_PMU_EVENT_ATTR(actual-frequency,	 I915_PMU_ACTUAL_FREQUENCY),
+        I915_PMU_EVENT_ATTR(requested-frequency, I915_PMU_REQUESTED_FREQUENCY),
+        I915_PMU_EVENT_ATTR(energy,		 I915_PMU_ENERGY),
+        I915_PMU_EVENT_ATTR(interrupts,		 I915_PMU_INTERRUPTS),
+        I915_PMU_EVENT_ATTR(rc6-residency,	 I915_PMU_RC6_RESIDENCY),
+        I915_PMU_EVENT_ATTR(rc6p-residency,	 I915_PMU_RC6p_RESIDENCY),
+        I915_PMU_EVENT_ATTR(rc6pp-residency,	 I915_PMU_RC6pp_RESIDENCY),
+
+        NULL,
+};
+
+static const struct attribute_group i915_pmu_events_attr_group = {
+        .name = "events",
+        .attrs = i915_pmu_events_attrs,
+};
+
+static const struct attribute_group *i915_pmu_attr_groups[] = {
+        &i915_pmu_format_attr_group,
+        &i915_pmu_events_attr_group,
+        NULL
+};
+
+void i915_pmu_register(struct drm_i915_private *i915)
+{
+	if (INTEL_GEN(i915) <= 2)
+		return;
+
+	i915->pmu.base.attr_groups	= i915_pmu_attr_groups;
+	i915->pmu.base.task_ctx_nr	= perf_sw_context;
+	i915->pmu.base.event_init	= i915_pmu_event_init;
+	i915->pmu.base.add		= i915_pmu_event_add;
+	i915->pmu.base.del		= i915_pmu_event_del;
+	i915->pmu.base.start		= i915_pmu_event_start;
+	i915->pmu.base.stop		= i915_pmu_event_stop;
+	i915->pmu.base.read		= i915_pmu_event_read;
+	i915->pmu.base.event_idx	= i915_pmu_event_event_idx;
+
+	spin_lock_init(&i915->pmu.lock);
+	hrtimer_init(&i915->pmu.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	i915->pmu.timer.function = i915_sample;
+	i915->pmu.enable = 0;
+
+	if (perf_pmu_register(&i915->pmu.base, "i915", -1))
+		i915->pmu.base.event_init = NULL;
+}
+
+void i915_pmu_unregister(struct drm_i915_private *i915)
+{
+	if (!i915->pmu.base.event_init)
+		return;
+
+	i915->pmu.enable = 0;
+
+	perf_pmu_unregister(&i915->pmu.base);
+	i915->pmu.base.event_init = NULL;
+
+	hrtimer_cancel(&i915->pmu.timer);
+}
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 1dc7e7a2a23b..26bce766ab51 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -95,6 +95,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define VIDEO_ENHANCEMENT_CLASS	2
 #define COPY_ENGINE_CLASS	3
 #define OTHER_CLASS		4
+#define MAX_ENGINE_CLASS	4
+
+#define MAX_ENGINE_INSTANCE    1
 
 /* PCI config space */
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 9ab596941372..14630612325b 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -198,6 +198,15 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 	GEM_BUG_ON(info->class >= ARRAY_SIZE(intel_engine_classes));
 	class_info = &intel_engine_classes[info->class];
 
+	if (GEM_WARN_ON(info->class > MAX_ENGINE_CLASS))
+		return -EINVAL;
+
+	if (GEM_WARN_ON(info->instance > MAX_ENGINE_INSTANCE))
+		return -EINVAL;
+
+	if (GEM_WARN_ON(dev_priv->engine_class[info->class][info->instance]))
+		return -EINVAL;
+
 	GEM_BUG_ON(dev_priv->engine[id]);
 	engine = kzalloc(sizeof(*engine), GFP_KERNEL);
 	if (!engine)
@@ -225,6 +234,7 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 
 	ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
 
+	dev_priv->engine_class[info->class][info->instance] = engine;
 	dev_priv->engine[id] = engine;
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cdf084ef5aae..af8d85794c44 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2282,3 +2282,28 @@ int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine)
 
 	return intel_init_ring_buffer(engine);
 }
+
+static u8 user_class_map[I915_ENGINE_CLASS_MAX] = {
+	[I915_ENGINE_CLASS_OTHER] = OTHER_CLASS,
+	[I915_ENGINE_CLASS_RENDER] = RENDER_CLASS,
+	[I915_ENGINE_CLASS_COPY] = COPY_ENGINE_CLASS,
+	[I915_ENGINE_CLASS_VIDEO] = VIDEO_DECODE_CLASS,
+	[I915_ENGINE_CLASS_VIDEO_ENHANCE] = VIDEO_ENHANCEMENT_CLASS,
+};
+
+struct intel_engine_cs *
+intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
+{
+	if (class >= ARRAY_SIZE(user_class_map))
+		return NULL;
+
+	class = user_class_map[class];
+
+	if (WARN_ON_ONCE(class > MAX_ENGINE_CLASS))
+		return NULL;
+
+	if (instance > MAX_ENGINE_INSTANCE)
+		return NULL;
+
+	return i915->engine_class[class][instance];
+}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d33c93444c0d..9fdf0cdf6220 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -245,6 +245,11 @@ struct intel_engine_cs {
 		I915_SELFTEST_DECLARE(bool mock : 1);
 	} breadcrumbs;
 
+	struct {
+		u32 enable;
+		u64 sample[4];
+	} pmu;
+
 	/*
 	 * A pool of objects to use as shadow copies of client batch buffers
 	 * when the command parser is enabled. Prevents the client from
@@ -735,4 +740,7 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv);
 void intel_engines_mark_idle(struct drm_i915_private *i915);
 void intel_engines_reset_default_submission(struct drm_i915_private *i915);
 
+struct intel_engine_cs *
+intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance);
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7ccbd6a2bbe0..103874476a6d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -86,6 +86,61 @@ enum i915_mocs_table_index {
 	I915_MOCS_CACHED,
 };
 
+enum drm_i915_gem_engine_class {
+	I915_ENGINE_CLASS_OTHER = 0,
+	I915_ENGINE_CLASS_RENDER = 1,
+	I915_ENGINE_CLASS_COPY = 2,
+	I915_ENGINE_CLASS_VIDEO = 3,
+	I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
+	I915_ENGINE_CLASS_MAX /* non-ABI */
+};
+
+/**
+ * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
+ *
+ */
+
+enum drm_i915_pmu_engine_sample {
+	I915_SAMPLE_QUEUED = 0,
+	I915_SAMPLE_BUSY = 1,
+	I915_SAMPLE_WAIT = 2,
+	I915_SAMPLE_SEMA = 3
+};
+
+#define I915_PMU_SAMPLE_BITS (4)
+#define I915_PMU_SAMPLE_MASK (0xf)
+#define I915_PMU_SAMPLE_INSTANCE_BITS (8)
+#define I915_PMU_CLASS_SHIFT \
+	(I915_PMU_SAMPLE_BITS + I915_PMU_SAMPLE_INSTANCE_BITS)
+
+#define __I915_PMU_ENGINE(class, instance, sample) \
+	((class) << I915_PMU_CLASS_SHIFT | \
+	(instance) << I915_PMU_SAMPLE_BITS | \
+	(sample))
+
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
+#define I915_PMU_ENGINE_BUSY(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_BUSY)
+
+#define I915_PMU_ENGINE_WAIT(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_WAIT)
+
+#define I915_PMU_ENGINE_SEMA(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
+
+#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
+
+#define I915_PMU_ACTUAL_FREQUENCY 	__I915_PMU_OTHER(0)
+#define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
+#define I915_PMU_ENERGY			__I915_PMU_OTHER(2)
+#define I915_PMU_INTERRUPTS		__I915_PMU_OTHER(3)
+
+#define I915_PMU_RC6_RESIDENCY		__I915_PMU_OTHER(4)
+#define I915_PMU_RC6p_RESIDENCY		__I915_PMU_OTHER(5)
+#define I915_PMU_RC6pp_RESIDENCY	__I915_PMU_OTHER(6)
+
 /* Each region is a minimum of 16k, and there are at most 255 of them.
  */
 #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7d34bc16ca1c..3b6eb0131204 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7382,6 +7382,7 @@ int perf_event_overflow(struct perf_event *event,
 {
 	return __perf_event_overflow(event, 1, data, regs);
 }
+EXPORT_SYMBOL_GPL(perf_event_overflow);
 
 /*
  * Generic software event infrastructure
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 05/10] drm/i915/pmu: Suspend sampling when GPU is idle
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2017-08-02 12:32 ` [RFC 04/10] drm/i915: Expose a PMU interface for perf queries Tvrtko Ursulin
@ 2017-08-02 12:32 ` Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 06/10] drm/i915: Wrap context schedule notification Tvrtko Ursulin
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:32 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

If only a subset of events is enabled we can afford to suspend
the sampling timer when the GPU is idle and so save some cycles
and power.

v2: Rebase and limit timer even more.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  5 ++++
 drivers/gpu/drm/i915/i915_gem.c         |  1 +
 drivers/gpu/drm/i915/i915_gem_request.c |  1 +
 drivers/gpu/drm/i915/i915_pmu.c         | 51 ++++++++++++++++++++++++++++++---
 4 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 142826742b86..d5ebc524d30d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2617,6 +2617,7 @@ struct drm_i915_private {
 		struct pmu base;
 		spinlock_t lock;
 		struct hrtimer timer;
+		bool timer_enabled;
 		u64 enable;
 		u64 sample[__I915_NUM_PMU_SAMPLERS];
 	} pmu;
@@ -3833,9 +3834,13 @@ extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
 #ifdef CONFIG_PERF_EVENTS
 extern void i915_pmu_register(struct drm_i915_private *i915);
 extern void i915_pmu_unregister(struct drm_i915_private *i915);
+extern void i915_pmu_gt_idle(struct drm_i915_private *i915);
+extern void i915_pmu_gt_active(struct drm_i915_private *i915);
 #else
 static inline void i915_pmu_register(struct drm_i915_private *i915) {}
 static inline void i915_pmu_unregister(struct drm_i915_private *i915) {}
+static inline void i915_pmu_gt_idle(struct drm_i915_private *i915) {}
+static inline void i915_pmu_gt_active(struct drm_i915_private *i915) {}
 #endif
 
 /* i915_suspend.c */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a60885d6231b..1a2156f43d74 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3256,6 +3256,7 @@ i915_gem_idle_work_handler(struct work_struct *work)
 
 	intel_engines_mark_idle(dev_priv);
 	i915_gem_timelines_mark_idle(dev_priv);
+	i915_pmu_gt_idle(dev_priv);
 
 	GEM_BUG_ON(!dev_priv->gt.awake);
 	dev_priv->gt.awake = false;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 9eedd33eb524..781f41461a71 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -879,6 +879,7 @@ static void i915_gem_mark_busy(const struct intel_engine_cs *engine)
 	i915_update_gfx_val(dev_priv);
 	if (INTEL_GEN(dev_priv) >= 6)
 		gen6_rps_busy(dev_priv);
+	i915_pmu_gt_active(dev_priv);
 
 	queue_delayed_work(dev_priv->wq,
 			   &dev_priv->gt.retire_work,
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 62c527c12641..0d9c0d07a432 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -59,6 +59,46 @@ static u64 event_enabled_mask(struct perf_event *event)
 	return config_enabled_mask(event->attr.config);
 }
 
+static bool pmu_needs_timer(struct drm_i915_private *i915, bool gpu_active)
+{
+	u64 enable = i915->pmu.enable;
+
+	enable &= config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY) |
+		  config_enabled_mask(I915_PMU_REQUESTED_FREQUENCY) |
+		  ENGINE_SAMPLE_MASK;
+
+	if (!gpu_active)
+		enable &= ~ENGINE_SAMPLE_MASK;
+
+	return enable;
+}
+
+void i915_pmu_gt_idle(struct drm_i915_private *i915)
+{
+	spin_lock_irq(&i915->pmu.lock);
+	/*
+	 * Signal sampling timer to stop if only engine events are enabled and
+	 * GPU went idle.
+	 */
+	i915->pmu.timer_enabled = pmu_needs_timer(i915, false);
+	spin_unlock_irq(&i915->pmu.lock);
+}
+
+void i915_pmu_gt_active(struct drm_i915_private *i915)
+{
+	spin_lock_irq(&i915->pmu.lock);
+	/*
+	 * Re-enable sampling timer when GPU goes active.
+	 */
+	if (!i915->pmu.timer_enabled && pmu_needs_timer(i915, true)) {
+		hrtimer_start_range_ns(&i915->pmu.timer,
+				       ns_to_ktime(PERIOD), 0,
+				       HRTIMER_MODE_REL_PINNED);
+		i915->pmu.timer_enabled = true;
+	}
+	spin_unlock_irq(&i915->pmu.lock);
+}
+
 static bool grab_forcewake(struct drm_i915_private *i915, bool fw)
 {
 	if (!fw)
@@ -149,7 +189,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 	struct drm_i915_private *i915 =
 		container_of(hrtimer, struct drm_i915_private, pmu.timer);
 
-	if (i915->pmu.enable == 0)
+	if (!READ_ONCE(i915->pmu.timer_enabled))
 		return HRTIMER_NORESTART;
 
 	engines_sample(i915);
@@ -317,12 +357,14 @@ static void i915_pmu_enable(struct perf_event *event)
 
 	spin_lock_irqsave(&i915->pmu.lock, flags);
 
-	if (i915->pmu.enable == 0)
+	i915->pmu.enable |= event_enabled_mask(event);
+
+	if (pmu_needs_timer(i915, true) && !i915->pmu.timer_enabled) {
 		hrtimer_start_range_ns(&i915->pmu.timer,
 				       ns_to_ktime(PERIOD), 0,
 				       HRTIMER_MODE_REL_PINNED);
-
-	i915->pmu.enable |= event_enabled_mask(event);
+		i915->pmu.timer_enabled = true;
+	}
 
 	if (is_engine_event(event)) {
 		struct intel_engine_cs *engine;
@@ -366,6 +408,7 @@ static void i915_pmu_disable(struct perf_event *event)
 	}
 
 	i915->pmu.enable &= ~mask;
+	i915->pmu.timer_enabled &= pmu_needs_timer(i915, true);
 
 	spin_unlock_irqrestore(&i915->pmu.lock, flags);
 
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 06/10] drm/i915: Wrap context schedule notification
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2017-08-02 12:32 ` [RFC 05/10] drm/i915/pmu: Suspend sampling when GPU is idle Tvrtko Ursulin
@ 2017-08-02 12:32 ` Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 07/10] drm/i915: Engine busy time tracking Tvrtko Ursulin
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:32 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

No functional change just something which will be handy in the
following patch.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b0738d2b2a7f..4c2cb07c39e2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -307,6 +307,18 @@ execlists_context_status_change(struct drm_i915_gem_request *rq,
 				   status, rq);
 }
 
+static inline void
+execlists_context_schedule_in(struct drm_i915_gem_request *rq)
+{
+	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
+}
+
+static inline void
+execlists_context_schedule_out(struct drm_i915_gem_request *rq)
+{
+	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
+}
+
 static void
 execlists_update_context_pdps(struct i915_hw_ppgtt *ppgtt, u32 *reg_state)
 {
@@ -352,7 +364,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 		if (rq) {
 			GEM_BUG_ON(count > !n);
 			if (!count++)
-				execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
+				execlists_context_schedule_in(rq);
 			port_set(&port[n], port_pack(rq, count));
 			desc = execlists_update_context(rq);
 			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
@@ -603,8 +615,7 @@ static void intel_lrc_irq_handler(unsigned long data)
 			if (--count == 0) {
 				GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
 				GEM_BUG_ON(!i915_gem_request_completed(rq));
-				execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
-
+				execlists_context_schedule_out(rq);
 				trace_i915_gem_request_out(rq);
 				i915_gem_request_put(rq);
 
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 07/10] drm/i915: Engine busy time tracking
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2017-08-02 12:32 ` [RFC 06/10] drm/i915: Wrap context schedule notification Tvrtko Ursulin
@ 2017-08-02 12:32 ` Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 08/10] drm/i915: Export engine busy stats in debugfs Tvrtko Ursulin
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:32 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Track total time requests have been executing on the hardware.

We add new kernel API to allow software tracking of time GPU
engines are spending executing requests.

Both per-engine and global API is added with the latter also
being exported for use by external users.

v2:
 * Squashed with the internal API.
 * Dropped static key.
 * Made per-engine.
 * Store time in monotonic ktime.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  | 103 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  61 +++++++++++++++++++
 3 files changed, 166 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 14630612325b..e709f4edef90 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -232,6 +232,8 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 	/* Nothing to do here, execute in order of dependencies */
 	engine->schedule = NULL;
 
+	spin_lock_init(&engine->stats.lock);
+
 	ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
 
 	dev_priv->engine_class[info->class][info->instance] = engine;
@@ -1345,6 +1347,107 @@ void intel_engines_mark_idle(struct drm_i915_private *i915)
 	}
 }
 
+int intel_enable_engine_stats(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (!i915.enable_execlists)
+		return -ENODEV;
+
+	spin_lock_irqsave(&engine->stats.lock, flags);
+	if (engine->stats.enabled++ == 0) {
+		engine->stats.ref = 0;
+		engine->stats.start = engine->stats.total = 0;
+	} else if (engine->stats.enabled == ~0) {
+		goto busy;
+	}
+	spin_unlock_irqrestore(&engine->stats.lock, flags);
+
+	return 0;
+
+busy:
+	spin_unlock_irqrestore(&engine->stats.lock, flags);
+
+	return -EBUSY;
+}
+
+void intel_disable_engine_stats(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (!i915.enable_execlists)
+		return;
+
+	spin_lock_irqsave(&engine->stats.lock, flags);
+	WARN_ON_ONCE(engine->stats.enabled == 0);
+	engine->stats.enabled--;
+	spin_unlock_irqrestore(&engine->stats.lock, flags);
+}
+
+int intel_enable_engines_stats(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int ret = 0;
+
+	if (!i915.enable_execlists)
+		return -ENODEV;
+
+	for_each_engine(engine, dev_priv, id) {
+		ret = intel_enable_engine_stats(engine);
+		if (WARN_ON_ONCE(ret))
+			break;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(intel_enable_engines_stats);
+
+void intel_disable_engines_stats(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, dev_priv, id)
+		intel_disable_engine_stats(engine);
+}
+EXPORT_SYMBOL(intel_disable_engines_stats);
+
+ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine)
+{
+	ktime_t total;
+	unsigned long flags;
+
+	spin_lock_irqsave(&engine->stats.lock, flags);
+
+	total = engine->stats.total;
+
+	/*
+	 * If the engine is executing something at the moment
+	 * add it to the total.
+	 */
+	if (engine->stats.ref)
+		total = ktime_add(total,
+				  ktime_sub(ktime_get(), engine->stats.start));
+
+	spin_unlock_irqrestore(&engine->stats.lock, flags);
+
+	return total;
+}
+
+ktime_t intel_engines_get_busy_time(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	ktime_t total = 0;
+
+	for_each_engine(engine, dev_priv, id)
+		total = ktime_add(total, intel_engine_get_busy_time(engine));
+
+	return total;
+}
+EXPORT_SYMBOL(intel_engines_get_busy_time);
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_engine.c"
 #endif
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4c2cb07c39e2..5de2fdb86e61 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -310,12 +310,14 @@ execlists_context_status_change(struct drm_i915_gem_request *rq,
 static inline void
 execlists_context_schedule_in(struct drm_i915_gem_request *rq)
 {
+	intel_engine_context_in(rq->engine);
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
 }
 
 static inline void
 execlists_context_schedule_out(struct drm_i915_gem_request *rq)
 {
+	intel_engine_context_out(rq->engine);
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 9fdf0cdf6220..68f50ec72be6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -446,6 +446,14 @@ struct intel_engine_cs {
 	 * certain bits to encode the command length in the header).
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+       struct {
+               spinlock_t lock;
+	       unsigned int enabled;
+               unsigned int ref;
+               ktime_t start; /* Timestamp of the last idle to active transition. */
+               ktime_t total; /* Total time engined was busy. */
+       } stats;
 };
 
 static inline unsigned int
@@ -743,4 +751,57 @@ void intel_engines_reset_default_submission(struct drm_i915_private *i915);
 struct intel_engine_cs *
 intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance);
 
+static inline void intel_engine_context_in(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (READ_ONCE(engine->stats.enabled) == 0)
+		return;
+
+	spin_lock_irqsave(&engine->stats.lock, flags);
+
+	if (engine->stats.enabled > 0) {
+		if (engine->stats.ref++ == 0)
+			engine->stats.start = ktime_get();
+		GEM_BUG_ON(engine->stats.ref == 0);
+	}
+
+	spin_unlock_irqrestore(&engine->stats.lock, flags);
+}
+
+static inline void intel_engine_context_out(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (READ_ONCE(engine->stats.enabled) == 0)
+		return;
+
+	spin_lock_irqsave(&engine->stats.lock, flags);
+
+	if (engine->stats.enabled > 0) {
+		/*
+		 * After turning on engine stats, context out might be the
+		 * first event which then needs to be ignored (ref == 0).
+		 */
+		if (engine->stats.ref && --engine->stats.ref == 0) {
+			ktime_t last = ktime_sub(ktime_get(),
+						 engine->stats.start);
+
+			engine->stats.total = ktime_add(engine->stats.total,
+							last);
+		}
+	}
+
+	spin_unlock_irqrestore(&engine->stats.lock, flags);
+}
+
+int intel_enable_engine_stats(struct intel_engine_cs *engine);
+void intel_disable_engine_stats(struct intel_engine_cs *engine);
+
+int intel_enable_engines_stats(struct drm_i915_private *dev_priv);
+void intel_disable_engines_stats(struct drm_i915_private *dev_priv);
+
+ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine);
+ktime_t intel_engines_get_busy_time(struct drm_i915_private *dev_priv);
+
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 08/10] drm/i915: Export engine busy stats in debugfs
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2017-08-02 12:32 ` [RFC 07/10] drm/i915: Engine busy time tracking Tvrtko Ursulin
@ 2017-08-02 12:32 ` Tvrtko Ursulin
  2017-08-02 12:32 ` [RFC 09/10] drm/i915/pmu: Wire up engine busy stats to PMU Tvrtko Ursulin
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:32 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Export the stats added in the previous patch in debugfs.

Number of active clients reading this data is tracked and the
static key is only enabled whilst there are some.

Userspace is intended to keep the file descriptor open, seeking
to the beginning of the file periodically, and re-reading the
stats.

This is because the underlying implementation is costly on every
first open/last close transition, and also, because the stats
exported mostly make sense when they are considered relative to
the previous sample.

File lists nanoseconds each engine was active since the tracking
has started.

v2: Rebase.
v3: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 92 +++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2811229f6fb7..ad1a59776eca 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4763,6 +4763,92 @@ static const struct file_operations i915_hpd_storm_ctl_fops = {
 	.write = i915_hpd_storm_ctl_write
 };
 
+struct i915_engine_stats_buf {
+	unsigned int len;
+	size_t available;
+	char buf[0];
+};
+
+static int i915_engine_stats_open(struct inode *inode, struct file *file)
+{
+	struct drm_i915_private *i915 = file->f_inode->i_private;
+	const unsigned int engine_name_len =
+		sizeof(((struct intel_engine_cs *)0)->name);
+	struct i915_engine_stats_buf *buf;
+	const unsigned int buf_size =
+		(engine_name_len + 2 + 19 + 1) * I915_NUM_ENGINES + 1 +
+		sizeof(*buf);
+	int ret;
+
+	buf = kzalloc(buf_size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	ret = intel_enable_engines_stats(i915);
+	if (ret) {
+		kfree(buf);
+		return ret;
+	}
+
+	buf->len = buf_size;
+	file->private_data = buf;
+
+	return 0;
+}
+
+static int i915_engine_stats_release(struct inode *inode, struct file *file)
+{
+	struct drm_i915_private *i915 = file->f_inode->i_private;
+
+	intel_disable_engines_stats(i915);
+	kfree(file->private_data);
+
+	return 0;
+}
+
+static ssize_t i915_engine_stats_read(struct file *file, char __user *ubuf,
+				      size_t count, loff_t *pos)
+{
+	struct i915_engine_stats_buf *buf =
+		(struct i915_engine_stats_buf *)file->private_data;
+
+	if (*pos == 0) {
+		struct drm_i915_private *dev_priv = file->f_inode->i_private;
+		char *ptr = &buf->buf[0];
+		int left = buf->len;
+		struct intel_engine_cs *engine;
+		enum intel_engine_id id;
+
+		buf->available = 0;
+
+		for_each_engine(engine, dev_priv, id) {
+			u64 total =
+			       ktime_to_ns(intel_engine_get_busy_time(engine));
+			int len;
+
+			len = snprintf(ptr, left, "%s: %llu\n",
+				       engine->name, total);
+			buf->available += len;
+			left -= len;
+			ptr += len;
+
+			if (len == 0)
+				return -EFBIG;
+		}
+	}
+
+	return simple_read_from_buffer(ubuf, count, pos, &buf->buf[0],
+				       buf->available);
+}
+
+static const struct file_operations i915_engine_stats_fops = {
+	.owner = THIS_MODULE,
+	.open = i915_engine_stats_open,
+	.release = i915_engine_stats_release,
+	.read = i915_engine_stats_read,
+	.llseek = default_llseek,
+};
+
 static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_capabilities", i915_capabilities, 0},
 	{"i915_gem_objects", i915_gem_object_info, 0},
@@ -4852,6 +4938,12 @@ int i915_debugfs_register(struct drm_i915_private *dev_priv)
 	struct dentry *ent;
 	int ret, i;
 
+	ent = debugfs_create_file("i915_engine_stats", S_IRUGO,
+				  minor->debugfs_root, to_i915(minor->dev),
+				  &i915_engine_stats_fops);
+	if (!ent)
+		return -ENOMEM;
+
 	ent = debugfs_create_file("i915_forcewake_user", S_IRUSR,
 				  minor->debugfs_root, to_i915(minor->dev),
 				  &i915_forcewake_fops);
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 09/10] drm/i915/pmu: Wire up engine busy stats to PMU
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2017-08-02 12:32 ` [RFC 08/10] drm/i915: Export engine busy stats in debugfs Tvrtko Ursulin
@ 2017-08-02 12:32 ` Tvrtko Ursulin
  2017-08-02 12:39 ` [RFC 10/10] drm/i915: Gate engine stats collection with a static key Tvrtko Ursulin
  2017-08-02 12:56 ` ✗ Fi.CI.BAT: warning for i915 PMU and engine busy stats (rev2) Patchwork
  10 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:32 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We can use engine busy stats instead of the MMIO sampling timer
for better efficiency.

As minimum this saves period * num_engines / sec mmio reads,
and in a better case, when only engine busy samplers are active,
it enables us to not kick off the sampling timer at all.

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c         | 86 ++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
 2 files changed, 77 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 0d9c0d07a432..3272ec0763bf 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -59,6 +59,11 @@ static u64 event_enabled_mask(struct perf_event *event)
 	return config_enabled_mask(event->attr.config);
 }
 
+static bool supports_busy_stats(void)
+{
+	return i915.enable_execlists;
+}
+
 static bool pmu_needs_timer(struct drm_i915_private *i915, bool gpu_active)
 {
 	u64 enable = i915->pmu.enable;
@@ -69,6 +74,8 @@ static bool pmu_needs_timer(struct drm_i915_private *i915, bool gpu_active)
 
 	if (!gpu_active)
 		enable &= ~ENGINE_SAMPLE_MASK;
+	else if (supports_busy_stats())
+		enable &= ~BIT(I915_SAMPLE_BUSY);
 
 	return enable;
 }
@@ -132,7 +139,8 @@ static void engines_sample(struct drm_i915_private *dev_priv)
 		if (enable & BIT(I915_SAMPLE_QUEUED))
 			engine->pmu.sample[I915_SAMPLE_QUEUED] += PERIOD;
 
-		if (enable & BIT(I915_SAMPLE_BUSY)) {
+		if ((enable & BIT(I915_SAMPLE_BUSY)) &&
+		    !engine->pmu.busy_stats) {
 			u32 val;
 
 			fw = grab_forcewake(dev_priv, fw);
@@ -349,14 +357,29 @@ static void i915_pmu_timer_cancel(struct perf_event *event)
 	hrtimer_cancel(&hwc->hrtimer);
 }
 
+static bool engine_needs_busy_stats(struct intel_engine_cs *engine)
+{
+	return supports_busy_stats() &&
+	       (engine->pmu.enable & BIT(I915_SAMPLE_BUSY));
+}
+
 static void i915_pmu_enable(struct perf_event *event)
 {
 	struct drm_i915_private *i915 =
 		container_of(event->pmu, typeof(*i915), pmu.base);
+	struct intel_engine_cs *engine = NULL;
 	unsigned long flags;
 
 	spin_lock_irqsave(&i915->pmu.lock, flags);
 
+	if (is_engine_event(event)) {
+		engine = intel_engine_lookup_user(i915,
+						  engine_event_class(event),
+						  engine_event_instance(event));
+		GEM_BUG_ON(!engine);
+		engine->pmu.enable |= BIT(engine_event_sample(event));
+	}
+
 	i915->pmu.enable |= event_enabled_mask(event);
 
 	if (pmu_needs_timer(i915, true) && !i915->pmu.timer_enabled) {
@@ -364,16 +387,11 @@ static void i915_pmu_enable(struct perf_event *event)
 				       ns_to_ktime(PERIOD), 0,
 				       HRTIMER_MODE_REL_PINNED);
 		i915->pmu.timer_enabled = true;
-	}
-
-	if (is_engine_event(event)) {
-		struct intel_engine_cs *engine;
-
-		engine = intel_engine_lookup_user(i915,
-						  engine_event_class(event),
-						  engine_event_instance(event));
-		GEM_BUG_ON(!engine);
-		engine->pmu.enable |= BIT(engine_event_sample(event));
+	} else if (is_engine_event(event) && engine_needs_busy_stats(engine) &&
+		   !engine->pmu.busy_stats) {
+		engine->pmu.busy_stats = true;
+		if (!cancel_delayed_work(&engine->pmu.disable_busy_stats))
+			queue_work(i915->wq, &engine->pmu.enable_busy_stats);
 	}
 
 	spin_unlock_irqrestore(&i915->pmu.lock, flags);
@@ -399,10 +417,17 @@ static void i915_pmu_disable(struct perf_event *event)
 						  engine_event_instance(event));
 		GEM_BUG_ON(!engine);
 		engine->pmu.enable &= ~BIT(engine_event_sample(event));
+		if (engine->pmu.busy_stats &&
+		    !engine_needs_busy_stats(engine)) {
+			engine->pmu.busy_stats = false;
+			queue_delayed_work(i915->wq,
+					   &engine->pmu.disable_busy_stats,
+					   round_jiffies_up_relative(2 * HZ));
+		}
 		mask = 0;
 		for_each_engine(engine, i915, id)
 			mask |= engine->pmu.enable;
-		mask = ~mask;
+		mask = (~mask) & ENGINE_SAMPLE_MASK;
 	} else {
 		mask = event_enabled_mask(event);
 	}
@@ -474,6 +499,9 @@ static void i915_pmu_event_read(struct perf_event *event)
 
 		if (WARN_ON_ONCE(!engine)) {
 			/* Do nothing */
+		} else if (sample == I915_SAMPLE_BUSY &&
+			   engine->pmu.busy_stats) {
+			val = ktime_to_ns(intel_engine_get_busy_time(engine));
 		} else {
 			val = engine->pmu.sample[sample];
 		}
@@ -634,8 +662,27 @@ static const struct attribute_group *i915_pmu_attr_groups[] = {
         NULL
 };
 
+static void __enable_busy_stats(struct work_struct *work)
+{
+	struct intel_engine_cs *engine =
+		container_of(work, typeof(*engine), pmu.enable_busy_stats);
+
+	WARN_ON_ONCE(intel_enable_engine_stats(engine));
+}
+
+static void __disable_busy_stats(struct work_struct *work)
+{
+	struct intel_engine_cs *engine =
+	       container_of(work, typeof(*engine), pmu.disable_busy_stats.work);
+
+	intel_disable_engine_stats(engine);
+}
+
 void i915_pmu_register(struct drm_i915_private *i915)
 {
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
 	if (INTEL_GEN(i915) <= 2)
 		return;
 
@@ -651,6 +698,13 @@ void i915_pmu_register(struct drm_i915_private *i915)
 
 	spin_lock_init(&i915->pmu.lock);
 	hrtimer_init(&i915->pmu.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+	for_each_engine(engine, i915, id) {
+		INIT_WORK(&engine->pmu.enable_busy_stats, __enable_busy_stats);
+		INIT_DELAYED_WORK(&engine->pmu.disable_busy_stats,
+			  __disable_busy_stats);
+	}
+
 	i915->pmu.timer.function = i915_sample;
 	i915->pmu.enable = 0;
 
@@ -660,6 +714,9 @@ void i915_pmu_register(struct drm_i915_private *i915)
 
 void i915_pmu_unregister(struct drm_i915_private *i915)
 {
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
 	if (!i915->pmu.base.event_init)
 		return;
 
@@ -669,4 +726,9 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
 	i915->pmu.base.event_init = NULL;
 
 	hrtimer_cancel(&i915->pmu.timer);
+
+	for_each_engine(engine, i915, id) {
+		flush_work(&engine->pmu.enable_busy_stats);
+		flush_delayed_work(&engine->pmu.disable_busy_stats);
+	}
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 68f50ec72be6..fd5d838ca7b5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -248,6 +248,9 @@ struct intel_engine_cs {
 	struct {
 		u32 enable;
 		u64 sample[4];
+		bool busy_stats;
+		struct work_struct enable_busy_stats;
+		struct delayed_work disable_busy_stats;
 	} pmu;
 
 	/*
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 10/10] drm/i915: Gate engine stats collection with a static key
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2017-08-02 12:32 ` [RFC 09/10] drm/i915/pmu: Wire up engine busy stats to PMU Tvrtko Ursulin
@ 2017-08-02 12:39 ` Tvrtko Ursulin
  2017-08-02 12:56 ` ✗ Fi.CI.BAT: warning for i915 PMU and engine busy stats (rev2) Patchwork
  10 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 12:39 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

This reduces the cost of the software engine busyness tracking
to a single no-op instruction when there are no listeners.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  | 17 +++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h | 54 ++++++++++++++++++---------------
 2 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e709f4edef90..fb61df3f770c 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -21,6 +21,7 @@
  * IN THE SOFTWARE.
  *
  */
+#include <linux/static_key.h>
 
 #include "i915_drv.h"
 #include "intel_ringbuffer.h"
@@ -1347,6 +1348,10 @@ void intel_engines_mark_idle(struct drm_i915_private *i915)
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(i915_engine_stats_key);
+static DEFINE_MUTEX(i915_engine_stats_mutex);
+static int i915_engine_stats_ref;
+
 int intel_enable_engine_stats(struct intel_engine_cs *engine)
 {
 	unsigned long flags;
@@ -1354,6 +1359,8 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
 	if (!i915.enable_execlists)
 		return -ENODEV;
 
+	mutex_lock(&i915_engine_stats_mutex);
+
 	spin_lock_irqsave(&engine->stats.lock, flags);
 	if (engine->stats.enabled++ == 0) {
 		engine->stats.ref = 0;
@@ -1363,10 +1370,16 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
 	}
 	spin_unlock_irqrestore(&engine->stats.lock, flags);
 
+	if (i915_engine_stats_ref++ == 0)
+		static_branch_enable(&i915_engine_stats_key);
+
+	mutex_unlock(&i915_engine_stats_mutex);
+
 	return 0;
 
 busy:
 	spin_unlock_irqrestore(&engine->stats.lock, flags);
+	mutex_unlock(&i915_engine_stats_mutex);
 
 	return -EBUSY;
 }
@@ -1378,10 +1391,14 @@ void intel_disable_engine_stats(struct intel_engine_cs *engine)
 	if (!i915.enable_execlists)
 		return;
 
+	mutex_lock(&i915_engine_stats_mutex);
 	spin_lock_irqsave(&engine->stats.lock, flags);
 	WARN_ON_ONCE(engine->stats.enabled == 0);
 	engine->stats.enabled--;
 	spin_unlock_irqrestore(&engine->stats.lock, flags);
+	if (--i915_engine_stats_ref == 0)
+		static_branch_disable(&i915_engine_stats_key);
+	mutex_unlock(&i915_engine_stats_mutex);
 }
 
 int intel_enable_engines_stats(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index fd5d838ca7b5..74add0a18e7d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -754,48 +754,54 @@ void intel_engines_reset_default_submission(struct drm_i915_private *i915);
 struct intel_engine_cs *
 intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance);
 
+DECLARE_STATIC_KEY_FALSE(i915_engine_stats_key);
+
 static inline void intel_engine_context_in(struct intel_engine_cs *engine)
 {
 	unsigned long flags;
 
-	if (READ_ONCE(engine->stats.enabled) == 0)
-		return;
+	if (static_branch_unlikely(&i915_engine_stats_key)) {
+		if (READ_ONCE(engine->stats.enabled) == 0)
+			return;
 
-	spin_lock_irqsave(&engine->stats.lock, flags);
+		spin_lock_irqsave(&engine->stats.lock, flags);
 
-	if (engine->stats.enabled > 0) {
-		if (engine->stats.ref++ == 0)
-			engine->stats.start = ktime_get();
-		GEM_BUG_ON(engine->stats.ref == 0);
-	}
+			if (engine->stats.enabled > 0) {
+				if (engine->stats.ref++ == 0)
+					engine->stats.start = ktime_get();
+				GEM_BUG_ON(engine->stats.ref == 0);
+			}
 
-	spin_unlock_irqrestore(&engine->stats.lock, flags);
+		spin_unlock_irqrestore(&engine->stats.lock, flags);
+	}
 }
 
 static inline void intel_engine_context_out(struct intel_engine_cs *engine)
 {
 	unsigned long flags;
 
-	if (READ_ONCE(engine->stats.enabled) == 0)
-		return;
+	if (static_branch_unlikely(&i915_engine_stats_key)) {
+		if (READ_ONCE(engine->stats.enabled) == 0)
+			return;
 
-	spin_lock_irqsave(&engine->stats.lock, flags);
+		spin_lock_irqsave(&engine->stats.lock, flags);
 
-	if (engine->stats.enabled > 0) {
-		/*
-		 * After turning on engine stats, context out might be the
-		 * first event which then needs to be ignored (ref == 0).
-		 */
-		if (engine->stats.ref && --engine->stats.ref == 0) {
-			ktime_t last = ktime_sub(ktime_get(),
-						 engine->stats.start);
+		if (engine->stats.enabled > 0) {
+			/*
+			 * After turning on engine stats, context out might be
+			 * the first event which then needs to be ignored.
+			 */
+			if (engine->stats.ref && --engine->stats.ref == 0) {
+				ktime_t last = ktime_sub(ktime_get(),
+							 engine->stats.start);
 
-			engine->stats.total = ktime_add(engine->stats.total,
-							last);
+				engine->stats.total =
+					ktime_add(engine->stats.total, last);
+			}
 		}
-	}
 
-	spin_unlock_irqrestore(&engine->stats.lock, flags);
+		spin_unlock_irqrestore(&engine->stats.lock, flags);
+	}
 }
 
 int intel_enable_engine_stats(struct intel_engine_cs *engine);
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* ✗ Fi.CI.BAT: warning for i915 PMU and engine busy stats (rev2)
  2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
                   ` (9 preceding siblings ...)
  2017-08-02 12:39 ` [RFC 10/10] drm/i915: Gate engine stats collection with a static key Tvrtko Ursulin
@ 2017-08-02 12:56 ` Patchwork
  10 siblings, 0 replies; 32+ messages in thread
From: Patchwork @ 2017-08-02 12:56 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: i915 PMU and engine busy stats (rev2)
URL   : https://patchwork.freedesktop.org/series/27488/
State : warning

== Summary ==

Series 27488v2 i915 PMU and engine busy stats
https://patchwork.freedesktop.org/api/1.0/series/27488/revisions/2/mbox/

Test kms_busy:
        Subgroup basic-flip-default-a:
                pass       -> DMESG-WARN (fi-skl-6700hq) fdo#101144
        Subgroup basic-flip-default-b:
                pass       -> FAIL       (fi-skl-6700hq) fdo#101518 +31
Test kms_flip:
        Subgroup basic-flip-vs-modeset:
                pass       -> SKIP       (fi-skl-6700hq)
        Subgroup basic-flip-vs-wf_vblank:
                pass       -> SKIP       (fi-skl-6700hq) fdo#99739
        Subgroup basic-plain-flip:
                pass       -> SKIP       (fi-skl-6700hq)
Test kms_frontbuffer_tracking:
        Subgroup basic:
                pass       -> SKIP       (fi-skl-6700hq)
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                dmesg-warn -> PASS       (fi-byt-n2820) fdo#101705
Test pm_rpm:
        Subgroup basic-pci-d3-state:
                pass       -> SKIP       (fi-skl-6700hq)
        Subgroup basic-rte:
                pass       -> SKIP       (fi-skl-6700hq)

fdo#101144 https://bugs.freedesktop.org/show_bug.cgi?id=101144
fdo#101518 https://bugs.freedesktop.org/show_bug.cgi?id=101518
fdo#99739 https://bugs.freedesktop.org/show_bug.cgi?id=99739
fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705

fi-bdw-5557u     total:280  pass:269  dwarn:0   dfail:0   fail:0   skip:11  time:454s
fi-bdw-gvtdvm    total:280  pass:266  dwarn:0   dfail:0   fail:0   skip:14  time:430s
fi-blb-e6850     total:280  pass:225  dwarn:1   dfail:0   fail:0   skip:54  time:355s
fi-bsw-n3050     total:280  pass:244  dwarn:0   dfail:0   fail:0   skip:36  time:542s
fi-bxt-j4205     total:280  pass:261  dwarn:0   dfail:0   fail:0   skip:19  time:510s
fi-byt-n2820     total:280  pass:252  dwarn:0   dfail:0   fail:0   skip:28  time:506s
fi-glk-2a        total:280  pass:261  dwarn:0   dfail:0   fail:0   skip:19  time:610s
fi-hsw-4770      total:280  pass:264  dwarn:0   dfail:0   fail:0   skip:16  time:434s
fi-hsw-4770r     total:280  pass:264  dwarn:0   dfail:0   fail:0   skip:16  time:419s
fi-ilk-650       total:280  pass:230  dwarn:0   dfail:0   fail:0   skip:50  time:412s
fi-ivb-3520m     total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:490s
fi-ivb-3770      total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:477s
fi-kbl-7500u     total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:476s
fi-kbl-7560u     total:280  pass:270  dwarn:0   dfail:0   fail:0   skip:10  time:574s
fi-kbl-r         total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:591s
fi-pnv-d510      total:280  pass:224  dwarn:1   dfail:0   fail:0   skip:55  time:575s
fi-skl-6260u     total:280  pass:270  dwarn:0   dfail:0   fail:0   skip:10  time:457s
fi-skl-6700hq    total:280  pass:224  dwarn:1   dfail:0   fail:30  skip:24  time:321s
fi-skl-6700k     total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:472s
fi-skl-6770hq    total:280  pass:270  dwarn:0   dfail:0   fail:0   skip:10  time:480s
fi-skl-gvtdvm    total:280  pass:267  dwarn:0   dfail:0   fail:0   skip:13  time:435s
fi-skl-x1585l    total:280  pass:269  dwarn:0   dfail:0   fail:0   skip:11  time:474s
fi-snb-2520m     total:280  pass:252  dwarn:0   dfail:0   fail:0   skip:28  time:548s
fi-snb-2600      total:280  pass:251  dwarn:0   dfail:0   fail:0   skip:29  time:411s

fcb630a80579faf6d12ee62cb49bd7a4acff41e6 drm-tip: 2017y-08m-01d-17h-14m-57s UTC integration manifest
c45ea97dde7f drm/i915: Convert intel_rc6_residency_us to ns

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5311/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-02 12:32 ` [RFC 04/10] drm/i915: Expose a PMU interface for perf queries Tvrtko Ursulin
@ 2017-08-02 13:00   ` Tvrtko Ursulin
  2017-08-12  2:15     ` Rogozhkin, Dmitry V
  0 siblings, 1 reply; 32+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 13:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx, Peter Zijlstra


Hi Peter,

We are trying to expose some software PMU events from the i915 driver. 
Things like GPU engine busyness, power usage and similar.

There is quite a bit of opens regarding usage of the perf API and so on. 
Would you be able to snoop around the below patch and help with 
answering some questions?

Current opens would be something like:

Are the metrics we are exposing meaningful/appropriate as software PMU?

Do we need / can we get anything useful by enumerating our events in 
sysfs? For example can we have "perf stat" or similar do something 
useful with them today? Or we should just not expose in sysfs and only 
use then from tools which will understand them?

Also, once I exported the events for enumeration and tried accessing 
them with perf, I had to create fake pt_regs to pass to 
perf_event_overflow.

Can we control sampling frequency as requested by the perf API? We 
mostly do not need rapid polling on these counters.

How is our usage of the perf API in general?

Regards,

Tvrtko

On 02/08/2017 13:32, Tvrtko Ursulin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> The first goal is to be able to measure GPU (and invidual ring) busyness
> without having to poll registers from userspace. (Which not only incurs
> holding the forcewake lock indefinitely, perturbing the system, but also
> runs the risk of hanging the machine.) As an alternative we can use the
> perf event counter interface to sample the ring registers periodically
> and send those results to userspace.
> 
> To be able to do so, we need to export the two symbols from
> kernel/events/core.c to register and unregister a PMU device.
> 
> v2: Use a common timer for the ring sampling.
> 
> v3:
>   * Decouple uAPI from i915 engine ids.
>   * Complete uAPI defines.
>   * Refactor some code to helpers for clarity.
>   * Skip sampling disabled engines.
>   * Expose counters in sysfs.
>   * Pass in fake regs to avoid null ptr deref in perf core.
>   * Convert to class/instance uAPI.
>   * Use shared driver code for rc6 residency, power and frequency.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/Makefile           |   1 +
>   drivers/gpu/drm/i915/i915_drv.c         |   2 +
>   drivers/gpu/drm/i915/i915_drv.h         |  25 ++
>   drivers/gpu/drm/i915/i915_pmu.c         | 629 ++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_reg.h         |   3 +
>   drivers/gpu/drm/i915/intel_engine_cs.c  |  10 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c |  25 ++
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   8 +
>   include/uapi/drm/i915_drm.h             |  55 +++
>   kernel/events/core.c                    |   1 +
>   10 files changed, 759 insertions(+)
>   create mode 100644 drivers/gpu/drm/i915/i915_pmu.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index f8227318dcaf..1c720013dc42 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -26,6 +26,7 @@ i915-y := i915_drv.o \
>   
>   i915-$(CONFIG_COMPAT)   += i915_ioc32.o
>   i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o
> +i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
>   
>   # GEM code
>   i915-y += i915_cmd_parser.o \
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 214555e813f1..d75c2d790875 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1194,6 +1194,7 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
>   	struct drm_device *dev = &dev_priv->drm;
>   
>   	i915_gem_shrinker_init(dev_priv);
> +	i915_pmu_register(dev_priv);
>   
>   	/*
>   	 * Notify a valid surface after modesetting,
> @@ -1248,6 +1249,7 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
>   	intel_opregion_unregister(dev_priv);
>   
>   	i915_perf_unregister(dev_priv);
> +	i915_pmu_unregister(dev_priv);
>   
>   	i915_teardown_sysfs(dev_priv);
>   	i915_guc_log_unregister(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 0c8bd1cdcbbd..142826742b86 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -40,6 +40,7 @@
>   #include <linux/hash.h>
>   #include <linux/intel-iommu.h>
>   #include <linux/kref.h>
> +#include <linux/perf_event.h>
>   #include <linux/pm_qos.h>
>   #include <linux/reservation.h>
>   #include <linux/shmem_fs.h>
> @@ -2111,6 +2112,12 @@ struct intel_cdclk_state {
>   	unsigned int cdclk, vco, ref;
>   };
>   
> +enum {
> +	__I915_SAMPLE_FREQ_ACT = 0,
> +	__I915_SAMPLE_FREQ_REQ,
> +	__I915_NUM_PMU_SAMPLERS
> +};
> +
>   struct drm_i915_private {
>   	struct drm_device drm;
>   
> @@ -2158,6 +2165,7 @@ struct drm_i915_private {
>   	struct pci_dev *bridge_dev;
>   	struct i915_gem_context *kernel_context;
>   	struct intel_engine_cs *engine[I915_NUM_ENGINES];
> +	struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1][MAX_ENGINE_INSTANCE + 1];
>   	struct i915_vma *semaphore;
>   
>   	struct drm_dma_handle *status_page_dmah;
> @@ -2605,6 +2613,14 @@ struct drm_i915_private {
>   		int	irq;
>   	} lpe_audio;
>   
> +	struct {
> +		struct pmu base;
> +		spinlock_t lock;
> +		struct hrtimer timer;
> +		u64 enable;
> +		u64 sample[__I915_NUM_PMU_SAMPLERS];
> +	} pmu;
> +
>   	/*
>   	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
>   	 * will be rejected. Instead look for a better place.
> @@ -3813,6 +3829,15 @@ extern void i915_perf_fini(struct drm_i915_private *dev_priv);
>   extern void i915_perf_register(struct drm_i915_private *dev_priv);
>   extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
>   
> +/* i915_pmu.c */
> +#ifdef CONFIG_PERF_EVENTS
> +extern void i915_pmu_register(struct drm_i915_private *i915);
> +extern void i915_pmu_unregister(struct drm_i915_private *i915);
> +#else
> +static inline void i915_pmu_register(struct drm_i915_private *i915) {}
> +static inline void i915_pmu_unregister(struct drm_i915_private *i915) {}
> +#endif
> +
>   /* i915_suspend.c */
>   extern int i915_save_state(struct drm_i915_private *dev_priv);
>   extern int i915_restore_state(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> new file mode 100644
> index 000000000000..62c527c12641
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -0,0 +1,629 @@
> +#include <linux/perf_event.h>
> +#include <linux/pm_runtime.h>
> +
> +#include "i915_drv.h"
> +#include "intel_ringbuffer.h"
> +
> +#define FREQUENCY 200
> +#define PERIOD max_t(u64, 10000, NSEC_PER_SEC / FREQUENCY)
> +
> +#define ENGINE_SAMPLE_MASK \
> +	(BIT(I915_SAMPLE_QUEUED) | \
> +	 BIT(I915_SAMPLE_BUSY) | \
> +	 BIT(I915_SAMPLE_WAIT) | \
> +	 BIT(I915_SAMPLE_SEMA))
> +
> +#define ENGINE_SAMPLE_BITS (16)
> +
> +static u8 engine_config_sample(u64 config)
> +{
> +	return config & I915_PMU_SAMPLE_MASK;
> +}
> +
> +static u8 engine_event_sample(struct perf_event *event)
> +{
> +	return engine_config_sample(event->attr.config);
> +}
> +
> +static u8 engine_event_class(struct perf_event *event)
> +{
> +	return (event->attr.config >> I915_PMU_CLASS_SHIFT) & 0xff;
> +}
> +
> +static u8 engine_event_instance(struct perf_event *event)
> +{
> +	return (event->attr.config >> I915_PMU_SAMPLE_BITS) & 0xff;
> +}
> +
> +static bool is_engine_config(u64 config)
> +{
> +	return config < __I915_PMU_OTHER(0);
> +}
> +
> +static u64 config_enabled_mask(u64 config)
> +{
> +	if (is_engine_config(config))
> +		return BIT_ULL(engine_config_sample(config));
> +	else
> +		return BIT_ULL(config - __I915_PMU_OTHER(0)) <<
> +		       ENGINE_SAMPLE_BITS;
> +}
> +
> +static bool is_engine_event(struct perf_event *event)
> +{
> +	return is_engine_config(event->attr.config);
> +}
> +
> +static u64 event_enabled_mask(struct perf_event *event)
> +{
> +	return config_enabled_mask(event->attr.config);
> +}
> +
> +static bool grab_forcewake(struct drm_i915_private *i915, bool fw)
> +{
> +	if (!fw)
> +		intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
> +
> +	return true;
> +}
> +
> +static void engines_sample(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	bool fw = false;
> +
> +	if ((dev_priv->pmu.enable & ENGINE_SAMPLE_MASK) == 0)
> +		return;
> +
> +	if (!dev_priv->gt.awake)
> +		return;
> +
> +	if (!intel_runtime_pm_get_if_in_use(dev_priv))
> +		return;
> +
> +	for_each_engine(engine, dev_priv, id) {
> +		u32 enable = engine->pmu.enable;
> +
> +		if (i915_seqno_passed(intel_engine_get_seqno(engine),
> +				      intel_engine_last_submit(engine)))
> +			continue;
> +
> +		if (enable & BIT(I915_SAMPLE_QUEUED))
> +			engine->pmu.sample[I915_SAMPLE_QUEUED] += PERIOD;
> +
> +		if (enable & BIT(I915_SAMPLE_BUSY)) {
> +			u32 val;
> +
> +			fw = grab_forcewake(dev_priv, fw);
> +			val = I915_READ_FW(RING_MI_MODE(engine->mmio_base));
> +			if (!(val & MODE_IDLE))
> +				engine->pmu.sample[I915_SAMPLE_BUSY] += PERIOD;
> +		}
> +
> +		if (enable & (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA))) {
> +			u32 val;
> +
> +			fw = grab_forcewake(dev_priv, fw);
> +			val = I915_READ_FW(RING_CTL(engine->mmio_base));
> +			if ((enable & BIT(I915_SAMPLE_WAIT)) &&
> +			    (val & RING_WAIT))
> +				engine->pmu.sample[I915_SAMPLE_WAIT] += PERIOD;
> +			if ((enable & BIT(I915_SAMPLE_SEMA)) &&
> +			    (val & RING_WAIT_SEMAPHORE))
> +				engine->pmu.sample[I915_SAMPLE_SEMA] += PERIOD;
> +		}
> +	}
> +
> +	if (fw)
> +		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +	intel_runtime_pm_put(dev_priv);
> +}
> +
> +static void frequency_sample(struct drm_i915_private *dev_priv)
> +{
> +	if (dev_priv->pmu.enable &
> +	    config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY)) {
> +		u64 val;
> +
> +		val = dev_priv->rps.cur_freq;
> +		if (dev_priv->gt.awake &&
> +		    intel_runtime_pm_get_if_in_use(dev_priv)) {
> +			val = intel_get_cagf(dev_priv,
> +					     I915_READ_NOTRACE(GEN6_RPSTAT1));
> +			intel_runtime_pm_put(dev_priv);
> +		}
> +		val = intel_gpu_freq(dev_priv, val);
> +		dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT] += val * PERIOD;
> +	}
> +
> +	if (dev_priv->pmu.enable &
> +	    config_enabled_mask(I915_PMU_REQUESTED_FREQUENCY)) {
> +		u64 val = intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq);
> +		dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ] += val * PERIOD;
> +	}
> +}
> +
> +static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
> +{
> +	struct drm_i915_private *i915 =
> +		container_of(hrtimer, struct drm_i915_private, pmu.timer);
> +
> +	if (i915->pmu.enable == 0)
> +		return HRTIMER_NORESTART;
> +
> +	engines_sample(i915);
> +	frequency_sample(i915);
> +
> +	hrtimer_forward_now(hrtimer, ns_to_ktime(PERIOD));
> +	return HRTIMER_RESTART;
> +}
> +
> +static void i915_pmu_event_destroy(struct perf_event *event)
> +{
> +	WARN_ON(event->parent);
> +}
> +
> +static int engine_event_init(struct perf_event *event)
> +{
> +	struct drm_i915_private *i915 =
> +		container_of(event->pmu, typeof(*i915), pmu.base);
> +
> +	if (!intel_engine_lookup_user(i915, engine_event_class(event),
> +				      engine_event_instance(event)))
> +		return -ENODEV;
> +
> +	switch (engine_event_sample(event)) {
> +	case I915_SAMPLE_QUEUED:
> +	case I915_SAMPLE_BUSY:
> +	case I915_SAMPLE_WAIT:
> +		break;
> +	case I915_SAMPLE_SEMA:
> +		if (INTEL_GEN(i915) < 6)
> +			return -ENODEV;
> +		break;
> +	default:
> +		return -ENOENT;
> +	}
> +
> +	return 0;
> +}
> +
> +static DEFINE_PER_CPU(struct pt_regs, i915_pmu_pt_regs);
> +
> +static enum hrtimer_restart hrtimer_sample(struct hrtimer *hrtimer)
> +{
> +	struct pt_regs *regs = this_cpu_ptr(&i915_pmu_pt_regs);
> +	struct perf_sample_data data;
> +	struct perf_event *event;
> +	u64 period;
> +
> +	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
> +	if (event->state != PERF_EVENT_STATE_ACTIVE)
> +		return HRTIMER_NORESTART;
> +
> +	event->pmu->read(event);
> +
> +	perf_sample_data_init(&data, 0, event->hw.last_period);
> +	perf_event_overflow(event, &data, regs);
> +
> +	period = max_t(u64, 10000, event->hw.sample_period);
> +	hrtimer_forward_now(hrtimer, ns_to_ktime(period));
> +	return HRTIMER_RESTART;
> +}
> +
> +static void init_hrtimer(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if (!is_sampling_event(event))
> +		return;
> +
> +	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	hwc->hrtimer.function = hrtimer_sample;
> +
> +	if (event->attr.freq) {
> +		long freq = event->attr.sample_freq;
> +
> +		event->attr.sample_period = NSEC_PER_SEC / freq;
> +		hwc->sample_period = event->attr.sample_period;
> +		local64_set(&hwc->period_left, hwc->sample_period);
> +		hwc->last_period = hwc->sample_period;
> +		event->attr.freq = 0;
> +	}
> +}
> +
> +static int i915_pmu_event_init(struct perf_event *event)
> +{
> +	struct drm_i915_private *i915 =
> +		container_of(event->pmu, typeof(*i915), pmu.base);
> +	int ret;
> +
> +	/* XXX ideally only want pid == -1 && cpu == -1 */
> +
> +	if (event->attr.type != event->pmu->type)
> +		return -ENOENT;
> +
> +	if (has_branch_stack(event))
> +		return -EOPNOTSUPP;
> +
> +	ret = 0;
> +	if (is_engine_event(event)) {
> +		ret = engine_event_init(event);
> +	} else switch (event->attr.config) {
> +	case I915_PMU_ACTUAL_FREQUENCY:
> +		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
> +			ret = -ENODEV; /* requires a mutex for sampling! */
> +	case I915_PMU_REQUESTED_FREQUENCY:
> +	case I915_PMU_ENERGY:
> +	case I915_PMU_RC6_RESIDENCY:
> +	case I915_PMU_RC6p_RESIDENCY:
> +	case I915_PMU_RC6pp_RESIDENCY:
> +		if (INTEL_GEN(i915) < 6)
> +			ret = -ENODEV;
> +		break;
> +	}
> +	if (ret)
> +		return ret;
> +
> +	if (!event->parent)
> +		event->destroy = i915_pmu_event_destroy;
> +
> +	init_hrtimer(event);
> +
> +	return 0;
> +}
> +
> +static void i915_pmu_timer_start(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	s64 period;
> +
> +	if (!is_sampling_event(event))
> +		return;
> +
> +	period = local64_read(&hwc->period_left);
> +	if (period) {
> +		if (period < 0)
> +			period = 10000;
> +
> +		local64_set(&hwc->period_left, 0);
> +	} else {
> +		period = max_t(u64, 10000, hwc->sample_period);
> +	}
> +
> +	hrtimer_start_range_ns(&hwc->hrtimer,
> +			       ns_to_ktime(period), 0,
> +			       HRTIMER_MODE_REL_PINNED);
> +}
> +
> +static void i915_pmu_timer_cancel(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if (!is_sampling_event(event))
> +		return;
> +
> +	local64_set(&hwc->period_left,
> +		    ktime_to_ns(hrtimer_get_remaining(&hwc->hrtimer)));
> +	hrtimer_cancel(&hwc->hrtimer);
> +}
> +
> +static void i915_pmu_enable(struct perf_event *event)
> +{
> +	struct drm_i915_private *i915 =
> +		container_of(event->pmu, typeof(*i915), pmu.base);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&i915->pmu.lock, flags);
> +
> +	if (i915->pmu.enable == 0)
> +		hrtimer_start_range_ns(&i915->pmu.timer,
> +				       ns_to_ktime(PERIOD), 0,
> +				       HRTIMER_MODE_REL_PINNED);
> +
> +	i915->pmu.enable |= event_enabled_mask(event);
> +
> +	if (is_engine_event(event)) {
> +		struct intel_engine_cs *engine;
> +
> +		engine = intel_engine_lookup_user(i915,
> +						  engine_event_class(event),
> +						  engine_event_instance(event));
> +		GEM_BUG_ON(!engine);
> +		engine->pmu.enable |= BIT(engine_event_sample(event));
> +	}
> +
> +	spin_unlock_irqrestore(&i915->pmu.lock, flags);
> +
> +	i915_pmu_timer_start(event);
> +}
> +
> +static void i915_pmu_disable(struct perf_event *event)
> +{
> +	struct drm_i915_private *i915 =
> +		container_of(event->pmu, typeof(*i915), pmu.base);
> +	unsigned long flags;
> +	u64 mask;
> +
> +	spin_lock_irqsave(&i915->pmu.lock, flags);
> +
> +	if (is_engine_event(event)) {
> +		struct intel_engine_cs *engine;
> +		enum intel_engine_id id;
> +
> +		engine = intel_engine_lookup_user(i915,
> +						  engine_event_class(event),
> +						  engine_event_instance(event));
> +		GEM_BUG_ON(!engine);
> +		engine->pmu.enable &= ~BIT(engine_event_sample(event));
> +		mask = 0;
> +		for_each_engine(engine, i915, id)
> +			mask |= engine->pmu.enable;
> +		mask = ~mask;
> +	} else {
> +		mask = event_enabled_mask(event);
> +	}
> +
> +	i915->pmu.enable &= ~mask;
> +
> +	spin_unlock_irqrestore(&i915->pmu.lock, flags);
> +
> +	i915_pmu_timer_cancel(event);
> +}
> +
> +static int i915_pmu_event_add(struct perf_event *event, int flags)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if (flags & PERF_EF_START)
> +		i915_pmu_enable(event);
> +
> +	hwc->state = !(flags & PERF_EF_START);
> +
> +	return 0;
> +}
> +
> +static void i915_pmu_event_del(struct perf_event *event, int flags)
> +{
> +	i915_pmu_disable(event);
> +}
> +
> +static void i915_pmu_event_start(struct perf_event *event, int flags)
> +{
> +	i915_pmu_enable(event);
> +}
> +
> +static void i915_pmu_event_stop(struct perf_event *event, int flags)
> +{
> +	i915_pmu_disable(event);
> +}
> +
> +static u64 count_interrupts(struct drm_i915_private *i915)
> +{
> +	/* open-coded kstat_irqs() */
> +	struct irq_desc *desc = irq_to_desc(i915->drm.pdev->irq);
> +	u64 sum = 0;
> +	int cpu;
> +
> +	if (!desc || !desc->kstat_irqs)
> +		return 0;
> +
> +	for_each_possible_cpu(cpu)
> +		sum += *per_cpu_ptr(desc->kstat_irqs, cpu);
> +
> +	return sum;
> +}
> +
> +static void i915_pmu_event_read(struct perf_event *event)
> +{
> +	struct drm_i915_private *i915 =
> +		container_of(event->pmu, typeof(*i915), pmu.base);
> +	u64 val = 0;
> +
> +	if (is_engine_event(event)) {
> +		u8 sample = engine_event_sample(event);
> +		struct intel_engine_cs *engine;
> +
> +		engine = intel_engine_lookup_user(i915,
> +						  engine_event_class(event),
> +						  engine_event_instance(event));
> +
> +		if (WARN_ON_ONCE(!engine)) {
> +			/* Do nothing */
> +		} else {
> +			val = engine->pmu.sample[sample];
> +		}
> +	} else switch (event->attr.config) {
> +	case I915_PMU_ACTUAL_FREQUENCY:
> +		val = i915->pmu.sample[__I915_SAMPLE_FREQ_ACT];
> +		break;
> +	case I915_PMU_REQUESTED_FREQUENCY:
> +		val = i915->pmu.sample[__I915_SAMPLE_FREQ_REQ];
> +		break;
> +	case I915_PMU_ENERGY:
> +		val = intel_energy_uJ(i915);
> +		break;
> +	case I915_PMU_INTERRUPTS:
> +		val = count_interrupts(i915);
> +		break;
> +
> +	case I915_PMU_RC6_RESIDENCY:
> +		if (!i915->gt.awake)
> +			return;
> +
> +		val = intel_rc6_residency_ns(i915,
> +					     IS_VALLEYVIEW(i915) ?
> +					     VLV_GT_RENDER_RC6 :
> +					     GEN6_GT_GFX_RC6);
> +		break;
> +
> +	case I915_PMU_RC6p_RESIDENCY:
> +		if (!i915->gt.awake)
> +			return;
> +
> +		if (!IS_VALLEYVIEW(i915))
> +			val = intel_rc6_residency_ns(i915, GEN6_GT_GFX_RC6p);
> +		break;
> +
> +	case I915_PMU_RC6pp_RESIDENCY:
> +		if (!i915->gt.awake)
> +			return;
> +
> +		if (!IS_VALLEYVIEW(i915))
> +			val = intel_rc6_residency_ns(i915, GEN6_GT_GFX_RC6pp);
> +		break;
> +	}
> +
> +	local64_set(&event->count, val);
> +}
> +
> +static int i915_pmu_event_event_idx(struct perf_event *event)
> +{
> +	return 0;
> +}
> +
> +static ssize_t i915_pmu_format_show(struct device *dev,
> +				    struct device_attribute *attr, char *buf)
> +{
> +        struct dev_ext_attribute *eattr;
> +
> +        eattr = container_of(attr, struct dev_ext_attribute, attr);
> +        return sprintf(buf, "%s\n", (char *) eattr->var);
> +}
> +
> +#define I915_PMU_FORMAT_ATTR(_name, _config)           \
> +        (&((struct dev_ext_attribute[]) {               \
> +                { .attr = __ATTR(_name, S_IRUGO, i915_pmu_format_show, NULL), \
> +                  .var = (void *) _config, }            \
> +        })[0].attr.attr)
> +
> +static struct attribute *i915_pmu_format_attrs[] = {
> +        I915_PMU_FORMAT_ATTR(i915_eventid, "config:0-42"),
> +        NULL,
> +};
> +
> +static const struct attribute_group i915_pmu_format_attr_group = {
> +        .name = "format",
> +        .attrs = i915_pmu_format_attrs,
> +};
> +
> +static ssize_t i915_pmu_event_show(struct device *dev,
> +				   struct device_attribute *attr, char *buf)
> +{
> +        struct dev_ext_attribute *eattr;
> +
> +        eattr = container_of(attr, struct dev_ext_attribute, attr);
> +        return sprintf(buf, "config=0x%lx\n", (unsigned long) eattr->var);
> +}
> +
> +#define I915_PMU_EVENT_ATTR(_name, _config)            \
> +        (&((struct dev_ext_attribute[]) {               \
> +                { .attr = __ATTR(_name, S_IRUGO, i915_pmu_event_show, NULL), \
> +                  .var = (void *) _config, }            \
> +         })[0].attr.attr)
> +
> +static struct attribute *i915_pmu_events_attrs[] = {
> +	I915_PMU_EVENT_ATTR(rcs0-queued,
> +			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_RENDER, 0)),
> +	I915_PMU_EVENT_ATTR(rcs0-busy,
> +			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0)),
> +	I915_PMU_EVENT_ATTR(rcs0-wait,
> +			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_RENDER, 0)),
> +	I915_PMU_EVENT_ATTR(rcs0-sema,
> +			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_RENDER, 0)),
> +
> +	I915_PMU_EVENT_ATTR(bcs0-queued,
> +			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_COPY, 0)),
> +	I915_PMU_EVENT_ATTR(bcs0-busy,
> +			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_COPY, 0)),
> +	I915_PMU_EVENT_ATTR(bcs0-wait,
> +			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_COPY, 0)),
> +	I915_PMU_EVENT_ATTR(bcs0-sema,
> +			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_COPY, 0)),
> +
> +	I915_PMU_EVENT_ATTR(vcs0-queued,
> +			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_VIDEO, 0)),
> +	I915_PMU_EVENT_ATTR(vcs0-busy,
> +			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO, 0)),
> +	I915_PMU_EVENT_ATTR(vcs0-wait,
> +			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_VIDEO, 0)),
> +	I915_PMU_EVENT_ATTR(vcs0-sema,
> +			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_VIDEO, 0)),
> +
> +	I915_PMU_EVENT_ATTR(vcs1-queued,
> +			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_VIDEO, 1)),
> +	I915_PMU_EVENT_ATTR(vcs1-busy,
> +			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO, 1)),
> +	I915_PMU_EVENT_ATTR(vcs1-wait,
> +			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_VIDEO, 1)),
> +	I915_PMU_EVENT_ATTR(vcs1-sema,
> +			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_VIDEO, 1)),
> +
> +	I915_PMU_EVENT_ATTR(vecs0-queued,
> +			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
> +	I915_PMU_EVENT_ATTR(vecs0-busy,
> +			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
> +	I915_PMU_EVENT_ATTR(vecs0-wait,
> +			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
> +	I915_PMU_EVENT_ATTR(vecs0-sema,
> +			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
> +
> +        I915_PMU_EVENT_ATTR(actual-frequency,	 I915_PMU_ACTUAL_FREQUENCY),
> +        I915_PMU_EVENT_ATTR(requested-frequency, I915_PMU_REQUESTED_FREQUENCY),
> +        I915_PMU_EVENT_ATTR(energy,		 I915_PMU_ENERGY),
> +        I915_PMU_EVENT_ATTR(interrupts,		 I915_PMU_INTERRUPTS),
> +        I915_PMU_EVENT_ATTR(rc6-residency,	 I915_PMU_RC6_RESIDENCY),
> +        I915_PMU_EVENT_ATTR(rc6p-residency,	 I915_PMU_RC6p_RESIDENCY),
> +        I915_PMU_EVENT_ATTR(rc6pp-residency,	 I915_PMU_RC6pp_RESIDENCY),
> +
> +        NULL,
> +};
> +
> +static const struct attribute_group i915_pmu_events_attr_group = {
> +        .name = "events",
> +        .attrs = i915_pmu_events_attrs,
> +};
> +
> +static const struct attribute_group *i915_pmu_attr_groups[] = {
> +        &i915_pmu_format_attr_group,
> +        &i915_pmu_events_attr_group,
> +        NULL
> +};
> +
> +void i915_pmu_register(struct drm_i915_private *i915)
> +{
> +	if (INTEL_GEN(i915) <= 2)
> +		return;
> +
> +	i915->pmu.base.attr_groups	= i915_pmu_attr_groups;
> +	i915->pmu.base.task_ctx_nr	= perf_sw_context;
> +	i915->pmu.base.event_init	= i915_pmu_event_init;
> +	i915->pmu.base.add		= i915_pmu_event_add;
> +	i915->pmu.base.del		= i915_pmu_event_del;
> +	i915->pmu.base.start		= i915_pmu_event_start;
> +	i915->pmu.base.stop		= i915_pmu_event_stop;
> +	i915->pmu.base.read		= i915_pmu_event_read;
> +	i915->pmu.base.event_idx	= i915_pmu_event_event_idx;
> +
> +	spin_lock_init(&i915->pmu.lock);
> +	hrtimer_init(&i915->pmu.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	i915->pmu.timer.function = i915_sample;
> +	i915->pmu.enable = 0;
> +
> +	if (perf_pmu_register(&i915->pmu.base, "i915", -1))
> +		i915->pmu.base.event_init = NULL;
> +}
> +
> +void i915_pmu_unregister(struct drm_i915_private *i915)
> +{
> +	if (!i915->pmu.base.event_init)
> +		return;
> +
> +	i915->pmu.enable = 0;
> +
> +	perf_pmu_unregister(&i915->pmu.base);
> +	i915->pmu.base.event_init = NULL;
> +
> +	hrtimer_cancel(&i915->pmu.timer);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 1dc7e7a2a23b..26bce766ab51 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -95,6 +95,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
>   #define VIDEO_ENHANCEMENT_CLASS	2
>   #define COPY_ENGINE_CLASS	3
>   #define OTHER_CLASS		4
> +#define MAX_ENGINE_CLASS	4
> +
> +#define MAX_ENGINE_INSTANCE    1
>   
>   /* PCI config space */
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 9ab596941372..14630612325b 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -198,6 +198,15 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
>   	GEM_BUG_ON(info->class >= ARRAY_SIZE(intel_engine_classes));
>   	class_info = &intel_engine_classes[info->class];
>   
> +	if (GEM_WARN_ON(info->class > MAX_ENGINE_CLASS))
> +		return -EINVAL;
> +
> +	if (GEM_WARN_ON(info->instance > MAX_ENGINE_INSTANCE))
> +		return -EINVAL;
> +
> +	if (GEM_WARN_ON(dev_priv->engine_class[info->class][info->instance]))
> +		return -EINVAL;
> +
>   	GEM_BUG_ON(dev_priv->engine[id]);
>   	engine = kzalloc(sizeof(*engine), GFP_KERNEL);
>   	if (!engine)
> @@ -225,6 +234,7 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
>   
>   	ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
>   
> +	dev_priv->engine_class[info->class][info->instance] = engine;
>   	dev_priv->engine[id] = engine;
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index cdf084ef5aae..af8d85794c44 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2282,3 +2282,28 @@ int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine)
>   
>   	return intel_init_ring_buffer(engine);
>   }
> +
> +static u8 user_class_map[I915_ENGINE_CLASS_MAX] = {
> +	[I915_ENGINE_CLASS_OTHER] = OTHER_CLASS,
> +	[I915_ENGINE_CLASS_RENDER] = RENDER_CLASS,
> +	[I915_ENGINE_CLASS_COPY] = COPY_ENGINE_CLASS,
> +	[I915_ENGINE_CLASS_VIDEO] = VIDEO_DECODE_CLASS,
> +	[I915_ENGINE_CLASS_VIDEO_ENHANCE] = VIDEO_ENHANCEMENT_CLASS,
> +};
> +
> +struct intel_engine_cs *
> +intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
> +{
> +	if (class >= ARRAY_SIZE(user_class_map))
> +		return NULL;
> +
> +	class = user_class_map[class];
> +
> +	if (WARN_ON_ONCE(class > MAX_ENGINE_CLASS))
> +		return NULL;
> +
> +	if (instance > MAX_ENGINE_INSTANCE)
> +		return NULL;
> +
> +	return i915->engine_class[class][instance];
> +}
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d33c93444c0d..9fdf0cdf6220 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -245,6 +245,11 @@ struct intel_engine_cs {
>   		I915_SELFTEST_DECLARE(bool mock : 1);
>   	} breadcrumbs;
>   
> +	struct {
> +		u32 enable;
> +		u64 sample[4];
> +	} pmu;
> +
>   	/*
>   	 * A pool of objects to use as shadow copies of client batch buffers
>   	 * when the command parser is enabled. Prevents the client from
> @@ -735,4 +740,7 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv);
>   void intel_engines_mark_idle(struct drm_i915_private *i915);
>   void intel_engines_reset_default_submission(struct drm_i915_private *i915);
>   
> +struct intel_engine_cs *
> +intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance);
> +
>   #endif /* _INTEL_RINGBUFFER_H_ */
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 7ccbd6a2bbe0..103874476a6d 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -86,6 +86,61 @@ enum i915_mocs_table_index {
>   	I915_MOCS_CACHED,
>   };
>   
> +enum drm_i915_gem_engine_class {
> +	I915_ENGINE_CLASS_OTHER = 0,
> +	I915_ENGINE_CLASS_RENDER = 1,
> +	I915_ENGINE_CLASS_COPY = 2,
> +	I915_ENGINE_CLASS_VIDEO = 3,
> +	I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
> +	I915_ENGINE_CLASS_MAX /* non-ABI */
> +};
> +
> +/**
> + * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
> + *
> + */
> +
> +enum drm_i915_pmu_engine_sample {
> +	I915_SAMPLE_QUEUED = 0,
> +	I915_SAMPLE_BUSY = 1,
> +	I915_SAMPLE_WAIT = 2,
> +	I915_SAMPLE_SEMA = 3
> +};
> +
> +#define I915_PMU_SAMPLE_BITS (4)
> +#define I915_PMU_SAMPLE_MASK (0xf)
> +#define I915_PMU_SAMPLE_INSTANCE_BITS (8)
> +#define I915_PMU_CLASS_SHIFT \
> +	(I915_PMU_SAMPLE_BITS + I915_PMU_SAMPLE_INSTANCE_BITS)
> +
> +#define __I915_PMU_ENGINE(class, instance, sample) \
> +	((class) << I915_PMU_CLASS_SHIFT | \
> +	(instance) << I915_PMU_SAMPLE_BITS | \
> +	(sample))
> +
> +#define I915_PMU_ENGINE_QUEUED(class, instance) \
> +	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
> +
> +#define I915_PMU_ENGINE_BUSY(class, instance) \
> +	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_BUSY)
> +
> +#define I915_PMU_ENGINE_WAIT(class, instance) \
> +	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_WAIT)
> +
> +#define I915_PMU_ENGINE_SEMA(class, instance) \
> +	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
> +
> +#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
> +
> +#define I915_PMU_ACTUAL_FREQUENCY 	__I915_PMU_OTHER(0)
> +#define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
> +#define I915_PMU_ENERGY			__I915_PMU_OTHER(2)
> +#define I915_PMU_INTERRUPTS		__I915_PMU_OTHER(3)
> +
> +#define I915_PMU_RC6_RESIDENCY		__I915_PMU_OTHER(4)
> +#define I915_PMU_RC6p_RESIDENCY		__I915_PMU_OTHER(5)
> +#define I915_PMU_RC6pp_RESIDENCY	__I915_PMU_OTHER(6)
> +
>   /* Each region is a minimum of 16k, and there are at most 255 of them.
>    */
>   #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 7d34bc16ca1c..3b6eb0131204 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7382,6 +7382,7 @@ int perf_event_overflow(struct perf_event *event,
>   {
>   	return __perf_event_overflow(event, 1, data, regs);
>   }
> +EXPORT_SYMBOL_GPL(perf_event_overflow);
>   
>   /*
>    * Generic software event infrastructure
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-02 13:00   ` Tvrtko Ursulin
@ 2017-08-12  2:15     ` Rogozhkin, Dmitry V
  2017-08-22 18:17       ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-08-12  2:15 UTC (permalink / raw)
  To: peterz, tvrtko.ursulin; +Cc: Intel-gfx

Hi,

Peter, using a chance I would like to remind you that we need your help
clarifying whether we use perm API correctly. 

I am trying i915 PMU as it is implemented in these RFC patches and would
like to share my observations. I did not try it in all possible way yet,
but still think that the following worth mentioning.

'perf' sees events exported via sysfc. For example,
$ perf list
  i915/actual-frequency/                             [Kernel PMU event]
  i915/bcs0-busy/                                    [Kernel PMU event]
  i915/bcs0-queued/                                  [Kernel PMU event]
  i915/bcs0-sema/                                    [Kernel PMU event]
  i915/bcs0-wait/                                    [Kernel PMU event]
  i915/energy/                                       [Kernel PMU event]
  i915/interrupts/                                   [Kernel PMU event]
  i915/rc6-residency/                                [Kernel PMU event]
  i915/rc6p-residency/                               [Kernel PMU event]
  i915/rc6pp-residency/                              [Kernel PMU event]
  i915/rcs0-busy/                                    [Kernel PMU event]
  i915/rcs0-queued/                                  [Kernel PMU event]
  i915/rcs0-sema/                                    [Kernel PMU event]
  i915/rcs0-wait/                                    [Kernel PMU event]
  i915/requested-frequency/                          [Kernel PMU event]
  i915/vcs0-busy/                                    [Kernel PMU event]
  i915/vcs0-queued/                                  [Kernel PMU event]
  i915/vcs0-sema/                                    [Kernel PMU event]
  i915/vcs0-wait/                                    [Kernel PMU event]
  i915/vcs1-busy/                                    [Kernel PMU event]
  i915/vcs1-queued/                                  [Kernel PMU event]
  i915/vcs1-sema/                                    [Kernel PMU event]
  i915/vcs1-wait/                                    [Kernel PMU event]
  i915/vecs0-busy/                                   [Kernel PMU event]
  i915/vecs0-queued/                                 [Kernel PMU event]
  i915/vecs0-sema/                                   [Kernel PMU event]
  i915/vecs0-wait/                                   [Kernel PMU event]

I have tried to use some of the events with 'perf stat' and here I
failed. Result was something like:

$ perf stat -e instructions,i915/rcs0-busy/ workload.sh
<... wrokload.sh output...>

Performance counter stats for 'workload.sh':
     1,204,616,268      instructions
                 0      i915/rcs0-busy/

       1.869169153 seconds time elapsed

As you can see instructions event works pretty well, i915/rcs0-busy/
doesn't.

I afraid that our current understanding of how PMU should work is not
fully correct. I think so, because the way PMU entry points init(),
add(), del(), start(), stop(), read() are implemented do not correlate
with how many times they are called. I have counted them and here is the
result:
init()=19, add()=44310, del()=43900, start()=44534, stop()=0, read()=0

Which means that we are regularly attempt to start/stop timer and/or
busy stats calculations. Another thing which pay attention is that
read() was not called at all. How perf supposes to get counter value?
Yet another thing, where we are supposed to initialize our internal
staff: numbers above are from single run and even init is called
multiple times? Where we are supposed to de-init our staff: each time on
del() - this hardly makes sense?

I should note that if perf will be issued with -I 10 option, then read()
is being called: init_c()=265, add_c()=132726, del_c()=131482,
start_c()=133412, stop()=0, read()=71. However, i915 counter is still 0.
I have tried to print counter values from within read() and these values
are non 0. Actually read() returns sequence of <non_zero>, 0, 0, 0, ...,
<no_zero> because with our add(), del() code we regularly start/stop our
counter and execution in read() follows different branches.

Thus, I think that right now we do not implement PMU correctly and do
not meet perf expectations from the PMU. Unfortunately, right now I have
no idea what are these expectations.

Regards,
Dmitry.


On Wed, 2017-08-02 at 14:00 +0100, Tvrtko Ursulin wrote:
> Hi Peter,
> 
> We are trying to expose some software PMU events from the i915 driver. 
> Things like GPU engine busyness, power usage and similar.
> 
> There is quite a bit of opens regarding usage of the perf API and so on. 
> Would you be able to snoop around the below patch and help with 
> answering some questions?
> 
> Current opens would be something like:
> 
> Are the metrics we are exposing meaningful/appropriate as software PMU?
> 
> Do we need / can we get anything useful by enumerating our events in 
> sysfs? For example can we have "perf stat" or similar do something 
> useful with them today? Or we should just not expose in sysfs and only 
> use then from tools which will understand them?
> 
> Also, once I exported the events for enumeration and tried accessing 
> them with perf, I had to create fake pt_regs to pass to 
> perf_event_overflow.
> 
> Can we control sampling frequency as requested by the perf API? We 
> mostly do not need rapid polling on these counters.
> 
> How is our usage of the perf API in general?
> 
> Regards,
> 
> Tvrtko
> 
> On 02/08/2017 13:32, Tvrtko Ursulin wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > The first goal is to be able to measure GPU (and invidual ring) busyness
> > without having to poll registers from userspace. (Which not only incurs
> > holding the forcewake lock indefinitely, perturbing the system, but also
> > runs the risk of hanging the machine.) As an alternative we can use the
> > perf event counter interface to sample the ring registers periodically
> > and send those results to userspace.
> > 
> > To be able to do so, we need to export the two symbols from
> > kernel/events/core.c to register and unregister a PMU device.
> > 
> > v2: Use a common timer for the ring sampling.
> > 
> > v3:
> >   * Decouple uAPI from i915 engine ids.
> >   * Complete uAPI defines.
> >   * Refactor some code to helpers for clarity.
> >   * Skip sampling disabled engines.
> >   * Expose counters in sysfs.
> >   * Pass in fake regs to avoid null ptr deref in perf core.
> >   * Convert to class/instance uAPI.
> >   * Use shared driver code for rc6 residency, power and frequency.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/Makefile           |   1 +
> >   drivers/gpu/drm/i915/i915_drv.c         |   2 +
> >   drivers/gpu/drm/i915/i915_drv.h         |  25 ++
> >   drivers/gpu/drm/i915/i915_pmu.c         | 629 ++++++++++++++++++++++++++++++++
> >   drivers/gpu/drm/i915/i915_reg.h         |   3 +
> >   drivers/gpu/drm/i915/intel_engine_cs.c  |  10 +
> >   drivers/gpu/drm/i915/intel_ringbuffer.c |  25 ++
> >   drivers/gpu/drm/i915/intel_ringbuffer.h |   8 +
> >   include/uapi/drm/i915_drm.h             |  55 +++
> >   kernel/events/core.c                    |   1 +
> >   10 files changed, 759 insertions(+)
> >   create mode 100644 drivers/gpu/drm/i915/i915_pmu.c
> > 
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index f8227318dcaf..1c720013dc42 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -26,6 +26,7 @@ i915-y := i915_drv.o \
> >   
> >   i915-$(CONFIG_COMPAT)   += i915_ioc32.o
> >   i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o
> > +i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
> >   
> >   # GEM code
> >   i915-y += i915_cmd_parser.o \
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 214555e813f1..d75c2d790875 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1194,6 +1194,7 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
> >   	struct drm_device *dev = &dev_priv->drm;
> >   
> >   	i915_gem_shrinker_init(dev_priv);
> > +	i915_pmu_register(dev_priv);
> >   
> >   	/*
> >   	 * Notify a valid surface after modesetting,
> > @@ -1248,6 +1249,7 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
> >   	intel_opregion_unregister(dev_priv);
> >   
> >   	i915_perf_unregister(dev_priv);
> > +	i915_pmu_unregister(dev_priv);
> >   
> >   	i915_teardown_sysfs(dev_priv);
> >   	i915_guc_log_unregister(dev_priv);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 0c8bd1cdcbbd..142826742b86 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -40,6 +40,7 @@
> >   #include <linux/hash.h>
> >   #include <linux/intel-iommu.h>
> >   #include <linux/kref.h>
> > +#include <linux/perf_event.h>
> >   #include <linux/pm_qos.h>
> >   #include <linux/reservation.h>
> >   #include <linux/shmem_fs.h>
> > @@ -2111,6 +2112,12 @@ struct intel_cdclk_state {
> >   	unsigned int cdclk, vco, ref;
> >   };
> >   
> > +enum {
> > +	__I915_SAMPLE_FREQ_ACT = 0,
> > +	__I915_SAMPLE_FREQ_REQ,
> > +	__I915_NUM_PMU_SAMPLERS
> > +};
> > +
> >   struct drm_i915_private {
> >   	struct drm_device drm;
> >   
> > @@ -2158,6 +2165,7 @@ struct drm_i915_private {
> >   	struct pci_dev *bridge_dev;
> >   	struct i915_gem_context *kernel_context;
> >   	struct intel_engine_cs *engine[I915_NUM_ENGINES];
> > +	struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1][MAX_ENGINE_INSTANCE + 1];
> >   	struct i915_vma *semaphore;
> >   
> >   	struct drm_dma_handle *status_page_dmah;
> > @@ -2605,6 +2613,14 @@ struct drm_i915_private {
> >   		int	irq;
> >   	} lpe_audio;
> >   
> > +	struct {
> > +		struct pmu base;
> > +		spinlock_t lock;
> > +		struct hrtimer timer;
> > +		u64 enable;
> > +		u64 sample[__I915_NUM_PMU_SAMPLERS];
> > +	} pmu;
> > +
> >   	/*
> >   	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
> >   	 * will be rejected. Instead look for a better place.
> > @@ -3813,6 +3829,15 @@ extern void i915_perf_fini(struct drm_i915_private *dev_priv);
> >   extern void i915_perf_register(struct drm_i915_private *dev_priv);
> >   extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
> >   
> > +/* i915_pmu.c */
> > +#ifdef CONFIG_PERF_EVENTS
> > +extern void i915_pmu_register(struct drm_i915_private *i915);
> > +extern void i915_pmu_unregister(struct drm_i915_private *i915);
> > +#else
> > +static inline void i915_pmu_register(struct drm_i915_private *i915) {}
> > +static inline void i915_pmu_unregister(struct drm_i915_private *i915) {}
> > +#endif
> > +
> >   /* i915_suspend.c */
> >   extern int i915_save_state(struct drm_i915_private *dev_priv);
> >   extern int i915_restore_state(struct drm_i915_private *dev_priv);
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> > new file mode 100644
> > index 000000000000..62c527c12641
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > @@ -0,0 +1,629 @@
> > +#include <linux/perf_event.h>
> > +#include <linux/pm_runtime.h>
> > +
> > +#include "i915_drv.h"
> > +#include "intel_ringbuffer.h"
> > +
> > +#define FREQUENCY 200
> > +#define PERIOD max_t(u64, 10000, NSEC_PER_SEC / FREQUENCY)
> > +
> > +#define ENGINE_SAMPLE_MASK \
> > +	(BIT(I915_SAMPLE_QUEUED) | \
> > +	 BIT(I915_SAMPLE_BUSY) | \
> > +	 BIT(I915_SAMPLE_WAIT) | \
> > +	 BIT(I915_SAMPLE_SEMA))
> > +
> > +#define ENGINE_SAMPLE_BITS (16)
> > +
> > +static u8 engine_config_sample(u64 config)
> > +{
> > +	return config & I915_PMU_SAMPLE_MASK;
> > +}
> > +
> > +static u8 engine_event_sample(struct perf_event *event)
> > +{
> > +	return engine_config_sample(event->attr.config);
> > +}
> > +
> > +static u8 engine_event_class(struct perf_event *event)
> > +{
> > +	return (event->attr.config >> I915_PMU_CLASS_SHIFT) & 0xff;
> > +}
> > +
> > +static u8 engine_event_instance(struct perf_event *event)
> > +{
> > +	return (event->attr.config >> I915_PMU_SAMPLE_BITS) & 0xff;
> > +}
> > +
> > +static bool is_engine_config(u64 config)
> > +{
> > +	return config < __I915_PMU_OTHER(0);
> > +}
> > +
> > +static u64 config_enabled_mask(u64 config)
> > +{
> > +	if (is_engine_config(config))
> > +		return BIT_ULL(engine_config_sample(config));
> > +	else
> > +		return BIT_ULL(config - __I915_PMU_OTHER(0)) <<
> > +		       ENGINE_SAMPLE_BITS;
> > +}
> > +
> > +static bool is_engine_event(struct perf_event *event)
> > +{
> > +	return is_engine_config(event->attr.config);
> > +}
> > +
> > +static u64 event_enabled_mask(struct perf_event *event)
> > +{
> > +	return config_enabled_mask(event->attr.config);
> > +}
> > +
> > +static bool grab_forcewake(struct drm_i915_private *i915, bool fw)
> > +{
> > +	if (!fw)
> > +		intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
> > +
> > +	return true;
> > +}
> > +
> > +static void engines_sample(struct drm_i915_private *dev_priv)
> > +{
> > +	struct intel_engine_cs *engine;
> > +	enum intel_engine_id id;
> > +	bool fw = false;
> > +
> > +	if ((dev_priv->pmu.enable & ENGINE_SAMPLE_MASK) == 0)
> > +		return;
> > +
> > +	if (!dev_priv->gt.awake)
> > +		return;
> > +
> > +	if (!intel_runtime_pm_get_if_in_use(dev_priv))
> > +		return;
> > +
> > +	for_each_engine(engine, dev_priv, id) {
> > +		u32 enable = engine->pmu.enable;
> > +
> > +		if (i915_seqno_passed(intel_engine_get_seqno(engine),
> > +				      intel_engine_last_submit(engine)))
> > +			continue;
> > +
> > +		if (enable & BIT(I915_SAMPLE_QUEUED))
> > +			engine->pmu.sample[I915_SAMPLE_QUEUED] += PERIOD;
> > +
> > +		if (enable & BIT(I915_SAMPLE_BUSY)) {
> > +			u32 val;
> > +
> > +			fw = grab_forcewake(dev_priv, fw);
> > +			val = I915_READ_FW(RING_MI_MODE(engine->mmio_base));
> > +			if (!(val & MODE_IDLE))
> > +				engine->pmu.sample[I915_SAMPLE_BUSY] += PERIOD;
> > +		}
> > +
> > +		if (enable & (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA))) {
> > +			u32 val;
> > +
> > +			fw = grab_forcewake(dev_priv, fw);
> > +			val = I915_READ_FW(RING_CTL(engine->mmio_base));
> > +			if ((enable & BIT(I915_SAMPLE_WAIT)) &&
> > +			    (val & RING_WAIT))
> > +				engine->pmu.sample[I915_SAMPLE_WAIT] += PERIOD;
> > +			if ((enable & BIT(I915_SAMPLE_SEMA)) &&
> > +			    (val & RING_WAIT_SEMAPHORE))
> > +				engine->pmu.sample[I915_SAMPLE_SEMA] += PERIOD;
> > +		}
> > +	}
> > +
> > +	if (fw)
> > +		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> > +	intel_runtime_pm_put(dev_priv);
> > +}
> > +
> > +static void frequency_sample(struct drm_i915_private *dev_priv)
> > +{
> > +	if (dev_priv->pmu.enable &
> > +	    config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY)) {
> > +		u64 val;
> > +
> > +		val = dev_priv->rps.cur_freq;
> > +		if (dev_priv->gt.awake &&
> > +		    intel_runtime_pm_get_if_in_use(dev_priv)) {
> > +			val = intel_get_cagf(dev_priv,
> > +					     I915_READ_NOTRACE(GEN6_RPSTAT1));
> > +			intel_runtime_pm_put(dev_priv);
> > +		}
> > +		val = intel_gpu_freq(dev_priv, val);
> > +		dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT] += val * PERIOD;
> > +	}
> > +
> > +	if (dev_priv->pmu.enable &
> > +	    config_enabled_mask(I915_PMU_REQUESTED_FREQUENCY)) {
> > +		u64 val = intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq);
> > +		dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ] += val * PERIOD;
> > +	}
> > +}
> > +
> > +static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
> > +{
> > +	struct drm_i915_private *i915 =
> > +		container_of(hrtimer, struct drm_i915_private, pmu.timer);
> > +
> > +	if (i915->pmu.enable == 0)
> > +		return HRTIMER_NORESTART;
> > +
> > +	engines_sample(i915);
> > +	frequency_sample(i915);
> > +
> > +	hrtimer_forward_now(hrtimer, ns_to_ktime(PERIOD));
> > +	return HRTIMER_RESTART;
> > +}
> > +
> > +static void i915_pmu_event_destroy(struct perf_event *event)
> > +{
> > +	WARN_ON(event->parent);
> > +}
> > +
> > +static int engine_event_init(struct perf_event *event)
> > +{
> > +	struct drm_i915_private *i915 =
> > +		container_of(event->pmu, typeof(*i915), pmu.base);
> > +
> > +	if (!intel_engine_lookup_user(i915, engine_event_class(event),
> > +				      engine_event_instance(event)))
> > +		return -ENODEV;
> > +
> > +	switch (engine_event_sample(event)) {
> > +	case I915_SAMPLE_QUEUED:
> > +	case I915_SAMPLE_BUSY:
> > +	case I915_SAMPLE_WAIT:
> > +		break;
> > +	case I915_SAMPLE_SEMA:
> > +		if (INTEL_GEN(i915) < 6)
> > +			return -ENODEV;
> > +		break;
> > +	default:
> > +		return -ENOENT;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static DEFINE_PER_CPU(struct pt_regs, i915_pmu_pt_regs);
> > +
> > +static enum hrtimer_restart hrtimer_sample(struct hrtimer *hrtimer)
> > +{
> > +	struct pt_regs *regs = this_cpu_ptr(&i915_pmu_pt_regs);
> > +	struct perf_sample_data data;
> > +	struct perf_event *event;
> > +	u64 period;
> > +
> > +	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
> > +	if (event->state != PERF_EVENT_STATE_ACTIVE)
> > +		return HRTIMER_NORESTART;
> > +
> > +	event->pmu->read(event);
> > +
> > +	perf_sample_data_init(&data, 0, event->hw.last_period);
> > +	perf_event_overflow(event, &data, regs);
> > +
> > +	period = max_t(u64, 10000, event->hw.sample_period);
> > +	hrtimer_forward_now(hrtimer, ns_to_ktime(period));
> > +	return HRTIMER_RESTART;
> > +}
> > +
> > +static void init_hrtimer(struct perf_event *event)
> > +{
> > +	struct hw_perf_event *hwc = &event->hw;
> > +
> > +	if (!is_sampling_event(event))
> > +		return;
> > +
> > +	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> > +	hwc->hrtimer.function = hrtimer_sample;
> > +
> > +	if (event->attr.freq) {
> > +		long freq = event->attr.sample_freq;
> > +
> > +		event->attr.sample_period = NSEC_PER_SEC / freq;
> > +		hwc->sample_period = event->attr.sample_period;
> > +		local64_set(&hwc->period_left, hwc->sample_period);
> > +		hwc->last_period = hwc->sample_period;
> > +		event->attr.freq = 0;
> > +	}
> > +}
> > +
> > +static int i915_pmu_event_init(struct perf_event *event)
> > +{
> > +	struct drm_i915_private *i915 =
> > +		container_of(event->pmu, typeof(*i915), pmu.base);
> > +	int ret;
> > +
> > +	/* XXX ideally only want pid == -1 && cpu == -1 */
> > +
> > +	if (event->attr.type != event->pmu->type)
> > +		return -ENOENT;
> > +
> > +	if (has_branch_stack(event))
> > +		return -EOPNOTSUPP;
> > +
> > +	ret = 0;
> > +	if (is_engine_event(event)) {
> > +		ret = engine_event_init(event);
> > +	} else switch (event->attr.config) {
> > +	case I915_PMU_ACTUAL_FREQUENCY:
> > +		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
> > +			ret = -ENODEV; /* requires a mutex for sampling! */
> > +	case I915_PMU_REQUESTED_FREQUENCY:
> > +	case I915_PMU_ENERGY:
> > +	case I915_PMU_RC6_RESIDENCY:
> > +	case I915_PMU_RC6p_RESIDENCY:
> > +	case I915_PMU_RC6pp_RESIDENCY:
> > +		if (INTEL_GEN(i915) < 6)
> > +			ret = -ENODEV;
> > +		break;
> > +	}
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (!event->parent)
> > +		event->destroy = i915_pmu_event_destroy;
> > +
> > +	init_hrtimer(event);
> > +
> > +	return 0;
> > +}
> > +
> > +static void i915_pmu_timer_start(struct perf_event *event)
> > +{
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	s64 period;
> > +
> > +	if (!is_sampling_event(event))
> > +		return;
> > +
> > +	period = local64_read(&hwc->period_left);
> > +	if (period) {
> > +		if (period < 0)
> > +			period = 10000;
> > +
> > +		local64_set(&hwc->period_left, 0);
> > +	} else {
> > +		period = max_t(u64, 10000, hwc->sample_period);
> > +	}
> > +
> > +	hrtimer_start_range_ns(&hwc->hrtimer,
> > +			       ns_to_ktime(period), 0,
> > +			       HRTIMER_MODE_REL_PINNED);
> > +}
> > +
> > +static void i915_pmu_timer_cancel(struct perf_event *event)
> > +{
> > +	struct hw_perf_event *hwc = &event->hw;
> > +
> > +	if (!is_sampling_event(event))
> > +		return;
> > +
> > +	local64_set(&hwc->period_left,
> > +		    ktime_to_ns(hrtimer_get_remaining(&hwc->hrtimer)));
> > +	hrtimer_cancel(&hwc->hrtimer);
> > +}
> > +
> > +static void i915_pmu_enable(struct perf_event *event)
> > +{
> > +	struct drm_i915_private *i915 =
> > +		container_of(event->pmu, typeof(*i915), pmu.base);
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&i915->pmu.lock, flags);
> > +
> > +	if (i915->pmu.enable == 0)
> > +		hrtimer_start_range_ns(&i915->pmu.timer,
> > +				       ns_to_ktime(PERIOD), 0,
> > +				       HRTIMER_MODE_REL_PINNED);
> > +
> > +	i915->pmu.enable |= event_enabled_mask(event);
> > +
> > +	if (is_engine_event(event)) {
> > +		struct intel_engine_cs *engine;
> > +
> > +		engine = intel_engine_lookup_user(i915,
> > +						  engine_event_class(event),
> > +						  engine_event_instance(event));
> > +		GEM_BUG_ON(!engine);
> > +		engine->pmu.enable |= BIT(engine_event_sample(event));
> > +	}
> > +
> > +	spin_unlock_irqrestore(&i915->pmu.lock, flags);
> > +
> > +	i915_pmu_timer_start(event);
> > +}
> > +
> > +static void i915_pmu_disable(struct perf_event *event)
> > +{
> > +	struct drm_i915_private *i915 =
> > +		container_of(event->pmu, typeof(*i915), pmu.base);
> > +	unsigned long flags;
> > +	u64 mask;
> > +
> > +	spin_lock_irqsave(&i915->pmu.lock, flags);
> > +
> > +	if (is_engine_event(event)) {
> > +		struct intel_engine_cs *engine;
> > +		enum intel_engine_id id;
> > +
> > +		engine = intel_engine_lookup_user(i915,
> > +						  engine_event_class(event),
> > +						  engine_event_instance(event));
> > +		GEM_BUG_ON(!engine);
> > +		engine->pmu.enable &= ~BIT(engine_event_sample(event));
> > +		mask = 0;
> > +		for_each_engine(engine, i915, id)
> > +			mask |= engine->pmu.enable;
> > +		mask = ~mask;
> > +	} else {
> > +		mask = event_enabled_mask(event);
> > +	}
> > +
> > +	i915->pmu.enable &= ~mask;
> > +
> > +	spin_unlock_irqrestore(&i915->pmu.lock, flags);
> > +
> > +	i915_pmu_timer_cancel(event);
> > +}
> > +
> > +static int i915_pmu_event_add(struct perf_event *event, int flags)
> > +{
> > +	struct hw_perf_event *hwc = &event->hw;
> > +
> > +	if (flags & PERF_EF_START)
> > +		i915_pmu_enable(event);
> > +
> > +	hwc->state = !(flags & PERF_EF_START);
> > +
> > +	return 0;
> > +}
> > +
> > +static void i915_pmu_event_del(struct perf_event *event, int flags)
> > +{
> > +	i915_pmu_disable(event);
> > +}
> > +
> > +static void i915_pmu_event_start(struct perf_event *event, int flags)
> > +{
> > +	i915_pmu_enable(event);
> > +}
> > +
> > +static void i915_pmu_event_stop(struct perf_event *event, int flags)
> > +{
> > +	i915_pmu_disable(event);
> > +}
> > +
> > +static u64 count_interrupts(struct drm_i915_private *i915)
> > +{
> > +	/* open-coded kstat_irqs() */
> > +	struct irq_desc *desc = irq_to_desc(i915->drm.pdev->irq);
> > +	u64 sum = 0;
> > +	int cpu;
> > +
> > +	if (!desc || !desc->kstat_irqs)
> > +		return 0;
> > +
> > +	for_each_possible_cpu(cpu)
> > +		sum += *per_cpu_ptr(desc->kstat_irqs, cpu);
> > +
> > +	return sum;
> > +}
> > +
> > +static void i915_pmu_event_read(struct perf_event *event)
> > +{
> > +	struct drm_i915_private *i915 =
> > +		container_of(event->pmu, typeof(*i915), pmu.base);
> > +	u64 val = 0;
> > +
> > +	if (is_engine_event(event)) {
> > +		u8 sample = engine_event_sample(event);
> > +		struct intel_engine_cs *engine;
> > +
> > +		engine = intel_engine_lookup_user(i915,
> > +						  engine_event_class(event),
> > +						  engine_event_instance(event));
> > +
> > +		if (WARN_ON_ONCE(!engine)) {
> > +			/* Do nothing */
> > +		} else {
> > +			val = engine->pmu.sample[sample];
> > +		}
> > +	} else switch (event->attr.config) {
> > +	case I915_PMU_ACTUAL_FREQUENCY:
> > +		val = i915->pmu.sample[__I915_SAMPLE_FREQ_ACT];
> > +		break;
> > +	case I915_PMU_REQUESTED_FREQUENCY:
> > +		val = i915->pmu.sample[__I915_SAMPLE_FREQ_REQ];
> > +		break;
> > +	case I915_PMU_ENERGY:
> > +		val = intel_energy_uJ(i915);
> > +		break;
> > +	case I915_PMU_INTERRUPTS:
> > +		val = count_interrupts(i915);
> > +		break;
> > +
> > +	case I915_PMU_RC6_RESIDENCY:
> > +		if (!i915->gt.awake)
> > +			return;
> > +
> > +		val = intel_rc6_residency_ns(i915,
> > +					     IS_VALLEYVIEW(i915) ?
> > +					     VLV_GT_RENDER_RC6 :
> > +					     GEN6_GT_GFX_RC6);
> > +		break;
> > +
> > +	case I915_PMU_RC6p_RESIDENCY:
> > +		if (!i915->gt.awake)
> > +			return;
> > +
> > +		if (!IS_VALLEYVIEW(i915))
> > +			val = intel_rc6_residency_ns(i915, GEN6_GT_GFX_RC6p);
> > +		break;
> > +
> > +	case I915_PMU_RC6pp_RESIDENCY:
> > +		if (!i915->gt.awake)
> > +			return;
> > +
> > +		if (!IS_VALLEYVIEW(i915))
> > +			val = intel_rc6_residency_ns(i915, GEN6_GT_GFX_RC6pp);
> > +		break;
> > +	}
> > +
> > +	local64_set(&event->count, val);
> > +}
> > +
> > +static int i915_pmu_event_event_idx(struct perf_event *event)
> > +{
> > +	return 0;
> > +}
> > +
> > +static ssize_t i915_pmu_format_show(struct device *dev,
> > +				    struct device_attribute *attr, char *buf)
> > +{
> > +        struct dev_ext_attribute *eattr;
> > +
> > +        eattr = container_of(attr, struct dev_ext_attribute, attr);
> > +        return sprintf(buf, "%s\n", (char *) eattr->var);
> > +}
> > +
> > +#define I915_PMU_FORMAT_ATTR(_name, _config)           \
> > +        (&((struct dev_ext_attribute[]) {               \
> > +                { .attr = __ATTR(_name, S_IRUGO, i915_pmu_format_show, NULL), \
> > +                  .var = (void *) _config, }            \
> > +        })[0].attr.attr)
> > +
> > +static struct attribute *i915_pmu_format_attrs[] = {
> > +        I915_PMU_FORMAT_ATTR(i915_eventid, "config:0-42"),
> > +        NULL,
> > +};
> > +
> > +static const struct attribute_group i915_pmu_format_attr_group = {
> > +        .name = "format",
> > +        .attrs = i915_pmu_format_attrs,
> > +};
> > +
> > +static ssize_t i915_pmu_event_show(struct device *dev,
> > +				   struct device_attribute *attr, char *buf)
> > +{
> > +        struct dev_ext_attribute *eattr;
> > +
> > +        eattr = container_of(attr, struct dev_ext_attribute, attr);
> > +        return sprintf(buf, "config=0x%lx\n", (unsigned long) eattr->var);
> > +}
> > +
> > +#define I915_PMU_EVENT_ATTR(_name, _config)            \
> > +        (&((struct dev_ext_attribute[]) {               \
> > +                { .attr = __ATTR(_name, S_IRUGO, i915_pmu_event_show, NULL), \
> > +                  .var = (void *) _config, }            \
> > +         })[0].attr.attr)
> > +
> > +static struct attribute *i915_pmu_events_attrs[] = {
> > +	I915_PMU_EVENT_ATTR(rcs0-queued,
> > +			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_RENDER, 0)),
> > +	I915_PMU_EVENT_ATTR(rcs0-busy,
> > +			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0)),
> > +	I915_PMU_EVENT_ATTR(rcs0-wait,
> > +			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_RENDER, 0)),
> > +	I915_PMU_EVENT_ATTR(rcs0-sema,
> > +			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_RENDER, 0)),
> > +
> > +	I915_PMU_EVENT_ATTR(bcs0-queued,
> > +			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_COPY, 0)),
> > +	I915_PMU_EVENT_ATTR(bcs0-busy,
> > +			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_COPY, 0)),
> > +	I915_PMU_EVENT_ATTR(bcs0-wait,
> > +			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_COPY, 0)),
> > +	I915_PMU_EVENT_ATTR(bcs0-sema,
> > +			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_COPY, 0)),
> > +
> > +	I915_PMU_EVENT_ATTR(vcs0-queued,
> > +			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_VIDEO, 0)),
> > +	I915_PMU_EVENT_ATTR(vcs0-busy,
> > +			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO, 0)),
> > +	I915_PMU_EVENT_ATTR(vcs0-wait,
> > +			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_VIDEO, 0)),
> > +	I915_PMU_EVENT_ATTR(vcs0-sema,
> > +			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_VIDEO, 0)),
> > +
> > +	I915_PMU_EVENT_ATTR(vcs1-queued,
> > +			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_VIDEO, 1)),
> > +	I915_PMU_EVENT_ATTR(vcs1-busy,
> > +			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO, 1)),
> > +	I915_PMU_EVENT_ATTR(vcs1-wait,
> > +			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_VIDEO, 1)),
> > +	I915_PMU_EVENT_ATTR(vcs1-sema,
> > +			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_VIDEO, 1)),
> > +
> > +	I915_PMU_EVENT_ATTR(vecs0-queued,
> > +			    I915_PMU_ENGINE_QUEUED(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
> > +	I915_PMU_EVENT_ATTR(vecs0-busy,
> > +			    I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
> > +	I915_PMU_EVENT_ATTR(vecs0-wait,
> > +			    I915_PMU_ENGINE_WAIT(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
> > +	I915_PMU_EVENT_ATTR(vecs0-sema,
> > +			    I915_PMU_ENGINE_SEMA(I915_ENGINE_CLASS_VIDEO_ENHANCE, 0)),
> > +
> > +        I915_PMU_EVENT_ATTR(actual-frequency,	 I915_PMU_ACTUAL_FREQUENCY),
> > +        I915_PMU_EVENT_ATTR(requested-frequency, I915_PMU_REQUESTED_FREQUENCY),
> > +        I915_PMU_EVENT_ATTR(energy,		 I915_PMU_ENERGY),
> > +        I915_PMU_EVENT_ATTR(interrupts,		 I915_PMU_INTERRUPTS),
> > +        I915_PMU_EVENT_ATTR(rc6-residency,	 I915_PMU_RC6_RESIDENCY),
> > +        I915_PMU_EVENT_ATTR(rc6p-residency,	 I915_PMU_RC6p_RESIDENCY),
> > +        I915_PMU_EVENT_ATTR(rc6pp-residency,	 I915_PMU_RC6pp_RESIDENCY),
> > +
> > +        NULL,
> > +};
> > +
> > +static const struct attribute_group i915_pmu_events_attr_group = {
> > +        .name = "events",
> > +        .attrs = i915_pmu_events_attrs,
> > +};
> > +
> > +static const struct attribute_group *i915_pmu_attr_groups[] = {
> > +        &i915_pmu_format_attr_group,
> > +        &i915_pmu_events_attr_group,
> > +        NULL
> > +};
> > +
> > +void i915_pmu_register(struct drm_i915_private *i915)
> > +{
> > +	if (INTEL_GEN(i915) <= 2)
> > +		return;
> > +
> > +	i915->pmu.base.attr_groups	= i915_pmu_attr_groups;
> > +	i915->pmu.base.task_ctx_nr	= perf_sw_context;
> > +	i915->pmu.base.event_init	= i915_pmu_event_init;
> > +	i915->pmu.base.add		= i915_pmu_event_add;
> > +	i915->pmu.base.del		= i915_pmu_event_del;
> > +	i915->pmu.base.start		= i915_pmu_event_start;
> > +	i915->pmu.base.stop		= i915_pmu_event_stop;
> > +	i915->pmu.base.read		= i915_pmu_event_read;
> > +	i915->pmu.base.event_idx	= i915_pmu_event_event_idx;
> > +
> > +	spin_lock_init(&i915->pmu.lock);
> > +	hrtimer_init(&i915->pmu.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> > +	i915->pmu.timer.function = i915_sample;
> > +	i915->pmu.enable = 0;
> > +
> > +	if (perf_pmu_register(&i915->pmu.base, "i915", -1))
> > +		i915->pmu.base.event_init = NULL;
> > +}
> > +
> > +void i915_pmu_unregister(struct drm_i915_private *i915)
> > +{
> > +	if (!i915->pmu.base.event_init)
> > +		return;
> > +
> > +	i915->pmu.enable = 0;
> > +
> > +	perf_pmu_unregister(&i915->pmu.base);
> > +	i915->pmu.base.event_init = NULL;
> > +
> > +	hrtimer_cancel(&i915->pmu.timer);
> > +}
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 1dc7e7a2a23b..26bce766ab51 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -95,6 +95,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
> >   #define VIDEO_ENHANCEMENT_CLASS	2
> >   #define COPY_ENGINE_CLASS	3
> >   #define OTHER_CLASS		4
> > +#define MAX_ENGINE_CLASS	4
> > +
> > +#define MAX_ENGINE_INSTANCE    1
> >   
> >   /* PCI config space */
> >   
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 9ab596941372..14630612325b 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -198,6 +198,15 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
> >   	GEM_BUG_ON(info->class >= ARRAY_SIZE(intel_engine_classes));
> >   	class_info = &intel_engine_classes[info->class];
> >   
> > +	if (GEM_WARN_ON(info->class > MAX_ENGINE_CLASS))
> > +		return -EINVAL;
> > +
> > +	if (GEM_WARN_ON(info->instance > MAX_ENGINE_INSTANCE))
> > +		return -EINVAL;
> > +
> > +	if (GEM_WARN_ON(dev_priv->engine_class[info->class][info->instance]))
> > +		return -EINVAL;
> > +
> >   	GEM_BUG_ON(dev_priv->engine[id]);
> >   	engine = kzalloc(sizeof(*engine), GFP_KERNEL);
> >   	if (!engine)
> > @@ -225,6 +234,7 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
> >   
> >   	ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
> >   
> > +	dev_priv->engine_class[info->class][info->instance] = engine;
> >   	dev_priv->engine[id] = engine;
> >   	return 0;
> >   }
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index cdf084ef5aae..af8d85794c44 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -2282,3 +2282,28 @@ int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine)
> >   
> >   	return intel_init_ring_buffer(engine);
> >   }
> > +
> > +static u8 user_class_map[I915_ENGINE_CLASS_MAX] = {
> > +	[I915_ENGINE_CLASS_OTHER] = OTHER_CLASS,
> > +	[I915_ENGINE_CLASS_RENDER] = RENDER_CLASS,
> > +	[I915_ENGINE_CLASS_COPY] = COPY_ENGINE_CLASS,
> > +	[I915_ENGINE_CLASS_VIDEO] = VIDEO_DECODE_CLASS,
> > +	[I915_ENGINE_CLASS_VIDEO_ENHANCE] = VIDEO_ENHANCEMENT_CLASS,
> > +};
> > +
> > +struct intel_engine_cs *
> > +intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
> > +{
> > +	if (class >= ARRAY_SIZE(user_class_map))
> > +		return NULL;
> > +
> > +	class = user_class_map[class];
> > +
> > +	if (WARN_ON_ONCE(class > MAX_ENGINE_CLASS))
> > +		return NULL;
> > +
> > +	if (instance > MAX_ENGINE_INSTANCE)
> > +		return NULL;
> > +
> > +	return i915->engine_class[class][instance];
> > +}
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index d33c93444c0d..9fdf0cdf6220 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -245,6 +245,11 @@ struct intel_engine_cs {
> >   		I915_SELFTEST_DECLARE(bool mock : 1);
> >   	} breadcrumbs;
> >   
> > +	struct {
> > +		u32 enable;
> > +		u64 sample[4];
> > +	} pmu;
> > +
> >   	/*
> >   	 * A pool of objects to use as shadow copies of client batch buffers
> >   	 * when the command parser is enabled. Prevents the client from
> > @@ -735,4 +740,7 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv);
> >   void intel_engines_mark_idle(struct drm_i915_private *i915);
> >   void intel_engines_reset_default_submission(struct drm_i915_private *i915);
> >   
> > +struct intel_engine_cs *
> > +intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance);
> > +
> >   #endif /* _INTEL_RINGBUFFER_H_ */
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 7ccbd6a2bbe0..103874476a6d 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -86,6 +86,61 @@ enum i915_mocs_table_index {
> >   	I915_MOCS_CACHED,
> >   };
> >   
> > +enum drm_i915_gem_engine_class {
> > +	I915_ENGINE_CLASS_OTHER = 0,
> > +	I915_ENGINE_CLASS_RENDER = 1,
> > +	I915_ENGINE_CLASS_COPY = 2,
> > +	I915_ENGINE_CLASS_VIDEO = 3,
> > +	I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
> > +	I915_ENGINE_CLASS_MAX /* non-ABI */
> > +};
> > +
> > +/**
> > + * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
> > + *
> > + */
> > +
> > +enum drm_i915_pmu_engine_sample {
> > +	I915_SAMPLE_QUEUED = 0,
> > +	I915_SAMPLE_BUSY = 1,
> > +	I915_SAMPLE_WAIT = 2,
> > +	I915_SAMPLE_SEMA = 3
> > +};
> > +
> > +#define I915_PMU_SAMPLE_BITS (4)
> > +#define I915_PMU_SAMPLE_MASK (0xf)
> > +#define I915_PMU_SAMPLE_INSTANCE_BITS (8)
> > +#define I915_PMU_CLASS_SHIFT \
> > +	(I915_PMU_SAMPLE_BITS + I915_PMU_SAMPLE_INSTANCE_BITS)
> > +
> > +#define __I915_PMU_ENGINE(class, instance, sample) \
> > +	((class) << I915_PMU_CLASS_SHIFT | \
> > +	(instance) << I915_PMU_SAMPLE_BITS | \
> > +	(sample))
> > +
> > +#define I915_PMU_ENGINE_QUEUED(class, instance) \
> > +	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
> > +
> > +#define I915_PMU_ENGINE_BUSY(class, instance) \
> > +	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_BUSY)
> > +
> > +#define I915_PMU_ENGINE_WAIT(class, instance) \
> > +	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_WAIT)
> > +
> > +#define I915_PMU_ENGINE_SEMA(class, instance) \
> > +	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
> > +
> > +#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
> > +
> > +#define I915_PMU_ACTUAL_FREQUENCY 	__I915_PMU_OTHER(0)
> > +#define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
> > +#define I915_PMU_ENERGY			__I915_PMU_OTHER(2)
> > +#define I915_PMU_INTERRUPTS		__I915_PMU_OTHER(3)
> > +
> > +#define I915_PMU_RC6_RESIDENCY		__I915_PMU_OTHER(4)
> > +#define I915_PMU_RC6p_RESIDENCY		__I915_PMU_OTHER(5)
> > +#define I915_PMU_RC6pp_RESIDENCY	__I915_PMU_OTHER(6)
> > +
> >   /* Each region is a minimum of 16k, and there are at most 255 of them.
> >    */
> >   #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 7d34bc16ca1c..3b6eb0131204 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -7382,6 +7382,7 @@ int perf_event_overflow(struct perf_event *event,
> >   {
> >   	return __perf_event_overflow(event, 1, data, regs);
> >   }
> > +EXPORT_SYMBOL_GPL(perf_event_overflow);
> >   
> >   /*
> >    * Generic software event infrastructure
> > 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-12  2:15     ` Rogozhkin, Dmitry V
@ 2017-08-22 18:17       ` Peter Zijlstra
  2017-08-23 17:51         ` Rogozhkin, Dmitry V
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-08-22 18:17 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V; +Cc: Intel-gfx

On Sat, Aug 12, 2017 at 02:15:13AM +0000, Rogozhkin, Dmitry V wrote:
> $ perf stat -e instructions,i915/rcs0-busy/ workload.sh
> <... wrokload.sh output...>
> 
> Performance counter stats for 'workload.sh':
>      1,204,616,268      instructions
>                  0      i915/rcs0-busy/
> 
>        1.869169153 seconds time elapsed
> 
> As you can see instructions event works pretty well, i915/rcs0-busy/
> doesn't.
> 
> I afraid that our current understanding of how PMU should work is not
> fully correct.

Can we start off by explaining to me how this i915 stuff works. Because
all I have is ~750 lines of patch without comments. Which sort of leaves
me confused.

The above command tries to add an event 'i915/rcs0-busy/' to a task. How
are i915 resource associated to any one particular task?

Is there a unique i915 resource for each task? If not, I don't see how
per-task event can ever work as expected.

> I think so, because the way PMU entry points init(),
> add(), del(), start(), stop(), read() are implemented do not correlate
> with how many times they are called. I have counted them and here is the
> result:
> init()=19, add()=44310, del()=43900, start()=44534, stop()=0, read()=0
> 
> Which means that we are regularly attempt to start/stop timer and/or
> busy stats calculations. Another thing which pay attention is that
> read() was not called at all. How perf supposes to get counter value?

Both stop() and del() are supposed to update event->count. Only if we do
sys_read() while the event is active (something perf-stat never does
IIRC) will it issue pmu::read() to get an up-to-date number.

> Yet another thing, where we are supposed to initialize our internal
> staff: numbers above are from single run and even init is called
> multiple times? Where we are supposed to de-init our staff: each time on
> del() - this hardly makes sense?

init happens in pmu::event_init(), that can set an optional
event->destroy() function for de-init.

init() is called once for each event created, the above creates an
inherited per-task event (I think, I lost track of what perf tool does)
and 19 seems to suggest you did some 18 fork()/clone() calls after that,
resulting in your 1 parent event with 18 children.

> I should note that if perf will be issued with -I 10 option, then read()
> is being called: init_c()=265, add_c()=132726, del_c()=131482,
> start_c()=133412, stop()=0, read()=71. However, i915 counter is still 0.
> I have tried to print counter values from within read() and these values
> are non 0. Actually read() returns sequence of <non_zero>, 0, 0, 0, ...,
> <no_zero> because with our add(), del() code we regularly start/stop our
> counter and execution in read() follows different branches.
> 
> Thus, I think that right now we do not implement PMU correctly and do
> not meet perf expectations from the PMU. Unfortunately, right now I have
> no idea what are these expectations.

Please as to clarify how i915 works, I have no idea where to go.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-22 18:17       ` Peter Zijlstra
@ 2017-08-23 17:51         ` Rogozhkin, Dmitry V
  2017-08-23 18:01           ` Peter Zijlstra
                             ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-08-23 17:51 UTC (permalink / raw)
  To: peterz; +Cc: Intel-gfx

On Tue, 2017-08-22 at 20:17 +0200, Peter Zijlstra wrote:
> On Sat, Aug 12, 2017 at 02:15:13AM +0000, Rogozhkin, Dmitry V wrote:
> > $ perf stat -e instructions,i915/rcs0-busy/ workload.sh
> > <... wrokload.sh output...>
> > 
> > Performance counter stats for 'workload.sh':
> >      1,204,616,268      instructions
> >                  0      i915/rcs0-busy/
> > 
> >        1.869169153 seconds time elapsed
> > 
> > As you can see instructions event works pretty well, i915/rcs0-busy/
> > doesn't.
> > 
> > I afraid that our current understanding of how PMU should work is not
> > fully correct.
> 
> Can we start off by explaining to me how this i915 stuff works. Because
> all I have is ~750 lines of patch without comments. Which sort of leaves
> me confused.

i915 PMU tries to expose a number of SW metrics to characterize i915
performance: i915 interrupt counts, time (counted in microseconds) GPU
spent in specific states like RC6, time  (counted in microseconds) GPU
engines were busy executing some tasks or were idle, etc. As for now
these metrics are i915 global and can't be attached to some particular
task, however, we may consider such metrics in the future. I think that
global PMU is the nice first step. I had a talk with Andi Kleen who
pointed that this is similar to how uncore PMU behaves and suggested I
should use 'perf stat -e <event> -a -C0' and probably adjust the code to
register i915 PMU with the perf_invalid_context. Which I did in the
patch which probably should be squashed to the patch from Tvrtko (I did
not want to intersect with his job). This change can be found here:
https://patchwork.freedesktop.org/patch/171953/. This patch makes 'perf
stat -e i915/rcs0-busy/' to error out and supports 'perf stat -e
i915/rcs0-busy/ -a -C0'. I still think I miss something since 'perf stat
-e i915/rcs0-busy/ -a' will give metrics multiplied by number of active
CPUs, but that's look closer to what is needed.

Anyhow, returning to the metrics i915 exposes. Some metrics are just
exposure of some counters supported already inside i915 PMU which do not
require any special sampling: at any given moment you can request the
counter value (these are interrupts counts, i915 power consumption).
Other metrics are similar to the ever-existing which I just described,
but they require activation for i915 to start to count them - this is
done on the event initialization (these are engine busy stats). Finally,
there is a third group which require sampling counting: they are needed
to be initialized and i915 pmu starts an internal timer to count these
values (these are some engines characteristics referenced in the code as
QUEUED, SEMA, WAIT).

I hope this clarifies. As well I hope I correctly covered high level
description of i915 PMU. I hope Chris will correct me if I was wrong
somewhere. Mind that another author of i915 PMU Tvrtko is for one more
week or so on vacation.


> 
> The above command tries to add an event 'i915/rcs0-busy/' to a task. How
> are i915 resource associated to any one particular task?

Currently in no way, they are global.

> 
> Is there a unique i915 resource for each task? If not, I don't see how
> per-task event can ever work as expected.

This depends on what you mean under "expected"? I see 2 variants:
1. Either this command line should return correct metric values
2. This command line should error out
Right now i915 PMU produces global metric, thus, I think 2nd variant is
true. Which I think is achievable if PMU is registered with
perf_invalid_context.

> 
> > I think so, because the way PMU entry points init(),
> > add(), del(), start(), stop(), read() are implemented do not correlate
> > with how many times they are called. I have counted them and here is the
> > result:
> > init()=19, add()=44310, del()=43900, start()=44534, stop()=0, read()=0
> > 
> > Which means that we are regularly attempt to start/stop timer and/or
> > busy stats calculations. Another thing which pay attention is that
> > read() was not called at all. How perf supposes to get counter value?
> 
> Both stop() and del() are supposed to update event->count. Only if we do
> sys_read() while the event is active (something perf-stat never does
> IIRC) will it issue pmu::read() to get an up-to-date number.
> 
> > Yet another thing, where we are supposed to initialize our internal
> > staff: numbers above are from single run and even init is called
> > multiple times? Where we are supposed to de-init our staff: each time on
> > del() - this hardly makes sense?
> 
> init happens in pmu::event_init(), that can set an optional
> event->destroy() function for de-init.
> 
> init() is called once for each event created, the above creates an
> inherited per-task event (I think, I lost track of what perf tool does)
> and 19 seems to suggest you did some 18 fork()/clone() calls after that,
> resulting in your 1 parent event with 18 children.

This behavior changed significantly once I have registered i915 PMU with
the perf_invalid_context notifying perf subsystem that out PMU produces
global metrics. Right now what I have is:
1. If I call "perf stat -e <event> -a -C0", then event is initialized
once
2. If I call "perf stat -e <event> -a", then event is initialized as
many times as there are logical CPUs. On my system there were 8.

And there is another problem associated with the "perf stat -e <event>
-a": since 8 events are initialized, I get metric values multiplied by
8. How I am supposed to fix that? I suspect that this is somehow related
to cpumask attribute exposed by some other PMUs, for example uncore.
They are doing something like:

static cpumask_t uncore_cpu_mask;

static DEVICE_ATTR(cpumask, S_IRUGO, uncore_get_attr_cpumask, NULL);

static struct attribute *uncore_pmu_attrs[] = {
	&dev_attr_cpumask.attr,
	NULL,
};

static struct attribute_group uncore_pmu_attr_group = {
	.attrs = uncore_pmu_attrs,
};

Should i915 follow this way as well?

> 
> > I should note that if perf will be issued with -I 10 option, then read()
> > is being called: init_c()=265, add_c()=132726, del_c()=131482,
> > start_c()=133412, stop()=0, read()=71. However, i915 counter is still 0.
> > I have tried to print counter values from within read() and these values
> > are non 0. Actually read() returns sequence of <non_zero>, 0, 0, 0, ...,
> > <no_zero> because with our add(), del() code we regularly start/stop our
> > counter and execution in read() follows different branches.
> > 
> > Thus, I think that right now we do not implement PMU correctly and do
> > not meet perf expectations from the PMU. Unfortunately, right now I have
> > no idea what are these expectations.
> 
> Please as to clarify how i915 works, I have no idea where to go.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-23 17:51         ` Rogozhkin, Dmitry V
@ 2017-08-23 18:01           ` Peter Zijlstra
  2017-08-23 18:40             ` Rogozhkin, Dmitry V
  2017-08-23 18:04           ` Peter Zijlstra
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-08-23 18:01 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V; +Cc: Intel-gfx

On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:

> https://patchwork.freedesktop.org/patch/171953/. This patch makes 'perf
> stat -e i915/rcs0-busy/' to error out and supports 'perf stat -e
> i915/rcs0-busy/ -a -C0'. I still think I miss something since 'perf stat
> -e i915/rcs0-busy/ -a' will give metrics multiplied by number of active
> CPUs, but that's look closer to what is needed.

IIRC an uncore PMU should expose a cpumask through sysfs, and then perf
tools will read that mask and auto-magically limit the number of CPUs it
instantiates the counter on.

See for instance arch/x86/events/intel/cstate.c, that is a fairly
trivial uncore PMU, look for "DEVICE_ATTR(cpumask".

For example, on my 2 socket 10 core ivb-ep:

root@ivb-ep:~# cat /sys/bus/event_source/devices/cstate_pkg/cpumask
0,10

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-23 17:51         ` Rogozhkin, Dmitry V
  2017-08-23 18:01           ` Peter Zijlstra
@ 2017-08-23 18:04           ` Peter Zijlstra
  2017-08-23 18:38             ` Rogozhkin, Dmitry V
  2017-08-23 18:05           ` Peter Zijlstra
  2017-08-23 18:22           ` Peter Zijlstra
  3 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-08-23 18:04 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V; +Cc: Intel-gfx

On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:

> > The above command tries to add an event 'i915/rcs0-busy/' to a task. How
> > are i915 resource associated to any one particular task?
> 
> Currently in no way, they are global.

Right. So no per DRM context things. Can you have multiple DRM contexts
per task? If so that would make things slightly tricky when you get away
from !global counters.

> > Is there a unique i915 resource for each task? If not, I don't see how
> > per-task event can ever work as expected.
> 
> This depends on what you mean under "expected"? I see 2 variants:
> 1. Either this command line should return correct metric values
> 2. This command line should error out

> Right now i915 PMU produces global metric, thus, I think 2nd variant is
> true. Which I think is achievable if PMU is registered with
> perf_invalid_context.

Agreed, and yes, perf_invalid_context is right for uncore stuff and will
result in refusal to create per-task counters.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-23 17:51         ` Rogozhkin, Dmitry V
  2017-08-23 18:01           ` Peter Zijlstra
  2017-08-23 18:04           ` Peter Zijlstra
@ 2017-08-23 18:05           ` Peter Zijlstra
  2017-08-23 18:22           ` Peter Zijlstra
  3 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-08-23 18:05 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V; +Cc: Intel-gfx

On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:
> And there is another problem associated with the "perf stat -e <event>
> -a": since 8 events are initialized, I get metric values multiplied by
> 8. How I am supposed to fix that? I suspect that this is somehow related
> to cpumask attribute exposed by some other PMUs, for example uncore.
> They are doing something like:
> 
> static cpumask_t uncore_cpu_mask;
> 
> static DEVICE_ATTR(cpumask, S_IRUGO, uncore_get_attr_cpumask, NULL);
> 
> static struct attribute *uncore_pmu_attrs[] = {
> 	&dev_attr_cpumask.attr,
> 	NULL,
> };
> 
> static struct attribute_group uncore_pmu_attr_group = {
> 	.attrs = uncore_pmu_attrs,
> };
> 
> Should i915 follow this way as well?

Ah, so much for reading emails in parts, yes! that is what userspace
needs to know in order to DTRT.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-23 17:51         ` Rogozhkin, Dmitry V
                             ` (2 preceding siblings ...)
  2017-08-23 18:05           ` Peter Zijlstra
@ 2017-08-23 18:22           ` Peter Zijlstra
  2017-08-23 19:00             ` Rogozhkin, Dmitry V
  2017-08-28 22:57             ` Rogozhkin, Dmitry V
  3 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-08-23 18:22 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V; +Cc: Intel-gfx

On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:

> Anyhow, returning to the metrics i915 exposes. Some metrics are just
> exposure of some counters supported already inside i915 PMU which do not
> require any special sampling: at any given moment you can request the
> counter value (these are interrupts counts, i915 power consumption).

> Other metrics are similar to the ever-existing which I just described,
> but they require activation for i915 to start to count them - this is
> done on the event initialization (these are engine busy stats).

Right, so depending on how expensive this activation is and if it can be
done without scheduling, there are two options:

 1) activate/deactivate from pmu::start()/pmu::stop()
 2) activate/deactivate from pmu::event_init()/event->destroy() and
    disregard all counting between pmu::stop() and pmu::start().

> Finally, there is a third group which require sampling counting: they
> are needed to be initialized and i915 pmu starts an internal timer to
> count these values (these are some engines characteristics referenced
> in the code as QUEUED, SEMA, WAIT).

So uncore PMUs can't really do sampling. That is, perf defines sampling
as interrupting the relevant task and then providing things like the
%RIP value at interrupt time. Since uncore activity cannot be associated
with any one task, no sampling allowed.

Now, I'm thinking that what i915 does is slightly different, it doesn't
provide registers to read out the counter state, but instead
periodically writes state snapshots into some memory buffer, right?

That's a bit tricky, maybe the best fit would be what PPC HV 24x7 does.
They create an event-group, that is a set of counters that are
co-scheduled, matching the set of counters they get from the HV
interface (or a subset) and then sys_read() will use a TXN_READ to
group-read the entire thing at once. In your case it could consume the
last state snapshot instead of request one (or wait for the next,
whatever works best).

Would that work?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-23 18:04           ` Peter Zijlstra
@ 2017-08-23 18:38             ` Rogozhkin, Dmitry V
  2017-08-23 19:28               ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-08-23 18:38 UTC (permalink / raw)
  To: peterz; +Cc: Intel-gfx

On Wed, 2017-08-23 at 20:04 +0200, Peter Zijlstra wrote:
> On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:
> 
> > > The above command tries to add an event 'i915/rcs0-busy/' to a task. How
> > > are i915 resource associated to any one particular task?
> > 
> > Currently in no way, they are global.
> 
> Right. So no per DRM context things. Can you have multiple DRM contexts
> per task? If so that would make things slightly tricky when you get away
> from !global counters.

Just to be sure we are on the same page: under task we understand the
process which is being monitored with the perf: "perf stat -e <ev>
task.sh"? Yes, each process may have few DRM contexts, so we are in a
tricky space:).

If we will decide to support per-task counters in the future, would it
be possible to preserve global mode providing end-user a choice: monitor
per-process or globally?

> 
> > > Is there a unique i915 resource for each task? If not, I don't see how
> > > per-task event can ever work as expected.
> > 
> > This depends on what you mean under "expected"? I see 2 variants:
> > 1. Either this command line should return correct metric values
> > 2. This command line should error out
> 
> > Right now i915 PMU produces global metric, thus, I think 2nd variant is
> > true. Which I think is achievable if PMU is registered with
> > perf_invalid_context.
> 
> Agreed, and yes, perf_invalid_context is right for uncore stuff and will
> result in refusal to create per-task counters.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-23 18:01           ` Peter Zijlstra
@ 2017-08-23 18:40             ` Rogozhkin, Dmitry V
  0 siblings, 0 replies; 32+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-08-23 18:40 UTC (permalink / raw)
  To: peterz; +Cc: Intel-gfx

On Wed, 2017-08-23 at 20:01 +0200, Peter Zijlstra wrote:
> On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:
> 
> > https://patchwork.freedesktop.org/patch/171953/. This patch makes 'perf
> > stat -e i915/rcs0-busy/' to error out and supports 'perf stat -e
> > i915/rcs0-busy/ -a -C0'. I still think I miss something since 'perf stat
> > -e i915/rcs0-busy/ -a' will give metrics multiplied by number of active
> > CPUs, but that's look closer to what is needed.
> 
> IIRC an uncore PMU should expose a cpumask through sysfs, and then perf
> tools will read that mask and auto-magically limit the number of CPUs it
> instantiates the counter on.
> 
> See for instance arch/x86/events/intel/cstate.c, that is a fairly
> trivial uncore PMU, look for "DEVICE_ATTR(cpumask".

Thank you for the reference! I will go forward and try to implement this
for i915 PMU.

> 
> For example, on my 2 socket 10 core ivb-ep:
> 
> root@ivb-ep:~# cat /sys/bus/event_source/devices/cstate_pkg/cpumask
> 0,10
> 

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-23 18:22           ` Peter Zijlstra
@ 2017-08-23 19:00             ` Rogozhkin, Dmitry V
  2017-08-23 19:33               ` Peter Zijlstra
  2017-08-28 22:57             ` Rogozhkin, Dmitry V
  1 sibling, 1 reply; 32+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-08-23 19:00 UTC (permalink / raw)
  To: peterz; +Cc: Intel-gfx

On Wed, 2017-08-23 at 20:22 +0200, Peter Zijlstra wrote:
> On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:
> 
> > Anyhow, returning to the metrics i915 exposes. Some metrics are just
> > exposure of some counters supported already inside i915 PMU which do not
> > require any special sampling: at any given moment you can request the
> > counter value (these are interrupts counts, i915 power consumption).
> 
> > Other metrics are similar to the ever-existing which I just described,
> > but they require activation for i915 to start to count them - this is
> > done on the event initialization (these are engine busy stats).
> 
> Right, so depending on how expensive this activation is and if it can be
> done without scheduling, there are two options:
> 
>  1) activate/deactivate from pmu::start()/pmu::stop()
>  2) activate/deactivate from pmu::event_init()/event->destroy() and
>     disregard all counting between pmu::stop() and pmu::start().
> 
> > Finally, there is a third group which require sampling counting: they
> > are needed to be initialized and i915 pmu starts an internal timer to
> > count these values (these are some engines characteristics referenced
> > in the code as QUEUED, SEMA, WAIT).
> 
> So uncore PMUs can't really do sampling. That is, perf defines sampling
> as interrupting the relevant task and then providing things like the
> %RIP value at interrupt time. Since uncore activity cannot be associated
> with any one task, no sampling allowed.

I read this as we need to add:
static int i915_pmu_event_init(struct perf_event *event)
{
...
	/* Sampling not supported yet */
	if (hwc->sample_period)
		return -EINVAL;
...
}
And deny sampling period for our events, right? And what should happen
is that the following command will start to fail: "perf stat -e ev -a -I
100"? Essentially you are saying that uncore PMUs should work only in
the "get value when asked" mode, for perf-stat it will add results in
the end of the run. However, there is no strict perf subsystem denying
of the uncore event sampling, but this is delegated to the each PMU
implementation. Is that correct? I wonder, how this is supposed to work
in case PMU is capable to produce both global and per-task metrics?


> 
> Now, I'm thinking that what i915 does is slightly different, it doesn't
> provide registers to read out the counter state, but instead
> periodically writes state snapshots into some memory buffer, right?
> 
> That's a bit tricky, maybe the best fit would be what PPC HV 24x7 does.
> They create an event-group, that is a set of counters that are
> co-scheduled, matching the set of counters they get from the HV
> interface (or a subset) and then sys_read() will use a TXN_READ to
> group-read the entire thing at once. In your case it could consume the
> last state snapshot instead of request one (or wait for the next,
> whatever works best).
> 
> Would that work?

Hm. Not sure I understand this with my userspace brain:). Why group
read? Just to eliminate overhead of doing same/similar things for all
the metrics? Anyway, what completely slept away from me is how this
relates to the sampling if it is not allowed for uncore PMUs? So, there
will or will not be a sampling?

Essentially, i915 PMU for such "sampling" metrics is implemented in the
following way:
1. If such a metric is requested i915 PMU starts to do some internal
sampling which is not exposed to perf subsystem. I think there is some
timer staff currently associated with this.
2. On each received i915_pmu_event_read() we just immediately return
those values we have gathered so far internally
3. Since we were not aware about no uncore sampling policy, we have some
code for sampling exposed to perf subsystem, but this is non-related to
#1. Essentially it is used to just call i915_pmu_event_read(). Here is a
code:

static DEFINE_PER_CPU(struct pt_regs, i915_pmu_pt_regs);

static enum hrtimer_restart hrtimer_sample(struct hrtimer *hrtimer)
{
	struct pt_regs *regs = this_cpu_ptr(&i915_pmu_pt_regs);
	struct perf_sample_data data;
	struct perf_event *event;
	u64 period;

	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
	if (event->state != PERF_EVENT_STATE_ACTIVE)
		return HRTIMER_NORESTART;

	event->pmu->read(event);

	perf_sample_data_init(&data, 0, event->hw.last_period);
	perf_event_overflow(event, &data, regs);

	period = max_t(u64, 10000, event->hw.sample_period);
	hrtimer_forward_now(hrtimer, ns_to_ktime(period));
	return HRTIMER_RESTART;
}


By the way, I think Tvrtko had a question on why he needs this
"DEFINE_PER_CPU(struct pt_regs, i915_pmu_pt_regs)" staff which looked
somewhat strange? Here is a quote: "Also, once I exported the events for
enumeration and tried accessing them with perf, I had to create fake
pt_regs to pass to perf_event_overflow."

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-23 18:38             ` Rogozhkin, Dmitry V
@ 2017-08-23 19:28               ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-08-23 19:28 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V; +Cc: Intel-gfx

On Wed, Aug 23, 2017 at 06:38:05PM +0000, Rogozhkin, Dmitry V wrote:
> On Wed, 2017-08-23 at 20:04 +0200, Peter Zijlstra wrote:
> > On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:
> > 
> > > > The above command tries to add an event 'i915/rcs0-busy/' to a task. How
> > > > are i915 resource associated to any one particular task?
> > > 
> > > Currently in no way, they are global.
> > 
> > Right. So no per DRM context things. Can you have multiple DRM contexts
> > per task? If so that would make things slightly tricky when you get away
> > from !global counters.
> 
> Just to be sure we are on the same page: under task we understand the
> process which is being monitored with the perf: "perf stat -e <ev>
> task.sh"? Yes, each process may have few DRM contexts, so we are in a
> tricky space:).

task is really what userspace calls a thread.

> If we will decide to support per-task counters in the future, would it
> be possible to preserve global mode providing end-user a choice: monitor
> per-process or globally?

Of course, we could even do a whole new PMU for possible finer grained
stuff.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-23 19:00             ` Rogozhkin, Dmitry V
@ 2017-08-23 19:33               ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-08-23 19:33 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V; +Cc: Intel-gfx

On Wed, Aug 23, 2017 at 07:00:33PM +0000, Rogozhkin, Dmitry V wrote:
> On Wed, 2017-08-23 at 20:22 +0200, Peter Zijlstra wrote:
> > On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:
> > 
> > > Anyhow, returning to the metrics i915 exposes. Some metrics are just
> > > exposure of some counters supported already inside i915 PMU which do not
> > > require any special sampling: at any given moment you can request the
> > > counter value (these are interrupts counts, i915 power consumption).
> > 
> > > Other metrics are similar to the ever-existing which I just described,
> > > but they require activation for i915 to start to count them - this is
> > > done on the event initialization (these are engine busy stats).
> > 
> > Right, so depending on how expensive this activation is and if it can be
> > done without scheduling, there are two options:
> > 
> >  1) activate/deactivate from pmu::start()/pmu::stop()
> >  2) activate/deactivate from pmu::event_init()/event->destroy() and
> >     disregard all counting between pmu::stop() and pmu::start().
> > 
> > > Finally, there is a third group which require sampling counting: they
> > > are needed to be initialized and i915 pmu starts an internal timer to
> > > count these values (these are some engines characteristics referenced
> > > in the code as QUEUED, SEMA, WAIT).
> > 
> > So uncore PMUs can't really do sampling. That is, perf defines sampling
> > as interrupting the relevant task and then providing things like the
> > %RIP value at interrupt time. Since uncore activity cannot be associated
> > with any one task, no sampling allowed.
> 
> I read this as we need to add:
> static int i915_pmu_event_init(struct perf_event *event)
> {
> ...
> 	/* Sampling not supported yet */
> 	if (hwc->sample_period)
> 		return -EINVAL;
> ...
> }
> And deny sampling period for our events, right? 

Something like that, yes.


> And what should happen
> is that the following command will start to fail: "perf stat -e ev -a -I
> 100"? 

No, perf stat -I works by userspace doing sys_read() with a userspace
timer.

It will stop perf-record from working.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-23 18:22           ` Peter Zijlstra
  2017-08-23 19:00             ` Rogozhkin, Dmitry V
@ 2017-08-28 22:57             ` Rogozhkin, Dmitry V
  2017-08-29  9:30               ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-08-28 22:57 UTC (permalink / raw)
  To: peterz; +Cc: Intel-gfx

On Wed, 2017-08-23 at 20:22 +0200, Peter Zijlstra wrote:
> On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:
> 
> > Anyhow, returning to the metrics i915 exposes. Some metrics are just
> > exposure of some counters supported already inside i915 PMU which do not
> > require any special sampling: at any given moment you can request the
> > counter value (these are interrupts counts, i915 power consumption).
> 
> > Other metrics are similar to the ever-existing which I just described,
> > but they require activation for i915 to start to count them - this is
> > done on the event initialization (these are engine busy stats).
> 
> Right, so depending on how expensive this activation is and if it can be
> done without scheduling, there are two options:
> 
>  1) activate/deactivate from pmu::start()/pmu::stop()
>  2) activate/deactivate from pmu::event_init()/event->destroy() and
>     disregard all counting between pmu::stop() and pmu::start().
> 
> > Finally, there is a third group which require sampling counting: they
> > are needed to be initialized and i915 pmu starts an internal timer to
> > count these values (these are some engines characteristics referenced
> > in the code as QUEUED, SEMA, WAIT).
> 
> So uncore PMUs can't really do sampling. That is, perf defines sampling
> as interrupting the relevant task and then providing things like the
> %RIP value at interrupt time. Since uncore activity cannot be associated
> with any one task, no sampling allowed.
> 
> Now, I'm thinking that what i915 does is slightly different, it doesn't
> provide registers to read out the counter state, but instead
> periodically writes state snapshots into some memory buffer, right?
> 
> That's a bit tricky, maybe the best fit would be what PPC HV 24x7 does.
> They create an event-group, that is a set of counters that are
> co-scheduled, matching the set of counters they get from the HV
> interface (or a subset) and then sys_read() will use a TXN_READ to
> group-read the entire thing at once. In your case it could consume the
> last state snapshot instead of request one (or wait for the next,
> whatever works best).
> 
> Would that work?

Hi Peter,

I have updated my fixes to Tvrtko's PMU, they are here:
https://patchwork.freedesktop.org/series/28842/, and I started to check
whether we will be able to cover all the use cases for this PMU which we
had in mind. Here I have some concerns and further questions.

So, as soon as I registered PMU with the perf_invalid_context, i.e. as
an uncore PMU, I got the effect that metrics from our PMU are available
under root only. This happens since we fall to the following case
described in 'man perf_event_open': "A pid == -1 and cpu >= 0 setting is
per-CPU and measures all processes on the specified CPU.  Per-CPU events
need  the  CAP_SYS_ADMIN  capability  or
a /proc/sys/kernel/perf_event_paranoid value of less than 1."

This a trouble point for us... So, could you, please, clarify:
1. How PMU API is positioned? It is for debug purposes only or it can be
used in the end-user release applications to monitor system activity and
make some decisions based on that?
2. How applications can access uncore PMU metrics from non-privileged
applications?
3. Is that a strong requirement to restrict uncore PMU metrics reporting
to privileged applications or this can be relaxed?


I understand why restriction was relevant in the time when only CPU
level were available: system-wide were expensive, but I don't quite
understand why these restrictions are in place now for uncore PMUs when
they actually report metrics right away. Is that just a remnant of
CPU-only times and no one needed this to be changed? Can this be changed
and uncore metrics allowed to be accessed from general applications?


Regards,
Dmitry.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-28 22:57             ` Rogozhkin, Dmitry V
@ 2017-08-29  9:30               ` Peter Zijlstra
  2017-08-29 19:16                 ` Rogozhkin, Dmitry V
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-08-29  9:30 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V; +Cc: Intel-gfx

On Mon, Aug 28, 2017 at 10:43:17PM +0000, Rogozhkin, Dmitry V wrote:

> Hi Peter,
>
> I have updated my fixes to Tvrtko's PMU, they are here:
> https://patchwork.freedesktop.org/series/28842/, and I started to check
> whether we will be able to cover all the use cases for this PMU which we
> had in mind. Here I have some concerns and further questions.
>
> So, as soon as I registered PMU with the perf_invalid_context, i.e. as
> an uncore PMU, I got the effect that metrics from our PMU are available
> under root only. This happens since we fall to the following case
> described in 'man perf_event_open': "A pid == -1 and cpu >= 0 setting is
> per-CPU and measures all processes on the specified CPU.  Per-CPU events
> need  the  CAP_SYS_ADMIN  capability  or
> a /proc/sys/kernel/perf_event_paranoid value of less than 1."
>
> This a trouble point for us... So, could you, please, clarify:
> 1. How PMU API is positioned? It is for debug purposes only or it can be
> used in the end-user release applications to monitor system activity and
> make some decisions based on that?

Perf is meant to also be usable for end-users, _however_ by default it
will only allow users to profile their own userspace (tasks).

Allowing unpriv users access to kernel data is a clear data leak (you
instantly void KASLR for instance).

And since uncore PMUs are not uniquely tied to individual user context,
unpriv users have no access.

> 2. How applications can access uncore PMU metrics from non-privileged
> applications?

Only by lowering sysctl.kernel.perf_event_paranoid.

Since uuncore is shared among many users, its events can be used to
construct side-channel attacks. So from a security pov this is not
something that can otherwise be done.

Imagine user A using the GPU to do crypto and user B reading the GPU
events to infer state or similar things.

Without privilege separation we cannot allow unpriv access to system
wide resources.

> 3. Is that a strong requirement to restrict uncore PMU metrics reporting
> to privileged applications or this can be relaxed?

Pretty strict, people tend to get fairly upset every time we leak stuff.
In fact Debian and Android carry a perf_event_paranoid patch that
default disables _everything_ :-(
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-29  9:30               ` Peter Zijlstra
@ 2017-08-29 19:16                 ` Rogozhkin, Dmitry V
  2017-08-29 19:21                   ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-08-29 19:16 UTC (permalink / raw)
  To: peterz; +Cc: Intel-gfx

On Tue, 2017-08-29 at 11:30 +0200, Peter Zijlstra wrote:
> On Mon, Aug 28, 2017 at 10:43:17PM +0000, Rogozhkin, Dmitry V wrote:
> 
> > Hi Peter,
> >
> > I have updated my fixes to Tvrtko's PMU, they are here:
> > https://patchwork.freedesktop.org/series/28842/, and I started to check
> > whether we will be able to cover all the use cases for this PMU which we
> > had in mind. Here I have some concerns and further questions.
> >
> > So, as soon as I registered PMU with the perf_invalid_context, i.e. as
> > an uncore PMU, I got the effect that metrics from our PMU are available
> > under root only. This happens since we fall to the following case
> > described in 'man perf_event_open': "A pid == -1 and cpu >= 0 setting is
> > per-CPU and measures all processes on the specified CPU.  Per-CPU events
> > need  the  CAP_SYS_ADMIN  capability  or
> > a /proc/sys/kernel/perf_event_paranoid value of less than 1."
> >
> > This a trouble point for us... So, could you, please, clarify:
> > 1. How PMU API is positioned? It is for debug purposes only or it can be
> > used in the end-user release applications to monitor system activity and
> > make some decisions based on that?
> 
> Perf is meant to also be usable for end-users, _however_ by default it
> will only allow users to profile their own userspace (tasks).
> 
> Allowing unpriv users access to kernel data is a clear data leak (you
> instantly void KASLR for instance).
> 
> And since uncore PMUs are not uniquely tied to individual user context,
> unpriv users have no access.
> 

If someone knew how much I don't like to step into such situations:).
Ok, thank you for clarification. This affect how we can use this API,
but it does not make it unusable. Good that it is intended for end-users
as well.

> > 2. How applications can access uncore PMU metrics from non-privileged
> > applications?
> 
> Only by lowering sysctl.kernel.perf_event_paranoid.
> Since uuncore is shared among many users, its events can be used to
> construct side-channel attacks. So from a security pov this is not
> something that can otherwise be done.
> 
> Imagine user A using the GPU to do crypto and user B reading the GPU
> events to infer state or similar things.
> 
> Without privilege separation we cannot allow unpriv access to system
> wide resources.
> 
> > 3. Is that a strong requirement to restrict uncore PMU metrics reporting
> > to privileged applications or this can be relaxed?
> 
> Pretty strict, people tend to get fairly upset every time we leak stuff.
> In fact Debian and Android carry a perf_event_paranoid patch that
> default disables _everything_ :-(

Can you say more on that for Debian and Android? What exactly they do?
What is the value of perf_event_paranoid there? They disable everything
even for root and CAP_SYS_ADMIN? But still they don't remove this from
kernel on compilation stage, right? So users can explicitly change
perf_event_paranoid to the desired value?

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-29 19:16                 ` Rogozhkin, Dmitry V
@ 2017-08-29 19:21                   ` Peter Zijlstra
  2017-09-13 23:05                     ` Rogozhkin, Dmitry V
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-08-29 19:21 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V; +Cc: Intel-gfx

On Tue, Aug 29, 2017 at 07:16:31PM +0000, Rogozhkin, Dmitry V wrote:
> > Pretty strict, people tend to get fairly upset every time we leak stuff.
> > In fact Debian and Android carry a perf_event_paranoid patch that
> > default disables _everything_ :-(
> 
> Can you say more on that for Debian and Android? What exactly they do?
> What is the value of perf_event_paranoid there? They disable everything
> even for root and CAP_SYS_ADMIN? But still they don't remove this from
> kernel on compilation stage, right? So users can explicitly change
> perf_event_paranoid to the desired value?

They introduce (and default to) perf_event_paranoid = 3. Which
disallows everything for unpriv user, root can still do things IIRC, I'd
have to dig out the patch.

This way apps have no access to the syscall, but you can enable it using
ADB by lowering the setting. So developers still have access, but
regular apps do not.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-08-29 19:21                   ` Peter Zijlstra
@ 2017-09-13 23:05                     ` Rogozhkin, Dmitry V
  2017-09-20 20:14                       ` Rogozhkin, Dmitry V
  0 siblings, 1 reply; 32+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-09-13 23:05 UTC (permalink / raw)
  To: peterz; +Cc: Intel-gfx

On Tue, 2017-08-29 at 21:21 +0200, Peter Zijlstra wrote:
> On Tue, Aug 29, 2017 at 07:16:31PM +0000, Rogozhkin, Dmitry V wrote:
> > > Pretty strict, people tend to get fairly upset every time we leak stuff.
> > > In fact Debian and Android carry a perf_event_paranoid patch that
> > > default disables _everything_ :-(
> > 
> > Can you say more on that for Debian and Android? What exactly they do?
> > What is the value of perf_event_paranoid there? They disable everything
> > even for root and CAP_SYS_ADMIN? But still they don't remove this from
> > kernel on compilation stage, right? So users can explicitly change
> > perf_event_paranoid to the desired value?
> 
> They introduce (and default to) perf_event_paranoid = 3. Which
> disallows everything for unpriv user, root can still do things IIRC, I'd
> have to dig out the patch.
> 
> This way apps have no access to the syscall, but you can enable it using
> ADB by lowering the setting. So developers still have access, but
> regular apps do not.
> 

Hi, Peter.

How you would feel about the following idea (or close to it):
1. We introduce one more level for perf_event_paranoid=4 (or =3, I am
not sure whether Debian/Android =3 is considered uAPI) which would mean:
"disallow kernel profiling for unpriv, but let individual kernel modules
to have their own settings".
2. We will have i915 PMU custom setting
"/sys/module/i915/parameters/perf_event_paranoid" which will be in
effect only if global perf_event_paranoid=4 (or =3) and prevail over a
global setting

Would anything like that be acceptable upstream? This would permit
customers to configure i915 PMU support for unpriv users separately from
the rest of PMU subsystem.

Regards,
Dmitry.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-09-13 23:05                     ` Rogozhkin, Dmitry V
@ 2017-09-20 20:14                       ` Rogozhkin, Dmitry V
  2017-10-02 21:28                         ` Rogozhkin, Dmitry V
  0 siblings, 1 reply; 32+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-09-20 20:14 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V, peterz; +Cc: Intel-gfx

Hi Peter, could you, please, comment on below?

-----Original Message-----
From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf Of Rogozhkin, Dmitry V
Sent: Wednesday, September 13, 2017 4:06 PM
To: peterz@infradead.org
Cc: Intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [RFC 04/10] drm/i915: Expose a PMU interface for perf queries

On Tue, 2017-08-29 at 21:21 +0200, Peter Zijlstra wrote:
> On Tue, Aug 29, 2017 at 07:16:31PM +0000, Rogozhkin, Dmitry V wrote:
> > > Pretty strict, people tend to get fairly upset every time we leak stuff.
> > > In fact Debian and Android carry a perf_event_paranoid patch that 
> > > default disables _everything_ :-(
> > 
> > Can you say more on that for Debian and Android? What exactly they do?
> > What is the value of perf_event_paranoid there? They disable 
> > everything even for root and CAP_SYS_ADMIN? But still they don't 
> > remove this from kernel on compilation stage, right? So users can 
> > explicitly change perf_event_paranoid to the desired value?
> 
> They introduce (and default to) perf_event_paranoid = 3. Which 
> disallows everything for unpriv user, root can still do things IIRC, 
> I'd have to dig out the patch.
> 
> This way apps have no access to the syscall, but you can enable it 
> using ADB by lowering the setting. So developers still have access, 
> but regular apps do not.
> 

Hi, Peter.

How you would feel about the following idea (or close to it):
1. We introduce one more level for perf_event_paranoid=4 (or =3, I am not sure whether Debian/Android =3 is considered uAPI) which would mean:
"disallow kernel profiling for unpriv, but let individual kernel modules to have their own settings".
2. We will have i915 PMU custom setting
"/sys/module/i915/parameters/perf_event_paranoid" which will be in effect only if global perf_event_paranoid=4 (or =3) and prevail over a global setting

Would anything like that be acceptable upstream? This would permit customers to configure i915 PMU support for unpriv users separately from the rest of PMU subsystem.

Regards,
Dmitry.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 04/10] drm/i915: Expose a PMU interface for perf queries
  2017-09-20 20:14                       ` Rogozhkin, Dmitry V
@ 2017-10-02 21:28                         ` Rogozhkin, Dmitry V
  0 siblings, 0 replies; 32+ messages in thread
From: Rogozhkin, Dmitry V @ 2017-10-02 21:28 UTC (permalink / raw)
  To: peterz; +Cc: Intel-gfx

Hi Peter, friendly reminder. Could you, please, respond?

-----Original Message-----
From: Rogozhkin, Dmitry V 
Sent: Wednesday, September 20, 2017 1:15 PM
To: Rogozhkin, Dmitry V <dmitry.v.rogozhkin@intel.com>; peterz@infradead.org
Cc: Intel-gfx@lists.freedesktop.org
Subject: RE: [Intel-gfx] [RFC 04/10] drm/i915: Expose a PMU interface for perf queries

Hi Peter, could you, please, comment on below?

-----Original Message-----
From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf Of Rogozhkin, Dmitry V
Sent: Wednesday, September 13, 2017 4:06 PM
To: peterz@infradead.org
Cc: Intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [RFC 04/10] drm/i915: Expose a PMU interface for perf queries

On Tue, 2017-08-29 at 21:21 +0200, Peter Zijlstra wrote:
> On Tue, Aug 29, 2017 at 07:16:31PM +0000, Rogozhkin, Dmitry V wrote:
> > > Pretty strict, people tend to get fairly upset every time we leak stuff.
> > > In fact Debian and Android carry a perf_event_paranoid patch that 
> > > default disables _everything_ :-(
> > 
> > Can you say more on that for Debian and Android? What exactly they do?
> > What is the value of perf_event_paranoid there? They disable 
> > everything even for root and CAP_SYS_ADMIN? But still they don't 
> > remove this from kernel on compilation stage, right? So users can 
> > explicitly change perf_event_paranoid to the desired value?
> 
> They introduce (and default to) perf_event_paranoid = 3. Which 
> disallows everything for unpriv user, root can still do things IIRC, 
> I'd have to dig out the patch.
> 
> This way apps have no access to the syscall, but you can enable it 
> using ADB by lowering the setting. So developers still have access, 
> but regular apps do not.
> 

Hi, Peter.

How you would feel about the following idea (or close to it):
1. We introduce one more level for perf_event_paranoid=4 (or =3, I am not sure whether Debian/Android =3 is considered uAPI) which would mean:
"disallow kernel profiling for unpriv, but let individual kernel modules to have their own settings".
2. We will have i915 PMU custom setting
"/sys/module/i915/parameters/perf_event_paranoid" which will be in effect only if global perf_event_paranoid=4 (or =3) and prevail over a global setting

Would anything like that be acceptable upstream? This would permit customers to configure i915 PMU support for unpriv users separately from the rest of PMU subsystem.

Regards,
Dmitry.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-10-02 21:28 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02 12:32 [RFC v2 00/10] i915 PMU and engine busy stats Tvrtko Ursulin
2017-08-02 12:32 ` [RFC 01/10] drm/i915: Convert intel_rc6_residency_us to ns Tvrtko Ursulin
2017-08-02 12:32 ` [RFC 02/10] drm/i915: Add intel_energy_uJ Tvrtko Ursulin
2017-08-02 12:32 ` [RFC 03/10] drm/i915: Extract intel_get_cagf Tvrtko Ursulin
2017-08-02 12:32 ` [RFC 04/10] drm/i915: Expose a PMU interface for perf queries Tvrtko Ursulin
2017-08-02 13:00   ` Tvrtko Ursulin
2017-08-12  2:15     ` Rogozhkin, Dmitry V
2017-08-22 18:17       ` Peter Zijlstra
2017-08-23 17:51         ` Rogozhkin, Dmitry V
2017-08-23 18:01           ` Peter Zijlstra
2017-08-23 18:40             ` Rogozhkin, Dmitry V
2017-08-23 18:04           ` Peter Zijlstra
2017-08-23 18:38             ` Rogozhkin, Dmitry V
2017-08-23 19:28               ` Peter Zijlstra
2017-08-23 18:05           ` Peter Zijlstra
2017-08-23 18:22           ` Peter Zijlstra
2017-08-23 19:00             ` Rogozhkin, Dmitry V
2017-08-23 19:33               ` Peter Zijlstra
2017-08-28 22:57             ` Rogozhkin, Dmitry V
2017-08-29  9:30               ` Peter Zijlstra
2017-08-29 19:16                 ` Rogozhkin, Dmitry V
2017-08-29 19:21                   ` Peter Zijlstra
2017-09-13 23:05                     ` Rogozhkin, Dmitry V
2017-09-20 20:14                       ` Rogozhkin, Dmitry V
2017-10-02 21:28                         ` Rogozhkin, Dmitry V
2017-08-02 12:32 ` [RFC 05/10] drm/i915/pmu: Suspend sampling when GPU is idle Tvrtko Ursulin
2017-08-02 12:32 ` [RFC 06/10] drm/i915: Wrap context schedule notification Tvrtko Ursulin
2017-08-02 12:32 ` [RFC 07/10] drm/i915: Engine busy time tracking Tvrtko Ursulin
2017-08-02 12:32 ` [RFC 08/10] drm/i915: Export engine busy stats in debugfs Tvrtko Ursulin
2017-08-02 12:32 ` [RFC 09/10] drm/i915/pmu: Wire up engine busy stats to PMU Tvrtko Ursulin
2017-08-02 12:39 ` [RFC 10/10] drm/i915: Gate engine stats collection with a static key Tvrtko Ursulin
2017-08-02 12:56 ` ✗ Fi.CI.BAT: warning for i915 PMU and engine busy stats (rev2) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.