intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt
@ 2023-03-30  0:40 Umesh Nerlige Ramappa
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 1/9] drm/i915/pmu: Support PMU for all engines Umesh Nerlige Ramappa
                   ` (12 more replies)
  0 siblings, 13 replies; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30  0:40 UTC (permalink / raw)
  To: intel-gfx

With MTL, frequency and rc6 counters are specific to a gt. Export these
counters via gt-specific events to the user space.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Test-with: 20230330003656.1294873-1-umesh.nerlige.ramappa@intel.com

Tvrtko Ursulin (6):
  drm/i915/pmu: Support PMU for all engines
  drm/i915/pmu: Skip sampling engines with no enabled counters
  drm/i915/pmu: Transform PMU parking code to be GT based
  drm/i915/pmu: Add reference counting to the sampling timer
  drm/i915/pmu: Prepare for multi-tile non-engine counters
  drm/i915/pmu: Export counters from all tiles

Umesh Nerlige Ramappa (3):
  drm/i915/pmu: Use a helper to convert to MHz
  drm/i915/pmu: Split reading engine and other events into helpers
  drm/i915/pmu: Enable legacy PMU events for MTL

 drivers/gpu/drm/i915/gt/intel_gt_pm.c |   4 +-
 drivers/gpu/drm/i915/i915_pmu.c       | 464 ++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_pmu.h       |  22 +-
 include/uapi/drm/i915_drm.h           |  22 +-
 4 files changed, 394 insertions(+), 118 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 1/9] drm/i915/pmu: Support PMU for all engines
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
@ 2023-03-30  0:40 ` Umesh Nerlige Ramappa
  2023-03-30 12:27   ` Tvrtko Ursulin
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 2/9] drm/i915/pmu: Skip sampling engines with no enabled counters Umesh Nerlige Ramappa
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30  0:40 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Given how the metrics are already exported, we also need to run sampling
over engines from all GTs.

Problem of GT frequencies is left for later.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 7ece883a7d95..e274dba58629 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -10,6 +10,7 @@
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_engine_regs.h"
 #include "gt/intel_engine_user.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_gt_regs.h"
 #include "gt/intel_rc6.h"
@@ -414,8 +415,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 	struct drm_i915_private *i915 =
 		container_of(hrtimer, struct drm_i915_private, pmu.timer);
 	struct i915_pmu *pmu = &i915->pmu;
-	struct intel_gt *gt = to_gt(i915);
 	unsigned int period_ns;
+	struct intel_gt *gt;
+	unsigned int i;
 	ktime_t now;
 
 	if (!READ_ONCE(pmu->timer_enabled))
@@ -431,8 +433,14 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 	 * grabbing the forcewake. However the potential error from timer call-
 	 * back delay greatly dominates this so we keep it simple.
 	 */
-	engines_sample(gt, period_ns);
-	frequency_sample(gt, period_ns);
+
+	for_each_gt(gt, i915, i) {
+		engines_sample(gt, period_ns);
+
+		/* Sample only gt0 until gt support is added for frequency */
+		if (i == 0)
+			frequency_sample(gt, period_ns);
+	}
 
 	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 2/9] drm/i915/pmu: Skip sampling engines with no enabled counters
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 1/9] drm/i915/pmu: Support PMU for all engines Umesh Nerlige Ramappa
@ 2023-03-30  0:40 ` Umesh Nerlige Ramappa
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 3/9] drm/i915/pmu: Transform PMU parking code to be GT based Umesh Nerlige Ramappa
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30  0:40 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As we have more and more engines do not waste time sampling the ones no-
one is monitoring.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index e274dba58629..6abd5042dea3 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -339,6 +339,9 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
 		return;
 
 	for_each_engine(engine, gt, id) {
+		if (!engine->pmu.enable)
+			continue;
+
 		if (!intel_engine_pm_get_if_awake(engine))
 			continue;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 3/9] drm/i915/pmu: Transform PMU parking code to be GT based
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 1/9] drm/i915/pmu: Support PMU for all engines Umesh Nerlige Ramappa
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 2/9] drm/i915/pmu: Skip sampling engines with no enabled counters Umesh Nerlige Ramappa
@ 2023-03-30  0:40 ` Umesh Nerlige Ramappa
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 4/9] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30  0:40 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Trivial prep work for full multi-tile enablement later.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |  4 ++--
 drivers/gpu/drm/i915/i915_pmu.c       | 16 ++++++++--------
 drivers/gpu/drm/i915/i915_pmu.h       |  9 +++++----
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index e02cb90723ae..c2e69bafd02b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -87,7 +87,7 @@ static int __gt_unpark(struct intel_wakeref *wf)
 
 	intel_rc6_unpark(&gt->rc6);
 	intel_rps_unpark(&gt->rps);
-	i915_pmu_gt_unparked(i915);
+	i915_pmu_gt_unparked(gt);
 	intel_guc_busyness_unpark(gt);
 
 	intel_gt_unpark_requests(gt);
@@ -109,7 +109,7 @@ static int __gt_park(struct intel_wakeref *wf)
 
 	intel_guc_busyness_park(gt);
 	i915_vma_parked(gt);
-	i915_pmu_gt_parked(i915);
+	i915_pmu_gt_parked(gt);
 	intel_rps_park(&gt->rps);
 	intel_rc6_park(&gt->rc6);
 
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 6abd5042dea3..6f7f9b40860d 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -217,11 +217,11 @@ static void init_rc6(struct i915_pmu *pmu)
 	}
 }
 
-static void park_rc6(struct drm_i915_private *i915)
+static void park_rc6(struct intel_gt *gt)
 {
-	struct i915_pmu *pmu = &i915->pmu;
+	struct i915_pmu *pmu = &gt->i915->pmu;
 
-	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
+	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
 	pmu->sleep_last = ktime_get_raw();
 }
 
@@ -236,16 +236,16 @@ static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
 	}
 }
 
-void i915_pmu_gt_parked(struct drm_i915_private *i915)
+void i915_pmu_gt_parked(struct intel_gt *gt)
 {
-	struct i915_pmu *pmu = &i915->pmu;
+	struct i915_pmu *pmu = &gt->i915->pmu;
 
 	if (!pmu->base.event_init)
 		return;
 
 	spin_lock_irq(&pmu->lock);
 
-	park_rc6(i915);
+	park_rc6(gt);
 
 	/*
 	 * Signal sampling timer to stop if only engine events are enabled and
@@ -256,9 +256,9 @@ void i915_pmu_gt_parked(struct drm_i915_private *i915)
 	spin_unlock_irq(&pmu->lock);
 }
 
-void i915_pmu_gt_unparked(struct drm_i915_private *i915)
+void i915_pmu_gt_unparked(struct intel_gt *gt)
 {
-	struct i915_pmu *pmu = &i915->pmu;
+	struct i915_pmu *pmu = &gt->i915->pmu;
 
 	if (!pmu->base.event_init)
 		return;
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index 449057648f39..d98fbc7a2f45 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -13,6 +13,7 @@
 #include <uapi/drm/i915_drm.h>
 
 struct drm_i915_private;
+struct intel_gt;
 
 /**
  * Non-engine events that we need to track enabled-disabled transition and
@@ -151,15 +152,15 @@ int i915_pmu_init(void);
 void i915_pmu_exit(void);
 void i915_pmu_register(struct drm_i915_private *i915);
 void i915_pmu_unregister(struct drm_i915_private *i915);
-void i915_pmu_gt_parked(struct drm_i915_private *i915);
-void i915_pmu_gt_unparked(struct drm_i915_private *i915);
+void i915_pmu_gt_parked(struct intel_gt *gt);
+void i915_pmu_gt_unparked(struct intel_gt *gt);
 #else
 static inline int i915_pmu_init(void) { return 0; }
 static inline void i915_pmu_exit(void) {}
 static inline void i915_pmu_register(struct drm_i915_private *i915) {}
 static inline void i915_pmu_unregister(struct drm_i915_private *i915) {}
-static inline void i915_pmu_gt_parked(struct drm_i915_private *i915) {}
-static inline void i915_pmu_gt_unparked(struct drm_i915_private *i915) {}
+static inline void i915_pmu_gt_parked(struct intel_gt *gt) {}
+static inline void i915_pmu_gt_unparked(struct intel_gt *gt) {}
 #endif
 
 #endif
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 4/9] drm/i915/pmu: Add reference counting to the sampling timer
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (2 preceding siblings ...)
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 3/9] drm/i915/pmu: Transform PMU parking code to be GT based Umesh Nerlige Ramappa
@ 2023-03-30  0:40 ` Umesh Nerlige Ramappa
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 5/9] drm/i915/pmu: Prepare for multi-tile non-engine counters Umesh Nerlige Ramappa
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30  0:40 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We do not want to have timers per tile and waste CPU cycles and energy via
multiple wake-up sources, for a relatively un-important task of PMU
sampling, so keeping a single timer works well. But we also do not want
the first GT which goes idle to turn off the timer.

Add some reference counting, via a mask of unparked GTs, to solve this.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++--
 drivers/gpu/drm/i915/i915_pmu.h |  4 ++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 6f7f9b40860d..c00b94c7f509 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -251,7 +251,9 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
 	 * Signal sampling timer to stop if only engine events are enabled and
 	 * GPU went idle.
 	 */
-	pmu->timer_enabled = pmu_needs_timer(pmu, false);
+	pmu->unparked &= ~BIT(gt->info.id);
+	if (pmu->unparked == 0)
+		pmu->timer_enabled = pmu_needs_timer(pmu, false);
 
 	spin_unlock_irq(&pmu->lock);
 }
@@ -268,7 +270,10 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
 	/*
 	 * Re-enable sampling timer when GPU goes active.
 	 */
-	__i915_pmu_maybe_start_timer(pmu);
+	if (pmu->unparked == 0)
+		__i915_pmu_maybe_start_timer(pmu);
+
+	pmu->unparked |= BIT(gt->info.id);
 
 	spin_unlock_irq(&pmu->lock);
 }
@@ -438,6 +443,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 	 */
 
 	for_each_gt(gt, i915, i) {
+		if (!(pmu->unparked & BIT(i)))
+			continue;
+
 		engines_sample(gt, period_ns);
 
 		/* Sample only gt0 until gt support is added for frequency */
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index d98fbc7a2f45..1b04c79907e8 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -76,6 +76,10 @@ struct i915_pmu {
 	 * @lock: Lock protecting enable mask and ref count handling.
 	 */
 	spinlock_t lock;
+	/**
+	 * @unparked: GT unparked mask.
+	 */
+	unsigned int unparked;
 	/**
 	 * @timer: Timer for internal i915 PMU sampling.
 	 */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 5/9] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (3 preceding siblings ...)
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 4/9] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
@ 2023-03-30  0:40 ` Umesh Nerlige Ramappa
  2023-03-30 12:39   ` Tvrtko Ursulin
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 6/9] drm/i915/pmu: Export counters from all tiles Umesh Nerlige Ramappa
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30  0:40 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reserve some bits in the counter config namespace which will carry the
tile id and prepare the code to handle this.

No per tile counters have been added yet.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 153 +++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_pmu.h |   9 +-
 include/uapi/drm/i915_drm.h     |  18 +++-
 3 files changed, 132 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index c00b94c7f509..5d1de98d86b4 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -56,11 +56,21 @@ static bool is_engine_config(u64 config)
 	return config < __I915_PMU_OTHER(0);
 }
 
+static unsigned int config_gt_id(const u64 config)
+{
+	return config >> __I915_PMU_GT_SHIFT;
+}
+
+static u64 config_counter(const u64 config)
+{
+	return config & ~(~0ULL << __I915_PMU_GT_SHIFT);
+}
+
 static unsigned int other_bit(const u64 config)
 {
 	unsigned int val;
 
-	switch (config) {
+	switch (config_counter(config)) {
 	case I915_PMU_ACTUAL_FREQUENCY:
 		val =  __I915_PMU_ACTUAL_FREQUENCY_ENABLED;
 		break;
@@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config)
 		return -1;
 	}
 
-	return I915_ENGINE_SAMPLE_COUNT + val;
+	return I915_ENGINE_SAMPLE_COUNT +
+	       config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT +
+	       val;
 }
 
 static unsigned int config_bit(const u64 config)
 {
-	if (is_engine_config(config))
+	if (is_engine_config(config)) {
+		GEM_BUG_ON(config_gt_id(config));
+
 		return engine_config_sample(config);
-	else
+	} else {
 		return other_bit(config);
+	}
 }
 
 static u64 config_mask(u64 config)
@@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event)
 	return config_bit(event->attr.config);
 }
 
+static u64 frequency_enabled_mask(void)
+{
+	unsigned int i;
+	u64 mask = 0;
+
+	for (i = 0; i < I915_PMU_MAX_GTS; i++)
+		mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
+			config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
+
+	return mask;
+}
+
 static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
 {
 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
@@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
 	 * Mask out all the ones which do not need the timer, or in
 	 * other words keep all the ones that could need the timer.
 	 */
-	enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
-		  config_mask(I915_PMU_REQUESTED_FREQUENCY) |
-		  ENGINE_SAMPLE_MASK;
+	enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
 
 	/*
 	 * When the GPU is idle per-engine counters do not need to be
@@ -164,9 +189,39 @@ static inline s64 ktime_since_raw(const ktime_t kt)
 	return ktime_to_ns(ktime_sub(ktime_get_raw(), kt));
 }
 
+static unsigned int
+__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample)
+{
+	unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample;
+
+	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));
+
+	return idx;
+}
+
+static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample)
+{
+	return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur;
+}
+
+static void
+store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val)
+{
+	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val;
+}
+
+static void
+add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val,
+		u32 mul)
+{
+	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur +=
+							mul_u32_u32(val, mul);
+}
+
 static u64 get_rc6(struct intel_gt *gt)
 {
 	struct drm_i915_private *i915 = gt->i915;
+	const unsigned int gt_id = gt->info.id;
 	struct i915_pmu *pmu = &i915->pmu;
 	unsigned long flags;
 	bool awake = false;
@@ -181,7 +236,7 @@ static u64 get_rc6(struct intel_gt *gt)
 	spin_lock_irqsave(&pmu->lock, flags);
 
 	if (awake) {
-		pmu->sample[__I915_SAMPLE_RC6].cur = val;
+		store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
 	} else {
 		/*
 		 * We think we are runtime suspended.
@@ -190,14 +245,14 @@ static u64 get_rc6(struct intel_gt *gt)
 		 * on top of the last known real value, as the approximated RC6
 		 * counter value.
 		 */
-		val = ktime_since_raw(pmu->sleep_last);
-		val += pmu->sample[__I915_SAMPLE_RC6].cur;
+		val = ktime_since_raw(pmu->sleep_last[gt_id]);
+		val += read_sample(pmu, gt_id, __I915_SAMPLE_RC6);
 	}
 
-	if (val < pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur)
-		val = pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur;
+	if (val < read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED))
+		val = read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED);
 	else
-		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur = val;
+		store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED, val);
 
 	spin_unlock_irqrestore(&pmu->lock, flags);
 
@@ -207,13 +262,20 @@ static u64 get_rc6(struct intel_gt *gt)
 static void init_rc6(struct i915_pmu *pmu)
 {
 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
-	intel_wakeref_t wakeref;
+	struct intel_gt *gt;
+	unsigned int i;
+
+	for_each_gt(gt, i915, i) {
+		intel_wakeref_t wakeref;
 
-	with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref) {
-		pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
-		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur =
-					pmu->sample[__I915_SAMPLE_RC6].cur;
-		pmu->sleep_last = ktime_get_raw();
+		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
+			u64 val = __get_rc6(gt);
+
+			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
+			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
+				     val);
+			pmu->sleep_last[i] = ktime_get_raw();
+		}
 	}
 }
 
@@ -221,8 +283,8 @@ static void park_rc6(struct intel_gt *gt)
 {
 	struct i915_pmu *pmu = &gt->i915->pmu;
 
-	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
-	pmu->sleep_last = ktime_get_raw();
+	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
+	pmu->sleep_last[gt->info.id] = ktime_get_raw();
 }
 
 static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
@@ -362,34 +424,30 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
 	}
 }
 
-static void
-add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
-{
-	sample->cur += mul_u32_u32(val, mul);
-}
-
-static bool frequency_sampling_enabled(struct i915_pmu *pmu)
+static bool
+frequency_sampling_enabled(struct i915_pmu *pmu, unsigned int gt)
 {
 	return pmu->enable &
-	       (config_mask(I915_PMU_ACTUAL_FREQUENCY) |
-		config_mask(I915_PMU_REQUESTED_FREQUENCY));
+	       (config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt)) |
+		config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt)));
 }
 
 static void
 frequency_sample(struct intel_gt *gt, unsigned int period_ns)
 {
 	struct drm_i915_private *i915 = gt->i915;
+	const unsigned int gt_id = gt->info.id;
 	struct i915_pmu *pmu = &i915->pmu;
 	struct intel_rps *rps = &gt->rps;
 
-	if (!frequency_sampling_enabled(pmu))
+	if (!frequency_sampling_enabled(pmu, gt_id))
 		return;
 
 	/* Report 0/0 (actual/requested) frequency while parked. */
 	if (!intel_gt_pm_get_if_awake(gt))
 		return;
 
-	if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
+	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
 		u32 val;
 
 		/*
@@ -405,12 +463,12 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
 		if (!val)
 			val = intel_gpu_freq(rps, rps->cur_freq);
 
-		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
+		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_ACT,
 				val, period_ns / 1000);
 	}
 
-	if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
-		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_REQ],
+	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
+		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
 				intel_rps_get_requested_frequency(rps),
 				period_ns / 1000);
 	}
@@ -447,10 +505,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 			continue;
 
 		engines_sample(gt, period_ns);
-
-		/* Sample only gt0 until gt support is added for frequency */
-		if (i == 0)
-			frequency_sample(gt, period_ns);
+		frequency_sample(gt, period_ns);
 	}
 
 	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
@@ -492,7 +547,12 @@ config_status(struct drm_i915_private *i915, u64 config)
 {
 	struct intel_gt *gt = to_gt(i915);
 
-	switch (config) {
+	unsigned int gt_id = config_gt_id(config);
+
+	if (gt_id)
+		return -ENOENT;
+
+	switch (config_counter(config)) {
 	case I915_PMU_ACTUAL_FREQUENCY:
 		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
 			/* Requires a mutex for sampling! */
@@ -600,22 +660,27 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 			val = engine->pmu.sample[sample].cur;
 		}
 	} else {
-		switch (event->attr.config) {
+		const unsigned int gt_id = config_gt_id(event->attr.config);
+		const u64 config = config_counter(event->attr.config);
+
+		switch (config) {
 		case I915_PMU_ACTUAL_FREQUENCY:
 			val =
-			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_ACT].cur,
+			   div_u64(read_sample(pmu, gt_id,
+					       __I915_SAMPLE_FREQ_ACT),
 				   USEC_PER_SEC /* to MHz */);
 			break;
 		case I915_PMU_REQUESTED_FREQUENCY:
 			val =
-			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_REQ].cur,
+			   div_u64(read_sample(pmu, gt_id,
+					       __I915_SAMPLE_FREQ_REQ),
 				   USEC_PER_SEC /* to MHz */);
 			break;
 		case I915_PMU_INTERRUPTS:
 			val = READ_ONCE(pmu->irq_count);
 			break;
 		case I915_PMU_RC6_RESIDENCY:
-			val = get_rc6(to_gt(i915));
+			val = get_rc6(i915->gt[gt_id]);
 			break;
 		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
 			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index 1b04c79907e8..a708e44a227e 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -38,13 +38,16 @@ enum {
 	__I915_NUM_PMU_SAMPLERS
 };
 
+#define I915_PMU_MAX_GTS (4) /* FIXME */
+
 /**
  * How many different events we track in the global PMU mask.
  *
  * It is also used to know to needed number of event reference counters.
  */
 #define I915_PMU_MASK_BITS \
-	(I915_ENGINE_SAMPLE_COUNT + __I915_PMU_TRACKED_EVENT_COUNT)
+	(I915_ENGINE_SAMPLE_COUNT + \
+	 I915_PMU_MAX_GTS * __I915_PMU_TRACKED_EVENT_COUNT)
 
 #define I915_ENGINE_SAMPLE_COUNT (I915_SAMPLE_SEMA + 1)
 
@@ -124,11 +127,11 @@ struct i915_pmu {
 	 * Only global counters are held here, while the per-engine ones are in
 	 * struct intel_engine_cs.
 	 */
-	struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS];
+	struct i915_pmu_sample sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
 	/**
 	 * @sleep_last: Last time GT parked for RC6 estimation.
 	 */
-	ktime_t sleep_last;
+	ktime_t sleep_last[I915_PMU_MAX_GTS];
 	/**
 	 * @irq_count: Number of interrupts
 	 *
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index dba7c5a5b25e..bbab7f3dbeb4 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -280,7 +280,17 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
-#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
+/*
+ * Top 8 bits of every non-engine counter are GT id.
+ * FIXME: __I915_PMU_GT_SHIFT will be changed to 56
+ */
+#define __I915_PMU_GT_SHIFT (60)
+
+#define ___I915_PMU_OTHER(gt, x) \
+	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
+	((__u64)(gt) << __I915_PMU_GT_SHIFT))
+
+#define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
 #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
@@ -290,6 +300,12 @@ enum drm_i915_pmu_engine_sample {
 
 #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
 
+#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
+#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
+#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
+#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
+#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
+
 /* Each region is a minimum of 16k, and there are at most 255 of them.
  */
 #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 6/9] drm/i915/pmu: Export counters from all tiles
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (4 preceding siblings ...)
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 5/9] drm/i915/pmu: Prepare for multi-tile non-engine counters Umesh Nerlige Ramappa
@ 2023-03-30  0:41 ` Umesh Nerlige Ramappa
  2023-03-30 13:01   ` Tvrtko Ursulin
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 7/9] drm/i915/pmu: Use a helper to convert to MHz Umesh Nerlige Ramappa
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30  0:41 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Start exporting frequency and RC6 counters from all tiles.

Existing counters keep their names and config values and new one use the
namespace added in the previous patch, with the "-gtN" added to their
names.

Interrupts counter is an odd one off. Because it is the global device
counters (not only GT) we choose not to add per tile versions for now.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 96 ++++++++++++++++++++++++++-------
 1 file changed, 77 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 5d1de98d86b4..2a5deabff088 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -548,8 +548,9 @@ config_status(struct drm_i915_private *i915, u64 config)
 	struct intel_gt *gt = to_gt(i915);
 
 	unsigned int gt_id = config_gt_id(config);
+	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
 
-	if (gt_id)
+	if (gt_id > max_gt_id)
 		return -ENOENT;
 
 	switch (config_counter(config)) {
@@ -563,6 +564,8 @@ config_status(struct drm_i915_private *i915, u64 config)
 			return -ENODEV;
 		break;
 	case I915_PMU_INTERRUPTS:
+		if (gt_id)
+			return -ENOENT;
 		break;
 	case I915_PMU_RC6_RESIDENCY:
 		if (!gt->rc6.supported)
@@ -932,9 +935,9 @@ static const struct attribute_group i915_pmu_cpumask_attr_group = {
 	.attrs = i915_cpumask_attrs,
 };
 
-#define __event(__config, __name, __unit) \
+#define __event(__counter, __name, __unit) \
 { \
-	.config = (__config), \
+	.counter = (__counter), \
 	.name = (__name), \
 	.unit = (__unit), \
 }
@@ -975,15 +978,21 @@ create_event_attributes(struct i915_pmu *pmu)
 {
 	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
 	static const struct {
-		u64 config;
+		unsigned int counter;
 		const char *name;
 		const char *unit;
 	} events[] = {
-		__event(I915_PMU_ACTUAL_FREQUENCY, "actual-frequency", "M"),
-		__event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
-		__event(I915_PMU_INTERRUPTS, "interrupts", NULL),
-		__event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
-		__event(I915_PMU_SOFTWARE_GT_AWAKE_TIME, "software-gt-awake-time", "ns"),
+		__event(0, "actual-frequency", "M"),
+		__event(1, "requested-frequency", "M"),
+		__event(3, "rc6-residency", "ns"),
+		__event(4, "software-gt-awake-time", "ns"),
+	};
+	static const struct {
+		unsigned int counter;
+		const char *name;
+		const char *unit;
+	} global_events[] = {
+		__event(2, "interrupts", NULL),
 	};
 	static const struct {
 		enum drm_i915_pmu_engine_sample sample;
@@ -998,14 +1007,29 @@ create_event_attributes(struct i915_pmu *pmu)
 	struct i915_ext_attribute *i915_attr = NULL, *i915_iter;
 	struct attribute **attr = NULL, **attr_iter;
 	struct intel_engine_cs *engine;
-	unsigned int i;
+	struct intel_gt *gt;
+	unsigned int i, j;
 
 	/* Count how many counters we will be exposing. */
-	for (i = 0; i < ARRAY_SIZE(events); i++) {
-		if (!config_status(i915, events[i].config))
+	/* per gt counters */
+	for_each_gt(gt, i915, j) {
+		for (i = 0; i < ARRAY_SIZE(events); i++) {
+			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
+
+			if (!config_status(i915, config))
+				count++;
+		}
+	}
+
+	/* global (per GPU) counters */
+	for (i = 0; i < ARRAY_SIZE(global_events); i++) {
+		u64 config = ___I915_PMU_OTHER(0, global_events[i].counter);
+
+		if (!config_status(i915, config))
 			count++;
 	}
 
+	/* per engine counters */
 	for_each_uabi_engine(engine, i915) {
 		for (i = 0; i < ARRAY_SIZE(engine_events); i++) {
 			if (!engine_event_status(engine,
@@ -1033,26 +1057,60 @@ create_event_attributes(struct i915_pmu *pmu)
 	attr_iter = attr;
 
 	/* Initialize supported non-engine counters. */
-	for (i = 0; i < ARRAY_SIZE(events); i++) {
+	/* per gt counters */
+	for_each_gt(gt, i915, j) {
+		for (i = 0; i < ARRAY_SIZE(events); i++) {
+			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
+			char *str;
+
+			if (config_status(i915, config))
+				continue;
+
+			str = kasprintf(GFP_KERNEL, "%s-gt%u",
+					events[i].name, j);
+			if (!str)
+				goto err;
+
+			*attr_iter++ = &i915_iter->attr.attr;
+			i915_iter = add_i915_attr(i915_iter, str, config);
+
+			if (events[i].unit) {
+				str = kasprintf(GFP_KERNEL, "%s-gt%u.unit",
+						events[i].name, j);
+				if (!str)
+					goto err;
+
+				*attr_iter++ = &pmu_iter->attr.attr;
+				pmu_iter = add_pmu_attr(pmu_iter, str,
+							events[i].unit);
+			}
+		}
+	}
+
+	/* global (per GPU) counters */
+	for (i = 0; i < ARRAY_SIZE(global_events); i++) {
+		u64 config = ___I915_PMU_OTHER(0, global_events[i].counter);
 		char *str;
 
-		if (config_status(i915, events[i].config))
+		if (config_status(i915, config))
 			continue;
 
-		str = kstrdup(events[i].name, GFP_KERNEL);
+		str = kstrdup(global_events[i].name, GFP_KERNEL);
 		if (!str)
 			goto err;
 
 		*attr_iter++ = &i915_iter->attr.attr;
-		i915_iter = add_i915_attr(i915_iter, str, events[i].config);
+		i915_iter = add_i915_attr(i915_iter, str, config);
 
-		if (events[i].unit) {
-			str = kasprintf(GFP_KERNEL, "%s.unit", events[i].name);
+		if (global_events[i].unit) {
+			str = kasprintf(GFP_KERNEL, "%s.unit",
+					global_events[i].name);
 			if (!str)
 				goto err;
 
 			*attr_iter++ = &pmu_iter->attr.attr;
-			pmu_iter = add_pmu_attr(pmu_iter, str, events[i].unit);
+			pmu_iter = add_pmu_attr(pmu_iter, str,
+						global_events[i].unit);
 		}
 	}
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 7/9] drm/i915/pmu: Use a helper to convert to MHz
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (5 preceding siblings ...)
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 6/9] drm/i915/pmu: Export counters from all tiles Umesh Nerlige Ramappa
@ 2023-03-30  0:41 ` Umesh Nerlige Ramappa
  2023-03-30 13:13   ` Tvrtko Ursulin
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 8/9] drm/i915/pmu: Split reading engine and other events into helpers Umesh Nerlige Ramappa
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30  0:41 UTC (permalink / raw)
  To: intel-gfx

Use a helper to convert frequency values to MHz.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 2a5deabff088..40ce1dc00067 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -636,6 +636,11 @@ static int i915_pmu_event_init(struct perf_event *event)
 	return 0;
 }
 
+static u64 read_sample_us(struct i915_pmu *pmu, unsigned int gt_id, int sample)
+{
+	return div_u64(read_sample(pmu, gt_id, sample), USEC_PER_SEC);
+}
+
 static u64 __i915_pmu_event_read(struct perf_event *event)
 {
 	struct drm_i915_private *i915 =
@@ -668,16 +673,10 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 
 		switch (config) {
 		case I915_PMU_ACTUAL_FREQUENCY:
-			val =
-			   div_u64(read_sample(pmu, gt_id,
-					       __I915_SAMPLE_FREQ_ACT),
-				   USEC_PER_SEC /* to MHz */);
+			val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
 			break;
 		case I915_PMU_REQUESTED_FREQUENCY:
-			val =
-			   div_u64(read_sample(pmu, gt_id,
-					       __I915_SAMPLE_FREQ_REQ),
-				   USEC_PER_SEC /* to MHz */);
+			val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
 			break;
 		case I915_PMU_INTERRUPTS:
 			val = READ_ONCE(pmu->irq_count);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 8/9] drm/i915/pmu: Split reading engine and other events into helpers
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (6 preceding siblings ...)
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 7/9] drm/i915/pmu: Use a helper to convert to MHz Umesh Nerlige Ramappa
@ 2023-03-30  0:41 ` Umesh Nerlige Ramappa
  2023-03-30 13:26   ` Tvrtko Ursulin
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 9/9] drm/i915/pmu: Enable legacy PMU events for MTL Umesh Nerlige Ramappa
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30  0:41 UTC (permalink / raw)
  To: intel-gfx

Split the event reading function into engine and other helpers.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 93 ++++++++++++++++++---------------
 1 file changed, 52 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 40ce1dc00067..9bd9605d2662 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -641,58 +641,69 @@ static u64 read_sample_us(struct i915_pmu *pmu, unsigned int gt_id, int sample)
 	return div_u64(read_sample(pmu, gt_id, sample), USEC_PER_SEC);
 }
 
-static u64 __i915_pmu_event_read(struct perf_event *event)
+static u64 __i915_pmu_event_read_engine(struct perf_event *event)
 {
-	struct drm_i915_private *i915 =
-		container_of(event->pmu, typeof(*i915), pmu.base);
-	struct i915_pmu *pmu = &i915->pmu;
+	struct drm_i915_private *i915 = container_of(event->pmu, typeof(*i915), pmu.base);
+	u8 sample = engine_event_sample(event);
+	struct intel_engine_cs *engine;
 	u64 val = 0;
 
-	if (is_engine_event(event)) {
-		u8 sample = engine_event_sample(event);
-		struct intel_engine_cs *engine;
-
-		engine = intel_engine_lookup_user(i915,
-						  engine_event_class(event),
-						  engine_event_instance(event));
+	engine = intel_engine_lookup_user(i915,
+					  engine_event_class(event),
+					  engine_event_instance(event));
 
-		if (drm_WARN_ON_ONCE(&i915->drm, !engine)) {
-			/* Do nothing */
-		} else if (sample == I915_SAMPLE_BUSY &&
-			   intel_engine_supports_stats(engine)) {
-			ktime_t unused;
+	if (drm_WARN_ON_ONCE(&i915->drm, !engine)) {
+		/* Do nothing */
+	} else if (sample == I915_SAMPLE_BUSY &&
+		   intel_engine_supports_stats(engine)) {
+		ktime_t unused;
 
-			val = ktime_to_ns(intel_engine_get_busy_time(engine,
-								     &unused));
-		} else {
-			val = engine->pmu.sample[sample].cur;
-		}
+		val = ktime_to_ns(intel_engine_get_busy_time(engine,
+							     &unused));
 	} else {
-		const unsigned int gt_id = config_gt_id(event->attr.config);
-		const u64 config = config_counter(event->attr.config);
-
-		switch (config) {
-		case I915_PMU_ACTUAL_FREQUENCY:
-			val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
-			break;
-		case I915_PMU_REQUESTED_FREQUENCY:
-			val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
-			break;
-		case I915_PMU_INTERRUPTS:
-			val = READ_ONCE(pmu->irq_count);
-			break;
-		case I915_PMU_RC6_RESIDENCY:
-			val = get_rc6(i915->gt[gt_id]);
-			break;
-		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
-			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
-			break;
-		}
+		val = engine->pmu.sample[sample].cur;
 	}
 
 	return val;
 }
 
+static u64 __i915_pmu_event_read_other(struct perf_event *event)
+{
+	struct drm_i915_private *i915 = container_of(event->pmu, typeof(*i915), pmu.base);
+	const unsigned int gt_id = config_gt_id(event->attr.config);
+	const u64 config = config_counter(event->attr.config);
+	struct i915_pmu *pmu = &i915->pmu;
+	u64 val = 0;
+
+	switch (config) {
+	case I915_PMU_ACTUAL_FREQUENCY:
+		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
+		break;
+	case I915_PMU_REQUESTED_FREQUENCY:
+		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
+		break;
+	case I915_PMU_INTERRUPTS:
+		val = READ_ONCE(pmu->irq_count);
+		break;
+	case I915_PMU_RC6_RESIDENCY:
+		val = get_rc6(i915->gt[gt_id]);
+		break;
+	case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
+		val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
+		break;
+	}
+
+	return val;
+}
+
+static u64 __i915_pmu_event_read(struct perf_event *event)
+{
+	if (is_engine_event(event))
+		return __i915_pmu_event_read_engine(event);
+	else
+		return __i915_pmu_event_read_other(event);
+}
+
 static void i915_pmu_event_read(struct perf_event *event)
 {
 	struct drm_i915_private *i915 =
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] [PATCH 9/9] drm/i915/pmu: Enable legacy PMU events for MTL
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (7 preceding siblings ...)
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 8/9] drm/i915/pmu: Split reading engine and other events into helpers Umesh Nerlige Ramappa
@ 2023-03-30  0:41 ` Umesh Nerlige Ramappa
  2023-03-30 13:38   ` Tvrtko Ursulin
  2023-03-30  1:37 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add MTL PMU support for multi-gt Patchwork
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30  0:41 UTC (permalink / raw)
  To: intel-gfx

MTL introduces separate GTs for render and media. This complicates the
definition of frequency and rc6 counters for the GPU as a whole since
each GT has an independent counter. The best way to support this change
is to deprecate the GPU-specific counters and create GT-specific
counters, however that just breaks ABI. Since perf tools and scripts may
be decentralized with probably many users, it's hard to deprecate the
legacy counters and have all the users on board with that.

Re-introduce the legacy counters and support them as min/max of
GT-specific counters as necessary to ensure backwards compatibility.

I915_PMU_ACTUAL_FREQUENCY - will show max of GT-specific counters
I915_PMU_REQUESTED_FREQUENCY - will show max of GT-specific counters
I915_PMU_INTERRUPTS - no changes since it is GPU specific on all platforms
I915_PMU_RC6_RESIDENCY - will show min of GT-specific counters
I915_PMU_SOFTWARE_GT_AWAKE_TIME - will show max of GT-specific counters

Note:
- For deeper debugging of performance issues, tools must be upgraded to
  read the GT-specific counters.
- This patch deserves to be separate from the other PMU features so that
  it can be easily dropped if legacy events are ever deprecated.
- Internal implementation relies on creating an extra entry in the
  arrays used for GT specific counters. Index 0 is empty.
  Index 1 through N are mapped to GTs 0 through N - 1.
- User interface will use GT numbers indexed from 0 to specify the GT of
  interest.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 134 +++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_pmu.h |   2 +-
 include/uapi/drm/i915_drm.h     |  14 ++--
 3 files changed, 125 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 9bd9605d2662..0dc7711c3b4b 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -221,7 +221,7 @@ add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val,
 static u64 get_rc6(struct intel_gt *gt)
 {
 	struct drm_i915_private *i915 = gt->i915;
-	const unsigned int gt_id = gt->info.id;
+	const unsigned int gt_id = gt->info.id + 1;
 	struct i915_pmu *pmu = &i915->pmu;
 	unsigned long flags;
 	bool awake = false;
@@ -267,24 +267,26 @@ static void init_rc6(struct i915_pmu *pmu)
 
 	for_each_gt(gt, i915, i) {
 		intel_wakeref_t wakeref;
+		const unsigned int gt_id = i + 1;
 
 		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
 			u64 val = __get_rc6(gt);
 
-			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
-			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
+			store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
+			store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED,
 				     val);
-			pmu->sleep_last[i] = ktime_get_raw();
+			pmu->sleep_last[gt_id] = ktime_get_raw();
 		}
 	}
 }
 
 static void park_rc6(struct intel_gt *gt)
 {
+	const unsigned int gt_id = gt->info.id + 1;
 	struct i915_pmu *pmu = &gt->i915->pmu;
 
-	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
-	pmu->sleep_last[gt->info.id] = ktime_get_raw();
+	store_sample(pmu, gt_id, __I915_SAMPLE_RC6, __get_rc6(gt));
+	pmu->sleep_last[gt_id] = ktime_get_raw();
 }
 
 static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
@@ -436,18 +438,18 @@ static void
 frequency_sample(struct intel_gt *gt, unsigned int period_ns)
 {
 	struct drm_i915_private *i915 = gt->i915;
-	const unsigned int gt_id = gt->info.id;
+	const unsigned int gt_id = gt->info.id + 1;
 	struct i915_pmu *pmu = &i915->pmu;
 	struct intel_rps *rps = &gt->rps;
 
-	if (!frequency_sampling_enabled(pmu, gt_id))
+	if (!frequency_sampling_enabled(pmu, gt->info.id))
 		return;
 
 	/* Report 0/0 (actual/requested) frequency while parked. */
 	if (!intel_gt_pm_get_if_awake(gt))
 		return;
 
-	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
+	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt->info.id))) {
 		u32 val;
 
 		/*
@@ -467,7 +469,7 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
 				val, period_ns / 1000);
 	}
 
-	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
+	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt->info.id))) {
 		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
 				intel_rps_get_requested_frequency(rps),
 				period_ns / 1000);
@@ -545,14 +547,15 @@ engine_event_status(struct intel_engine_cs *engine,
 static int
 config_status(struct drm_i915_private *i915, u64 config)
 {
-	struct intel_gt *gt = to_gt(i915);
-
 	unsigned int gt_id = config_gt_id(config);
-	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
+	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 2 : 1;
+	struct intel_gt *gt;
 
 	if (gt_id > max_gt_id)
 		return -ENOENT;
 
+	gt = !gt_id ? to_gt(i915) : i915->gt[gt_id - 1];
+
 	switch (config_counter(config)) {
 	case I915_PMU_ACTUAL_FREQUENCY:
 		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
@@ -673,23 +676,58 @@ static u64 __i915_pmu_event_read_other(struct perf_event *event)
 	const unsigned int gt_id = config_gt_id(event->attr.config);
 	const u64 config = config_counter(event->attr.config);
 	struct i915_pmu *pmu = &i915->pmu;
+	struct intel_gt *gt;
 	u64 val = 0;
+	int i;
 
 	switch (config) {
 	case I915_PMU_ACTUAL_FREQUENCY:
-		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
+		if (gt_id)
+			return read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
+
+		if (!HAS_EXTRA_GT_LIST(i915))
+			return read_sample_us(pmu, 1, __I915_SAMPLE_FREQ_ACT);
+
+		for_each_gt(gt, i915, i)
+			val = max(val, read_sample_us(pmu, i + 1, __I915_SAMPLE_FREQ_ACT));
+
 		break;
 	case I915_PMU_REQUESTED_FREQUENCY:
-		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
+		if (gt_id)
+			return read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
+
+		if (!HAS_EXTRA_GT_LIST(i915))
+			return read_sample_us(pmu, 1, __I915_SAMPLE_FREQ_REQ);
+
+		for_each_gt(gt, i915, i)
+			val = max(val, read_sample_us(pmu, i + 1, __I915_SAMPLE_FREQ_REQ));
+
 		break;
 	case I915_PMU_INTERRUPTS:
 		val = READ_ONCE(pmu->irq_count);
 		break;
 	case I915_PMU_RC6_RESIDENCY:
-		val = get_rc6(i915->gt[gt_id]);
+		if (gt_id)
+			return get_rc6(i915->gt[gt_id - 1]);
+
+		if (!HAS_EXTRA_GT_LIST(i915))
+			return get_rc6(i915->gt[0]);
+
+		val = U64_MAX;
+		for_each_gt(gt, i915, i)
+			val = min(val, get_rc6(gt));
+
 		break;
 	case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
-		val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
+		if (gt_id)
+			return ktime_to_ns(intel_gt_get_awake_time(i915->gt[gt_id - 1]));
+
+		if (!HAS_EXTRA_GT_LIST(i915))
+			return ktime_to_ns(intel_gt_get_awake_time(i915->gt[0]));
+
+		val = 0;
+		for_each_gt(gt, i915, i)
+			val = max((s64)val, ktime_to_ns(intel_gt_get_awake_time(gt)));
 		break;
 	}
 
@@ -728,11 +766,14 @@ static void i915_pmu_event_read(struct perf_event *event)
 
 static void i915_pmu_enable(struct perf_event *event)
 {
+	const unsigned int gt_id = config_gt_id(event->attr.config);
 	struct drm_i915_private *i915 =
 		container_of(event->pmu, typeof(*i915), pmu.base);
 	struct i915_pmu *pmu = &i915->pmu;
+	struct intel_gt *gt;
 	unsigned long flags;
 	unsigned int bit;
+	u64 i;
 
 	bit = event_bit(event);
 	if (bit == -1)
@@ -745,12 +786,42 @@ static void i915_pmu_enable(struct perf_event *event)
 	 * the event reference counter.
 	 */
 	BUILD_BUG_ON(ARRAY_SIZE(pmu->enable_count) != I915_PMU_MASK_BITS);
+	BUILD_BUG_ON(BITS_PER_TYPE(pmu->enable) < I915_PMU_MASK_BITS);
 	GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
 	GEM_BUG_ON(pmu->enable_count[bit] == ~0);
 
 	pmu->enable |= BIT_ULL(bit);
 	pmu->enable_count[bit]++;
 
+	/*
+	 * The arrays that i915_pmu maintains are now indexed as
+	 *
+	 * 0 - aggregate events (a.k.a !gt_id)
+	 * 1 - gt0
+	 * 2 - gt1
+	 *
+	 * The same logic applies to event_bit masks. The first set of mask are
+	 * for aggregate, followed by gt0 and gt1 masks. The idea here is to
+	 * enable the event on all gts if the aggregate event bit is set. This
+	 * applies only to the non-engine-events.
+	 */
+	if (!gt_id && !is_engine_event(event)) {
+		for_each_gt(gt, i915, i) {
+			u64 counter = config_counter(event->attr.config);
+			u64 config = ((i + 1) << __I915_PMU_GT_SHIFT) | counter;
+			unsigned int bit = config_bit(config);
+
+			if (bit == -1)
+				continue;
+
+			GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
+			GEM_BUG_ON(pmu->enable_count[bit] == ~0);
+
+			pmu->enable |= BIT_ULL(bit);
+			pmu->enable_count[bit]++;
+		}
+	}
+
 	/*
 	 * Start the sampling timer if needed and not already enabled.
 	 */
@@ -793,6 +864,7 @@ static void i915_pmu_enable(struct perf_event *event)
 
 static void i915_pmu_disable(struct perf_event *event)
 {
+	const unsigned int gt_id = config_gt_id(event->attr.config);
 	struct drm_i915_private *i915 =
 		container_of(event->pmu, typeof(*i915), pmu.base);
 	unsigned int bit = event_bit(event);
@@ -822,6 +894,26 @@ static void i915_pmu_disable(struct perf_event *event)
 		 */
 		if (--engine->pmu.enable_count[sample] == 0)
 			engine->pmu.enable &= ~BIT(sample);
+	} else if (!gt_id) {
+		struct intel_gt *gt;
+		u64 i;
+
+		for_each_gt(gt, i915, i) {
+			u64 counter = config_counter(event->attr.config);
+			u64 config = ((i + 1) << __I915_PMU_GT_SHIFT) | counter;
+			unsigned int bit = config_bit(config);
+
+			if (bit == -1)
+				continue;
+
+			GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
+			GEM_BUG_ON(pmu->enable_count[bit] == 0);
+
+			if (--pmu->enable_count[bit] == 0) {
+				pmu->enable &= ~BIT_ULL(bit);
+				pmu->timer_enabled &= pmu_needs_timer(pmu, true);
+			}
+		}
 	}
 
 	GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
@@ -1002,7 +1094,11 @@ create_event_attributes(struct i915_pmu *pmu)
 		const char *name;
 		const char *unit;
 	} global_events[] = {
+		__event(0, "actual-frequency", "M"),
+		__event(1, "requested-frequency", "M"),
 		__event(2, "interrupts", NULL),
+		__event(3, "rc6-residency", "ns"),
+		__event(4, "software-gt-awake-time", "ns"),
 	};
 	static const struct {
 		enum drm_i915_pmu_engine_sample sample;
@@ -1024,7 +1120,7 @@ create_event_attributes(struct i915_pmu *pmu)
 	/* per gt counters */
 	for_each_gt(gt, i915, j) {
 		for (i = 0; i < ARRAY_SIZE(events); i++) {
-			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
+			u64 config = ___I915_PMU_OTHER(j + 1, events[i].counter);
 
 			if (!config_status(i915, config))
 				count++;
@@ -1070,7 +1166,7 @@ create_event_attributes(struct i915_pmu *pmu)
 	/* per gt counters */
 	for_each_gt(gt, i915, j) {
 		for (i = 0; i < ARRAY_SIZE(events); i++) {
-			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
+			u64 config = ___I915_PMU_OTHER(j + 1, events[i].counter);
 			char *str;
 
 			if (config_status(i915, config))
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index a708e44a227e..a4cc1eb218fc 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -38,7 +38,7 @@ enum {
 	__I915_NUM_PMU_SAMPLERS
 };
 
-#define I915_PMU_MAX_GTS (4) /* FIXME */
+#define I915_PMU_MAX_GTS (4 + 1) /* FIXME */
 
 /**
  * How many different events we track in the global PMU mask.
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index bbab7f3dbeb4..18794c30027f 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -290,6 +290,7 @@ enum drm_i915_pmu_engine_sample {
 	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
 	((__u64)(gt) << __I915_PMU_GT_SHIFT))
 
+/* Aggregate from all gts */
 #define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
@@ -300,11 +301,14 @@ enum drm_i915_pmu_engine_sample {
 
 #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
 
-#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
-#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
-#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
-#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
-#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
+/* GT specific counters */
+#define ____I915_PMU_OTHER(gt, x) ___I915_PMU_OTHER(((gt) + 1), x)
+
+#define __I915_PMU_ACTUAL_FREQUENCY(gt)		____I915_PMU_OTHER(gt, 0)
+#define __I915_PMU_REQUESTED_FREQUENCY(gt)	____I915_PMU_OTHER(gt, 1)
+#define __I915_PMU_INTERRUPTS(gt)		____I915_PMU_OTHER(gt, 2)
+#define __I915_PMU_RC6_RESIDENCY(gt)		____I915_PMU_OTHER(gt, 3)
+#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	____I915_PMU_OTHER(gt, 4)
 
 /* Each region is a minimum of 16k, and there are at most 255 of them.
  */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add MTL PMU support for multi-gt
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (8 preceding siblings ...)
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 9/9] drm/i915/pmu: Enable legacy PMU events for MTL Umesh Nerlige Ramappa
@ 2023-03-30  1:37 ` Patchwork
  2023-03-30  1:37 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2023-03-30  1:37 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: Add MTL PMU support for multi-gt
URL   : https://patchwork.freedesktop.org/series/115836/
State : warning

== Summary ==

Error: dim checkpatch failed
11153695e917 drm/i915/pmu: Support PMU for all engines
e81f792130c7 drm/i915/pmu: Skip sampling engines with no enabled counters
ecedc375d617 drm/i915/pmu: Transform PMU parking code to be GT based
12bdfd2a45f4 drm/i915/pmu: Add reference counting to the sampling timer
927155f231d6 drm/i915/pmu: Prepare for multi-tile non-engine counters
-:54: WARNING:AVOID_BUG: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants
#54: FILE: drivers/gpu/drm/i915/i915_pmu.c:99:
+		GEM_BUG_ON(config_gt_id(config));

-:103: WARNING:AVOID_BUG: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants
#103: FILE: drivers/gpu/drm/i915/i915_pmu.c:197:
+	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));

total: 0 errors, 2 warnings, 0 checks, 346 lines checked
ab6717f2e5ca drm/i915/pmu: Export counters from all tiles
2a5e80bfe71e drm/i915/pmu: Use a helper to convert to MHz
3b4e837ed3d7 drm/i915/pmu: Split reading engine and other events into helpers
7d0b9b050afc drm/i915/pmu: Enable legacy PMU events for MTL
-:241: WARNING:AVOID_BUG: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants
#241: FILE: drivers/gpu/drm/i915/i915_pmu.c:817:
+			GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));

-:242: WARNING:AVOID_BUG: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants
#242: FILE: drivers/gpu/drm/i915/i915_pmu.c:818:
+			GEM_BUG_ON(pmu->enable_count[bit] == ~0);

-:276: WARNING:AVOID_BUG: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants
#276: FILE: drivers/gpu/drm/i915/i915_pmu.c:909:
+			GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));

-:277: WARNING:AVOID_BUG: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants
#277: FILE: drivers/gpu/drm/i915/i915_pmu.c:910:
+			GEM_BUG_ON(pmu->enable_count[bit] == 0);

total: 0 errors, 4 warnings, 0 checks, 298 lines checked



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Add MTL PMU support for multi-gt
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (9 preceding siblings ...)
  2023-03-30  1:37 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add MTL PMU support for multi-gt Patchwork
@ 2023-03-30  1:37 ` Patchwork
  2023-03-30  1:46 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
  2023-03-30 19:50 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  12 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2023-03-30  1:37 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: Add MTL PMU support for multi-gt
URL   : https://patchwork.freedesktop.org/series/115836/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Add MTL PMU support for multi-gt
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (10 preceding siblings ...)
  2023-03-30  1:37 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2023-03-30  1:46 ` Patchwork
  2023-03-30 19:50 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  12 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2023-03-30  1:46 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 5473 bytes --]

== Series Details ==

Series: Add MTL PMU support for multi-gt
URL   : https://patchwork.freedesktop.org/series/115836/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12937 -> Patchwork_115836v1
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/index.html

Participating hosts (36 -> 35)
------------------------------

  Additional (1): fi-pnv-d510 
  Missing    (2): fi-kbl-soraka fi-snb-2520m 

Known issues
------------

  Here are the changes found in Patchwork_115836v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s3@smem:
    - bat-rpls-1:         [PASS][1] -> [ABORT][2] ([i915#6687] / [i915#7978])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/bat-rpls-1/igt@gem_exec_suspend@basic-s3@smem.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/bat-rpls-1/igt@gem_exec_suspend@basic-s3@smem.html

  * igt@i915_selftest@live@migrate:
    - bat-dg2-11:         [PASS][3] -> [DMESG-WARN][4] ([i915#7699])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/bat-dg2-11/igt@i915_selftest@live@migrate.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/bat-dg2-11/igt@i915_selftest@live@migrate.html

  * igt@i915_selftest@live@mman:
    - bat-rpls-1:         [PASS][5] -> [TIMEOUT][6] ([i915#6794])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/bat-rpls-1/igt@i915_selftest@live@mman.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/bat-rpls-1/igt@i915_selftest@live@mman.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
    - bat-dg2-11:         NOTRUN -> [SKIP][7] ([i915#7828])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/bat-dg2-11/igt@kms_chamelium_hpd@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence@pipe-d-dp-1:
    - bat-dg2-8:          [PASS][8] -> [FAIL][9] ([i915#7932])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence@pipe-d-dp-1.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence@pipe-d-dp-1.html

  * igt@kms_pipe_crc_basic@read-crc:
    - bat-dg2-11:         NOTRUN -> [SKIP][10] ([i915#5354])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/bat-dg2-11/igt@kms_pipe_crc_basic@read-crc.html

  * igt@kms_psr@primary_page_flip:
    - fi-pnv-d510:        NOTRUN -> [SKIP][11] ([fdo#109271]) +38 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/fi-pnv-d510/igt@kms_psr@primary_page_flip.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-glk-j4005:       [DMESG-FAIL][12] ([i915#5334]) -> [PASS][13]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/fi-glk-j4005/igt@i915_selftest@live@gt_heartbeat.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/fi-glk-j4005/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg2-11:         [ABORT][14] ([i915#7913]) -> [PASS][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/bat-dg2-11/igt@i915_selftest@live@hangcheck.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/bat-dg2-11/igt@i915_selftest@live@hangcheck.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687
  [i915#6794]: https://gitlab.freedesktop.org/drm/intel/issues/6794
  [i915#7699]: https://gitlab.freedesktop.org/drm/intel/issues/7699
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7932]: https://gitlab.freedesktop.org/drm/intel/issues/7932
  [i915#7978]: https://gitlab.freedesktop.org/drm/intel/issues/7978


Build changes
-------------

  * IGT: IGT_7226 -> IGTPW_8716
  * Linux: CI_DRM_12937 -> Patchwork_115836v1

  CI-20190529: 20190529
  CI_DRM_12937: 6848d3613c0a63382d00ff550c41394902bda903 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_8716: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8716/index.html
  IGT_7226: 41be8b4ab86f9e11388c10366dfd71e5032589c1 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_115836v1: 6848d3613c0a63382d00ff550c41394902bda903 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

5d29036213d8 drm/i915/pmu: Enable legacy PMU events for MTL
d60928cf3de7 drm/i915/pmu: Split reading engine and other events into helpers
57ff3141158f drm/i915/pmu: Use a helper to convert to MHz
2422132d18bc drm/i915/pmu: Export counters from all tiles
1fa5cb3609ec drm/i915/pmu: Prepare for multi-tile non-engine counters
9848cebb18fb drm/i915/pmu: Add reference counting to the sampling timer
1123e295524a drm/i915/pmu: Transform PMU parking code to be GT based
d454acb1683a drm/i915/pmu: Skip sampling engines with no enabled counters
b62cb9c20f42 drm/i915/pmu: Support PMU for all engines

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/index.html

[-- Attachment #2: Type: text/html, Size: 6364 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 1/9] drm/i915/pmu: Support PMU for all engines
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 1/9] drm/i915/pmu: Support PMU for all engines Umesh Nerlige Ramappa
@ 2023-03-30 12:27   ` Tvrtko Ursulin
  0 siblings, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2023-03-30 12:27 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx


On 30/03/2023 01:40, Umesh Nerlige Ramappa wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Given how the metrics are already exported, we also need to run sampling
> over engines from all GTs.
> 
> Problem of GT frequencies is left for later.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Just a reminder to add your s-o-b while moving patches from internal to 
upstream.

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 14 +++++++++++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 7ece883a7d95..e274dba58629 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -10,6 +10,7 @@
>   #include "gt/intel_engine_pm.h"
>   #include "gt/intel_engine_regs.h"
>   #include "gt/intel_engine_user.h"
> +#include "gt/intel_gt.h"
>   #include "gt/intel_gt_pm.h"
>   #include "gt/intel_gt_regs.h"
>   #include "gt/intel_rc6.h"
> @@ -414,8 +415,9 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
>   	struct drm_i915_private *i915 =
>   		container_of(hrtimer, struct drm_i915_private, pmu.timer);
>   	struct i915_pmu *pmu = &i915->pmu;
> -	struct intel_gt *gt = to_gt(i915);
>   	unsigned int period_ns;
> +	struct intel_gt *gt;
> +	unsigned int i;
>   	ktime_t now;
>   
>   	if (!READ_ONCE(pmu->timer_enabled))
> @@ -431,8 +433,14 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
>   	 * grabbing the forcewake. However the potential error from timer call-
>   	 * back delay greatly dominates this so we keep it simple.
>   	 */
> -	engines_sample(gt, period_ns);
> -	frequency_sample(gt, period_ns);
> +
> +	for_each_gt(gt, i915, i) {
> +		engines_sample(gt, period_ns);
> +
> +		/* Sample only gt0 until gt support is added for frequency */
> +		if (i == 0)
> +			frequency_sample(gt, period_ns);
> +	}
>   
>   	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
>   

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 5/9] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-03-30  0:40 ` [Intel-gfx] [PATCH 5/9] drm/i915/pmu: Prepare for multi-tile non-engine counters Umesh Nerlige Ramappa
@ 2023-03-30 12:39   ` Tvrtko Ursulin
  2023-03-30 22:28     ` Dixit, Ashutosh
  0 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2023-03-30 12:39 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx


On 30/03/2023 01:40, Umesh Nerlige Ramappa wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Reserve some bits in the counter config namespace which will carry the
> tile id and prepare the code to handle this.
> 
> No per tile counters have been added yet.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 153 +++++++++++++++++++++++---------
>   drivers/gpu/drm/i915/i915_pmu.h |   9 +-
>   include/uapi/drm/i915_drm.h     |  18 +++-
>   3 files changed, 132 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index c00b94c7f509..5d1de98d86b4 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config)
>   	return config < __I915_PMU_OTHER(0);
>   }
>   
> +static unsigned int config_gt_id(const u64 config)
> +{
> +	return config >> __I915_PMU_GT_SHIFT;
> +}
> +
> +static u64 config_counter(const u64 config)
> +{
> +	return config & ~(~0ULL << __I915_PMU_GT_SHIFT);
> +}
> +
>   static unsigned int other_bit(const u64 config)
>   {
>   	unsigned int val;
>   
> -	switch (config) {
> +	switch (config_counter(config)) {
>   	case I915_PMU_ACTUAL_FREQUENCY:
>   		val =  __I915_PMU_ACTUAL_FREQUENCY_ENABLED;
>   		break;
> @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config)
>   		return -1;
>   	}
>   
> -	return I915_ENGINE_SAMPLE_COUNT + val;
> +	return I915_ENGINE_SAMPLE_COUNT +
> +	       config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT +
> +	       val;
>   }
>   
>   static unsigned int config_bit(const u64 config)
>   {
> -	if (is_engine_config(config))
> +	if (is_engine_config(config)) {
> +		GEM_BUG_ON(config_gt_id(config));
> +
>   		return engine_config_sample(config);
> -	else
> +	} else {
>   		return other_bit(config);
> +	}
>   }
>   
>   static u64 config_mask(u64 config)
> @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event)
>   	return config_bit(event->attr.config);
>   }
>   
> +static u64 frequency_enabled_mask(void)
> +{
> +	unsigned int i;
> +	u64 mask = 0;
> +
> +	for (i = 0; i < I915_PMU_MAX_GTS; i++)
> +		mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
> +			config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
> +
> +	return mask;
> +}
> +
>   static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>   {
>   	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>   	 * Mask out all the ones which do not need the timer, or in
>   	 * other words keep all the ones that could need the timer.
>   	 */
> -	enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
> -		  config_mask(I915_PMU_REQUESTED_FREQUENCY) |
> -		  ENGINE_SAMPLE_MASK;
> +	enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
>   
>   	/*
>   	 * When the GPU is idle per-engine counters do not need to be
> @@ -164,9 +189,39 @@ static inline s64 ktime_since_raw(const ktime_t kt)
>   	return ktime_to_ns(ktime_sub(ktime_get_raw(), kt));
>   }
>   
> +static unsigned int
> +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample)
> +{
> +	unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample;
> +
> +	GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample));
> +
> +	return idx;
> +}
> +
> +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample)
> +{
> +	return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur;
> +}
> +
> +static void
> +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val)
> +{
> +	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val;
> +}
> +
> +static void
> +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val,
> +		u32 mul)
> +{
> +	pmu->sample[__sample_idx(pmu, gt_id, sample)].cur +=
> +							mul_u32_u32(val, mul);
> +}
> +
>   static u64 get_rc6(struct intel_gt *gt)
>   {
>   	struct drm_i915_private *i915 = gt->i915;
> +	const unsigned int gt_id = gt->info.id;
>   	struct i915_pmu *pmu = &i915->pmu;
>   	unsigned long flags;
>   	bool awake = false;
> @@ -181,7 +236,7 @@ static u64 get_rc6(struct intel_gt *gt)
>   	spin_lock_irqsave(&pmu->lock, flags);
>   
>   	if (awake) {
> -		pmu->sample[__I915_SAMPLE_RC6].cur = val;
> +		store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
>   	} else {
>   		/*
>   		 * We think we are runtime suspended.
> @@ -190,14 +245,14 @@ static u64 get_rc6(struct intel_gt *gt)
>   		 * on top of the last known real value, as the approximated RC6
>   		 * counter value.
>   		 */
> -		val = ktime_since_raw(pmu->sleep_last);
> -		val += pmu->sample[__I915_SAMPLE_RC6].cur;
> +		val = ktime_since_raw(pmu->sleep_last[gt_id]);
> +		val += read_sample(pmu, gt_id, __I915_SAMPLE_RC6);
>   	}
>   
> -	if (val < pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur)
> -		val = pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur;
> +	if (val < read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED))
> +		val = read_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED);
>   	else
> -		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur = val;
> +		store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED, val);
>   
>   	spin_unlock_irqrestore(&pmu->lock, flags);
>   
> @@ -207,13 +262,20 @@ static u64 get_rc6(struct intel_gt *gt)
>   static void init_rc6(struct i915_pmu *pmu)
>   {
>   	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
> -	intel_wakeref_t wakeref;
> +	struct intel_gt *gt;
> +	unsigned int i;
> +
> +	for_each_gt(gt, i915, i) {
> +		intel_wakeref_t wakeref;
>   
> -	with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref) {
> -		pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(to_gt(i915));
> -		pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur =
> -					pmu->sample[__I915_SAMPLE_RC6].cur;
> -		pmu->sleep_last = ktime_get_raw();
> +		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
> +			u64 val = __get_rc6(gt);
> +
> +			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
> +			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
> +				     val);
> +			pmu->sleep_last[i] = ktime_get_raw();
> +		}
>   	}
>   }
>   
> @@ -221,8 +283,8 @@ static void park_rc6(struct intel_gt *gt)
>   {
>   	struct i915_pmu *pmu = &gt->i915->pmu;
>   
> -	pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(gt);
> -	pmu->sleep_last = ktime_get_raw();
> +	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
> +	pmu->sleep_last[gt->info.id] = ktime_get_raw();
>   }
>   
>   static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
> @@ -362,34 +424,30 @@ engines_sample(struct intel_gt *gt, unsigned int period_ns)
>   	}
>   }
>   
> -static void
> -add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
> -{
> -	sample->cur += mul_u32_u32(val, mul);
> -}
> -
> -static bool frequency_sampling_enabled(struct i915_pmu *pmu)
> +static bool
> +frequency_sampling_enabled(struct i915_pmu *pmu, unsigned int gt)
>   {
>   	return pmu->enable &
> -	       (config_mask(I915_PMU_ACTUAL_FREQUENCY) |
> -		config_mask(I915_PMU_REQUESTED_FREQUENCY));
> +	       (config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt)) |
> +		config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt)));
>   }
>   
>   static void
>   frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>   {
>   	struct drm_i915_private *i915 = gt->i915;
> +	const unsigned int gt_id = gt->info.id;
>   	struct i915_pmu *pmu = &i915->pmu;
>   	struct intel_rps *rps = &gt->rps;
>   
> -	if (!frequency_sampling_enabled(pmu))
> +	if (!frequency_sampling_enabled(pmu, gt_id))
>   		return;
>   
>   	/* Report 0/0 (actual/requested) frequency while parked. */
>   	if (!intel_gt_pm_get_if_awake(gt))
>   		return;
>   
> -	if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) {
> +	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
>   		u32 val;
>   
>   		/*
> @@ -405,12 +463,12 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>   		if (!val)
>   			val = intel_gpu_freq(rps, rps->cur_freq);
>   
> -		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT],
> +		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_ACT,
>   				val, period_ns / 1000);
>   	}
>   
> -	if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) {
> -		add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_REQ],
> +	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
> +		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
>   				intel_rps_get_requested_frequency(rps),
>   				period_ns / 1000);
>   	}
> @@ -447,10 +505,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
>   			continue;
>   
>   		engines_sample(gt, period_ns);
> -
> -		/* Sample only gt0 until gt support is added for frequency */
> -		if (i == 0)
> -			frequency_sample(gt, period_ns);
> +		frequency_sample(gt, period_ns);
>   	}
>   
>   	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
> @@ -492,7 +547,12 @@ config_status(struct drm_i915_private *i915, u64 config)
>   {
>   	struct intel_gt *gt = to_gt(i915);
>   
> -	switch (config) {
> +	unsigned int gt_id = config_gt_id(config);
> +
> +	if (gt_id)
> +		return -ENOENT;
> +
> +	switch (config_counter(config)) {
>   	case I915_PMU_ACTUAL_FREQUENCY:
>   		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
>   			/* Requires a mutex for sampling! */
> @@ -600,22 +660,27 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>   			val = engine->pmu.sample[sample].cur;
>   		}
>   	} else {
> -		switch (event->attr.config) {
> +		const unsigned int gt_id = config_gt_id(event->attr.config);
> +		const u64 config = config_counter(event->attr.config);
> +
> +		switch (config) {
>   		case I915_PMU_ACTUAL_FREQUENCY:
>   			val =
> -			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_ACT].cur,
> +			   div_u64(read_sample(pmu, gt_id,
> +					       __I915_SAMPLE_FREQ_ACT),
>   				   USEC_PER_SEC /* to MHz */);
>   			break;
>   		case I915_PMU_REQUESTED_FREQUENCY:
>   			val =
> -			   div_u64(pmu->sample[__I915_SAMPLE_FREQ_REQ].cur,
> +			   div_u64(read_sample(pmu, gt_id,
> +					       __I915_SAMPLE_FREQ_REQ),
>   				   USEC_PER_SEC /* to MHz */);
>   			break;
>   		case I915_PMU_INTERRUPTS:
>   			val = READ_ONCE(pmu->irq_count);
>   			break;
>   		case I915_PMU_RC6_RESIDENCY:
> -			val = get_rc6(to_gt(i915));
> +			val = get_rc6(i915->gt[gt_id]);
>   			break;
>   		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
>   			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
> index 1b04c79907e8..a708e44a227e 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.h
> +++ b/drivers/gpu/drm/i915/i915_pmu.h
> @@ -38,13 +38,16 @@ enum {
>   	__I915_NUM_PMU_SAMPLERS
>   };
>   
> +#define I915_PMU_MAX_GTS (4) /* FIXME */

3-4 years since writing this I have no idea what I meant by this FIXME. 
Should have put a better comment.. :( It was early platform enablement 
times so it was somewhat passable, but now I think we need to figure out 
what I actually meant. Maybe removing the comment is fine.

> +
>   /**
>    * How many different events we track in the global PMU mask.
>    *
>    * It is also used to know to needed number of event reference counters.
>    */
>   #define I915_PMU_MASK_BITS \
> -	(I915_ENGINE_SAMPLE_COUNT + __I915_PMU_TRACKED_EVENT_COUNT)
> +	(I915_ENGINE_SAMPLE_COUNT + \
> +	 I915_PMU_MAX_GTS * __I915_PMU_TRACKED_EVENT_COUNT)
>   
>   #define I915_ENGINE_SAMPLE_COUNT (I915_SAMPLE_SEMA + 1)
>   
> @@ -124,11 +127,11 @@ struct i915_pmu {
>   	 * Only global counters are held here, while the per-engine ones are in
>   	 * struct intel_engine_cs.
>   	 */
> -	struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS];
> +	struct i915_pmu_sample sample[I915_PMU_MAX_GTS * __I915_NUM_PMU_SAMPLERS];
>   	/**
>   	 * @sleep_last: Last time GT parked for RC6 estimation.
>   	 */
> -	ktime_t sleep_last;
> +	ktime_t sleep_last[I915_PMU_MAX_GTS];
>   	/**
>   	 * @irq_count: Number of interrupts
>   	 *
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index dba7c5a5b25e..bbab7f3dbeb4 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -280,7 +280,17 @@ enum drm_i915_pmu_engine_sample {
>   #define I915_PMU_ENGINE_SEMA(class, instance) \
>   	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
>   
> -#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
> +/*
> + * Top 8 bits of every non-engine counter are GT id.
> + * FIXME: __I915_PMU_GT_SHIFT will be changed to 56
> + */

I asked before and don't think I got an answer: Why is 4 bits not enough 
for gt id? The comment is not my code I am pretty sure.

Regards,

Tvrtko

> +#define __I915_PMU_GT_SHIFT (60)
> +
> +#define ___I915_PMU_OTHER(gt, x) \
> +	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
> +	((__u64)(gt) << __I915_PMU_GT_SHIFT))
> +
> +#define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
>   
>   #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
>   #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
> @@ -290,6 +300,12 @@ enum drm_i915_pmu_engine_sample {
>   
>   #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
>   
> +#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
> +#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
> +#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
> +#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
> +#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
> +
>   /* Each region is a minimum of 16k, and there are at most 255 of them.
>    */
>   #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 6/9] drm/i915/pmu: Export counters from all tiles
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 6/9] drm/i915/pmu: Export counters from all tiles Umesh Nerlige Ramappa
@ 2023-03-30 13:01   ` Tvrtko Ursulin
  2023-03-30 17:33     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2023-03-30 13:01 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx


On 30/03/2023 01:41, Umesh Nerlige Ramappa wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Start exporting frequency and RC6 counters from all tiles.
> 
> Existing counters keep their names and config values and new one use the
> namespace added in the previous patch, with the "-gtN" added to their
> names.

The part about keeping the names is not in the code any more. So something will have to give, either the commit text or the code.

Even without that detail, I suspect someone might want to add them Co-developed-by since I *think* someone did some changes.
  
> Interrupts counter is an odd one off. Because it is the global device
> counters (not only GT) we choose not to add per tile versions for now.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 96 ++++++++++++++++++++++++++-------
>   1 file changed, 77 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 5d1de98d86b4..2a5deabff088 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -548,8 +548,9 @@ config_status(struct drm_i915_private *i915, u64 config)
>   	struct intel_gt *gt = to_gt(i915);
>   
>   	unsigned int gt_id = config_gt_id(config);
> +	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
>   
> -	if (gt_id)
> +	if (gt_id > max_gt_id)
>   		return -ENOENT;
>   
>   	switch (config_counter(config)) {
> @@ -563,6 +564,8 @@ config_status(struct drm_i915_private *i915, u64 config)
>   			return -ENODEV;
>   		break;
>   	case I915_PMU_INTERRUPTS:
> +		if (gt_id)
> +			return -ENOENT;
>   		break;
>   	case I915_PMU_RC6_RESIDENCY:
>   		if (!gt->rc6.supported)
> @@ -932,9 +935,9 @@ static const struct attribute_group i915_pmu_cpumask_attr_group = {
>   	.attrs = i915_cpumask_attrs,
>   };
>   
> -#define __event(__config, __name, __unit) \
> +#define __event(__counter, __name, __unit) \
>   { \
> -	.config = (__config), \
> +	.counter = (__counter), \
>   	.name = (__name), \
>   	.unit = (__unit), \
>   }
> @@ -975,15 +978,21 @@ create_event_attributes(struct i915_pmu *pmu)
>   {
>   	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>   	static const struct {
> -		u64 config;
> +		unsigned int counter;
>   		const char *name;
>   		const char *unit;
>   	} events[] = {
> -		__event(I915_PMU_ACTUAL_FREQUENCY, "actual-frequency", "M"),
> -		__event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
> -		__event(I915_PMU_INTERRUPTS, "interrupts", NULL),
> -		__event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
> -		__event(I915_PMU_SOFTWARE_GT_AWAKE_TIME, "software-gt-awake-time", "ns"),
> +		__event(0, "actual-frequency", "M"),
> +		__event(1, "requested-frequency", "M"),
> +		__event(3, "rc6-residency", "ns"),
> +		__event(4, "software-gt-awake-time", "ns"),
> +	};
> +	static const struct {
> +		unsigned int counter;
> +		const char *name;
> +		const char *unit;
> +	} global_events[] = {
> +		__event(2, "interrupts", NULL),
>   	};
>   	static const struct {
>   		enum drm_i915_pmu_engine_sample sample;
> @@ -998,14 +1007,29 @@ create_event_attributes(struct i915_pmu *pmu)
>   	struct i915_ext_attribute *i915_attr = NULL, *i915_iter;
>   	struct attribute **attr = NULL, **attr_iter;
>   	struct intel_engine_cs *engine;
> -	unsigned int i;
> +	struct intel_gt *gt;
> +	unsigned int i, j;
>   
>   	/* Count how many counters we will be exposing. */
> -	for (i = 0; i < ARRAY_SIZE(events); i++) {
> -		if (!config_status(i915, events[i].config))
> +	/* per gt counters */

Two comments one by another, two styles - the inconsistency hurts.

Not sure why global events needed to be split out into a separate array? Like this below two loops are needed for each stage instead of one. AFAIR one array and one loop would just work because config_status wold report global ones as unsupported for gt > 0.

[Comes back later. It looked like this in my code:

         static const struct {
-               u64 config;
+               unsigned int counter;
                 const char *name;
                 const char *unit;
+               bool global;
         } events[] = {
-               __event(I915_PMU_ACTUAL_FREQUENCY, "actual-frequency", "M"),
-               __event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
-               __event(I915_PMU_INTERRUPTS, "interrupts", NULL),
-               __event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
+               /*
+                * #define __I915_PMU_ACTUAL_FREQUENCY(gt)    ___I915_PMU_OTHER(gt, 0)
+                * #define __I915_PMU_REQUESTED_FREQUENCY(gt) ___I915_PMU_OTHER(gt, 1)
+                * #define __I915_PMU_INTERRUPTS(gt)          ___I915_PMU_OTHER(gt, 2)
+                * #define __I915_PMU_RC6_RESIDENCY(gt)       ___I915_PMU_OTHER(gt, 3)
+                */
+               __event(0, "actual-frequency", "M"),
+               __event(1, "requested-frequency", "M"),
+               __global_event(2, "interrupts", NULL),
+               __event(3, "rc6-residency", "ns"),

...

         /* Count how many counters we will be exposing. */
-       for (i = 0; i < ARRAY_SIZE(events); i++) {
-               if (!config_status(i915, events[i].config))
-                       count++;
+       for_each_gt(i915, j, gt) {
+               for (i = 0; i < ARRAY_SIZE(events); i++) {
+                       u64 config = ___I915_PMU_OTHER(j, events[i].counter);
+
+                       if (!config_status(i915, config))
+                               count++;
+               }

So AFAICT it just worked.

]

> +	for_each_gt(gt, i915, j) {
> +		for (i = 0; i < ARRAY_SIZE(events); i++) {
> +			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
> +
> +			if (!config_status(i915, config))
> +				count++;
> +		}
> +	}
> +
> +	/* global (per GPU) counters */
> +	for (i = 0; i < ARRAY_SIZE(global_events); i++) {
> +		u64 config = ___I915_PMU_OTHER(0, global_events[i].counter);
> +
> +		if (!config_status(i915, config))
>   			count++;
>   	}
>   
> +	/* per engine counters */
>   	for_each_uabi_engine(engine, i915) {
>   		for (i = 0; i < ARRAY_SIZE(engine_events); i++) {
>   			if (!engine_event_status(engine,
> @@ -1033,26 +1057,60 @@ create_event_attributes(struct i915_pmu *pmu)
>   	attr_iter = attr;
>   
>   	/* Initialize supported non-engine counters. */
> -	for (i = 0; i < ARRAY_SIZE(events); i++) {
> +	/* per gt counters */
> +	for_each_gt(gt, i915, j) {
> +		for (i = 0; i < ARRAY_SIZE(events); i++) {
> +			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
> +			char *str;
> +
> +			if (config_status(i915, config))
> +				continue;
> +
> +			str = kasprintf(GFP_KERNEL, "%s-gt%u",
> +					events[i].name, j);

So with this patch all old platforms change the event names. This is not how I wrote it, and more importantly, it breaks userspace. Why would we do it?

For reference I dug out my code from 2020 and it looked like this:

+                       if (events[i].global || !i915->remote_tiles)
+                               str = kstrdup(events[i].name, GFP_KERNEL);
+                       else
+                               str = kasprintf(GFP_KERNEL, "%s-gt%u",
+                                               events[i].name, j);

So on single tile platforms names remain the same.

Regards,

Tvrtko

> +			if (!str)
> +				goto err;
> +
> +			*attr_iter++ = &i915_iter->attr.attr;
> +			i915_iter = add_i915_attr(i915_iter, str, config);
> +
> +			if (events[i].unit) {
> +				str = kasprintf(GFP_KERNEL, "%s-gt%u.unit",
> +						events[i].name, j);
> +				if (!str)
> +					goto err;
> +
> +				*attr_iter++ = &pmu_iter->attr.attr;
> +				pmu_iter = add_pmu_attr(pmu_iter, str,
> +							events[i].unit);
> +			}
> +		}
> +	}
> +
> +	/* global (per GPU) counters */
> +	for (i = 0; i < ARRAY_SIZE(global_events); i++) {
> +		u64 config = ___I915_PMU_OTHER(0, global_events[i].counter);
>   		char *str;
>   
> -		if (config_status(i915, events[i].config))
> +		if (config_status(i915, config))
>   			continue;
>   
> -		str = kstrdup(events[i].name, GFP_KERNEL);
> +		str = kstrdup(global_events[i].name, GFP_KERNEL);
>   		if (!str)
>   			goto err;
>   
>   		*attr_iter++ = &i915_iter->attr.attr;
> -		i915_iter = add_i915_attr(i915_iter, str, events[i].config);
> +		i915_iter = add_i915_attr(i915_iter, str, config);
>   
> -		if (events[i].unit) {
> -			str = kasprintf(GFP_KERNEL, "%s.unit", events[i].name);
> +		if (global_events[i].unit) {
> +			str = kasprintf(GFP_KERNEL, "%s.unit",
> +					global_events[i].name);
>   			if (!str)
>   				goto err;
>   
>   			*attr_iter++ = &pmu_iter->attr.attr;
> -			pmu_iter = add_pmu_attr(pmu_iter, str, events[i].unit);
> +			pmu_iter = add_pmu_attr(pmu_iter, str,
> +						global_events[i].unit);
>   		}
>   	}
>   

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 7/9] drm/i915/pmu: Use a helper to convert to MHz
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 7/9] drm/i915/pmu: Use a helper to convert to MHz Umesh Nerlige Ramappa
@ 2023-03-30 13:13   ` Tvrtko Ursulin
  0 siblings, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2023-03-30 13:13 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx


On 30/03/2023 01:41, Umesh Nerlige Ramappa wrote:
> Use a helper to convert frequency values to MHz.
> 
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 15 +++++++--------
>   1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 2a5deabff088..40ce1dc00067 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -636,6 +636,11 @@ static int i915_pmu_event_init(struct perf_event *event)
>   	return 0;
>   }
>   
> +static u64 read_sample_us(struct i915_pmu *pmu, unsigned int gt_id, int sample)

Maybe better as read_freq_sample_mhz? Anyway:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> +{
> +	return div_u64(read_sample(pmu, gt_id, sample), USEC_PER_SEC);
> +}
> +
>   static u64 __i915_pmu_event_read(struct perf_event *event)
>   {
>   	struct drm_i915_private *i915 =
> @@ -668,16 +673,10 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>   
>   		switch (config) {
>   		case I915_PMU_ACTUAL_FREQUENCY:
> -			val =
> -			   div_u64(read_sample(pmu, gt_id,
> -					       __I915_SAMPLE_FREQ_ACT),
> -				   USEC_PER_SEC /* to MHz */);
> +			val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
>   			break;
>   		case I915_PMU_REQUESTED_FREQUENCY:
> -			val =
> -			   div_u64(read_sample(pmu, gt_id,
> -					       __I915_SAMPLE_FREQ_REQ),
> -				   USEC_PER_SEC /* to MHz */);
> +			val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
>   			break;
>   		case I915_PMU_INTERRUPTS:
>   			val = READ_ONCE(pmu->irq_count);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 8/9] drm/i915/pmu: Split reading engine and other events into helpers
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 8/9] drm/i915/pmu: Split reading engine and other events into helpers Umesh Nerlige Ramappa
@ 2023-03-30 13:26   ` Tvrtko Ursulin
  0 siblings, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2023-03-30 13:26 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx


On 30/03/2023 01:41, Umesh Nerlige Ramappa wrote:
> Split the event reading function into engine and other helpers.

What, why and how please, third bit not being needed in this case. :)

> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 93 ++++++++++++++++++---------------
>   1 file changed, 52 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 40ce1dc00067..9bd9605d2662 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -641,58 +641,69 @@ static u64 read_sample_us(struct i915_pmu *pmu, unsigned int gt_id, int sample)
>   	return div_u64(read_sample(pmu, gt_id, sample), USEC_PER_SEC);
>   }
>   
> -static u64 __i915_pmu_event_read(struct perf_event *event)
> +static u64 __i915_pmu_event_read_engine(struct perf_event *event)
>   {
> -	struct drm_i915_private *i915 =
> -		container_of(event->pmu, typeof(*i915), pmu.base);
> -	struct i915_pmu *pmu = &i915->pmu;
> +	struct drm_i915_private *i915 = container_of(event->pmu, typeof(*i915), pmu.base);

Liking the over 80 look, if you insist. :)

> +	u8 sample = engine_event_sample(event);
> +	struct intel_engine_cs *engine;
>   	u64 val = 0;
>   
> -	if (is_engine_event(event)) {
> -		u8 sample = engine_event_sample(event);
> -		struct intel_engine_cs *engine;
> -
> -		engine = intel_engine_lookup_user(i915,
> -						  engine_event_class(event),
> -						  engine_event_instance(event));
> +	engine = intel_engine_lookup_user(i915,
> +					  engine_event_class(event),
> +					  engine_event_instance(event));
>   
> -		if (drm_WARN_ON_ONCE(&i915->drm, !engine)) {
> -			/* Do nothing */
> -		} else if (sample == I915_SAMPLE_BUSY &&
> -			   intel_engine_supports_stats(engine)) {
> -			ktime_t unused;
> +	if (drm_WARN_ON_ONCE(&i915->drm, !engine)) {
> +		/* Do nothing */
> +	} else if (sample == I915_SAMPLE_BUSY &&
> +		   intel_engine_supports_stats(engine)) {
> +		ktime_t unused;
>   
> -			val = ktime_to_ns(intel_engine_get_busy_time(engine,
> -								     &unused));
> -		} else {
> -			val = engine->pmu.sample[sample].cur;
> -		}
> +		val = ktime_to_ns(intel_engine_get_busy_time(engine,
> +							     &unused));
>   	} else {
> -		const unsigned int gt_id = config_gt_id(event->attr.config);
> -		const u64 config = config_counter(event->attr.config);
> -
> -		switch (config) {
> -		case I915_PMU_ACTUAL_FREQUENCY:
> -			val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
> -			break;
> -		case I915_PMU_REQUESTED_FREQUENCY:
> -			val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
> -			break;
> -		case I915_PMU_INTERRUPTS:
> -			val = READ_ONCE(pmu->irq_count);
> -			break;
> -		case I915_PMU_RC6_RESIDENCY:
> -			val = get_rc6(i915->gt[gt_id]);
> -			break;
> -		case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
> -			val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
> -			break;
> -		}
> +		val = engine->pmu.sample[sample].cur;
>   	}
>   
>   	return val;
>   }
>   
> +static u64 __i915_pmu_event_read_other(struct perf_event *event)
> +{
> +	struct drm_i915_private *i915 = container_of(event->pmu, typeof(*i915), pmu.base);
> +	const unsigned int gt_id = config_gt_id(event->attr.config);
> +	const u64 config = config_counter(event->attr.config);
> +	struct i915_pmu *pmu = &i915->pmu;
> +	u64 val = 0;
> +
> +	switch (config) {
> +	case I915_PMU_ACTUAL_FREQUENCY:
> +		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
> +		break;
> +	case I915_PMU_REQUESTED_FREQUENCY:
> +		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
> +		break;
> +	case I915_PMU_INTERRUPTS:
> +		val = READ_ONCE(pmu->irq_count);
> +		break;
> +	case I915_PMU_RC6_RESIDENCY:
> +		val = get_rc6(i915->gt[gt_id]);
> +		break;
> +	case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
> +		val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
> +		break;
> +	}
> +
> +	return val;
> +}
> +
> +static u64 __i915_pmu_event_read(struct perf_event *event)
> +{
> +	if (is_engine_event(event))
> +		return __i915_pmu_event_read_engine(event);
> +	else
> +		return __i915_pmu_event_read_other(event);
> +}
> +
>   static void i915_pmu_event_read(struct perf_event *event)
>   {
>   	struct drm_i915_private *i915 =

No real complaints - it is tidier and more readable. Just drop a note in 
the commit that is why and:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 9/9] drm/i915/pmu: Enable legacy PMU events for MTL
  2023-03-30  0:41 ` [Intel-gfx] [PATCH 9/9] drm/i915/pmu: Enable legacy PMU events for MTL Umesh Nerlige Ramappa
@ 2023-03-30 13:38   ` Tvrtko Ursulin
  2023-03-30 18:31     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2023-03-30 13:38 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx


On 30/03/2023 01:41, Umesh Nerlige Ramappa wrote:
> MTL introduces separate GTs for render and media. This complicates the
> definition of frequency and rc6 counters for the GPU as a whole since
> each GT has an independent counter. The best way to support this change
> is to deprecate the GPU-specific counters and create GT-specific
> counters, however that just breaks ABI. Since perf tools and scripts may
> be decentralized with probably many users, it's hard to deprecate the
> legacy counters and have all the users on board with that.
> 
> Re-introduce the legacy counters and support them as min/max of
> GT-specific counters as necessary to ensure backwards compatibility.
> 
> I915_PMU_ACTUAL_FREQUENCY - will show max of GT-specific counters
> I915_PMU_REQUESTED_FREQUENCY - will show max of GT-specific counters
> I915_PMU_INTERRUPTS - no changes since it is GPU specific on all platforms
> I915_PMU_RC6_RESIDENCY - will show min of GT-specific counters
> I915_PMU_SOFTWARE_GT_AWAKE_TIME - will show max of GT-specific counters

IMO max/min games are _very_ low value and probably just confusing.

I am not convinced we need to burden the kernel with this. New platform, 
new counters.. userspace can just deal with it.

In intel_gpu_top we can do the smarts in maybe default aggregated view 
(piggy back/extend on engines aggregation via command line '-p' or '1' 
at runtime). But then it's not min/max but probably normalized by number 
of gts.

Regards,

Tvrtko

> 
> Note:
> - For deeper debugging of performance issues, tools must be upgraded to
>    read the GT-specific counters.
> - This patch deserves to be separate from the other PMU features so that
>    it can be easily dropped if legacy events are ever deprecated.
> - Internal implementation relies on creating an extra entry in the
>    arrays used for GT specific counters. Index 0 is empty.
>    Index 1 through N are mapped to GTs 0 through N - 1.
> - User interface will use GT numbers indexed from 0 to specify the GT of
>    interest.
> 
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 134 +++++++++++++++++++++++++++-----
>   drivers/gpu/drm/i915/i915_pmu.h |   2 +-
>   include/uapi/drm/i915_drm.h     |  14 ++--
>   3 files changed, 125 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 9bd9605d2662..0dc7711c3b4b 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -221,7 +221,7 @@ add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val,
>   static u64 get_rc6(struct intel_gt *gt)
>   {
>   	struct drm_i915_private *i915 = gt->i915;
> -	const unsigned int gt_id = gt->info.id;
> +	const unsigned int gt_id = gt->info.id + 1;
>   	struct i915_pmu *pmu = &i915->pmu;
>   	unsigned long flags;
>   	bool awake = false;
> @@ -267,24 +267,26 @@ static void init_rc6(struct i915_pmu *pmu)
>   
>   	for_each_gt(gt, i915, i) {
>   		intel_wakeref_t wakeref;
> +		const unsigned int gt_id = i + 1;
>   
>   		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
>   			u64 val = __get_rc6(gt);
>   
> -			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
> -			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
> +			store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
> +			store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED,
>   				     val);
> -			pmu->sleep_last[i] = ktime_get_raw();
> +			pmu->sleep_last[gt_id] = ktime_get_raw();
>   		}
>   	}
>   }
>   
>   static void park_rc6(struct intel_gt *gt)
>   {
> +	const unsigned int gt_id = gt->info.id + 1;
>   	struct i915_pmu *pmu = &gt->i915->pmu;
>   
> -	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
> -	pmu->sleep_last[gt->info.id] = ktime_get_raw();
> +	store_sample(pmu, gt_id, __I915_SAMPLE_RC6, __get_rc6(gt));
> +	pmu->sleep_last[gt_id] = ktime_get_raw();
>   }
>   
>   static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
> @@ -436,18 +438,18 @@ static void
>   frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>   {
>   	struct drm_i915_private *i915 = gt->i915;
> -	const unsigned int gt_id = gt->info.id;
> +	const unsigned int gt_id = gt->info.id + 1;
>   	struct i915_pmu *pmu = &i915->pmu;
>   	struct intel_rps *rps = &gt->rps;
>   
> -	if (!frequency_sampling_enabled(pmu, gt_id))
> +	if (!frequency_sampling_enabled(pmu, gt->info.id))
>   		return;
>   
>   	/* Report 0/0 (actual/requested) frequency while parked. */
>   	if (!intel_gt_pm_get_if_awake(gt))
>   		return;
>   
> -	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
> +	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt->info.id))) {
>   		u32 val;
>   
>   		/*
> @@ -467,7 +469,7 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>   				val, period_ns / 1000);
>   	}
>   
> -	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
> +	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt->info.id))) {
>   		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
>   				intel_rps_get_requested_frequency(rps),
>   				period_ns / 1000);
> @@ -545,14 +547,15 @@ engine_event_status(struct intel_engine_cs *engine,
>   static int
>   config_status(struct drm_i915_private *i915, u64 config)
>   {
> -	struct intel_gt *gt = to_gt(i915);
> -
>   	unsigned int gt_id = config_gt_id(config);
> -	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
> +	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 2 : 1;
> +	struct intel_gt *gt;
>   
>   	if (gt_id > max_gt_id)
>   		return -ENOENT;
>   
> +	gt = !gt_id ? to_gt(i915) : i915->gt[gt_id - 1];
> +
>   	switch (config_counter(config)) {
>   	case I915_PMU_ACTUAL_FREQUENCY:
>   		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
> @@ -673,23 +676,58 @@ static u64 __i915_pmu_event_read_other(struct perf_event *event)
>   	const unsigned int gt_id = config_gt_id(event->attr.config);
>   	const u64 config = config_counter(event->attr.config);
>   	struct i915_pmu *pmu = &i915->pmu;
> +	struct intel_gt *gt;
>   	u64 val = 0;
> +	int i;
>   
>   	switch (config) {
>   	case I915_PMU_ACTUAL_FREQUENCY:
> -		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
> +		if (gt_id)
> +			return read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
> +
> +		if (!HAS_EXTRA_GT_LIST(i915))
> +			return read_sample_us(pmu, 1, __I915_SAMPLE_FREQ_ACT);
> +
> +		for_each_gt(gt, i915, i)
> +			val = max(val, read_sample_us(pmu, i + 1, __I915_SAMPLE_FREQ_ACT));
> +
>   		break;
>   	case I915_PMU_REQUESTED_FREQUENCY:
> -		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
> +		if (gt_id)
> +			return read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
> +
> +		if (!HAS_EXTRA_GT_LIST(i915))
> +			return read_sample_us(pmu, 1, __I915_SAMPLE_FREQ_REQ);
> +
> +		for_each_gt(gt, i915, i)
> +			val = max(val, read_sample_us(pmu, i + 1, __I915_SAMPLE_FREQ_REQ));
> +
>   		break;
>   	case I915_PMU_INTERRUPTS:
>   		val = READ_ONCE(pmu->irq_count);
>   		break;
>   	case I915_PMU_RC6_RESIDENCY:
> -		val = get_rc6(i915->gt[gt_id]);
> +		if (gt_id)
> +			return get_rc6(i915->gt[gt_id - 1]);
> +
> +		if (!HAS_EXTRA_GT_LIST(i915))
> +			return get_rc6(i915->gt[0]);
> +
> +		val = U64_MAX;
> +		for_each_gt(gt, i915, i)
> +			val = min(val, get_rc6(gt));
> +
>   		break;
>   	case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
> -		val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
> +		if (gt_id)
> +			return ktime_to_ns(intel_gt_get_awake_time(i915->gt[gt_id - 1]));
> +
> +		if (!HAS_EXTRA_GT_LIST(i915))
> +			return ktime_to_ns(intel_gt_get_awake_time(i915->gt[0]));
> +
> +		val = 0;
> +		for_each_gt(gt, i915, i)
> +			val = max((s64)val, ktime_to_ns(intel_gt_get_awake_time(gt)));
>   		break;
>   	}
>   
> @@ -728,11 +766,14 @@ static void i915_pmu_event_read(struct perf_event *event)
>   
>   static void i915_pmu_enable(struct perf_event *event)
>   {
> +	const unsigned int gt_id = config_gt_id(event->attr.config);
>   	struct drm_i915_private *i915 =
>   		container_of(event->pmu, typeof(*i915), pmu.base);
>   	struct i915_pmu *pmu = &i915->pmu;
> +	struct intel_gt *gt;
>   	unsigned long flags;
>   	unsigned int bit;
> +	u64 i;
>   
>   	bit = event_bit(event);
>   	if (bit == -1)
> @@ -745,12 +786,42 @@ static void i915_pmu_enable(struct perf_event *event)
>   	 * the event reference counter.
>   	 */
>   	BUILD_BUG_ON(ARRAY_SIZE(pmu->enable_count) != I915_PMU_MASK_BITS);
> +	BUILD_BUG_ON(BITS_PER_TYPE(pmu->enable) < I915_PMU_MASK_BITS);
>   	GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>   	GEM_BUG_ON(pmu->enable_count[bit] == ~0);
>   
>   	pmu->enable |= BIT_ULL(bit);
>   	pmu->enable_count[bit]++;
>   
> +	/*
> +	 * The arrays that i915_pmu maintains are now indexed as
> +	 *
> +	 * 0 - aggregate events (a.k.a !gt_id)
> +	 * 1 - gt0
> +	 * 2 - gt1
> +	 *
> +	 * The same logic applies to event_bit masks. The first set of mask are
> +	 * for aggregate, followed by gt0 and gt1 masks. The idea here is to
> +	 * enable the event on all gts if the aggregate event bit is set. This
> +	 * applies only to the non-engine-events.
> +	 */
> +	if (!gt_id && !is_engine_event(event)) {
> +		for_each_gt(gt, i915, i) {
> +			u64 counter = config_counter(event->attr.config);
> +			u64 config = ((i + 1) << __I915_PMU_GT_SHIFT) | counter;
> +			unsigned int bit = config_bit(config);
> +
> +			if (bit == -1)
> +				continue;
> +
> +			GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
> +			GEM_BUG_ON(pmu->enable_count[bit] == ~0);
> +
> +			pmu->enable |= BIT_ULL(bit);
> +			pmu->enable_count[bit]++;
> +		}
> +	}
> +
>   	/*
>   	 * Start the sampling timer if needed and not already enabled.
>   	 */
> @@ -793,6 +864,7 @@ static void i915_pmu_enable(struct perf_event *event)
>   
>   static void i915_pmu_disable(struct perf_event *event)
>   {
> +	const unsigned int gt_id = config_gt_id(event->attr.config);
>   	struct drm_i915_private *i915 =
>   		container_of(event->pmu, typeof(*i915), pmu.base);
>   	unsigned int bit = event_bit(event);
> @@ -822,6 +894,26 @@ static void i915_pmu_disable(struct perf_event *event)
>   		 */
>   		if (--engine->pmu.enable_count[sample] == 0)
>   			engine->pmu.enable &= ~BIT(sample);
> +	} else if (!gt_id) {
> +		struct intel_gt *gt;
> +		u64 i;
> +
> +		for_each_gt(gt, i915, i) {
> +			u64 counter = config_counter(event->attr.config);
> +			u64 config = ((i + 1) << __I915_PMU_GT_SHIFT) | counter;
> +			unsigned int bit = config_bit(config);
> +
> +			if (bit == -1)
> +				continue;
> +
> +			GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
> +			GEM_BUG_ON(pmu->enable_count[bit] == 0);
> +
> +			if (--pmu->enable_count[bit] == 0) {
> +				pmu->enable &= ~BIT_ULL(bit);
> +				pmu->timer_enabled &= pmu_needs_timer(pmu, true);
> +			}
> +		}
>   	}
>   
>   	GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
> @@ -1002,7 +1094,11 @@ create_event_attributes(struct i915_pmu *pmu)
>   		const char *name;
>   		const char *unit;
>   	} global_events[] = {
> +		__event(0, "actual-frequency", "M"),
> +		__event(1, "requested-frequency", "M"),
>   		__event(2, "interrupts", NULL),
> +		__event(3, "rc6-residency", "ns"),
> +		__event(4, "software-gt-awake-time", "ns"),
>   	};
>   	static const struct {
>   		enum drm_i915_pmu_engine_sample sample;
> @@ -1024,7 +1120,7 @@ create_event_attributes(struct i915_pmu *pmu)
>   	/* per gt counters */
>   	for_each_gt(gt, i915, j) {
>   		for (i = 0; i < ARRAY_SIZE(events); i++) {
> -			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
> +			u64 config = ___I915_PMU_OTHER(j + 1, events[i].counter);
>   
>   			if (!config_status(i915, config))
>   				count++;
> @@ -1070,7 +1166,7 @@ create_event_attributes(struct i915_pmu *pmu)
>   	/* per gt counters */
>   	for_each_gt(gt, i915, j) {
>   		for (i = 0; i < ARRAY_SIZE(events); i++) {
> -			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
> +			u64 config = ___I915_PMU_OTHER(j + 1, events[i].counter);
>   			char *str;
>   
>   			if (config_status(i915, config))
> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
> index a708e44a227e..a4cc1eb218fc 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.h
> +++ b/drivers/gpu/drm/i915/i915_pmu.h
> @@ -38,7 +38,7 @@ enum {
>   	__I915_NUM_PMU_SAMPLERS
>   };
>   
> -#define I915_PMU_MAX_GTS (4) /* FIXME */
> +#define I915_PMU_MAX_GTS (4 + 1) /* FIXME */
>   
>   /**
>    * How many different events we track in the global PMU mask.
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index bbab7f3dbeb4..18794c30027f 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -290,6 +290,7 @@ enum drm_i915_pmu_engine_sample {
>   	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
>   	((__u64)(gt) << __I915_PMU_GT_SHIFT))
>   
> +/* Aggregate from all gts */
>   #define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
>   
>   #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
> @@ -300,11 +301,14 @@ enum drm_i915_pmu_engine_sample {
>   
>   #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
>   
> -#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
> -#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
> -#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
> -#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
> -#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
> +/* GT specific counters */
> +#define ____I915_PMU_OTHER(gt, x) ___I915_PMU_OTHER(((gt) + 1), x)
> +
> +#define __I915_PMU_ACTUAL_FREQUENCY(gt)		____I915_PMU_OTHER(gt, 0)
> +#define __I915_PMU_REQUESTED_FREQUENCY(gt)	____I915_PMU_OTHER(gt, 1)
> +#define __I915_PMU_INTERRUPTS(gt)		____I915_PMU_OTHER(gt, 2)
> +#define __I915_PMU_RC6_RESIDENCY(gt)		____I915_PMU_OTHER(gt, 3)
> +#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	____I915_PMU_OTHER(gt, 4)
>   
>   /* Each region is a minimum of 16k, and there are at most 255 of them.
>    */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 6/9] drm/i915/pmu: Export counters from all tiles
  2023-03-30 13:01   ` Tvrtko Ursulin
@ 2023-03-30 17:33     ` Umesh Nerlige Ramappa
  2023-03-31  8:57       ` Tvrtko Ursulin
  0 siblings, 1 reply; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30 17:33 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Mar 30, 2023 at 02:01:42PM +0100, Tvrtko Ursulin wrote:
>
>On 30/03/2023 01:41, Umesh Nerlige Ramappa wrote:
>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>>Start exporting frequency and RC6 counters from all tiles.
>>
>>Existing counters keep their names and config values and new one use the
>>namespace added in the previous patch, with the "-gtN" added to their
>>names.
>
>The part about keeping the names is not in the code any more. So something will have to give, either the commit text or the code.
>
>Even without that detail, I suspect someone might want to add them Co-developed-by since I *think* someone did some changes.
>>Interrupts counter is an odd one off. Because it is the global device
>>counters (not only GT) we choose not to add per tile versions for now.
>>
>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
>>---
>>  drivers/gpu/drm/i915/i915_pmu.c | 96 ++++++++++++++++++++++++++-------
>>  1 file changed, 77 insertions(+), 19 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>index 5d1de98d86b4..2a5deabff088 100644
>>--- a/drivers/gpu/drm/i915/i915_pmu.c
>>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>>@@ -548,8 +548,9 @@ config_status(struct drm_i915_private *i915, u64 config)
>>  	struct intel_gt *gt = to_gt(i915);
>>  	unsigned int gt_id = config_gt_id(config);
>>+	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
>>-	if (gt_id)
>>+	if (gt_id > max_gt_id)
>>  		return -ENOENT;
>>  	switch (config_counter(config)) {
>>@@ -563,6 +564,8 @@ config_status(struct drm_i915_private *i915, u64 config)
>>  			return -ENODEV;
>>  		break;
>>  	case I915_PMU_INTERRUPTS:
>>+		if (gt_id)
>>+			return -ENOENT;
>>  		break;
>>  	case I915_PMU_RC6_RESIDENCY:
>>  		if (!gt->rc6.supported)
>>@@ -932,9 +935,9 @@ static const struct attribute_group i915_pmu_cpumask_attr_group = {
>>  	.attrs = i915_cpumask_attrs,
>>  };
>>-#define __event(__config, __name, __unit) \
>>+#define __event(__counter, __name, __unit) \
>>  { \
>>-	.config = (__config), \
>>+	.counter = (__counter), \
>>  	.name = (__name), \
>>  	.unit = (__unit), \
>>  }
>>@@ -975,15 +978,21 @@ create_event_attributes(struct i915_pmu *pmu)
>>  {
>>  	struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
>>  	static const struct {
>>-		u64 config;
>>+		unsigned int counter;
>>  		const char *name;
>>  		const char *unit;
>>  	} events[] = {
>>-		__event(I915_PMU_ACTUAL_FREQUENCY, "actual-frequency", "M"),
>>-		__event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
>>-		__event(I915_PMU_INTERRUPTS, "interrupts", NULL),
>>-		__event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
>>-		__event(I915_PMU_SOFTWARE_GT_AWAKE_TIME, "software-gt-awake-time", "ns"),
>>+		__event(0, "actual-frequency", "M"),
>>+		__event(1, "requested-frequency", "M"),
>>+		__event(3, "rc6-residency", "ns"),
>>+		__event(4, "software-gt-awake-time", "ns"),
>>+	};
>>+	static const struct {
>>+		unsigned int counter;
>>+		const char *name;
>>+		const char *unit;
>>+	} global_events[] = {
>>+		__event(2, "interrupts", NULL),
>>  	};
>>  	static const struct {
>>  		enum drm_i915_pmu_engine_sample sample;
>>@@ -998,14 +1007,29 @@ create_event_attributes(struct i915_pmu *pmu)
>>  	struct i915_ext_attribute *i915_attr = NULL, *i915_iter;
>>  	struct attribute **attr = NULL, **attr_iter;
>>  	struct intel_engine_cs *engine;
>>-	unsigned int i;
>>+	struct intel_gt *gt;
>>+	unsigned int i, j;
>>  	/* Count how many counters we will be exposing. */
>>-	for (i = 0; i < ARRAY_SIZE(events); i++) {
>>-		if (!config_status(i915, events[i].config))
>>+	/* per gt counters */
>
>Two comments one by another, two styles - the inconsistency hurts.
>
>Not sure why global events needed to be split out into a separate array? Like this below two loops are needed for each stage instead of one. AFAIR one array and one loop would just work because config_status wold report global ones as unsupported for gt > 0.

The idea was to add the legacy events into the global array. These 
events will not have -gtN appeneded to it. Note that on a single gt 
platform, my idea is to have both legacy as well as gt0 events.

ADLP:
actual-frequency
actual-frequency-gt0

MTL:
actual-frequency
actual-frequency-gt0
actual-frequency-gt1

>
>[Comes back later. It looked like this in my code:
>
>        static const struct {
>-               u64 config;
>+               unsigned int counter;
>                const char *name;
>                const char *unit;
>+               bool global;
>        } events[] = {
>-               __event(I915_PMU_ACTUAL_FREQUENCY, "actual-frequency", "M"),
>-               __event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
>-               __event(I915_PMU_INTERRUPTS, "interrupts", NULL),
>-               __event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
>+               /*
>+                * #define __I915_PMU_ACTUAL_FREQUENCY(gt)    ___I915_PMU_OTHER(gt, 0)
>+                * #define __I915_PMU_REQUESTED_FREQUENCY(gt) ___I915_PMU_OTHER(gt, 1)
>+                * #define __I915_PMU_INTERRUPTS(gt)          ___I915_PMU_OTHER(gt, 2)
>+                * #define __I915_PMU_RC6_RESIDENCY(gt)       ___I915_PMU_OTHER(gt, 3)
>+                */
>+               __event(0, "actual-frequency", "M"),
>+               __event(1, "requested-frequency", "M"),
>+               __global_event(2, "interrupts", NULL),
>+               __event(3, "rc6-residency", "ns"),
>
>...
>
>        /* Count how many counters we will be exposing. */
>-       for (i = 0; i < ARRAY_SIZE(events); i++) {
>-               if (!config_status(i915, events[i].config))
>-                       count++;
>+       for_each_gt(i915, j, gt) {
>+               for (i = 0; i < ARRAY_SIZE(events); i++) {
>+                       u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>+
>+                       if (!config_status(i915, config))
>+                               count++;
>+               }
>
>So AFAICT it just worked.

If we decide to drop 9/9, then I would drop the 7/9 and 8/9 and just 
move back to your original patches because they worked as is. The only 
open then would be if we want to have the -gt0 events as well for single 
gt platforms.

The idea is to make this similar to what's implemented for the sysfs 
frequency/rc6 attribues in /sys/class/drm/card0. There is a root version 
as well as a gt/gt0 version. fwiu, gt/gt0 attributes are used on a 
single gt platform.

>
>]
>
>>+	for_each_gt(gt, i915, j) {
>>+		for (i = 0; i < ARRAY_SIZE(events); i++) {
>>+			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>>+
>>+			if (!config_status(i915, config))
>>+				count++;
>>+		}
>>+	}
>>+
>>+	/* global (per GPU) counters */
>>+	for (i = 0; i < ARRAY_SIZE(global_events); i++) {
>>+		u64 config = ___I915_PMU_OTHER(0, global_events[i].counter);
>>+
>>+		if (!config_status(i915, config))
>>  			count++;
>>  	}
>>+	/* per engine counters */
>>  	for_each_uabi_engine(engine, i915) {
>>  		for (i = 0; i < ARRAY_SIZE(engine_events); i++) {
>>  			if (!engine_event_status(engine,
>>@@ -1033,26 +1057,60 @@ create_event_attributes(struct i915_pmu *pmu)
>>  	attr_iter = attr;
>>  	/* Initialize supported non-engine counters. */
>>-	for (i = 0; i < ARRAY_SIZE(events); i++) {
>>+	/* per gt counters */
>>+	for_each_gt(gt, i915, j) {
>>+		for (i = 0; i < ARRAY_SIZE(events); i++) {
>>+			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>>+			char *str;
>>+
>>+			if (config_status(i915, config))
>>+				continue;
>>+
>>+			str = kasprintf(GFP_KERNEL, "%s-gt%u",
>>+					events[i].name, j);
>
>So with this patch all old platforms change the event names. This is not how I wrote it, and more importantly, it breaks userspace. Why would we do it?

With this patch alone, yes, this would break the uapi. With the series, 
not really because those events are added back in 9/9. Should we retain 
uapi compatibility in each patch? If yes, then I need to change this.

>
>For reference I dug out my code from 2020 and it looked like this:
>
>+                       if (events[i].global || !i915->remote_tiles)
>+                               str = kstrdup(events[i].name, GFP_KERNEL);
>+                       else
>+                               str = kasprintf(GFP_KERNEL, "%s-gt%u",
>+                                               events[i].name, j);
>
>So on single tile platforms names remain the same.

The series still maintains the same idea, but also adds xxxx-gt0 events 
for the old platforms.

Thanks,
Umesh

>
>Regards,
>
>Tvrtko
>
>>+			if (!str)
>>+				goto err;
>>+
>>+			*attr_iter++ = &i915_iter->attr.attr;
>>+			i915_iter = add_i915_attr(i915_iter, str, config);
>>+
>>+			if (events[i].unit) {
>>+				str = kasprintf(GFP_KERNEL, "%s-gt%u.unit",
>>+						events[i].name, j);
>>+				if (!str)
>>+					goto err;
>>+
>>+				*attr_iter++ = &pmu_iter->attr.attr;
>>+				pmu_iter = add_pmu_attr(pmu_iter, str,
>>+							events[i].unit);
>>+			}
>>+		}
>>+	}
>>+
>>+	/* global (per GPU) counters */
>>+	for (i = 0; i < ARRAY_SIZE(global_events); i++) {
>>+		u64 config = ___I915_PMU_OTHER(0, global_events[i].counter);
>>  		char *str;
>>-		if (config_status(i915, events[i].config))
>>+		if (config_status(i915, config))
>>  			continue;
>>-		str = kstrdup(events[i].name, GFP_KERNEL);
>>+		str = kstrdup(global_events[i].name, GFP_KERNEL);
>>  		if (!str)
>>  			goto err;
>>  		*attr_iter++ = &i915_iter->attr.attr;
>>-		i915_iter = add_i915_attr(i915_iter, str, events[i].config);
>>+		i915_iter = add_i915_attr(i915_iter, str, config);
>>-		if (events[i].unit) {
>>-			str = kasprintf(GFP_KERNEL, "%s.unit", events[i].name);
>>+		if (global_events[i].unit) {
>>+			str = kasprintf(GFP_KERNEL, "%s.unit",
>>+					global_events[i].name);
>>  			if (!str)
>>  				goto err;
>>  			*attr_iter++ = &pmu_iter->attr.attr;
>>-			pmu_iter = add_pmu_attr(pmu_iter, str, events[i].unit);
>>+			pmu_iter = add_pmu_attr(pmu_iter, str,
>>+						global_events[i].unit);
>>  		}
>>  	}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 9/9] drm/i915/pmu: Enable legacy PMU events for MTL
  2023-03-30 13:38   ` Tvrtko Ursulin
@ 2023-03-30 18:31     ` Umesh Nerlige Ramappa
  2023-03-31 13:02       ` Tvrtko Ursulin
  2023-04-03 19:16       ` Umesh Nerlige Ramappa
  0 siblings, 2 replies; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-03-30 18:31 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

+ Joonas for comments on this

On Thu, Mar 30, 2023 at 02:38:03PM +0100, Tvrtko Ursulin wrote:
>
>On 30/03/2023 01:41, Umesh Nerlige Ramappa wrote:
>>MTL introduces separate GTs for render and media. This complicates the
>>definition of frequency and rc6 counters for the GPU as a whole since
>>each GT has an independent counter. The best way to support this change
>>is to deprecate the GPU-specific counters and create GT-specific
>>counters, however that just breaks ABI. Since perf tools and scripts may
>>be decentralized with probably many users, it's hard to deprecate the
>>legacy counters and have all the users on board with that.
>>
>>Re-introduce the legacy counters and support them as min/max of
>>GT-specific counters as necessary to ensure backwards compatibility.
>>
>>I915_PMU_ACTUAL_FREQUENCY - will show max of GT-specific counters
>>I915_PMU_REQUESTED_FREQUENCY - will show max of GT-specific counters
>>I915_PMU_INTERRUPTS - no changes since it is GPU specific on all platforms
>>I915_PMU_RC6_RESIDENCY - will show min of GT-specific counters
>>I915_PMU_SOFTWARE_GT_AWAKE_TIME - will show max of GT-specific counters
>
>IMO max/min games are _very_ low value and probably just confusing.

By value, do you mean ROI or actually that the values would be 
incorrect?

>
>I am not convinced we need to burden the kernel with this. New 
>platform, new counters.. userspace can just deal with it.

I agree and would prefer to drop this patch. There are some counter 
arguments, I have added Joonas here for comments.

1) an app/script hard-coded with the legacy events would be used on a 
new platform and fail and we should maintain backwards compatibility.

2) the sysfs attributes for rc6/frequency have already adopted an 
aggregate vs gt0/gt1 approach to address that and pmu should have a 
similar solution (or rather, PMU and the sysfs approaches should match 
based on whatever is the approach)

Regards,
Umesh

>
>In intel_gpu_top we can do the smarts in maybe default aggregated view 
>(piggy back/extend on engines aggregation via command line '-p' or '1' 
>at runtime). But then it's not min/max but probably normalized by 
>number of gts.
>
>Regards,
>
>Tvrtko
>
>>
>>Note:
>>- For deeper debugging of performance issues, tools must be upgraded to
>>   read the GT-specific counters.
>>- This patch deserves to be separate from the other PMU features so that
>>   it can be easily dropped if legacy events are ever deprecated.
>>- Internal implementation relies on creating an extra entry in the
>>   arrays used for GT specific counters. Index 0 is empty.
>>   Index 1 through N are mapped to GTs 0 through N - 1.
>>- User interface will use GT numbers indexed from 0 to specify the GT of
>>   interest.
>>
>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>---
>>  drivers/gpu/drm/i915/i915_pmu.c | 134 +++++++++++++++++++++++++++-----
>>  drivers/gpu/drm/i915/i915_pmu.h |   2 +-
>>  include/uapi/drm/i915_drm.h     |  14 ++--
>>  3 files changed, 125 insertions(+), 25 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>index 9bd9605d2662..0dc7711c3b4b 100644
>>--- a/drivers/gpu/drm/i915/i915_pmu.c
>>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>>@@ -221,7 +221,7 @@ add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val,
>>  static u64 get_rc6(struct intel_gt *gt)
>>  {
>>  	struct drm_i915_private *i915 = gt->i915;
>>-	const unsigned int gt_id = gt->info.id;
>>+	const unsigned int gt_id = gt->info.id + 1;
>>  	struct i915_pmu *pmu = &i915->pmu;
>>  	unsigned long flags;
>>  	bool awake = false;
>>@@ -267,24 +267,26 @@ static void init_rc6(struct i915_pmu *pmu)
>>  	for_each_gt(gt, i915, i) {
>>  		intel_wakeref_t wakeref;
>>+		const unsigned int gt_id = i + 1;
>>  		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
>>  			u64 val = __get_rc6(gt);
>>-			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
>>-			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
>>+			store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
>>+			store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED,
>>  				     val);
>>-			pmu->sleep_last[i] = ktime_get_raw();
>>+			pmu->sleep_last[gt_id] = ktime_get_raw();
>>  		}
>>  	}
>>  }
>>  static void park_rc6(struct intel_gt *gt)
>>  {
>>+	const unsigned int gt_id = gt->info.id + 1;
>>  	struct i915_pmu *pmu = &gt->i915->pmu;
>>-	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
>>-	pmu->sleep_last[gt->info.id] = ktime_get_raw();
>>+	store_sample(pmu, gt_id, __I915_SAMPLE_RC6, __get_rc6(gt));
>>+	pmu->sleep_last[gt_id] = ktime_get_raw();
>>  }
>>  static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
>>@@ -436,18 +438,18 @@ static void
>>  frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>>  {
>>  	struct drm_i915_private *i915 = gt->i915;
>>-	const unsigned int gt_id = gt->info.id;
>>+	const unsigned int gt_id = gt->info.id + 1;
>>  	struct i915_pmu *pmu = &i915->pmu;
>>  	struct intel_rps *rps = &gt->rps;
>>-	if (!frequency_sampling_enabled(pmu, gt_id))
>>+	if (!frequency_sampling_enabled(pmu, gt->info.id))
>>  		return;
>>  	/* Report 0/0 (actual/requested) frequency while parked. */
>>  	if (!intel_gt_pm_get_if_awake(gt))
>>  		return;
>>-	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
>>+	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt->info.id))) {
>>  		u32 val;
>>  		/*
>>@@ -467,7 +469,7 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>>  				val, period_ns / 1000);
>>  	}
>>-	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
>>+	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt->info.id))) {
>>  		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
>>  				intel_rps_get_requested_frequency(rps),
>>  				period_ns / 1000);
>>@@ -545,14 +547,15 @@ engine_event_status(struct intel_engine_cs *engine,
>>  static int
>>  config_status(struct drm_i915_private *i915, u64 config)
>>  {
>>-	struct intel_gt *gt = to_gt(i915);
>>-
>>  	unsigned int gt_id = config_gt_id(config);
>>-	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
>>+	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 2 : 1;
>>+	struct intel_gt *gt;
>>  	if (gt_id > max_gt_id)
>>  		return -ENOENT;
>>+	gt = !gt_id ? to_gt(i915) : i915->gt[gt_id - 1];
>>+
>>  	switch (config_counter(config)) {
>>  	case I915_PMU_ACTUAL_FREQUENCY:
>>  		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
>>@@ -673,23 +676,58 @@ static u64 __i915_pmu_event_read_other(struct perf_event *event)
>>  	const unsigned int gt_id = config_gt_id(event->attr.config);
>>  	const u64 config = config_counter(event->attr.config);
>>  	struct i915_pmu *pmu = &i915->pmu;
>>+	struct intel_gt *gt;
>>  	u64 val = 0;
>>+	int i;
>>  	switch (config) {
>>  	case I915_PMU_ACTUAL_FREQUENCY:
>>-		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
>>+		if (gt_id)
>>+			return read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
>>+
>>+		if (!HAS_EXTRA_GT_LIST(i915))
>>+			return read_sample_us(pmu, 1, __I915_SAMPLE_FREQ_ACT);
>>+
>>+		for_each_gt(gt, i915, i)
>>+			val = max(val, read_sample_us(pmu, i + 1, __I915_SAMPLE_FREQ_ACT));
>>+
>>  		break;
>>  	case I915_PMU_REQUESTED_FREQUENCY:
>>-		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
>>+		if (gt_id)
>>+			return read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
>>+
>>+		if (!HAS_EXTRA_GT_LIST(i915))
>>+			return read_sample_us(pmu, 1, __I915_SAMPLE_FREQ_REQ);
>>+
>>+		for_each_gt(gt, i915, i)
>>+			val = max(val, read_sample_us(pmu, i + 1, __I915_SAMPLE_FREQ_REQ));
>>+
>>  		break;
>>  	case I915_PMU_INTERRUPTS:
>>  		val = READ_ONCE(pmu->irq_count);
>>  		break;
>>  	case I915_PMU_RC6_RESIDENCY:
>>-		val = get_rc6(i915->gt[gt_id]);
>>+		if (gt_id)
>>+			return get_rc6(i915->gt[gt_id - 1]);
>>+
>>+		if (!HAS_EXTRA_GT_LIST(i915))
>>+			return get_rc6(i915->gt[0]);
>>+
>>+		val = U64_MAX;
>>+		for_each_gt(gt, i915, i)
>>+			val = min(val, get_rc6(gt));
>>+
>>  		break;
>>  	case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
>>-		val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
>>+		if (gt_id)
>>+			return ktime_to_ns(intel_gt_get_awake_time(i915->gt[gt_id - 1]));
>>+
>>+		if (!HAS_EXTRA_GT_LIST(i915))
>>+			return ktime_to_ns(intel_gt_get_awake_time(i915->gt[0]));
>>+
>>+		val = 0;
>>+		for_each_gt(gt, i915, i)
>>+			val = max((s64)val, ktime_to_ns(intel_gt_get_awake_time(gt)));
>>  		break;
>>  	}
>>@@ -728,11 +766,14 @@ static void i915_pmu_event_read(struct perf_event *event)
>>  static void i915_pmu_enable(struct perf_event *event)
>>  {
>>+	const unsigned int gt_id = config_gt_id(event->attr.config);
>>  	struct drm_i915_private *i915 =
>>  		container_of(event->pmu, typeof(*i915), pmu.base);
>>  	struct i915_pmu *pmu = &i915->pmu;
>>+	struct intel_gt *gt;
>>  	unsigned long flags;
>>  	unsigned int bit;
>>+	u64 i;
>>  	bit = event_bit(event);
>>  	if (bit == -1)
>>@@ -745,12 +786,42 @@ static void i915_pmu_enable(struct perf_event *event)
>>  	 * the event reference counter.
>>  	 */
>>  	BUILD_BUG_ON(ARRAY_SIZE(pmu->enable_count) != I915_PMU_MASK_BITS);
>>+	BUILD_BUG_ON(BITS_PER_TYPE(pmu->enable) < I915_PMU_MASK_BITS);
>>  	GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>>  	GEM_BUG_ON(pmu->enable_count[bit] == ~0);
>>  	pmu->enable |= BIT_ULL(bit);
>>  	pmu->enable_count[bit]++;
>>+	/*
>>+	 * The arrays that i915_pmu maintains are now indexed as
>>+	 *
>>+	 * 0 - aggregate events (a.k.a !gt_id)
>>+	 * 1 - gt0
>>+	 * 2 - gt1
>>+	 *
>>+	 * The same logic applies to event_bit masks. The first set of mask are
>>+	 * for aggregate, followed by gt0 and gt1 masks. The idea here is to
>>+	 * enable the event on all gts if the aggregate event bit is set. This
>>+	 * applies only to the non-engine-events.
>>+	 */
>>+	if (!gt_id && !is_engine_event(event)) {
>>+		for_each_gt(gt, i915, i) {
>>+			u64 counter = config_counter(event->attr.config);
>>+			u64 config = ((i + 1) << __I915_PMU_GT_SHIFT) | counter;
>>+			unsigned int bit = config_bit(config);
>>+
>>+			if (bit == -1)
>>+				continue;
>>+
>>+			GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>>+			GEM_BUG_ON(pmu->enable_count[bit] == ~0);
>>+
>>+			pmu->enable |= BIT_ULL(bit);
>>+			pmu->enable_count[bit]++;
>>+		}
>>+	}
>>+
>>  	/*
>>  	 * Start the sampling timer if needed and not already enabled.
>>  	 */
>>@@ -793,6 +864,7 @@ static void i915_pmu_enable(struct perf_event *event)
>>  static void i915_pmu_disable(struct perf_event *event)
>>  {
>>+	const unsigned int gt_id = config_gt_id(event->attr.config);
>>  	struct drm_i915_private *i915 =
>>  		container_of(event->pmu, typeof(*i915), pmu.base);
>>  	unsigned int bit = event_bit(event);
>>@@ -822,6 +894,26 @@ static void i915_pmu_disable(struct perf_event *event)
>>  		 */
>>  		if (--engine->pmu.enable_count[sample] == 0)
>>  			engine->pmu.enable &= ~BIT(sample);
>>+	} else if (!gt_id) {
>>+		struct intel_gt *gt;
>>+		u64 i;
>>+
>>+		for_each_gt(gt, i915, i) {
>>+			u64 counter = config_counter(event->attr.config);
>>+			u64 config = ((i + 1) << __I915_PMU_GT_SHIFT) | counter;
>>+			unsigned int bit = config_bit(config);
>>+
>>+			if (bit == -1)
>>+				continue;
>>+
>>+			GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>>+			GEM_BUG_ON(pmu->enable_count[bit] == 0);
>>+
>>+			if (--pmu->enable_count[bit] == 0) {
>>+				pmu->enable &= ~BIT_ULL(bit);
>>+				pmu->timer_enabled &= pmu_needs_timer(pmu, true);
>>+			}
>>+		}
>>  	}
>>  	GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>>@@ -1002,7 +1094,11 @@ create_event_attributes(struct i915_pmu *pmu)
>>  		const char *name;
>>  		const char *unit;
>>  	} global_events[] = {
>>+		__event(0, "actual-frequency", "M"),
>>+		__event(1, "requested-frequency", "M"),
>>  		__event(2, "interrupts", NULL),
>>+		__event(3, "rc6-residency", "ns"),
>>+		__event(4, "software-gt-awake-time", "ns"),
>>  	};
>>  	static const struct {
>>  		enum drm_i915_pmu_engine_sample sample;
>>@@ -1024,7 +1120,7 @@ create_event_attributes(struct i915_pmu *pmu)
>>  	/* per gt counters */
>>  	for_each_gt(gt, i915, j) {
>>  		for (i = 0; i < ARRAY_SIZE(events); i++) {
>>-			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>>+			u64 config = ___I915_PMU_OTHER(j + 1, events[i].counter);
>>  			if (!config_status(i915, config))
>>  				count++;
>>@@ -1070,7 +1166,7 @@ create_event_attributes(struct i915_pmu *pmu)
>>  	/* per gt counters */
>>  	for_each_gt(gt, i915, j) {
>>  		for (i = 0; i < ARRAY_SIZE(events); i++) {
>>-			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>>+			u64 config = ___I915_PMU_OTHER(j + 1, events[i].counter);
>>  			char *str;
>>  			if (config_status(i915, config))
>>diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>>index a708e44a227e..a4cc1eb218fc 100644
>>--- a/drivers/gpu/drm/i915/i915_pmu.h
>>+++ b/drivers/gpu/drm/i915/i915_pmu.h
>>@@ -38,7 +38,7 @@ enum {
>>  	__I915_NUM_PMU_SAMPLERS
>>  };
>>-#define I915_PMU_MAX_GTS (4) /* FIXME */
>>+#define I915_PMU_MAX_GTS (4 + 1) /* FIXME */
>>  /**
>>   * How many different events we track in the global PMU mask.
>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>index bbab7f3dbeb4..18794c30027f 100644
>>--- a/include/uapi/drm/i915_drm.h
>>+++ b/include/uapi/drm/i915_drm.h
>>@@ -290,6 +290,7 @@ enum drm_i915_pmu_engine_sample {
>>  	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
>>  	((__u64)(gt) << __I915_PMU_GT_SHIFT))
>>+/* Aggregate from all gts */
>>  #define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
>>  #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
>>@@ -300,11 +301,14 @@ enum drm_i915_pmu_engine_sample {
>>  #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
>>-#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
>>-#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
>>-#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
>>-#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
>>-#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
>>+/* GT specific counters */
>>+#define ____I915_PMU_OTHER(gt, x) ___I915_PMU_OTHER(((gt) + 1), x)
>>+
>>+#define __I915_PMU_ACTUAL_FREQUENCY(gt)		____I915_PMU_OTHER(gt, 0)
>>+#define __I915_PMU_REQUESTED_FREQUENCY(gt)	____I915_PMU_OTHER(gt, 1)
>>+#define __I915_PMU_INTERRUPTS(gt)		____I915_PMU_OTHER(gt, 2)
>>+#define __I915_PMU_RC6_RESIDENCY(gt)		____I915_PMU_OTHER(gt, 3)
>>+#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	____I915_PMU_OTHER(gt, 4)
>>  /* Each region is a minimum of 16k, and there are at most 255 of them.
>>   */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for Add MTL PMU support for multi-gt
  2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
                   ` (11 preceding siblings ...)
  2023-03-30  1:46 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2023-03-30 19:50 ` Patchwork
  12 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2023-03-30 19:50 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 14733 bytes --]

== Series Details ==

Series: Add MTL PMU support for multi-gt
URL   : https://patchwork.freedesktop.org/series/115836/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12937_full -> Patchwork_115836v1_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (7 -> 7)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_115836v1_full:

### IGT changes ###

#### Possible regressions ####

  * {igt@perf_pmu@rc6-all-gts} (NEW):
    - {shard-dg1}:        NOTRUN -> [SKIP][1] +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-dg1-14/igt@perf_pmu@rc6-all-gts.html
    - {shard-tglu}:       NOTRUN -> [SKIP][2] +1 similar issue
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-tglu-9/igt@perf_pmu@rc6-all-gts.html

  
New tests
---------

  New tests have been introduced between CI_DRM_12937_full and Patchwork_115836v1_full:

### New IGT tests (7) ###

  * igt@perf_pmu@frequency@gt0:
    - Statuses : 5 pass(s)
    - Exec time: [0.0] s

  * igt@perf_pmu@frequency@idle-gt0:
    - Statuses : 5 pass(s)
    - Exec time: [0.0] s

  * igt@perf_pmu@rc6-all-gts:
    - Statuses : 5 skip(s)
    - Exec time: [0.0] s

  * igt@perf_pmu@rc6@gt0:
    - Statuses : 5 pass(s)
    - Exec time: [0.0] s

  * igt@perf_pmu@rc6@other-idle-gt0:
    - Statuses : 5 skip(s)
    - Exec time: [0.0] s

  * igt@perf_pmu@rc6@runtime-pm-gt0:
    - Statuses : 4 pass(s) 1 skip(s)
    - Exec time: [0.0] s

  * igt@perf_pmu@rc6@runtime-pm-long-gt0:
    - Statuses : 4 pass(s) 1 skip(s)
    - Exec time: [0.0] s

  

Known issues
------------

  Here are the changes found in Patchwork_115836v1_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_fair@basic-deadline:
    - shard-apl:          [PASS][3] -> [FAIL][4] ([i915#2846])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-apl1/igt@gem_exec_fair@basic-deadline.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-apl2/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-glk:          [PASS][5] -> [FAIL][6] ([i915#2842])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-glk7/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-glk7/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_schedule@thriceslice:
    - shard-snb:          NOTRUN -> [SKIP][7] ([fdo#109271]) +59 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-snb5/igt@gem_exec_schedule@thriceslice.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions:
    - shard-glk:          [PASS][8] -> [FAIL][9] ([i915#2346]) +1 similar issue
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-glk2/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-glk4/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size:
    - shard-apl:          [PASS][10] -> [FAIL][11] ([i915#2346]) +1 similar issue
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-apl3/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-apl3/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html

  * igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@ab-hdmi-a1-hdmi-a2:
    - shard-glk:          [PASS][12] -> [FAIL][13] ([i915#2122])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-glk8/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@ab-hdmi-a1-hdmi-a2.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-glk8/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@ab-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1:
    - shard-apl:          [PASS][14] -> [FAIL][15] ([i915#79])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-apl7/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-apl1/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html

  * {igt@perf_pmu@rc6-all-gts} (NEW):
    - shard-apl:          NOTRUN -> [SKIP][16] ([fdo#109271]) +1 similar issue
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-apl4/igt@perf_pmu@rc6-all-gts.html
    - shard-glk:          NOTRUN -> [SKIP][17] ([fdo#109271]) +1 similar issue
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-glk8/igt@perf_pmu@rc6-all-gts.html

  
#### Possible fixes ####

  * igt@gem_exec_endless@dispatch@rcs0:
    - {shard-tglu}:       [TIMEOUT][18] ([i915#3778]) -> [PASS][19]
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-tglu-5/igt@gem_exec_endless@dispatch@rcs0.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-tglu-10/igt@gem_exec_endless@dispatch@rcs0.html

  * igt@gem_exec_whisper@basic-fds-forked-all:
    - {shard-tglu}:       [INCOMPLETE][20] ([i915#6755] / [i915#7663]) -> [PASS][21]
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-tglu-10/igt@gem_exec_whisper@basic-fds-forked-all.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-tglu-10/igt@gem_exec_whisper@basic-fds-forked-all.html

  * igt@i915_pm_rc6_residency@rc6-idle@rcs0:
    - {shard-tglu}:       [WARN][22] ([i915#2681]) -> [PASS][23]
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-tglu-2/igt@i915_pm_rc6_residency@rc6-idle@rcs0.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-tglu-9/igt@i915_pm_rc6_residency@rc6-idle@rcs0.html

  * igt@sysfs_heartbeat_interval@precise@rcs0:
    - {shard-dg1}:        [FAIL][24] ([i915#1755]) -> [PASS][25] +4 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-dg1-18/igt@sysfs_heartbeat_interval@precise@rcs0.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-dg1-16/igt@sysfs_heartbeat_interval@precise@rcs0.html

  
#### Warnings ####

  * igt@i915_pm_rps@reset:
    - shard-snb:          [DMESG-FAIL][26] ([i915#8319]) -> [INCOMPLETE][27] ([i915#7790])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12937/shard-snb4/igt@i915_pm_rps@reset.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/shard-snb2/igt@i915_pm_rps@reset.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109291]: https://bugs.freedesktop.org/show_bug.cgi?id=109291
  [fdo#109307]: https://bugs.freedesktop.org/show_bug.cgi?id=109307
  [fdo#109309]: https://bugs.freedesktop.org/show_bug.cgi?id=109309
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#110189]: https://bugs.freedesktop.org/show_bug.cgi?id=110189
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614
  [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615
  [fdo#111656]: https://bugs.freedesktop.org/show_bug.cgi?id=111656
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [fdo#112054]: https://bugs.freedesktop.org/show_bug.cgi?id=112054
  [fdo#112283]: https://bugs.freedesktop.org/show_bug.cgi?id=112283
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1257]: https://gitlab.freedesktop.org/drm/intel/issues/1257
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#1755]: https://gitlab.freedesktop.org/drm/intel/issues/1755
  [i915#1839]: https://gitlab.freedesktop.org/drm/intel/issues/1839
  [i915#1902]: https://gitlab.freedesktop.org/drm/intel/issues/1902
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346
  [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2575]: https://gitlab.freedesktop.org/drm/intel/issues/2575
  [i915#2587]: https://gitlab.freedesktop.org/drm/intel/issues/2587
  [i915#2658]: https://gitlab.freedesktop.org/drm/intel/issues/2658
  [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672
  [i915#2681]: https://gitlab.freedesktop.org/drm/intel/issues/2681
  [i915#280]: https://gitlab.freedesktop.org/drm/intel/issues/280
  [i915#284]: https://gitlab.freedesktop.org/drm/intel/issues/284
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2846]: https://gitlab.freedesktop.org/drm/intel/issues/2846
  [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297
  [i915#3318]: https://gitlab.freedesktop.org/drm/intel/issues/3318
  [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359
  [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458
  [i915#3469]: https://gitlab.freedesktop.org/drm/intel/issues/3469
  [i915#3539]: https://gitlab.freedesktop.org/drm/intel/issues/3539
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3742]: https://gitlab.freedesktop.org/drm/intel/issues/3742
  [i915#3778]: https://gitlab.freedesktop.org/drm/intel/issues/3778
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#3938]: https://gitlab.freedesktop.org/drm/intel/issues/3938
  [i915#3952]: https://gitlab.freedesktop.org/drm/intel/issues/3952
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270
  [i915#4281]: https://gitlab.freedesktop.org/drm/intel/issues/4281
  [i915#433]: https://gitlab.freedesktop.org/drm/intel/issues/433
  [i915#4538]: https://gitlab.freedesktop.org/drm/intel/issues/4538
  [i915#4565]: https://gitlab.freedesktop.org/drm/intel/issues/4565
  [i915#4579]: https://gitlab.freedesktop.org/drm/intel/issues/4579
  [i915#4771]: https://gitlab.freedesktop.org/drm/intel/issues/4771
  [i915#4812]: https://gitlab.freedesktop.org/drm/intel/issues/4812
  [i915#4818]: https://gitlab.freedesktop.org/drm/intel/issues/4818
  [i915#4833]: https://gitlab.freedesktop.org/drm/intel/issues/4833
  [i915#4852]: https://gitlab.freedesktop.org/drm/intel/issues/4852
  [i915#4859]: https://gitlab.freedesktop.org/drm/intel/issues/4859
  [i915#4860]: https://gitlab.freedesktop.org/drm/intel/issues/4860
  [i915#4880]: https://gitlab.freedesktop.org/drm/intel/issues/4880
  [i915#4885]: https://gitlab.freedesktop.org/drm/intel/issues/4885
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286
  [i915#5288]: https://gitlab.freedesktop.org/drm/intel/issues/5288
  [i915#5325]: https://gitlab.freedesktop.org/drm/intel/issues/5325
  [i915#5439]: https://gitlab.freedesktop.org/drm/intel/issues/5439
  [i915#5461]: https://gitlab.freedesktop.org/drm/intel/issues/5461
  [i915#5563]: https://gitlab.freedesktop.org/drm/intel/issues/5563
  [i915#5723]: https://gitlab.freedesktop.org/drm/intel/issues/5723
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6227]: https://gitlab.freedesktop.org/drm/intel/issues/6227
  [i915#6230]: https://gitlab.freedesktop.org/drm/intel/issues/6230
  [i915#6524]: https://gitlab.freedesktop.org/drm/intel/issues/6524
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#6755]: https://gitlab.freedesktop.org/drm/intel/issues/6755
  [i915#7561]: https://gitlab.freedesktop.org/drm/intel/issues/7561
  [i915#7663]: https://gitlab.freedesktop.org/drm/intel/issues/7663
  [i915#7697]: https://gitlab.freedesktop.org/drm/intel/issues/7697
  [i915#7701]: https://gitlab.freedesktop.org/drm/intel/issues/7701
  [i915#7711]: https://gitlab.freedesktop.org/drm/intel/issues/7711
  [i915#7790]: https://gitlab.freedesktop.org/drm/intel/issues/7790
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#7959]: https://gitlab.freedesktop.org/drm/intel/issues/7959
  [i915#8211]: https://gitlab.freedesktop.org/drm/intel/issues/8211
  [i915#8319]: https://gitlab.freedesktop.org/drm/intel/issues/8319


Build changes
-------------

  * IGT: IGT_7226 -> IGTPW_8716
  * Linux: CI_DRM_12937 -> Patchwork_115836v1

  CI-20190529: 20190529
  CI_DRM_12937: 6848d3613c0a63382d00ff550c41394902bda903 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_8716: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8716/index.html
  IGT_7226: 41be8b4ab86f9e11388c10366dfd71e5032589c1 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_115836v1: 6848d3613c0a63382d00ff550c41394902bda903 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115836v1/index.html

[-- Attachment #2: Type: text/html, Size: 10043 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 5/9] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-03-30 12:39   ` Tvrtko Ursulin
@ 2023-03-30 22:28     ` Dixit, Ashutosh
  2023-03-31  8:22       ` Tvrtko Ursulin
  0 siblings, 1 reply; 28+ messages in thread
From: Dixit, Ashutosh @ 2023-03-30 22:28 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, 30 Mar 2023 05:39:04 -0700, Tvrtko Ursulin wrote:
>

Hi Tvrtko,

> > diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
> > index 1b04c79907e8..a708e44a227e 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.h
> > +++ b/drivers/gpu/drm/i915/i915_pmu.h
> > @@ -38,13 +38,16 @@ enum {
> >	__I915_NUM_PMU_SAMPLERS
> >   };
> >   +#define I915_PMU_MAX_GTS (4) /* FIXME */
>
> 3-4 years since writing this I have no idea what I meant by this
> FIXME. Should have put a better comment.. :( It was early platform
> enablement times so it was somewhat passable, but now I think we need to
> figure out what I actually meant. Maybe removing the comment is fine.
>
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index dba7c5a5b25e..bbab7f3dbeb4 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -280,7 +280,17 @@ enum drm_i915_pmu_engine_sample {
> >   #define I915_PMU_ENGINE_SEMA(class, instance) \
> >	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
> >   -#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 +
> > (x))
> > +/*
> > + * Top 8 bits of every non-engine counter are GT id.
> > + * FIXME: __I915_PMU_GT_SHIFT will be changed to 56
> > + */
>
> I asked before and don't think I got an answer: Why is 4 bits not enough
> for gt id? The comment is not my code I am pretty sure.

Both of the above FIXME's are the work of yours truly :-) (added during
PRELIM work).

Anyway given that now i915 will not support new product generations I think
we can just drop the FIXME's. Otherwise I was saying since we are only
using a few bottom bits, why not future proof things a bit and allow for
num_gt's to expand beyond 16.

So for now just drop the FIXME's for i915, revisit if needed with xe.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 5/9] drm/i915/pmu: Prepare for multi-tile non-engine counters
  2023-03-30 22:28     ` Dixit, Ashutosh
@ 2023-03-31  8:22       ` Tvrtko Ursulin
  0 siblings, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2023-03-31  8:22 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx


On 30/03/2023 23:28, Dixit, Ashutosh wrote:
> On Thu, 30 Mar 2023 05:39:04 -0700, Tvrtko Ursulin wrote:
>>
> 
> Hi Tvrtko,
> 
>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>>> index 1b04c79907e8..a708e44a227e 100644
>>> --- a/drivers/gpu/drm/i915/i915_pmu.h
>>> +++ b/drivers/gpu/drm/i915/i915_pmu.h
>>> @@ -38,13 +38,16 @@ enum {
>>> 	__I915_NUM_PMU_SAMPLERS
>>>    };
>>>    +#define I915_PMU_MAX_GTS (4) /* FIXME */
>>
>> 3-4 years since writing this I have no idea what I meant by this
>> FIXME. Should have put a better comment.. :( It was early platform
>> enablement times so it was somewhat passable, but now I think we need to
>> figure out what I actually meant. Maybe removing the comment is fine.
>>
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index dba7c5a5b25e..bbab7f3dbeb4 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -280,7 +280,17 @@ enum drm_i915_pmu_engine_sample {
>>>    #define I915_PMU_ENGINE_SEMA(class, instance) \
>>> 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
>>>    -#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 +
>>> (x))
>>> +/*
>>> + * Top 8 bits of every non-engine counter are GT id.
>>> + * FIXME: __I915_PMU_GT_SHIFT will be changed to 56
>>> + */
>>
>> I asked before and don't think I got an answer: Why is 4 bits not enough
>> for gt id? The comment is not my code I am pretty sure.
> 
> Both of the above FIXME's are the work of yours truly :-) (added during
> PRELIM work).

Very kind of you but I think first one is mine. ;) I can find it in my 
local branch dating from at least June 2020.

I had an idea that maybe it was supposed to mean I wanted to results the 
I915_MAX_GT define and not duplicate a '4' here. Perhaps there was some 
header mess which made me give up at the time.

I think it is worth trying that now, maybe something changed.

> Anyway given that now i915 will not support new product generations I think
> we can just drop the FIXME's. Otherwise I was saying since we are only
> using a few bottom bits, why not future proof things a bit and allow for
> num_gt's to expand beyond 16.

Oh right.. I thought 16 gts will be enough but I also don't think I mind 
if it is 4 or 8 bits. Possibly at the time, as I was seeing more and 
more counters getting added, or better say classes of counters, I was 
starting to get wary of getting out of bits for future expansion. All of 
those were done by segmenting the numerical space, not bit wise, so 
perhaps the concern shouldn't have been there and 8 is also fine. Don't 
know really, don't think I have a strong opinion. Lets pick one and drop 
the FIXME comment.

Regards,

Tvrtko

> 
> So for now just drop the FIXME's for i915, revisit if needed with xe.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 6/9] drm/i915/pmu: Export counters from all tiles
  2023-03-30 17:33     ` Umesh Nerlige Ramappa
@ 2023-03-31  8:57       ` Tvrtko Ursulin
  0 siblings, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2023-03-31  8:57 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx


On 30/03/2023 18:33, Umesh Nerlige Ramappa wrote:
> On Thu, Mar 30, 2023 at 02:01:42PM +0100, Tvrtko Ursulin wrote:
>>
>> On 30/03/2023 01:41, Umesh Nerlige Ramappa wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Start exporting frequency and RC6 counters from all tiles.
>>>
>>> Existing counters keep their names and config values and new one use the
>>> namespace added in the previous patch, with the "-gtN" added to their
>>> names.
>>
>> The part about keeping the names is not in the code any more. So 
>> something will have to give, either the commit text or the code.
>>
>> Even without that detail, I suspect someone might want to add them 
>> Co-developed-by since I *think* someone did some changes.
>>> Interrupts counter is an odd one off. Because it is the global device
>>> counters (not only GT) we choose not to add per tile versions for now.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/i915_pmu.c | 96 ++++++++++++++++++++++++++-------
>>>  1 file changed, 77 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c 
>>> b/drivers/gpu/drm/i915/i915_pmu.c
>>> index 5d1de98d86b4..2a5deabff088 100644
>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>> @@ -548,8 +548,9 @@ config_status(struct drm_i915_private *i915, u64 
>>> config)
>>>      struct intel_gt *gt = to_gt(i915);
>>>      unsigned int gt_id = config_gt_id(config);
>>> +    unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
>>> -    if (gt_id)
>>> +    if (gt_id > max_gt_id)
>>>          return -ENOENT;
>>>      switch (config_counter(config)) {
>>> @@ -563,6 +564,8 @@ config_status(struct drm_i915_private *i915, u64 
>>> config)
>>>              return -ENODEV;
>>>          break;
>>>      case I915_PMU_INTERRUPTS:
>>> +        if (gt_id)
>>> +            return -ENOENT;
>>>          break;
>>>      case I915_PMU_RC6_RESIDENCY:
>>>          if (!gt->rc6.supported)
>>> @@ -932,9 +935,9 @@ static const struct attribute_group 
>>> i915_pmu_cpumask_attr_group = {
>>>      .attrs = i915_cpumask_attrs,
>>>  };
>>> -#define __event(__config, __name, __unit) \
>>> +#define __event(__counter, __name, __unit) \
>>>  { \
>>> -    .config = (__config), \
>>> +    .counter = (__counter), \
>>>      .name = (__name), \
>>>      .unit = (__unit), \
>>>  }
>>> @@ -975,15 +978,21 @@ create_event_attributes(struct i915_pmu *pmu)
>>>  {
>>>      struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), 
>>> pmu);
>>>      static const struct {
>>> -        u64 config;
>>> +        unsigned int counter;
>>>          const char *name;
>>>          const char *unit;
>>>      } events[] = {
>>> -        __event(I915_PMU_ACTUAL_FREQUENCY, "actual-frequency", "M"),
>>> -        __event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", 
>>> "M"),
>>> -        __event(I915_PMU_INTERRUPTS, "interrupts", NULL),
>>> -        __event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
>>> -        __event(I915_PMU_SOFTWARE_GT_AWAKE_TIME, 
>>> "software-gt-awake-time", "ns"),
>>> +        __event(0, "actual-frequency", "M"),
>>> +        __event(1, "requested-frequency", "M"),
>>> +        __event(3, "rc6-residency", "ns"),
>>> +        __event(4, "software-gt-awake-time", "ns"),
>>> +    };
>>> +    static const struct {
>>> +        unsigned int counter;
>>> +        const char *name;
>>> +        const char *unit;
>>> +    } global_events[] = {
>>> +        __event(2, "interrupts", NULL),
>>>      };
>>>      static const struct {
>>>          enum drm_i915_pmu_engine_sample sample;
>>> @@ -998,14 +1007,29 @@ create_event_attributes(struct i915_pmu *pmu)
>>>      struct i915_ext_attribute *i915_attr = NULL, *i915_iter;
>>>      struct attribute **attr = NULL, **attr_iter;
>>>      struct intel_engine_cs *engine;
>>> -    unsigned int i;
>>> +    struct intel_gt *gt;
>>> +    unsigned int i, j;
>>>      /* Count how many counters we will be exposing. */
>>> -    for (i = 0; i < ARRAY_SIZE(events); i++) {
>>> -        if (!config_status(i915, events[i].config))
>>> +    /* per gt counters */
>>
>> Two comments one by another, two styles - the inconsistency hurts.
>>
>> Not sure why global events needed to be split out into a separate 
>> array? Like this below two loops are needed for each stage instead of 
>> one. AFAIR one array and one loop would just work because 
>> config_status wold report global ones as unsupported for gt > 0.
> 
> The idea was to add the legacy events into the global array. These 
> events will not have -gtN appeneded to it. Note that on a single gt 
> platform, my idea is to have both legacy as well as gt0 events.
> 
> ADLP:
> actual-frequency
> actual-frequency-gt0

IMO that would be pointless and harmful even.

> MTL:
> actual-frequency
> actual-frequency-gt0
> actual-frequency-gt1

This one lets cover in discussion against 9/9.

>> [Comes back later. It looked like this in my code:
>>
>>        static const struct {
>> -               u64 config;
>> +               unsigned int counter;
>>                const char *name;
>>                const char *unit;
>> +               bool global;
>>        } events[] = {
>> -               __event(I915_PMU_ACTUAL_FREQUENCY, "actual-frequency", 
>> "M"),
>> -               __event(I915_PMU_REQUESTED_FREQUENCY, 
>> "requested-frequency", "M"),
>> -               __event(I915_PMU_INTERRUPTS, "interrupts", NULL),
>> -               __event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
>> +               /*
>> +                * #define __I915_PMU_ACTUAL_FREQUENCY(gt)    
>> ___I915_PMU_OTHER(gt, 0)
>> +                * #define __I915_PMU_REQUESTED_FREQUENCY(gt) 
>> ___I915_PMU_OTHER(gt, 1)
>> +                * #define __I915_PMU_INTERRUPTS(gt)          
>> ___I915_PMU_OTHER(gt, 2)
>> +                * #define __I915_PMU_RC6_RESIDENCY(gt)       
>> ___I915_PMU_OTHER(gt, 3)
>> +                */
>> +               __event(0, "actual-frequency", "M"),
>> +               __event(1, "requested-frequency", "M"),
>> +               __global_event(2, "interrupts", NULL),
>> +               __event(3, "rc6-residency", "ns"),
>>
>> ...
>>
>>        /* Count how many counters we will be exposing. */
>> -       for (i = 0; i < ARRAY_SIZE(events); i++) {
>> -               if (!config_status(i915, events[i].config))
>> -                       count++;
>> +       for_each_gt(i915, j, gt) {
>> +               for (i = 0; i < ARRAY_SIZE(events); i++) {
>> +                       u64 config = ___I915_PMU_OTHER(j, 
>> events[i].counter);
>> +
>> +                       if (!config_status(i915, config))
>> +                               count++;
>> +               }
>>
>> So AFAICT it just worked.
> 
> If we decide to drop 9/9, then I would drop the 7/9 and 8/9 and just 
> move back to your original patches because they worked as is. The only 
> open then would be if we want to have the -gt0 events as well for single 
> gt platforms.
> 
> The idea is to make this similar to what's implemented for the sysfs 
> frequency/rc6 attribues in /sys/class/drm/card0. There is a root version 
> as well as a gt/gt0 version. fwiu, gt/gt0 attributes are used on a 
> single gt platform.

For sysfs I can understand the write multi-plexers but min/max read 
policy I don't know, don't like those either.

>>
>> ]
>>
>>> +    for_each_gt(gt, i915, j) {
>>> +        for (i = 0; i < ARRAY_SIZE(events); i++) {
>>> +            u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>>> +
>>> +            if (!config_status(i915, config))
>>> +                count++;
>>> +        }
>>> +    }
>>> +
>>> +    /* global (per GPU) counters */
>>> +    for (i = 0; i < ARRAY_SIZE(global_events); i++) {
>>> +        u64 config = ___I915_PMU_OTHER(0, global_events[i].counter);
>>> +
>>> +        if (!config_status(i915, config))
>>>              count++;
>>>      }
>>> +    /* per engine counters */
>>>      for_each_uabi_engine(engine, i915) {
>>>          for (i = 0; i < ARRAY_SIZE(engine_events); i++) {
>>>              if (!engine_event_status(engine,
>>> @@ -1033,26 +1057,60 @@ create_event_attributes(struct i915_pmu *pmu)
>>>      attr_iter = attr;
>>>      /* Initialize supported non-engine counters. */
>>> -    for (i = 0; i < ARRAY_SIZE(events); i++) {
>>> +    /* per gt counters */
>>> +    for_each_gt(gt, i915, j) {
>>> +        for (i = 0; i < ARRAY_SIZE(events); i++) {
>>> +            u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>>> +            char *str;
>>> +
>>> +            if (config_status(i915, config))
>>> +                continue;
>>> +
>>> +            str = kasprintf(GFP_KERNEL, "%s-gt%u",
>>> +                    events[i].name, j);
>>
>> So with this patch all old platforms change the event names. This is 
>> not how I wrote it, and more importantly, it breaks userspace. Why 
>> would we do it?
> 
> With this patch alone, yes, this would break the uapi. With the series, 
> not really because those events are added back in 9/9. Should we retain 
> uapi compatibility in each patch? If yes, then I need to change this.

It probably would have been better if this one was left at it was, and 
then in the last patch you propose and implement the complete change. 
That way there would be no interim state where existing platforms are 
broken.  I guess you can hold off doing that until the discussion settles.

Regards,

Tvrtko

> 
>>
>> For reference I dug out my code from 2020 and it looked like this:
>>
>> +                       if (events[i].global || !i915->remote_tiles)
>> +                               str = kstrdup(events[i].name, 
>> GFP_KERNEL);
>> +                       else
>> +                               str = kasprintf(GFP_KERNEL, "%s-gt%u",
>> +                                               events[i].name, j);
>>
>> So on single tile platforms names remain the same.
> 
> The series still maintains the same idea, but also adds xxxx-gt0 events 
> for the old platforms.
> 
> Thanks,
> Umesh
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>> +            if (!str)
>>> +                goto err;
>>> +
>>> +            *attr_iter++ = &i915_iter->attr.attr;
>>> +            i915_iter = add_i915_attr(i915_iter, str, config);
>>> +
>>> +            if (events[i].unit) {
>>> +                str = kasprintf(GFP_KERNEL, "%s-gt%u.unit",
>>> +                        events[i].name, j);
>>> +                if (!str)
>>> +                    goto err;
>>> +
>>> +                *attr_iter++ = &pmu_iter->attr.attr;
>>> +                pmu_iter = add_pmu_attr(pmu_iter, str,
>>> +                            events[i].unit);
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    /* global (per GPU) counters */
>>> +    for (i = 0; i < ARRAY_SIZE(global_events); i++) {
>>> +        u64 config = ___I915_PMU_OTHER(0, global_events[i].counter);
>>>          char *str;
>>> -        if (config_status(i915, events[i].config))
>>> +        if (config_status(i915, config))
>>>              continue;
>>> -        str = kstrdup(events[i].name, GFP_KERNEL);
>>> +        str = kstrdup(global_events[i].name, GFP_KERNEL);
>>>          if (!str)
>>>              goto err;
>>>          *attr_iter++ = &i915_iter->attr.attr;
>>> -        i915_iter = add_i915_attr(i915_iter, str, events[i].config);
>>> +        i915_iter = add_i915_attr(i915_iter, str, config);
>>> -        if (events[i].unit) {
>>> -            str = kasprintf(GFP_KERNEL, "%s.unit", events[i].name);
>>> +        if (global_events[i].unit) {
>>> +            str = kasprintf(GFP_KERNEL, "%s.unit",
>>> +                    global_events[i].name);
>>>              if (!str)
>>>                  goto err;
>>>              *attr_iter++ = &pmu_iter->attr.attr;
>>> -            pmu_iter = add_pmu_attr(pmu_iter, str, events[i].unit);
>>> +            pmu_iter = add_pmu_attr(pmu_iter, str,
>>> +                        global_events[i].unit);
>>>          }
>>>      }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 9/9] drm/i915/pmu: Enable legacy PMU events for MTL
  2023-03-30 18:31     ` Umesh Nerlige Ramappa
@ 2023-03-31 13:02       ` Tvrtko Ursulin
  2023-04-20 20:12         ` Umesh Nerlige Ramappa
  2023-04-03 19:16       ` Umesh Nerlige Ramappa
  1 sibling, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2023-03-31 13:02 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx


On 30/03/2023 19:31, Umesh Nerlige Ramappa wrote:
> + Joonas for comments on this
> 
> On Thu, Mar 30, 2023 at 02:38:03PM +0100, Tvrtko Ursulin wrote:
>>
>> On 30/03/2023 01:41, Umesh Nerlige Ramappa wrote:
>>> MTL introduces separate GTs for render and media. This complicates the
>>> definition of frequency and rc6 counters for the GPU as a whole since
>>> each GT has an independent counter. The best way to support this change
>>> is to deprecate the GPU-specific counters and create GT-specific
>>> counters, however that just breaks ABI. Since perf tools and scripts may
>>> be decentralized with probably many users, it's hard to deprecate the
>>> legacy counters and have all the users on board with that.
>>>
>>> Re-introduce the legacy counters and support them as min/max of
>>> GT-specific counters as necessary to ensure backwards compatibility.
>>>
>>> I915_PMU_ACTUAL_FREQUENCY - will show max of GT-specific counters
>>> I915_PMU_REQUESTED_FREQUENCY - will show max of GT-specific counters
>>> I915_PMU_INTERRUPTS - no changes since it is GPU specific on all 
>>> platforms
>>> I915_PMU_RC6_RESIDENCY - will show min of GT-specific counters
>>> I915_PMU_SOFTWARE_GT_AWAKE_TIME - will show max of GT-specific counters
>>
>> IMO max/min games are _very_ low value and probably just confusing.
> 
> By value, do you mean ROI or actually that the values would be incorrect?

Both really.

>> I am not convinced we need to burden the kernel with this. New 
>> platform, new counters.. userspace can just deal with it.
> 
> I agree and would prefer to drop this patch. There are some counter 
> arguments, I have added Joonas here for comments.
> 
> 1) an app/script hard-coded with the legacy events would be used on a 
> new platform and fail and we should maintain backwards compatibility.

I thought we pretty much agreed multiple times in the past (on different 
topics) that a new platform can require new userspace.

PMU is probably even a more clear cut case since it is exposing hardware 
counters (or close) so sometimes it is not even theoretically possible 
to preserve "backward" compatibility.

(I double quote backward because I think real backward compatibility 
does not apply on a new platform. And MTL is under force probe still.)

So for me it all comes under the "would be nice" category. But since we 
need to add kernel code to do it, code which asy intel_gpu_top could run 
in userspace, I am not at all convinced it wouldn't be a bad idea.

The aggregated counters wouldn't even be giving the full picture.

So I'd simply add tiles/gt support to intel_gpu_top. Same as it 
currently can do "-p" on the command line, or '1' in the interactive 
mode, to aggregate the engine classes into one line item, I'd extend 
that concept into frequencies and RC6.

By default we start with normalized values and in physical mode we show 
separate counters per tile/gt.

Someone running old intel_gpu_top on MTL gets to see nothing since the 
counter names are different. Which is IMO fine - better than showing 
tile 0 data, or some minimums/maximums from one tile only.

> 2) the sysfs attributes for rc6/frequency have already adopted an 
> aggregate vs gt0/gt1 approach to address that and pmu should have a 
> similar solution (or rather, PMU and the sysfs approaches should match 
> based on whatever is the approach)

Yeah I disagreed with min/max reads in sysfs too and am pretty sure I 
expressed that at the time. :shrug:

But I don't think there is a strong argument that PMU needs to follow.

Only impact is to people who access perf_event_open directly so yeah, if 
there are such users, they will need to add multi-tile support.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 9/9] drm/i915/pmu: Enable legacy PMU events for MTL
  2023-03-30 18:31     ` Umesh Nerlige Ramappa
  2023-03-31 13:02       ` Tvrtko Ursulin
@ 2023-04-03 19:16       ` Umesh Nerlige Ramappa
  1 sibling, 0 replies; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-04-03 19:16 UTC (permalink / raw)
  To: Tvrtko Ursulin, joonas.lahtinen; +Cc: intel-gfx

+ Joonas for inputs

On Thu, Mar 30, 2023 at 11:31:53AM -0700, Umesh Nerlige Ramappa wrote:
>+ Joonas for comments on this
>
>On Thu, Mar 30, 2023 at 02:38:03PM +0100, Tvrtko Ursulin wrote:
>>
>>On 30/03/2023 01:41, Umesh Nerlige Ramappa wrote:
>>>MTL introduces separate GTs for render and media. This complicates the
>>>definition of frequency and rc6 counters for the GPU as a whole since
>>>each GT has an independent counter. The best way to support this change
>>>is to deprecate the GPU-specific counters and create GT-specific
>>>counters, however that just breaks ABI. Since perf tools and scripts may
>>>be decentralized with probably many users, it's hard to deprecate the
>>>legacy counters and have all the users on board with that.
>>>
>>>Re-introduce the legacy counters and support them as min/max of
>>>GT-specific counters as necessary to ensure backwards compatibility.
>>>
>>>I915_PMU_ACTUAL_FREQUENCY - will show max of GT-specific counters
>>>I915_PMU_REQUESTED_FREQUENCY - will show max of GT-specific counters
>>>I915_PMU_INTERRUPTS - no changes since it is GPU specific on all platforms
>>>I915_PMU_RC6_RESIDENCY - will show min of GT-specific counters
>>>I915_PMU_SOFTWARE_GT_AWAKE_TIME - will show max of GT-specific counters
>>
>>IMO max/min games are _very_ low value and probably just confusing.
>
>By value, do you mean ROI or actually that the values would be 
>incorrect?
>
>>
>>I am not convinced we need to burden the kernel with this. New 
>>platform, new counters.. userspace can just deal with it.
>
>I agree and would prefer to drop this patch. There are some counter 
>arguments, I have added Joonas here for comments.
>
>1) an app/script hard-coded with the legacy events would be used on a 
>new platform and fail and we should maintain backwards compatibility.
>
>2) the sysfs attributes for rc6/frequency have already adopted an 
>aggregate vs gt0/gt1 approach to address that and pmu should have a 
>similar solution (or rather, PMU and the sysfs approaches should match 
>based on whatever is the approach)
>
>Regards,
>Umesh
>
>>
>>In intel_gpu_top we can do the smarts in maybe default aggregated 
>>view (piggy back/extend on engines aggregation via command line '-p' 
>>or '1' at runtime). But then it's not min/max but probably 
>>normalized by number of gts.
>>
>>Regards,
>>
>>Tvrtko
>>
>>>
>>>Note:
>>>- For deeper debugging of performance issues, tools must be upgraded to
>>>  read the GT-specific counters.
>>>- This patch deserves to be separate from the other PMU features so that
>>>  it can be easily dropped if legacy events are ever deprecated.
>>>- Internal implementation relies on creating an extra entry in the
>>>  arrays used for GT specific counters. Index 0 is empty.
>>>  Index 1 through N are mapped to GTs 0 through N - 1.
>>>- User interface will use GT numbers indexed from 0 to specify the GT of
>>>  interest.
>>>
>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>---
>>> drivers/gpu/drm/i915/i915_pmu.c | 134 +++++++++++++++++++++++++++-----
>>> drivers/gpu/drm/i915/i915_pmu.h |   2 +-
>>> include/uapi/drm/i915_drm.h     |  14 ++--
>>> 3 files changed, 125 insertions(+), 25 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>>index 9bd9605d2662..0dc7711c3b4b 100644
>>>--- a/drivers/gpu/drm/i915/i915_pmu.c
>>>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>>>@@ -221,7 +221,7 @@ add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val,
>>> static u64 get_rc6(struct intel_gt *gt)
>>> {
>>> 	struct drm_i915_private *i915 = gt->i915;
>>>-	const unsigned int gt_id = gt->info.id;
>>>+	const unsigned int gt_id = gt->info.id + 1;
>>> 	struct i915_pmu *pmu = &i915->pmu;
>>> 	unsigned long flags;
>>> 	bool awake = false;
>>>@@ -267,24 +267,26 @@ static void init_rc6(struct i915_pmu *pmu)
>>> 	for_each_gt(gt, i915, i) {
>>> 		intel_wakeref_t wakeref;
>>>+		const unsigned int gt_id = i + 1;
>>> 		with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
>>> 			u64 val = __get_rc6(gt);
>>>-			store_sample(pmu, i, __I915_SAMPLE_RC6, val);
>>>-			store_sample(pmu, i, __I915_SAMPLE_RC6_LAST_REPORTED,
>>>+			store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val);
>>>+			store_sample(pmu, gt_id, __I915_SAMPLE_RC6_LAST_REPORTED,
>>> 				     val);
>>>-			pmu->sleep_last[i] = ktime_get_raw();
>>>+			pmu->sleep_last[gt_id] = ktime_get_raw();
>>> 		}
>>> 	}
>>> }
>>> static void park_rc6(struct intel_gt *gt)
>>> {
>>>+	const unsigned int gt_id = gt->info.id + 1;
>>> 	struct i915_pmu *pmu = &gt->i915->pmu;
>>>-	store_sample(pmu, gt->info.id, __I915_SAMPLE_RC6, __get_rc6(gt));
>>>-	pmu->sleep_last[gt->info.id] = ktime_get_raw();
>>>+	store_sample(pmu, gt_id, __I915_SAMPLE_RC6, __get_rc6(gt));
>>>+	pmu->sleep_last[gt_id] = ktime_get_raw();
>>> }
>>> static void __i915_pmu_maybe_start_timer(struct i915_pmu *pmu)
>>>@@ -436,18 +438,18 @@ static void
>>> frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>>> {
>>> 	struct drm_i915_private *i915 = gt->i915;
>>>-	const unsigned int gt_id = gt->info.id;
>>>+	const unsigned int gt_id = gt->info.id + 1;
>>> 	struct i915_pmu *pmu = &i915->pmu;
>>> 	struct intel_rps *rps = &gt->rps;
>>>-	if (!frequency_sampling_enabled(pmu, gt_id))
>>>+	if (!frequency_sampling_enabled(pmu, gt->info.id))
>>> 		return;
>>> 	/* Report 0/0 (actual/requested) frequency while parked. */
>>> 	if (!intel_gt_pm_get_if_awake(gt))
>>> 		return;
>>>-	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt_id))) {
>>>+	if (pmu->enable & config_mask(__I915_PMU_ACTUAL_FREQUENCY(gt->info.id))) {
>>> 		u32 val;
>>> 		/*
>>>@@ -467,7 +469,7 @@ frequency_sample(struct intel_gt *gt, unsigned int period_ns)
>>> 				val, period_ns / 1000);
>>> 	}
>>>-	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt_id))) {
>>>+	if (pmu->enable & config_mask(__I915_PMU_REQUESTED_FREQUENCY(gt->info.id))) {
>>> 		add_sample_mult(pmu, gt_id, __I915_SAMPLE_FREQ_REQ,
>>> 				intel_rps_get_requested_frequency(rps),
>>> 				period_ns / 1000);
>>>@@ -545,14 +547,15 @@ engine_event_status(struct intel_engine_cs *engine,
>>> static int
>>> config_status(struct drm_i915_private *i915, u64 config)
>>> {
>>>-	struct intel_gt *gt = to_gt(i915);
>>>-
>>> 	unsigned int gt_id = config_gt_id(config);
>>>-	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 1 : 0;
>>>+	unsigned int max_gt_id = HAS_EXTRA_GT_LIST(i915) ? 2 : 1;
>>>+	struct intel_gt *gt;
>>> 	if (gt_id > max_gt_id)
>>> 		return -ENOENT;
>>>+	gt = !gt_id ? to_gt(i915) : i915->gt[gt_id - 1];
>>>+
>>> 	switch (config_counter(config)) {
>>> 	case I915_PMU_ACTUAL_FREQUENCY:
>>> 		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
>>>@@ -673,23 +676,58 @@ static u64 __i915_pmu_event_read_other(struct perf_event *event)
>>> 	const unsigned int gt_id = config_gt_id(event->attr.config);
>>> 	const u64 config = config_counter(event->attr.config);
>>> 	struct i915_pmu *pmu = &i915->pmu;
>>>+	struct intel_gt *gt;
>>> 	u64 val = 0;
>>>+	int i;
>>> 	switch (config) {
>>> 	case I915_PMU_ACTUAL_FREQUENCY:
>>>-		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
>>>+		if (gt_id)
>>>+			return read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_ACT);
>>>+
>>>+		if (!HAS_EXTRA_GT_LIST(i915))
>>>+			return read_sample_us(pmu, 1, __I915_SAMPLE_FREQ_ACT);
>>>+
>>>+		for_each_gt(gt, i915, i)
>>>+			val = max(val, read_sample_us(pmu, i + 1, __I915_SAMPLE_FREQ_ACT));
>>>+
>>> 		break;
>>> 	case I915_PMU_REQUESTED_FREQUENCY:
>>>-		val = read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
>>>+		if (gt_id)
>>>+			return read_sample_us(pmu, gt_id, __I915_SAMPLE_FREQ_REQ);
>>>+
>>>+		if (!HAS_EXTRA_GT_LIST(i915))
>>>+			return read_sample_us(pmu, 1, __I915_SAMPLE_FREQ_REQ);
>>>+
>>>+		for_each_gt(gt, i915, i)
>>>+			val = max(val, read_sample_us(pmu, i + 1, __I915_SAMPLE_FREQ_REQ));
>>>+
>>> 		break;
>>> 	case I915_PMU_INTERRUPTS:
>>> 		val = READ_ONCE(pmu->irq_count);
>>> 		break;
>>> 	case I915_PMU_RC6_RESIDENCY:
>>>-		val = get_rc6(i915->gt[gt_id]);
>>>+		if (gt_id)
>>>+			return get_rc6(i915->gt[gt_id - 1]);
>>>+
>>>+		if (!HAS_EXTRA_GT_LIST(i915))
>>>+			return get_rc6(i915->gt[0]);
>>>+
>>>+		val = U64_MAX;
>>>+		for_each_gt(gt, i915, i)
>>>+			val = min(val, get_rc6(gt));
>>>+
>>> 		break;
>>> 	case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
>>>-		val = ktime_to_ns(intel_gt_get_awake_time(to_gt(i915)));
>>>+		if (gt_id)
>>>+			return ktime_to_ns(intel_gt_get_awake_time(i915->gt[gt_id - 1]));
>>>+
>>>+		if (!HAS_EXTRA_GT_LIST(i915))
>>>+			return ktime_to_ns(intel_gt_get_awake_time(i915->gt[0]));
>>>+
>>>+		val = 0;
>>>+		for_each_gt(gt, i915, i)
>>>+			val = max((s64)val, ktime_to_ns(intel_gt_get_awake_time(gt)));
>>> 		break;
>>> 	}
>>>@@ -728,11 +766,14 @@ static void i915_pmu_event_read(struct perf_event *event)
>>> static void i915_pmu_enable(struct perf_event *event)
>>> {
>>>+	const unsigned int gt_id = config_gt_id(event->attr.config);
>>> 	struct drm_i915_private *i915 =
>>> 		container_of(event->pmu, typeof(*i915), pmu.base);
>>> 	struct i915_pmu *pmu = &i915->pmu;
>>>+	struct intel_gt *gt;
>>> 	unsigned long flags;
>>> 	unsigned int bit;
>>>+	u64 i;
>>> 	bit = event_bit(event);
>>> 	if (bit == -1)
>>>@@ -745,12 +786,42 @@ static void i915_pmu_enable(struct perf_event *event)
>>> 	 * the event reference counter.
>>> 	 */
>>> 	BUILD_BUG_ON(ARRAY_SIZE(pmu->enable_count) != I915_PMU_MASK_BITS);
>>>+	BUILD_BUG_ON(BITS_PER_TYPE(pmu->enable) < I915_PMU_MASK_BITS);
>>> 	GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>>> 	GEM_BUG_ON(pmu->enable_count[bit] == ~0);
>>> 	pmu->enable |= BIT_ULL(bit);
>>> 	pmu->enable_count[bit]++;
>>>+	/*
>>>+	 * The arrays that i915_pmu maintains are now indexed as
>>>+	 *
>>>+	 * 0 - aggregate events (a.k.a !gt_id)
>>>+	 * 1 - gt0
>>>+	 * 2 - gt1
>>>+	 *
>>>+	 * The same logic applies to event_bit masks. The first set of mask are
>>>+	 * for aggregate, followed by gt0 and gt1 masks. The idea here is to
>>>+	 * enable the event on all gts if the aggregate event bit is set. This
>>>+	 * applies only to the non-engine-events.
>>>+	 */
>>>+	if (!gt_id && !is_engine_event(event)) {
>>>+		for_each_gt(gt, i915, i) {
>>>+			u64 counter = config_counter(event->attr.config);
>>>+			u64 config = ((i + 1) << __I915_PMU_GT_SHIFT) | counter;
>>>+			unsigned int bit = config_bit(config);
>>>+
>>>+			if (bit == -1)
>>>+				continue;
>>>+
>>>+			GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>>>+			GEM_BUG_ON(pmu->enable_count[bit] == ~0);
>>>+
>>>+			pmu->enable |= BIT_ULL(bit);
>>>+			pmu->enable_count[bit]++;
>>>+		}
>>>+	}
>>>+
>>> 	/*
>>> 	 * Start the sampling timer if needed and not already enabled.
>>> 	 */
>>>@@ -793,6 +864,7 @@ static void i915_pmu_enable(struct perf_event *event)
>>> static void i915_pmu_disable(struct perf_event *event)
>>> {
>>>+	const unsigned int gt_id = config_gt_id(event->attr.config);
>>> 	struct drm_i915_private *i915 =
>>> 		container_of(event->pmu, typeof(*i915), pmu.base);
>>> 	unsigned int bit = event_bit(event);
>>>@@ -822,6 +894,26 @@ static void i915_pmu_disable(struct perf_event *event)
>>> 		 */
>>> 		if (--engine->pmu.enable_count[sample] == 0)
>>> 			engine->pmu.enable &= ~BIT(sample);
>>>+	} else if (!gt_id) {
>>>+		struct intel_gt *gt;
>>>+		u64 i;
>>>+
>>>+		for_each_gt(gt, i915, i) {
>>>+			u64 counter = config_counter(event->attr.config);
>>>+			u64 config = ((i + 1) << __I915_PMU_GT_SHIFT) | counter;
>>>+			unsigned int bit = config_bit(config);
>>>+
>>>+			if (bit == -1)
>>>+				continue;
>>>+
>>>+			GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>>>+			GEM_BUG_ON(pmu->enable_count[bit] == 0);
>>>+
>>>+			if (--pmu->enable_count[bit] == 0) {
>>>+				pmu->enable &= ~BIT_ULL(bit);
>>>+				pmu->timer_enabled &= pmu_needs_timer(pmu, true);
>>>+			}
>>>+		}
>>> 	}
>>> 	GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>>>@@ -1002,7 +1094,11 @@ create_event_attributes(struct i915_pmu *pmu)
>>> 		const char *name;
>>> 		const char *unit;
>>> 	} global_events[] = {
>>>+		__event(0, "actual-frequency", "M"),
>>>+		__event(1, "requested-frequency", "M"),
>>> 		__event(2, "interrupts", NULL),
>>>+		__event(3, "rc6-residency", "ns"),
>>>+		__event(4, "software-gt-awake-time", "ns"),
>>> 	};
>>> 	static const struct {
>>> 		enum drm_i915_pmu_engine_sample sample;
>>>@@ -1024,7 +1120,7 @@ create_event_attributes(struct i915_pmu *pmu)
>>> 	/* per gt counters */
>>> 	for_each_gt(gt, i915, j) {
>>> 		for (i = 0; i < ARRAY_SIZE(events); i++) {
>>>-			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>>>+			u64 config = ___I915_PMU_OTHER(j + 1, events[i].counter);
>>> 			if (!config_status(i915, config))
>>> 				count++;
>>>@@ -1070,7 +1166,7 @@ create_event_attributes(struct i915_pmu *pmu)
>>> 	/* per gt counters */
>>> 	for_each_gt(gt, i915, j) {
>>> 		for (i = 0; i < ARRAY_SIZE(events); i++) {
>>>-			u64 config = ___I915_PMU_OTHER(j, events[i].counter);
>>>+			u64 config = ___I915_PMU_OTHER(j + 1, events[i].counter);
>>> 			char *str;
>>> 			if (config_status(i915, config))
>>>diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
>>>index a708e44a227e..a4cc1eb218fc 100644
>>>--- a/drivers/gpu/drm/i915/i915_pmu.h
>>>+++ b/drivers/gpu/drm/i915/i915_pmu.h
>>>@@ -38,7 +38,7 @@ enum {
>>> 	__I915_NUM_PMU_SAMPLERS
>>> };
>>>-#define I915_PMU_MAX_GTS (4) /* FIXME */
>>>+#define I915_PMU_MAX_GTS (4 + 1) /* FIXME */
>>> /**
>>>  * How many different events we track in the global PMU mask.
>>>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>index bbab7f3dbeb4..18794c30027f 100644
>>>--- a/include/uapi/drm/i915_drm.h
>>>+++ b/include/uapi/drm/i915_drm.h
>>>@@ -290,6 +290,7 @@ enum drm_i915_pmu_engine_sample {
>>> 	(((__u64)__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) | \
>>> 	((__u64)(gt) << __I915_PMU_GT_SHIFT))
>>>+/* Aggregate from all gts */
>>> #define __I915_PMU_OTHER(x) ___I915_PMU_OTHER(0, x)
>>> #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
>>>@@ -300,11 +301,14 @@ enum drm_i915_pmu_engine_sample {
>>> #define I915_PMU_LAST /* Deprecated - do not use */ I915_PMU_RC6_RESIDENCY
>>>-#define __I915_PMU_ACTUAL_FREQUENCY(gt)		___I915_PMU_OTHER(gt, 0)
>>>-#define __I915_PMU_REQUESTED_FREQUENCY(gt)	___I915_PMU_OTHER(gt, 1)
>>>-#define __I915_PMU_INTERRUPTS(gt)		___I915_PMU_OTHER(gt, 2)
>>>-#define __I915_PMU_RC6_RESIDENCY(gt)		___I915_PMU_OTHER(gt, 3)
>>>-#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	___I915_PMU_OTHER(gt, 4)
>>>+/* GT specific counters */
>>>+#define ____I915_PMU_OTHER(gt, x) ___I915_PMU_OTHER(((gt) + 1), x)
>>>+
>>>+#define __I915_PMU_ACTUAL_FREQUENCY(gt)		____I915_PMU_OTHER(gt, 0)
>>>+#define __I915_PMU_REQUESTED_FREQUENCY(gt)	____I915_PMU_OTHER(gt, 1)
>>>+#define __I915_PMU_INTERRUPTS(gt)		____I915_PMU_OTHER(gt, 2)
>>>+#define __I915_PMU_RC6_RESIDENCY(gt)		____I915_PMU_OTHER(gt, 3)
>>>+#define __I915_PMU_SOFTWARE_GT_AWAKE_TIME(gt)	____I915_PMU_OTHER(gt, 4)
>>> /* Each region is a minimum of 16k, and there are at most 255 of them.
>>>  */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-gfx] [PATCH 9/9] drm/i915/pmu: Enable legacy PMU events for MTL
  2023-03-31 13:02       ` Tvrtko Ursulin
@ 2023-04-20 20:12         ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 28+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-04-20 20:12 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, Mar 31, 2023 at 02:02:40PM +0100, Tvrtko Ursulin wrote:
>
>On 30/03/2023 19:31, Umesh Nerlige Ramappa wrote:
>>+ Joonas for comments on this
>>
>>On Thu, Mar 30, 2023 at 02:38:03PM +0100, Tvrtko Ursulin wrote:
>>>
>>>On 30/03/2023 01:41, Umesh Nerlige Ramappa wrote:
>>>>MTL introduces separate GTs for render and media. This complicates the
>>>>definition of frequency and rc6 counters for the GPU as a whole since
>>>>each GT has an independent counter. The best way to support this change
>>>>is to deprecate the GPU-specific counters and create GT-specific
>>>>counters, however that just breaks ABI. Since perf tools and scripts may
>>>>be decentralized with probably many users, it's hard to deprecate the
>>>>legacy counters and have all the users on board with that.
>>>>
>>>>Re-introduce the legacy counters and support them as min/max of
>>>>GT-specific counters as necessary to ensure backwards compatibility.
>>>>
>>>>I915_PMU_ACTUAL_FREQUENCY - will show max of GT-specific counters
>>>>I915_PMU_REQUESTED_FREQUENCY - will show max of GT-specific counters
>>>>I915_PMU_INTERRUPTS - no changes since it is GPU specific on all 
>>>>platforms
>>>>I915_PMU_RC6_RESIDENCY - will show min of GT-specific counters
>>>>I915_PMU_SOFTWARE_GT_AWAKE_TIME - will show max of GT-specific counters
>>>
>>>IMO max/min games are _very_ low value and probably just confusing.
>>
>>By value, do you mean ROI or actually that the values would be incorrect?
>
>Both really.
>
>>>I am not convinced we need to burden the kernel with this. New 
>>>platform, new counters.. userspace can just deal with it.
>>
>>I agree and would prefer to drop this patch. There are some counter 
>>arguments, I have added Joonas here for comments.
>>
>>1) an app/script hard-coded with the legacy events would be used on 
>>a new platform and fail and we should maintain backwards 
>>compatibility.
>
>I thought we pretty much agreed multiple times in the past (on 
>different topics) that a new platform can require new userspace.
>
>PMU is probably even a more clear cut case since it is exposing 
>hardware counters (or close) so sometimes it is not even theoretically 
>possible to preserve "backward" compatibility.
>
>(I double quote backward because I think real backward compatibility 
>does not apply on a new platform. And MTL is under force probe still.)
>
>So for me it all comes under the "would be nice" category. But since 
>we need to add kernel code to do it, code which asy intel_gpu_top 
>could run in userspace, I am not at all convinced it wouldn't be a bad 
>idea.
>
>The aggregated counters wouldn't even be giving the full picture.
>
>So I'd simply add tiles/gt support to intel_gpu_top. Same as it 
>currently can do "-p" on the command line, or '1' in the interactive 
>mode, to aggregate the engine classes into one line item, I'd extend 
>that concept into frequencies and RC6.
>
>By default we start with normalized values and in physical mode we 
>show separate counters per tile/gt.
>
>Someone running old intel_gpu_top on MTL gets to see nothing since the 
>counter names are different. Which is IMO fine - better than showing 
>tile 0 data, or some minimums/maximums from one tile only.
>
>>2) the sysfs attributes for rc6/frequency have already adopted an 
>>aggregate vs gt0/gt1 approach to address that and pmu should have a 
>>similar solution (or rather, PMU and the sysfs approaches should 
>>match based on whatever is the approach)
>
>Yeah I disagreed with min/max reads in sysfs too and am pretty sure I 
>expressed that at the time. :shrug:
>
>But I don't think there is a strong argument that PMU needs to follow.
>
>Only impact is to people who access perf_event_open directly so yeah, 
>if there are such users, they will need to add multi-tile support.

I discussed with Joonas offline and I guess I had a wrong idea regarding 
ABI. Looks like ABI is only broken if we remove something that existed 
for a platform, so this does not break ABI for MTL. The motivation was 
to have per-platform differences smaller for applications/UMD.

The other aspect he pointed out was that we should not push anything 
that does not have an IGT. For aggregate events, we do not have any 
plans to support IGTs (large effort as well as no clear way to support 
such aggregation).

In short, I will drop this patch and post what you originally had for 
multi-gt PMU support.

Thanks,
Umesh
>
>Regards,
>
>Tvrtko

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-04-20 20:13 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-30  0:40 [Intel-gfx] [PATCH 0/7] Add MTL PMU support for multi-gt Umesh Nerlige Ramappa
2023-03-30  0:40 ` [Intel-gfx] [PATCH 1/9] drm/i915/pmu: Support PMU for all engines Umesh Nerlige Ramappa
2023-03-30 12:27   ` Tvrtko Ursulin
2023-03-30  0:40 ` [Intel-gfx] [PATCH 2/9] drm/i915/pmu: Skip sampling engines with no enabled counters Umesh Nerlige Ramappa
2023-03-30  0:40 ` [Intel-gfx] [PATCH 3/9] drm/i915/pmu: Transform PMU parking code to be GT based Umesh Nerlige Ramappa
2023-03-30  0:40 ` [Intel-gfx] [PATCH 4/9] drm/i915/pmu: Add reference counting to the sampling timer Umesh Nerlige Ramappa
2023-03-30  0:40 ` [Intel-gfx] [PATCH 5/9] drm/i915/pmu: Prepare for multi-tile non-engine counters Umesh Nerlige Ramappa
2023-03-30 12:39   ` Tvrtko Ursulin
2023-03-30 22:28     ` Dixit, Ashutosh
2023-03-31  8:22       ` Tvrtko Ursulin
2023-03-30  0:41 ` [Intel-gfx] [PATCH 6/9] drm/i915/pmu: Export counters from all tiles Umesh Nerlige Ramappa
2023-03-30 13:01   ` Tvrtko Ursulin
2023-03-30 17:33     ` Umesh Nerlige Ramappa
2023-03-31  8:57       ` Tvrtko Ursulin
2023-03-30  0:41 ` [Intel-gfx] [PATCH 7/9] drm/i915/pmu: Use a helper to convert to MHz Umesh Nerlige Ramappa
2023-03-30 13:13   ` Tvrtko Ursulin
2023-03-30  0:41 ` [Intel-gfx] [PATCH 8/9] drm/i915/pmu: Split reading engine and other events into helpers Umesh Nerlige Ramappa
2023-03-30 13:26   ` Tvrtko Ursulin
2023-03-30  0:41 ` [Intel-gfx] [PATCH 9/9] drm/i915/pmu: Enable legacy PMU events for MTL Umesh Nerlige Ramappa
2023-03-30 13:38   ` Tvrtko Ursulin
2023-03-30 18:31     ` Umesh Nerlige Ramappa
2023-03-31 13:02       ` Tvrtko Ursulin
2023-04-20 20:12         ` Umesh Nerlige Ramappa
2023-04-03 19:16       ` Umesh Nerlige Ramappa
2023-03-30  1:37 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add MTL PMU support for multi-gt Patchwork
2023-03-30  1:37 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-03-30  1:46 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-03-30 19:50 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).